DEV Community

bronifty
bronifty

Posted on

Spark Is a DIY Athena - Or SQL Over Python Dataframes

am new to Spark, but so far I would say: it is the code libraries for a DIY Athena, which lets you write SQL against a file system (eg S3) like it was a database.

Some rather dense documentation discusses a Python Dataframe API (multidimensional arrays - aka matrices) through which SQL accesses these files. So SQL over Dataframes, essentially.

Top comments (0)