am new to Spark, but so far I would say: it is the code libraries for a DIY Athena, which lets you write SQL against a file system (eg S3) like it was a database.
Some rather dense documentation discusses a Python Dataframe API (multidimensional arrays - aka matrices) through which SQL accesses these files. So SQL over Dataframes, essentially.
Top comments (0)