Skip to content

DEV Community

bronifty

Posted on Sep 22, 2022

Spark Is a DIY Athena - Or SQL Over Python Dataframes

am new to Spark, but so far I would say: it is the code libraries for a DIY Athena, which lets you write SQL against a file system (eg S3) like it was a database.

Some rather dense documentation discusses a Python Dataframe API (multidimensional arrays - aka matrices) through which SQL accesses these files. So SQL over Dataframes, essentially.

Top comments (0)

Subscribe

Read next

How Microsoft for Startups Boosts Entrepreneurial Success

Mahrukh Adeel - Dec 21

A Comprehensive Guide to Grasping Quantum Computing

Blevins Woodward - Dec 22

Docker with Helm: Simplifying Kubernetes Deployment and Management

Abhay Singh Kathayat - Dec 21

Docker and Kubernetes Integration: The Ultimate Solution for Containerized Applications

Abhay Singh Kathayat - Dec 21