DEV Community

loading...

Build a Python Flask API for your Delta Lake

Francisco Ruiz A
Applying Software Engineering Practices to Big Data
・2 min read

After the recent announcement on the Databricks blog about querying your Delta Lake natively with Python (and other languages) without Apache Spark, I got curious about how a Flask API endpoint would look like, so here it is.

from deltalake import DeltaTable
from flask import request, jsonify

app = flask.Flask(__name__)
app.config["DEBUG"] = True

@app.route('/read-delta-table', methods=['GET'])
def home():
    dt = DeltaTable("/tmp/delta/students-delta-table/")

    pd = dt.to_pyarrow_dataset().to_table().to_pandas()

    json_str = pd.to_json(orient = "records")

    parsed = json.loads(json_str)  

    return jsonify(parsed)

app.run()
Enter fullscreen mode Exit fullscreen mode

Gist

Running the API

Alt Text

Pre-requisites:

  1. In order to compile the code you need to use the nightly version of rust

    [to install]
    $ rustup toolchain install nightly

    [to use]
    $ cd ~/projects/needs-nightly
    $ rustup override set nightly

  2. You need to use maturin package to build the .whl

    $ pip install maturin
    $ maturin build

This is still an experimental interface to Delta Lake for Rust with native bindings for Python so proceed with caution, you wouldn't want to expose an ocean of data through an endpoint so proceed with caution.

I'm excited about this project, being able to query delta tables from front-end apps (not via Apache Spark) was a missing piece in the puzzle of delta lakes.

Fantastic effort by the delta-rs contributors:

Delta-rs Git repo is here

Discussion (0)