loading...

Types of Apache Spark tables and views

subashsivaji profile image Subash Sivaji ・2 min read
Global Managed Table

A managed table is a Spark SQL table for which Spark manages both the data and the metadata. A global managed table is available across all clusters. When you drop the table both data and metadata gets dropped.

dataframe.write.saveAsTable("my_table")
Global Unmanaged/External Table

Spark manages the metadata, while you control the data location. As soon as you add ‘path’ option in dataframe writer it will be treated as global external/unmanaged table. When you drop table only metadata gets dropped. A global unmanaged/external table is available across all clusters.

dataframe.write.option('path', "<your-storage-path>").saveAsTable("my_table")
Local Table (a.k.a) Temporary Table (a.k.a) Temporary View

Spark session scoped. A local table is not accessible from other clusters (or if using databricks notebook not in other notebooks as well) and is not registered in the metastore.

dataframe.createOrReplaceTempView()
Global Temporary View

Spark application scoped, global temporary views are tied to a system preserved temporary database global_temp. This view can be shared across different spark sessions (or if using databricks notebooks, then shared across notebooks).

dataframe.createOrReplaceGlobalTempView("my_global_view")

can be accessed as,

spark.read.table("global_temp.my_global_view")
Global Permanent View

Persist a dataframe as permanent view. The view definition is recorded in the underlying metastore. You can only create permanent view on global managed table or global unmanaged table. Not allowed to create a permanent view on top of any temporary views or dataframe. Note: Permanent views are only available in SQL API — not available in dataframe API

spark.sql("CREATE VIEW permanent_view AS SELECT * FROM table")

There isn’t a function like dataframe.createOrReplacePermanentView()

Reference:

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.saveAsTable
https://docs.azuredatabricks.net/spark/latest/spark-sql/language-manual/create-table.html
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.createOrReplaceTempView
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView
https://stackoverflow.com/questions/48552620/is-it-possible-to-create-persistent-view-in-spark

Migrated from medium originally posted here

Discussion

pic
Editor guide