DEV Community

Mike Houngbadji
Mike Houngbadji

Posted on

Apache Spark SQL / Hive: Create External Table based on File in HDFS

To create a Table based on a file located in HDFS, we'll proceed as follow:

  • Update the file/folder to HDFS:
hadoop fs -put /local/source/location /hdfs/destination/location
Enter fullscreen mode Exit fullscreen mode
  • Create the table using the below SQL:
CREATE TABLE sample_table(
        key STRING,
        data STRING)
USING CSV  -- This is based on the format of your source files
OPTIONS ('delimiter'=',',  -- This only needed for delimited file.
        'path'='hdfs:///hdfs/destination/location')
Enter fullscreen mode Exit fullscreen mode
  • We can the now query our table:
SELECT *
FROM sample_table
Enter fullscreen mode Exit fullscreen mode

References:
SparkSQL Documentation - Create Table

PS:
I wrote this to also help myself retrieve the solution faster.

Top comments (0)