Explore the Future of Martropolis with Hadoop and Hive

#labex #hadoop #coding #programming

Introduction

This article covers the following tech skills:

In the year 2150, Earth's resources have been depleted, and humanity has established a thriving metropolis on Mars, known as Martropolis. As an environmental protection officer, your mission is to ensure the sustainability of this futuristic city by analyzing and optimizing resource utilization. One of your primary responsibilities is to leverage the power of Hadoop and Hive to process and analyze vast amounts of environmental data, which will guide your decision-making process.

Your objective is to explore the Hive database, investigate its structure, and gain insights into the data it contains. By mastering the art of describing tables in Hive, you will unlock the secrets hidden within the data, enabling you to make informed decisions that will shape the future of Martropolis and safeguard its delicate ecosystem.

Connect to Hive and List Available Databases

In this step, you will learn how to connect to the Hive environment and list the available databases.

First, ensure you are logged in as the hadoop user by running the following command in the terminal:

su - hadoop

Now, launch the Hive shell by executing the following command:

hive

Once you're in the Hive shell, you can use the SHOW DATABASES command to list all available databases.

SHOW DATABASES;

This command will display a list of databases, including the default database.
Example output:

hive> SHOW DATABASES;
OK
default
martropolis
Time taken: 0.528 seconds, Fetched: 2 row(s)

Switch to the 'martropolis' Database

In this step, you will switch to the martropolis database, which contains the tables relevant to your mission.

USE martropolis;

After executing this command, you will be working within the martropolis database.

Tip: martropolis has been automatically created by the system as a sample database for this lab.

List Tables in the 'martropolis' Database

Now that you're in the martropolis database, you can list all the tables it contains using the SHOW TABLES command.

SHOW TABLES;

This command will display a list of tables available in the martropolis database.
Example output:

hive> SHOW TABLES;
OK
sensor_data
Time taken: 0.028 seconds, Fetched: 1 row(s)

Describe the Structure of a Table

To understand the structure of a table, you can use the DESCRIBE command followed by the table name.

DESCRIBE sensor_data;

This command will provide detailed information about the table's columns, including column names, data types, and any additional metadata.
Example output:

hive> DESCRIBE sensor_data;
OK
sensor_id               int                                         
sensor_name             string                                      
reading                 double                                      
dt                      string                                      

# Partition Information          
# col_name              data_type               comment             
dt                      string                                      
Time taken: 0.154 seconds, Fetched: 8 row(s)

Explore Table Properties

In addition to the table structure, you can also explore the properties of a table using the DESCRIBE EXTENDED command.

DESCRIBE EXTENDED sensor_data;

This command will provide more detailed information about the table, including its properties, such as the table type, input and output formats, location, and any other relevant metadata.
Example output:

hive> DESCRIBE EXTENDED sensor_data;
OK
sensor_id               int                                         
sensor_name             string                                      
reading                 double                                      
dt                      string                                      

# Partition Information          
# col_name              data_type               comment             
dt                      string                                      

Detailed Table Information      Table(tableName:sensor_data, dbName:martropolis, owner:hadoop, createTime:1711106250, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:sensor_id, type:int, comment:null), FieldSchema(name:sensor_name, type:string, comment:null), FieldSchema(name:reading, type:double, comment:null), FieldSchema(name:dt, type:string, comment:null)], location:hdfs://localhost:9000/user/hive/warehouse/martropolis.db/sensor_data, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{totalSize=49, numRows=2, rawDataSize=47, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\"}, numFiles=1, numPartitions=1, transient_lastDdlTime=1711106250, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)
Time taken: 0.367 seconds, Fetched: 10 row(s)

Analyze Table Partitions (Optional)

If your tables are partitioned, you can use the SHOW PARTITIONS command to view the partitions of a specific table.

SHOW PARTITIONS sensor_data;

This command will display a list of partitions for the specified table, along with their corresponding partition values.
Example output:

hive> SHOW PARTITIONS sensor_data;
OK
dt=2023-05-01
Time taken: 0.099 seconds, Fetched: 1 row(s)

Summary

In this lab, you learned how to navigate the Hive environment, switch between databases, list tables, and describe the structure and properties of tables. By mastering these fundamental skills, you have taken the first step towards unlocking the valuable insights hidden within the environmental data of Martropolis.

Through hands-on experience, you gained a deeper understanding of the SHOW DATABASES, USE, SHOW TABLES, DESCRIBE, DESCRIBE EXTENDED, and SHOW PARTITIONS commands. These commands are essential tools for exploring and understanding the organization of data in Hive, enabling you to make informed decisions that will shape the future of Martropolis and safeguard its delicate ecosystem.