DEV Community

loading...

Flex Tables In Vertica

Tomaž Vinko
Interested in programming all kind of stuff, from tiny IoT devices to web apps
・3 min read

Vertica Analytics Platform

Column-oriented Vertica Analytics Platform was designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.

This Wikipedia citation was enough for me to start exploring the Vertica database.

After my journey into the realm of Vertica, I found great support for machine learning, efficient columnar storage, great performance and Flex Tables among others.

I found the latter one very interesting, so let's dive into it.

What are Flex Tables

Flex Tables are nothing more than flexible tables :)
They are different kinds of database tables, designed for loading and querying unstructured data.
Flex tables can contain only unstructured, raw data, or both unstructured and columnar data.
You can create a flex table with or without a schema or real columns. Hybrid tables consist of both unstructured and real columns.

Let's get our hands dirty

If you don't have Vertica already, grab free version

I run queries in DbVisualizer, but you can also use other Vertica compatible tools.

We'll import 1 million records from the CSV file into Flex Table.
First, download and extract the SalesRecords.zip archive.

Then create flex table:

CREATE FLEX TABLE csv_hello_world();
Enter fullscreen mode Exit fullscreen mode

We created two tables with the previous command.
First one is flex table csv_hello_world and the second one is auto generated normal table csv_hello_world_keys.

csv_hello_world_keys table contains three columns:

  • key_name: The name of the virtual column (key).
  • frequency: The number of times the key occurs in the map.
  • data_type_guess: The type guess that the helper functions determine for each distinct key in the map data.

We'll need to use COMPUTE_FLEXTABLE_KEYS to calculate this information for the flex keys table columns:

SELECT compute_flextable_keys('csv_hello_world');
Enter fullscreen mode Exit fullscreen mode

Let's fire query command to see the content:

SELECT * 
FROM csv_hello_world_keys;
Enter fullscreen mode Exit fullscreen mode

As expected, the table is empty. We didn't import our CSV yet, so the function doesn't have any information for type guessing.

Import CSV with following statement:

COPY csv_hello_world 
FROM '/home/dbadmin/docker/data/SalesRecords.csv' PARSER fcsvparser();
Enter fullscreen mode Exit fullscreen mode

Change file path to your local path where you extracted the archive. I'm running Vertica in docker so this is mapped to my host machine.

Now run COMPUTE_FLEXTABLE_KEYS command again:

SELECT compute_flextable_keys('csv_hello_world');
Enter fullscreen mode Exit fullscreen mode

and check csv_hello_world_keys table for updated keys:

SELECT * 
FROM csv_hello_world_keys;
Enter fullscreen mode Exit fullscreen mode

Voila, we have columns information now:
Alt Text

If we query our csv_hello_world table, we'll see two columns:

  • identity which is just auto increment field
  • raw binary column that stores loaded data Alt Text

This doesn't help us much. But if we query virtual columns explicitly, we'll get the values:

SELECT 
        "Region",
        "Country",
        "Item Type",
        "Sales Channel",
        "Order Priority",
        "Order Date",
        "Order ID",
        "Ship Date",
        "Units Sold",
        "Unit Price",
        "Unit Cost",
        "Total Revenue",
        "Total Cost",
        "Total Profit"
FROM csv_hello_world;
Enter fullscreen mode Exit fullscreen mode

Alt Text

Conclusion

Flex tables are just one of the many interesting Vertica features. I encourage you to dive deeper, especially the machine learning part.
SQL script for this tutorial can be found here.

Discussion (0)