Aerospike is an extremely fast and reliable data store that supports millions of transactions with incredibly low read and write latency. SQL is the lingua franca for data analysis. The key challenge here is that Aerospike is a NoSQL database, hence you cannot use standard SQL tools to analyze data stored in it. Aerospike offers AQL for lightweight administrative tasks for database maintenance. However, our developer community has been asking for a tool that would enable them to quickly browse or discover data stored in the Aerospike clusters visually along with the ability to run ANSI SQL queries. This has provided the impetus for our effort to create a data browser.
There are a few open-source data browsers such as Clairvoyance that can help you view data stored in Aerospike however, but none of them are feature-complete or actively maintained.
Presto is a highly parallel and distributed SQL query engine. We recently announced the GA of the Aerospike Connect for Presto, which can be used to run ANSI SQL queries to perform in-place, on-demand analytics on massive amounts of data in the Aerospike database via Presto.
So, it was a logical next step for us to leverage Presto and the Presto connector for our data browser design.
Aerospike Data Browser is basically a stack that consists of Quix, Presto, and the Aerospike Connector for Presto, and is dockerized. Figure 1 depicts an under the hood view. The Quix UI provides a DB Explorer and a SQL editor, in addition to a notebook manager for managing your notebooks. Presto exposes a JDBC interface to Quix and uses the Aerospike Connector to translate SQL queries into API calls to the DB. Building a stack with the aforementioned components for a desktop installation is not trivial by any means. Presto can scale to 100’s of nodes for a large scale deployment, but we wanted to limit the data browser to a single Presto instance that would run both the coordinator and worker in the developer's desktop environment. Our initial size of the Presto docker image was over 2GB, which was not acceptable. Hence, we stripped out all but the Aerospike connector from the plugin directory. Similarly, we had to downsize the Quix connector. Finally, we got the compressed docker image size under 1GB. We also made a design decision to default to schema inference so that a user that does not know the schema apriori is not left out.
The data browser enables you to:
- View namespace/set/schema of data stored in Aerospike, without the need to know the schema apriori, across multiple Aerospike clusters.
- Run ANSI SQL queries for data exploration, including aggregate functions.
- Visualize data.
It can be installed very easily using docker. See the Github repo for details on installation, configuration, and usage.
The DB explorer component of the UI allows you to browse the database and list catalogs, namespaces, and sets that are available as shown in Figure 2. Note that a catalog corresponds to an Aerospike cluster name, schema to a namespace, and table to an Aerospike set name. See SQL syntax page for examples and more information.
Also, there is an option to save your queries to a notebook for future use.
Supported SQL statements:
- SHOW CATALOGS
- SHOW SCHEMAS
- SHOW TABLES
- INSERT INTO
Figure 3 depicts the result of a query that was written in the query editor built into the UI.
SELECT * FROM aerospike.test.covid_stats LIMIT 1000
You can also visualize the data using the charting capabilities built into the UI as depicted in Figure 4.
Finally, you can use aggregate functions as depicted in Figure 5.
If you find your data in the Aerospike database staring at you waiting to be quickly discovered, look no further. Use the Aerospike Data Browser to browse through the namespaces and sets to understand the schema of the data. Further, you can run supported ANSI SQL queries against it to generate quick insights. And finally, create some cool visualization. So, try it out by downloading the Aerospike Community edition if you don’t have an existing Aerospike deployment, and follow the instructions in the data browser repo.
Keep in mind that the data browser is not an analytics tool and should not be used for complex analysis of large datasets. You should consider using the Aerospike Connect for Presto for analytics use cases.
Spoiler alert! We are working on the Aerospike JDBC driver, which is lightweight, more aligned with the Aerospike server functionality, and easily droppable in any JDBC data browser application. Stay tuned!
Subscribe to the Aerospike Developer Blog for interesting blogs and other info.