Jacob Cohen for HarperDB

Posted on Jan 5, 2021

5 Ways to Use HarperDB in Your Next Project

#database #datascience #iot #distributedsystems

HarperDB strives to provide the simplest and most streamlined database solution for developers everywhere. That said, just because it’s simple doesn’t mean that it’s not powerful. HarperDB provides plenty of powerful tools to use in a diverse set of projects. In a year that has been unhinged, HarperDB has only added stability. Features and functionality have been hardened, and we’ve added some exciting new features like Upsert, JSON Search, and Token Authentication. As we look forward to next year, I wanted to hit on some of the things that I consider to be emerging and ongoing trends into 2021. Let’s take a look at some examples of how you can use HarperDB in your next project!

Metadata Management

Let’s start simple with metadata. As our ability to collect data improves, our data footprint also grows to the point where we are storing metadata, which literally means data about data. It’s data that informs us about other data, and that right there is how you know you need a powerful database solution.

Let’s dig into an example. I’m going to take one of the easier, more obvious examples: photos. Particularly photos from your phone. Both Android and iOS (and most modern digital cameras) store EXIF data, which is a standardized format for photo metadata that includes typical things like: timestamp, dimensions, and resolution. It also includes more detailed data like: device make and model, aperture value, focal length, ISO speed, F-number, and longitude and latitude. That implies that there are plenty of different ways to search for photos in an application.

Why is HarperDB better at this than the average NoSQL database? Enter my old friend SQL! Data can be ingested into HarperDB however you want, but the easiest way to build a crazy conditional query is with SQL. For example, I can find all photos that were taken on an Apple product with ISO speed between 25 and 200 with the following query.

SELECT *
FROM photos.metadata 
WHERE device_make = 'Apple' 
  AND iso_speed BETWEEN 25 AND 200

Yeah, I get it, that’s a weird thing to be looking for, but maybe that’s exactly what I need.

Geospatial Data Analysis

Heavily related to metadata, but more specific, geospatial data deals with the added complication of maps, and wow is it complex! So complex, in fact, that there are a bunch of competing geospatial data standards. I’m a big fan of JSON, so I tend to use GeoJSON when dealing with geospatial data. This certainly falls in the metadata category, but geospatial data is considered a subset because it requires specially tailored functions to effectively gain insights into the data.

I’m going to keep going with my photo example from above. We know every photo already comes with latitude and longitude coordinates, so I can easily write queries in HarperDB using the built-in geospatial functions. I grew up near Washington, DC, so I can easily run a query to count how many photos I’ve taken within a mile radius of the Washington Monument with this query:

SELECT COUNT(*)
FROM photos.metadata
WHERE geoNear([-77.035257,38.889571], geojson_point, 1, 'miles')

Because we’re using SQL, we can run all sorts of different filters and aggregations to narrow down to the exact data we need.

IoT Sensor Data Collection

Changing it up a little bit on this one. The Internet of Things (IoT) has taken off in the last few years and only has more room to grow. The thing about IoT (no pun intended) is that the data is incredibly unstructured. Sensors can return data in all sorts of different formats and you never really know what you’re going to get, especially when trying to configure a hodgepodge of different sensors for different manufacturers. You could use a NoSQL database to collect all of that data, but then you have to ask yourself: what happens once I have that data? If you’re going to do anything beyond archive it, you’re going to need to be able to query it effectively. That’s where HarperDB thrives, as I mentioned above, you can execute SQL on that data immediately. That is made possible by the HarperDB Dynamic Schema!

You might say: What’s a Dynamic Schema and why do I care? A dynamic schema adjusts to the data as it’s ingested. In the HarperDB case this means attributes are reflexively added to the schema as they come into the database. For example, if I add a new sensor with additional attributes and dump them into a sensor table, any new attributes just show up. This means you now automatically have metadata (Aha!) on your schema that you would not have in a standard NoSQL database.

There are a couple of other fancy HarperDB features you might appreciate when working with sensor data. Take for example the HarperDB WebSocket SDK where you can create a publish/subscribe client which you can use to listen to data as it’s ingested and take immediate action. Additionally, you can use Clustering and Replication features to move data between instances, but I’m getting ahead of myself…

Distributed Data Systems

This is what I consider to be the most promising future trend! Truly distributed data, not simply a few data centers across the world, but points of presence physically near users. Sort of like 5G towers everywhere, but for your data. Distributing data on or near the edge is the most effective way to reduce latency for your users and reduce load on your servers, ultimately improving overall customer experience. Of course, this is not something that happens overnight, and most likely not on the initial launch of your project, but it’s important to consider scalability of a project.

At HarperDB, this is top of mind for us always. We call this clustering and replication, and we provide users with the granularity to define exactly what data is moving and where it’s going by configuring data to publish/subscribe at a table level. This means, in the IoT example above, we can configure our devices to publish their data to a primary server, but not receive (or subscribe) any data from anywhere else. In a fully distributed example where we want exact replicas across the globe, we would configure all tables to publish and subscribe. This flexibility enables you to define exactly how your data moves. We have some exciting distributed features on our roadmap for 2021, so be sure to keep an eye out for them!

Data Science

This is a broad category that involves the analysis of data, both structured and unstructured, to extract knowledge. Two of the most popular data science techniques are Machine Learning (ML) and Artificial Intelligence (AI). To call ML/AI a trend would be an understatement. Sometimes it seems like they’re all people can talk about, but for good reason- they’re powerful. I’ve always been a proponent of using a database to power these models, as it provides a great foundation of tools to aggregate and query data. My colleague Margo put together a great blog on using a database for Machine Learning, which you should absolutely check out if you’re interested in this sort of thing.

Closing Thoughts

This is a small subset of projects where HarperDB will provide solid underpinnings and is by no means a complete list. You can try out HarperDB for free with HarperDB Cloud. Give it shot and I think you’ll find that it’s incredibly easy to use and great for rapid and effective development. Have you used HarperDB in other types of projects? What other types of projects do you think HarperDB would be good for? Drop your thoughts in the comments below!

DEV Community

5 Ways to Use HarperDB in Your Next Project

Metadata Management

Geospatial Data Analysis

IoT Sensor Data Collection

Distributed Data Systems

Data Science

Closing Thoughts

Top comments (0)

Read next

Detecting and Analyzing Comment Quality Using Vector Search

Simple SGD Method Matches Adam's Performance While Using Half the Memory

Exploring the mongo Shell: A Command-Line Interface for MongoDB

Understanding ACID Properties: The Pillars of Reliable Databases