DEV Community

Cover image for Time Series Database and Analytics using Azure Data Explorer
Yogesh Dipankar for OCP

Posted on • Originally published at linkedin.com

Time Series Database and Analytics using Azure Data Explorer

In this article, I will be giving a summary of the Time Series Database and how it can be used for Advanced Analytics; we will also cover Azure Data Explorer: A Powerful time series database; this is one of the first articles in the series of Timeseries: ADX articles that I am planning to publish, so lets start which these basic three questions

  • What is a Time series Database?
  • Why Time-series DB?
  • How to do Advanced Analytics with ADX: Time Series DB

What is a Time Series Database?

A Time series database is a database capable of storing high-frequency, streaming data in time series; it stores information or data part of a β€œtime series,” which is a set of data points associated with timestamps. Due to this, the time series database size grows quickly in a short time. We will discuss its characteristics further; let's look at popular TimeSeries Databases.

Time Series Database

Why Time Series?

The data coming from Sensors, Scooters, Cars, and Devices running for long hours is usually coming in milliseconds, or nanoseconds, so to store and fetch this data from relational databases on no SQL databases is not a perfect match, storing and optimising this on timestamp till nanoseconds level and ability to fetch, partition the data on timestamp is inbuilt to time series database.

If you see the characteristic of the data here, it is a stream of data coming at a very high frequency, like every second. Time dimension takes a crucial role here. A time series database helps to partition this data on time.

The best way to analyze the things which are constantly fluctuating i.e., Non-Stationary Data over time is by using Time Series

Partitioning data on time helps sort the out-of-order data and ordered data separately. This immensely improves the analytical queries, data engineering jobs, and data scientist models running on it. Also, the data is indexed on time, so fetching time based on time is fast. It also supports time-based joins across multiple tables for effective analytics.

Why Time Series ?

A few of the things that a time series database is good at for the live streaming data is

a) Low latency for data ingestion from the sensor to the database

b) High write throughput as the time series data is live, write-heavy, and highly concurrent

c) High read throughput as the time series data is to be read and instantly presented on BI Tools

d) High data compression within a day, millions of records are added; these should be compressed while ingestion or while storage to avoid substantial disk size consumption

Azure Data Explorer is a fully managed PaaS service from Azure; it comes as a very good option for Time Series Database it fits in all the above requirements, and it also has great computing power to process the live streaming data and historical data stored in ADX cluster

Time Series can be created and analysed within a few seconds in Azure Data Explorer

ADX can generate real-time alerts based on the data that is ingested into ADX; it can run rules against the data in flight and generate alerts, emails

ADX also has very good integration with all Azure Services like Logic App, Event Hub, Blob Storage, and Application insights the journey of ADX ( Kusto ) started at Microsoft as a log aggregator, it is being used as a log engine of all Azure services for last 10 years, in 2021 it was publicly launched to be used as Time Series Database

It has some great machine learning features, trends, correlations, and models can be executed on the fly, and visualisations can be generated using UI/UX and BI tools

Other striking features of ADX

  1. It is used as a logging solution for all Microsoft Services in the background for the last 10 years
  2. So it's a proven solution, as per Microsoft documentation daily 100 Petabytes of data is ingested in ADX, a very powerful data ingestion system
  3. Use intuitive and powerful KQL ( Kusto Query Language ), Azure Log Analytics queries are actually KQL queries so upskilling for Azure resources should not be a big challenge
  4. For SQL users there will be a bit of learning but should be easily manageable
  5. Direct integration point with all Azure services and multiple other third-party services
  6. Plotting trends and applying toner machine learning algorithms becomes easy using ADX features

How to do Advanced Analytics with ADX: Time Series DB?

ADX for Time Series Analytics

  1. Inbuilt support for TimeSeries creation
  2. Inbuilt functions for analogy detection and forecasting
  3. Inbuilt support for ML algorithm integration

Advanced Analytics with ADX

In the next few articles, I am going to cover different aspects of the ADX Timeseries database and do a deep dive into it

Let me know what role you play: Are you a Data Engineer, Data Scientist, or Data Analyst, and are you already using any Timeseries database, also let me know what aspect of the Timeseries database you would like to know more - about so that I cover that in my next article, also if you are interested in learning ADX: Time Series database architecture and implementation, I will be happy to share

Hope you enjoyed the article, thanks for reading

Cheers,

Yogesh

Top comments (0)