DEV Community

Cover image for What is the modern data stack and why is it so important?
Adeife Adeoye
Adeife Adeoye

Posted on

What is the modern data stack and why is it so important?

If you're in the tech or data space, you know the hype surrounding the "modern data stack" - everyone has been talking about it. I recently watched a YouTube Podcast on this topic, featuring the CEO of Faros AI, Vitaly Gordon, along with Lars Kamp, CEO of Some Engineering, and they talked about it at great length.

So what's the big deal with the Modern Data Stack? I’ll explain it to you here.

It's not uncommon for companies to track and store data using different SaaS tools or old-fashioned tools like Google Sheets or Excel.

Unfortunately, this leads to multiple data sources that need to be centralized and combined in one place. Often, that means downloading CSV files and combining all data in one spreadsheet.

This is a terrible approach because:

  • Data becomes siloed. Most companies use so many different SaaS tools for their daily operations. Different teams use different tools too, this leads to a lot of siloed data that can't be accessed and analyzed.

  • It's easier to make mistakes. Manual processes like spreadsheets mean more opportunities for human errors.

  • It's not scalable. Having a siloed data system isn't scalable. As complexity grows, visibility and transparency across the team decline, leading to delays and breakdown in operations.

So what is the solution? A modern data stack.

What is a modern data stack?

A modern data stack is a set of cloud-based tools that enable highly efficient data integration for organizations. A modern data stack creates clean, trustworthy, and accessible data that help address the modern challenges in data management.

Components of modern data stacks

A modern data stack consists of several components that help streamline data processing and management.

Here are the key components of a modern data stack:

Source: Analytics India Magazine

Data Sources (AKA where the data come from)

Every company generates data from different sources:

  • Databases: Companies use databases to store information that powers their product. This includes information about their users and the data they generate in the product. Some commonly used databases include Postgres, Mysql, and Mongo.

  • SaaS tools: Most companies rely on a growing number of SaaS tools (Salesforce, Zendesk, Notion, etc.) in their daily operations.

Data Ingestion

So what happens after you generate the data? The next step is to take it to a centralized place called a cloud data warehouse.

Data ingestion tools collect, process, and prepare data for transformation into a data warehouse. There are two types of data ingestion methods:

ETL (Extract, Transform, Load) is the traditional data ingestion method. In this case, data is extracted from a source, transformed on a secondary processing server, and loaded into a data storage system. The downside to this process is slower data ingestion because the data is transformed on a separate server before getting loaded into the storage system.

ELT (Extract, Load, Transform) is the modern data ingestion method. In this case, data is extracted from a source, loaded into a storage system, and transformed inside the storage system. Since the data can be loaded into your warehouse and transformed simultaneously, the data ingestion process is faster with ELT.

SaaS tools like Fivetran and the open-source Airbyte (which is used in Faros AI) can help with data ingestion. Both tools use the ELT process, which makes it easy to transport raw data from data sources into a data storage solution.

Data Storage

All the data coming from the data sources are transformed into a centralized place called a cloud-data warehouse. In addition to storage, these cloud warehouses can be directly queried and used for analytics purposes, and they also serve as the central component of the data stack.

The most popular data warehouses are Google’s BigQuery, Snowflake and Amazon Redshift.

Data transformation

After the data is loaded and stored in a warehouse, it is transformed into user-friendly data models, making it easier and more useful for consumption.

A good modeling layer is essential to the success of your data analytics and business intelligence programs because the model ensures that business users use consistent, reliable, and accurate information in their analysis.

Dbt, Apache Airflow, and LookML are well-known examples of data transformation tools.

Data Analytics / Business Intelligence

Let's get into data analytics (sometimes simplified as "data visualization"). Data analytics tools (Notable and Hex) allow users to explore and find insights in their data, often through visualization like tables, charts, graphs, and dashboards.

Business Intelligence (BI) tools to help non-technical users (business owners) explore data without needing to know SQL. This frees them from depending on analysts and developers. Some popular BI tools include Tableau, Mode, and the open-source Metabase.

Data Governance and monitoring

Before I round up on the components of the modern data stack, let's quickly touch on data governance and monitoring. This usually involves:

  • Data privacy and access control, so an organization stays legally compliant regarding data protection. One of the popular tools for data privacy include Atlan.

  • Data observability so that issues and errors in the data can be caught and addressed. Great Expectations and Monte Carlo are two popular tools used for data observability.

  • Data cataloging/documentation and discovery so that organizations can keep track and make sense of their data which helps in data discoverability, quality, and sharing. Atlan is also one of the tools in this space, and there are several open-source solutions; Datahub, Metacat, and Amundsen.

Benefits of Modern Data Stacks

Here are some of the benefits of well-built modern data stack solutions:

Reduced costs and maintenance

Modern data stacks can significantly reduce your IT and data engineering costs because your team can launch fully managed data connectors within minutes. MDS tools also have consumption-based pricing - you only pay for what you use. In addition, Since MDS tools are often MSaaS (Managed SaaS), you don’t have to invest in maintaining the systems.

Greater Scalability

Unlike traditional data stacks, your organization does not depend on on-premises servers with MDS, so you can easily and sustainably scale your business. Moreover, the modern data stack consists of a combination of tools that complement but do not interfere with each other. So your organization can easily swap tools or integrate them with other tools or websites.

Fast Execution

Since many processes are automated, your data engineers and analysts don't need to waste time with manual infrastructure management. Instead, they can funnel their efforts into data analytics and business intelligence, creating actionable insights for your organization.

How does Faros AI fit into all of this?

Faros AI's Data Stack

On the podcast, they shared what the Faros AI data stack looks like. It is a neat solution for engineering teams to analyze their operations and understand where the bottlenecks in their software development life cycle may be.

The Faros AI data stack is currently made of dbt, Airbyte, Postgres, and Meltano, which aligns itself well with the “modern data stack” concept and seems to be a good example of tools that are built in this fashion.

You can watch the video below to learn more:

Conclusion

If there can be only one takeaway from this blog, it should be this: a modern data stack will help you save time, effort, and money. In addition, your company will benefit significantly from tooling that is faster, more accessible, and more scalable.

If you are an engineering leader or part of a team of developers/ engineers, then you might want to consider Faros AI, an engineering operations platform that can provide the visibility and insight to help you manage your operations better.

I hope you found the article helpful and resourceful. If you have any questions, don’t hesitate to reach out to me.

Top comments (0)