Segment is a popular CDP (Customer Data Platform) used by over 20,000 companies. One of the primary use cases for Segment is to collect event or clickstream data and pipe it into a destination of choice.
With any event collection system, anomalies are bound to occur in the ingestion pipeline and the data itself. General observability of such pipelines is important. It is possible with tools like Datadog, which collects metadata around event delivery such as latency and delivery success vs. failure.
Unfortunately, these tools fall short when it comes to monitoring the actual data (and the related metadata) collected in the events.
Segment events generally hold vital information standardized by a set schema and instrumented in code. Schemas defined in code are occasionally incorrect or not set to the underlying standards. This quickly becomes a data quality issue which cannot be detected by standard observability tools.
This is where data observability tooling comes to the rescue. Monosi is an open source data observability and monitoring platform for data teams (see Monosi Github). It is used to quickly set up monitors on a data store to run checks for data quality issues and alert on detected anomalies.
This post will walk you through getting started with Segment event data monitoring through the Monosi platform, ensuring the data quality of your events in less than 10 minutes.
For this tutorial, you will need a Segment business account with an event source set up. If you don't have a business plan, Segment offers a 14 day free trial. If you don't have an event source, learn more on how to start collecting events with Segment here.
You will also need one of your Segment destinations to be Postgres or Snowflake to work with Monosi. If you don't have either, I recommend creating a free Snowflake account and creating a Snowflake destination in Segment. For the purposes of this tutorial, we will be using Snowflake, but Postgres works as well.
Finally, you will need Docker to install and run Monosi. If you're not familiar with Docker, follow the tutorial here. Make sure you have docker installed and ready to use.
Monosi provides a Docker image to run the web interface and simplify deployment. To install and run Monosi through Docker, run the following command:
docker run -p 3000:3000 monosi/monosi
Since Segment is writing data to a data store destination (in this case Snowflake), we need to give Monosi the appropriate connection details.
Set up a connection to the data store in the UI by navigating to
http://localhost:3000/settings/sources. Click the
Create a Data Source button and fill out the connection details.
Monosi sends alerts on detected anomalies to Slack. Set up a Slack integration by navigating to
http://localhost:3000/settings/integrations and creating a new integration with a Slack channel webhook. More information can be found here.
With the Segment event data source and the Slack integration connected, we can now create a monitor. Navigate to the Monosi page
http://localhost:3000/monitors and select the table health monitor.
Fill out the form with a monitor name, description, and an interval for the monitor to run on (in minutes).
Then, select your created data source. For the purposes of this tutorial we are using Snowflake, therefore the inputs for a Segment
Page event monitor would be:
Save the monitor and it will appear in the monitors index view. This monitor will run until deletion. If any anomalies in the data are detected, it will send them to the connected Slack channel.
🎉 Congratulations, you've just set up and scheduled a data monitor on your Segment event data. You can now add more monitors to other event tables in your database. Find more information on how to use Monosi here.