DEV Community

Cover image for Setting up data monitoring for Snowflake
Ivan Porollo
Ivan Porollo

Posted on • Updated on

Setting up data monitoring for Snowflake

Updated Feb. 4th, 2022

Data quality and reliability are still a source of headaches for data organizations today. Monosi exists to resolve the issues that teams face.

Monosi is an open source data observability and monitoring platform for data teams (see Monosi Github). It is used to quickly set up monitors on a data store to run checks for data quality issues and alert on detected anomalies.

This article will walk you through how to get started monitoring a data warehouse in less than 10 minutes with Monosi.

Pre-requisites

For this tutorial, we are going to use Snowflake as our data warehouse.

If you don't have a Snowflake account, it's easy to create one. Go ahead and create an account - select the standard edition and your preferred cloud provider (for this example we will be using AWS - US West).

After signing up, you should receive an email with account details to login. With the account details in place, you are ready to start with Monosi.

Installing Monosi

Monosi is a Python package that is distributed through pypi.

Note: Monosi is only compatible with Python 3.6 and higher.

To install it, open your terminal and run:

pip install monosi
Enter fullscreen mode Exit fullscreen mode

Check that it's been installed by running:

monosi --version
Enter fullscreen mode Exit fullscreen mode

Configuring a data source

With Monosi installed, we need to configure a data source to monitor. Monosi reads connection details from the ~/.monosi/workspaces.yml file, so let's create it:

mkdir ~/.monosi
touch ~/.monosi/workspaces.yml
Enter fullscreen mode Exit fullscreen mode

Edit the workspaces.yml file in your editor of choice and fill it out with your information (specifically <user-name>, <password>, <account-name>). This is what the file will look like:

default:
    sources:
        default:
            type: snowflake
            user: <user-name>
            password: <password>
            account: <account-name>
            warehouse: COMPUTE_WH
            database: SNOWFLAKE_SAMPLE_DATA
Enter fullscreen mode Exit fullscreen mode

For the purposes of this tutorial, we will be using the SNOWFLAKE_SAMPLE_DATA database. This database is provided out of the box by Snowflake.

You can find more information on setting up the workspaces.yml file here.

Creating a Monosi project

Navigate to the directory where you want your Monosi project to live and create a project repository by running:

monosi init
Enter fullscreen mode Exit fullscreen mode

In the directory you should now see a folder called monosi-repo (you can rename this if you want). Navigate into the folder by running:

cd monosi-repo 
Enter fullscreen mode Exit fullscreen mode

In the monosi-repo directory, you should see a monosi_project.yml file.

This file configures which connection to use and your monitor paths, as well as other metadata. More information on setting up the monosi_project.yml file can be found here.

For the purposes of this tutorial, we don't need to edit the file.

Creating a monitor

Monosi automatically creates a folder called monitors in the monosi-repo directory. In that folder, an example custom SQL monitor is defined to show how the syntax works.

As mentioned, Snowflake provides some example data in their SNOWFLAKE_SAMPLE_DATA database. We can create a monitor for the orders table:

touch ./monitors/orders.yml
Enter fullscreen mode Exit fullscreen mode

Edit the orders.yml file in your editor of choice and paste the following:

monosi:
  monitors:
  - table: SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.orders
    timestamp_field: o_orderdate
    type: table
    days_ago: -10000
Enter fullscreen mode Exit fullscreen mode

This creates a monitor as code on the provided orders table. It monitors for table health metrics.

Run the monitors

Start the monitor by running the following command in the monosi-repo directory:

monosi run
Enter fullscreen mode Exit fullscreen mode

It takes a few seconds to run. The resulting output should be similar to:

orders monitor output

🎉 Congratulations, you just ran your first Monosi monitor! From the output of the run, you should see that the example data has several anomalies that the Monosi monitor has detected.

Scheduling monitors

With the release of v0.0.3, we have added a user interface that supports the scheduling of monitors. We've also created a video walkthrough of the UI here.

Monosi provides a docker image to run the application. Make sure you have docker installed and ready to use by running:

docker ps
Enter fullscreen mode Exit fullscreen mode

To get the Monosi UI up and running with docker, run the following command:

docker run -p 3000:3000 monosi/monosi
Enter fullscreen mode Exit fullscreen mode

Navigate to http://localhost:3000 and you will see the Monosi UI.

Set up a connection to your Snowflake account in the UI by navigating to http://localhost:3000/settings/sources.

After setting up the connection, create a monitor in the UI by navigating to http://localhost:3000/monitors and clicking the Create Monitor button. Fill out the form with the following information:

Name: Orders Monitor
Check every: 720 minutes
Monitor type: Table Health
Data Source: <Your Snowflake Datasource Name>
Table: SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.orders
Timestamp Field: o_orderdate

Hit save and your new monitor will appear in the monitors table.

🎉 Congratulations, you just scheduled a data monitor! This process will run indefinitely until you delete the monitor. To get alerts on detected anomalies, set up a Slack connection in http://localhost:3000/settings/integrations

Wrapping up

Now that you’ve worked through an example using Snowflake's provided data, you can further extend this to your own data with the Monosi profiler and custom SQL monitors.

If you have any questions, join our Slack community or open an issue in our repository on Github.

Discussion (0)