For pretty decent time doing analytics with MongoDB required additional overhead compared to modern SQL RDBMS and Data Warehouses associated with aggregation pipeline and map-reduce practices. While this approach allowed to craft advanced custom tailored aggregation algorithms it required additional knowledge of how to build and maintain it.
To fill this gap MongoDB released MongoDB connector for BI which acts as MySQL server on top of your MongoDB data. Under the hood it bridges existing aggregation mechanisms to MySQL protocol allowing standard MySQL clients to connect and issue SQL queries.
In this short 30-minutes tutorial we’ll setup MongoDB connector for your local MongoDB instance and spin up Node.js application which provides an analytics API, query cache and orchestration using Cube.js analytics framework. Cube.js can be used as a standalone server or can be embedded as part of existing Node.js web application. You can learn more about it here.
Setting up MongoDB connector for BI
To install MongoDB connector for BI locally you can use either quickstart guide or one of platform dependent installation guides.
Please make sure you use MongoDB version that supports MongoDB connector for BI. In this tutorial we use 4.0.5.
If you don’t have local MongoDB instance please download it here.
BI Connector can be downloaded here.
After BI connector has been installed please start mongod
instance first. If you use downloaded installation it can be started from its home directory like:
$ bin/mongod
BI connector itself can be started the same way:
$ bin/mongosqld
Please note that mongosqld
resides in another bin
directory.
If everything works correctly you should see success log message in your shell for mongosqld
process:
[initandlisten] waiting for connections at 127.0.0.1:3307
If you’re using MongoDB Atlas you can use this guide to enable BI connector.
Importing test dataset
You can skip this step if you already have data in your DB which can be analyzed. Otherwise you can use zip code test dataset from MongoDB to populate your DB with some test data.
Download zips.json and run mongoimport
from MongoDB home directory:
$ bin/mongoimport --db test --collection zips --file <path/to/zips.json>
Please make sure to restart MongoDB BI connector instance in order to generate up-to-date MySQL schema out from just added collection.
Spinning up Cube.js application
We’ll use Cube.js to make analytic queries to our MongoDB instance. To install its CLI run:
$ npm install -g cubejs-cli
To create new Cube.js application with MySQL driver run:
$ cubejs create mongo-tutorial -d mysql
Go to just created mongo-tutorial
directory and edit just created .env
file: replace placeholders with your MongoDB BI connector credentials.
By default it should be something like:
CUBEJS_DB_HOST=localhost
CUBEJS_DB_NAME=test
CUBEJS_DB_PORT=3307
CUBEJS_DB_TYPE=mysql
CUBEJS_API_SECRET=941ed7ad8a49bec1b3f87a9f48bff2a5e549e946fc22fcc5f18c3a17bf62c64ed3398e99b271cd66d4521a6fd7caa4bfd268dfccea59ffd9c67de660f3967338
Now generate Cube.js schema files for zips
collection from test data set or for your own collection:
$ cd mongo-tutorial
$ cubejs generate -t zips
In order to start Cube.js dev server you’ll also need locally running Redis instance which is used for cache and query queue orchestration. You can download it and run using these instructions.
If everything went smoothly you’re able to run Cube.js dev server:
$ npm run dev
If server started successfully you can now open http://localhost:4000
and navigate to Cube.js dev environment. There you should see working example of a Pie chart.
Building Dashboard
Cube.js dev environment example contains all essential client pieces to build analytics dashboard. Let’s modify it a little bit so it looks like more a dashboard and uses zips collection.
Replace contents of index.js
in your dev environment code sandbox:
import React from "react";
import ReactDOM from "react-dom";
import "antd/dist/antd.css";
import { Row, Col, Card, Layout, Spin } from "antd";
import cubejs from "@cubejs-client/core";
import { QueryRenderer } from "@cubejs-client/react";
import { Chart, Axis, Tooltip, Geom, Coord, Legend } from "bizcharts";
const renderChart = resultSet => (
<Chart height={400} data={resultSet.chartPivot()} forceFit>
<Coord type="theta" radius={0.75} />
<Axis name="Zips.count" />
<Legend position="bottom" name="category" />
<Tooltip showTitle={false} />
<Geom type="intervalStack" position="Zips.count" color="x" />
</Chart>
);
const cubejsApi = cubejs(
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1NDkzMDk3NzMsImV4cCI6MTU0OTM5NjE3M30.eXEdfUa_ek2V9MlGTpBMOd_AFfs8laaZj8ZsuM1wqqo",
{ apiUrl: "http://localhost:4000/cubejs-api/v1" }
);
const { Header, Footer, Sider, Content } = Layout;
const App = () => (
<Layout>
<Header>
<h2 style={{ color: "#fff" }}>MongoDB Dashboard</h2>
</Header>
<Content style={{ padding: "25px", margin: "25px" }}>
<Row type="flex" justify="space-around" align="top" gutter={24}>
<Col span={24} lg={12}>
<Card title="Zip count by state" style={{ marginBottom: "24px" }}>
<QueryRenderer
query={{ measures: ["Zips.count"], dimensions: ["Zips.state"] }}
cubejsApi={cubejsApi}
render={({ resultSet, error }) =>
(resultSet && renderChart(resultSet)) ||
(error && error.toString()) || <Spin />
}
/>
</Card>
</Col>
</Row>
</Content>
</Layout>
);
const rootElement = document.getElementById("root");
ReactDOM.render(<App />, rootElement);
Please make sure to replace the auth token with your own Cube.js temporary token that’s printed in the console while server start. Read more about auth tokens here.
If everything worked well you should see the following dashboard:
Why Cube.js?
So why using Cube.js is better than hitting SQL queries to MongoDB directly? Cube.js solves plethora of different problems every production ready analytic application need to solve: analytic SQL generation, query results caching and execution orchestration, data pre-aggregation, security, API for query results fetch and visualization.
These features allows to build production grade analytics applications that able to handle thousands of concurrent users and billions of data points. It also allows you to do analytics on a production MongoDB read replica or even MongoDB main node due to ability to reduce amount of actual queries issued to a MongoDB instance. Cube.js schemas also allows to model everything from simple counts to funnels and cohort retention analysis. You can learn more about it here.
Performance considerations
In order to be able to handle massive amount of data Cube.js heavily relies on pre-aggregations. As of now MongoDB BI Connector doesn’t support Create Table as Statement which is required to materialize query results right in database and create pre-aggregations. If you need to analyze well over 100M of data points in MongoDB please consider using Presto with MongoDB Connector which is also supported by Cube.js.
Top comments (0)