Introduction
This tutorial will show how to use MindsDB to predict gold prices based on historical data stored in a MongoDB database.
What is MindsDB?
MindsDB is an Open Source technology which adds machine learning capabilities to databases, enabling making predictions over standard databases through standard queries.
What is MongoDB?
MongoDB is a popular non-relational document database that provides support for JSON-like storage.
By using these two, we can predict data values in large databases. In this tutorial, we will predict the gold price over time using this Kaggle dataset.
Setting everything up
MindsDB is available both using their cloud service (cloud.mindsdb.com) or locally, and so is MongoDB, available locally and in their hosted version - MongoDB Atlas.
In this tutorial, I will focus in running both MongoDB and MindsDB in local docker containers, although it is quite similar to how the cloud version works - you can check out the documentation
Requirements
- docker environment. For an introduction on what is Docker and how to use it, checkout this tutorial
- docker-compose, to create and run the multi-container docker application
- mongosh client, to run queries against the MongoDB instance.
Creating the docker containers
First, we will create the docker containers. Create a folder and create a file named docker-compose.yml
:
This will create two containers: mongodb and mindsdb. The first one uses default authentication and everybody will be able to access it. While this should never be publicly available, this is just a first-time approach and will do for the time being. We also use a network bridge between both containers to allow connections between them.
For making data persistent across restarts, create a folder inside the first one called database
, which will be mounted by both containers and where data will be saved.
By now we should have the following directory structure:
.
└── gold-price-estimation/
├── docker-compose.yml
└── database/
└── ...
To start the containers, open a terminal, go inside the gold-price-estimation
folder and run docker-compose up -d
(you might need to have superuser privileges, depending on your configuration). You now have MindsDB and MongoDB running in your system!
Check this out by going to 127.0.0.1:47334. You should find MindDB's web UI.
If something didn't work out, double-check the previous steps and if it persists feel free to leave a comment below.
Importing the data in the database
After downloading the .csv file from Kaggle, we need to copy the file from the host to the container by running docker cp file_name.csv mongodb:/file_name.csv
.
To import the database, we need to first access the docker container with docker exec -it mongodb bash
, after which we will be presented with a bash command line. Then, run:
mongoimport --type csv -d golddb -c price --headerline --drop gold.csv
Here, golddb
is the name of the database and price
is the table name.
Leave the container with exit
and check whether the data is correctly imported:
mongosh --host mongodb --port 27017
USE golddb;
db.price.find({});
This should output the first entries in the database.
Create a connection from MindsDB to MongoDB
We need to specify MindsDB where our database is. For this, run:
mongosh --host mindsdb --port 47336
In the client, execute:
USE mindsdb;
db.databases.insertOne({ name: "mongo_gold", engine: "mongodb", connection_args: { "port": 27017, "host": "mongodb", "database": "golddb" } })
With this, we first connect our client to MindsDB’s database. Inside it, we select the database mindsdb
and insert an entry with the parameters of the connection:
-
mongo_gold
: name by which mindsdb identifies the connection -
mongodb
: URL/IP of the host, in this case the name of our docker container.
This command should return
{
acknowledged: true,
insertedId: ObjectId("635433076e11662dcd20042c")
}
, which tells us that it has saved the configuration. Runnning
db.databases.find()
confirms it by returning
{
name: 'mongo_gold',
database_type: 'mongodb',
host: 'mongodb',
port: '27017',
user: null
}
All that remains is to create the predictor!
Creating the predictor
While still inside the client, we run:
db.predictors.insert({ name: "mongo_gold_p", predict: "Value", connection: "mongo_gold", "select_data_query": "db.gold.find()" });
What do those parameters mean?
-
name
: name by which mindsdb identifies the predictor -
predict
: name of the column in the database which values we want to predict -
connection
: name we created previously by which Mindsdb identifies the connection -
select_data_query
: this allows to specify specific rows in the database by using standard MongoDB queries. For this example, we will use all rows.
We should receive an acknowledged message, and then MindsDB will start working on creating the predictor. You can check the status of the model generation with:
db.predictors.find({name: "model_name"});
Test it out!
Once the status is complete
, it will be ready to answer to queries.
Go ahead and predict a value with:
USE mongo_gold;
db.mongo_gold_p.find({Date: "1975-07-01"})
If it returns the prediction (along with some expressions like confidence, among others - full description in the documentation), congratulations!
Conclusion
You have successfully created a MongoDB database which, extended thanks to the power of MindsDB, is able to generate predictions based on the data provided.
For more info, make sure to check out the docs
Thank you for reading my tutorial, if it has been useful for you remember to give it a like and share it with your friends! Also, check out my Github profile.
Top comments (1)
If you have any questions, feel free to ask below!