Rutam Prita Mishra

Posted on Oct 13, 2022 • Edited on Oct 15, 2022

Predict Water Quality using MindsDB

#tutorial #machinelearning #datascience #database

Introduction

MindsDB is amazing in what it does as it embeds an AI layer on top of the existing traditional databases making them smarter and enabling them to create robust Predictor models with the data they have, without really worrying about the underlying coding and all.

MindsDB also makes the user interaction really easy as one just needs to use some simple SQL queries to get things done here. Currently, MindsDB offers both a free and paid version for all of its users and is available in two variants i.e., Self-hosted deployment and MindsDB Cloud.

In this tutorial we will be predicting the quality of water based on several feature parameters using a dataset on MindsDB Cloud.

Feeding Data to MindsDB Cloud

In order to feed the dataset to MindsDB Cloud, we need to first download it from any freely available sources like Kaggle and then upload it simply to MindsDB using the steps mentioned below.

Step 1: Let's sign in to the MindsDB Cloud console or simply register for a new account.

Step 2: Once you're logged in, you will find the MindsDB Cloud Editor. The top portion is simply a Query Editor where we can write the queries and execute them, the bottom comprises of the Result Viewer where we can see the results of the executed queries and the right panel contains the Learning Hub for anyone who is just getting started with MindsDB.

Step 3: Now hit the Add Data button from the top right corner and click on the Files tab instead of Databases followed by clicking on the Import File button.

Step 4: In this step we will simply upload the .CSV file that we can easily download from Kaggle here. Now we just have to provide a table name in the Table name field and then click on Save and Continue to import the file as a table in the given table name to MindsDB Cloud.

Step 5: Upon successful import of the table, the control takes us back to the Cloud Editor page where we can see two simple SQL queries listed to list the names of available tables and then check the data in the current table that we just imported.

Let's execute the first command and check the list of available. We should be able to find a table named WaterQL which confirms that the current table is present in the database.

SHOW TABLES FROM files;

Now let's execute the second query and check whether we have some data rows present in our table or now. This query should return 10 data rows.

SELECT * FROM files.WaterQL LIMIT 10;

This confirms that we are ready with the data table now. Let's proceed to the next part where we will train a Predictor model using this data.

Training a Predictor Model

MindsDB provides very simple SQL queries to carry out different tasks in its interface. So, we will now proceed with the steps below to get ready with the Predictor model.

Step 1: MindsDB provides a CREATE PREDICTOR syntax that we can use to train the model. Follow the syntax below.

CREATE PREDICTOR mindsdb.predictor_name       (Your Predictor Name)
FROM database_name                            (Your Database Name)
(SELECT columns FROM table_name LIMIT 10000)  (Your Table Name)
PREDICT target_parameter;                     (Your Target Parameter)

Simply, replace the names with the ones you want to use for your Predictor and the respective table names and you should be good to go with this. For e.g., the actual query for me looks something like this.

CREATE PREDICTOR mindsdb.water_quality
FROM files 
(SELECT * FROM WaterQL LIMIT 10000)
PREDICT Potability;

Step 2: Based on the size of the dataset used, it might take a while for the model to complete its training. We can check the training status of the model using the following statement.

SELECT status
FROM mindsdb.predictors
WHERE name='Name_of_the_Predictor';

The actual query will be formed like this putting the name of the model in place of the placeholder above.

SELECT status
FROM mindsdb.predictors
WHERE name='water_quality';

As the status returned is complete, we are now ready to do the predictions for water quality.

Note: There are 3 possible statuses for the model in the following sequence.

generating--> It means the model is getting generated currently

training--> It means that the model is now getting trained with the dataset.

complete--> It means the model is now ready to do the predictions.

Describing the Predictor Model

Before we proceed to the final part of predicting the water quality, let us first understand the underlying model that we just trained.

MindsDB provides the following 3 types of descriptions for the model using the DESCRIBE statement.

By Features
By Model
By Model Ensemble

By Features

DESCRIBE mindsdb.predictor_model_name.features;

This query shows the role of each column for the model along with the type of encoders used on them while training.

By Model

DESCRIBE mindsdb.predictor_model_name.model;

This query shows the list of all the underlying candidate models that are used during training. The one with the best performance i.e., whose value is closer to 1, is selected. You can see the value 1 for the selected one in the selected column while others are set at 0.

By Model Ensemble

DESCRIBE mindsdb.predictor_model_name.ensemble;

This query gives back a JSON output that contains the different parameters that ultimately helped to choose the best candidate model for the Predictor.

As we are done now understanding our Predictor model, let's move on to prediciting values in the next section.

Predicting the Target Value

Predicitng the water quality(Potability) is as easy as running a simple SELECT statement using the Predictor.

As water quality depends on many feature parameters, it is advised to do the prediction providing all the required feature parameter values for an accurate prediction. However, we can still go ahead and choose doing this by passing a few of them.

The query for this will be as follows.

SELECT target_value_name, target_value_confidence, target_value_confidence
FROM mindsdb.predictor_name
WHERE feature1=value1 AND feature2=value 2,...;

Now, replacing the placeholders in the above query, the actual query will be like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_quality
WHERE ph=2.6 AND Hardness=210 AND Solids=18645.233 AND Chloramines=6.546;

As the predicted Potability (Water Quality) is 0, this water is not safe for human consumption.

We will now pass all the required feature parameters to obtain a more accurate prediction of the water quality. So, the query now becomes something like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_quality
WHERE ph=6.9 AND Hardness=201 
AND Solids=11350.675 AND Chloramines=4.3 AND Sulfate=NULL 
AND Conductivity=467.5 AND Organic_carbon=9.98 AND Trihalomethanes=89.686 AND Turbidity=4.99;

As the predicted Potability (Water Quality) is 1, this water is safe for human consumption.

Kudos! We have now successfully predicted the water quality using a Predictor.

Note: While predicting we supplied three parameters with the SELECT statement.

target_parameter: This returns the value we want to predict.

target_parameter_confidence: This returns how confident the model is about the Prediction.

target_parameter_explain: This returns all the details about the predicted target_value i.e., the value of the target predicted, the confidence level, anomalies, if any, truth value, etc.

Conclusion

This concludes the tutorial here. Before we wrap this up, let's do a quick recap of what we did here. We first started with creating a MindsDB Cloud account, fed the dataset and created a table using the cloud UI, trained a Predictor model, described its model features and finally predicted the target water quality value.

MindsDB is really simple, easy-to-use and free to all of its users. So, I would suggest all of you to pick up any random dataset from the internet and start predicting values out of it using your own MindsDB Predictors.

Lastly, before you leave, I would love to know your feedback in the Comments section below and would be really motivated if you drop a LIKE on this article.

Top comments (1)

ijosbuttler11 • Apr 29

MindsDB simplifies predictive modeling by embedding AI into traditional databases. The tutorial demonstrates easy data feeding, model training, and prediction using SQL queries. MindsDB's simplicity and free access make it accessible for anyone. It's a great tool for experimenting with datasets and generating predictions effortlessly. I'm gonna share it on my watersoftenersizecalculator.com blog. Appreciated!

DEV Community

Predict Water Quality using MindsDB

Introduction

Feeding Data to MindsDB Cloud

Training a Predictor Model

Describing the Predictor Model

By Features

By Model

By Model Ensemble

Predicting the Target Value

Conclusion

Top comments (1)

Read next

Amazon SQS: The Backbone of Asynchronous Communication

Can React v19 replace React Query(Tanstack)?

Part 12: Building Your Own AI - Model Evaluation and Tuning for Optimal Performance

Exposing LLM-Controlled Robots' Vulnerability to Jailbreaking Physical Attacks