DEV Community

Rutam Prita Mishra
Rutam Prita Mishra

Posted on • Updated on

Predict Water Quality using MindsDB

Cover

Introduction

MindsDB is amazing in what it does as it embeds an AI layer on top of the existing traditional databases making them smarter and enabling them to create robust Predictor models with the data they have, without really worrying about the underlying coding and all.

MindsDB also makes the user interaction really easy as one just needs to use some simple SQL queries to get things done here. Currently, MindsDB offers both a free and paid version for all of its users and is available in two variants i.e., Self-hosted deployment and MindsDB Cloud.

In this tutorial we will be predicting the quality of water based on several feature parameters using a dataset on MindsDB Cloud.

Feeding Data to MindsDB Cloud

In order to feed the dataset to MindsDB Cloud, we need to first download it from any freely available sources like Kaggle and then upload it simply to MindsDB using the steps mentioned below.

Step 1: Let's sign in to the MindsDB Cloud console or simply register for a new account.

MindsDB Register

Step 2: Once you're logged in, you will find the MindsDB Cloud Editor. The top portion is simply a Query Editor where we can write the queries and execute them, the bottom comprises of the Result Viewer where we can see the results of the executed queries and the right panel contains the Learning Hub for anyone who is just getting started with MindsDB.

Cloud Editor

Step 3: Now hit the Add Data button from the top right corner and click on the Files tab instead of Databases followed by clicking on the Import File button.

Files Dashboard

Step 4: In this step we will simply upload the .CSV file that we can easily download from Kaggle here. Now we just have to provide a table name in the Table name field and then click on Save and Continue to import the file as a table in the given table name to MindsDB Cloud.

Import File Wizard

Step 5: Upon successful import of the table, the control takes us back to the Cloud Editor page where we can see two simple SQL queries listed to list the names of available tables and then check the data in the current table that we just imported.

Let's execute the first command and check the list of available. We should be able to find a table named WaterQL which confirms that the current table is present in the database.

SHOW TABLES FROM files;
Enter fullscreen mode Exit fullscreen mode

Show Tables

Now let's execute the second query and check whether we have some data rows present in our table or now. This query should return 10 data rows.

SELECT * FROM files.WaterQL LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Table Rows

This confirms that we are ready with the data table now. Let's proceed to the next part where we will train a Predictor model using this data.

Training a Predictor Model

MindsDB provides very simple SQL queries to carry out different tasks in its interface. So, we will now proceed with the steps below to get ready with the Predictor model.

Step 1: MindsDB provides a CREATE PREDICTOR syntax that we can use to train the model. Follow the syntax below.

CREATE PREDICTOR mindsdb.predictor_name       (Your Predictor Name)
FROM database_name                            (Your Database Name)
(SELECT columns FROM table_name LIMIT 10000)  (Your Table Name)
PREDICT target_parameter;                     (Your Target Parameter)
Enter fullscreen mode Exit fullscreen mode

Simply, replace the names with the ones you want to use for your Predictor and the respective table names and you should be good to go with this. For e.g., the actual query for me looks something like this.

CREATE PREDICTOR mindsdb.water_quality
FROM files 
(SELECT * FROM WaterQL LIMIT 10000)
PREDICT Potability;
Enter fullscreen mode Exit fullscreen mode

Create Predictor

Step 2: Based on the size of the dataset used, it might take a while for the model to complete its training. We can check the training status of the model using the following statement.

SELECT status
FROM mindsdb.predictors
WHERE name='Name_of_the_Predictor';
Enter fullscreen mode Exit fullscreen mode

The actual query will be formed like this putting the name of the model in place of the placeholder above.

SELECT status
FROM mindsdb.predictors
WHERE name='water_quality';
Enter fullscreen mode Exit fullscreen mode

Predictor Status

As the status returned is complete, we are now ready to do the predictions for water quality.

Note: There are 3 possible statuses for the model in the following sequence.

  • generating--> It means the model is getting generated currently
  • training--> It means that the model is now getting trained with the dataset.
  • complete--> It means the model is now ready to do the predictions.

Describing the Predictor Model

Before we proceed to the final part of predicting the water quality, let us first understand the underlying model that we just trained.

MindsDB provides the following 3 types of descriptions for the model using the DESCRIBE statement.

  • By Features
  • By Model
  • By Model Ensemble

By Features

DESCRIBE mindsdb.predictor_model_name.features;
Enter fullscreen mode Exit fullscreen mode

This query shows the role of each column for the model along with the type of encoders used on them while training.

Feature description

By Model

DESCRIBE mindsdb.predictor_model_name.model;
Enter fullscreen mode Exit fullscreen mode

This query shows the list of all the underlying candidate models that are used during training. The one with the best performance i.e., whose value is closer to 1, is selected. You can see the value 1 for the selected one in the selected column while others are set at 0.

Model description

By Model Ensemble

DESCRIBE mindsdb.predictor_model_name.ensemble;
Enter fullscreen mode Exit fullscreen mode

This query gives back a JSON output that contains the different parameters that ultimately helped to choose the best candidate model for the Predictor.

Ensemble description

As we are done now understanding our Predictor model, let's move on to prediciting values in the next section.

Predicting the Target Value

Predicitng the water quality(Potability) is as easy as running a simple SELECT statement using the Predictor.

As water quality depends on many feature parameters, it is advised to do the prediction providing all the required feature parameter values for an accurate prediction. However, we can still go ahead and choose doing this by passing a few of them.

The query for this will be as follows.

SELECT target_value_name, target_value_confidence, target_value_confidence
FROM mindsdb.predictor_name
WHERE feature1=value1 AND feature2=value 2,...;
Enter fullscreen mode Exit fullscreen mode

Now, replacing the placeholders in the above query, the actual query will be like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_quality
WHERE ph=2.6 AND Hardness=210 AND Solids=18645.233 AND Chloramines=6.546;
Enter fullscreen mode Exit fullscreen mode

Partial Potability

As the predicted Potability (Water Quality) is 0, this water is not safe for human consumption.

We will now pass all the required feature parameters to obtain a more accurate prediction of the water quality. So, the query now becomes something like this.

SELECT Potability,Potability_confidence,Potability_explain
FROM mindsdb.water_quality
WHERE ph=6.9 AND Hardness=201 
AND Solids=11350.675 AND Chloramines=4.3 AND Sulfate=NULL 
AND Conductivity=467.5 AND Organic_carbon=9.98 AND Trihalomethanes=89.686 AND Turbidity=4.99;
Enter fullscreen mode Exit fullscreen mode

Complete Potability

As the predicted Potability (Water Quality) is 1, this water is safe for human consumption.

Kudos! We have now successfully predicted the water quality using a Predictor.

Note: While predicting we supplied three parameters with the SELECT statement.

  • target_parameter: This returns the value we want to predict.
  • target_parameter_confidence: This returns how confident the model is about the Prediction.
  • target_parameter_explain: This returns all the details about the predicted target_value i.e., the value of the target predicted, the confidence level, anomalies, if any, truth value, etc.

Conclusion

This concludes the tutorial here. Before we wrap this up, let's do a quick recap of what we did here. We first started with creating a MindsDB Cloud account, fed the dataset and created a table using the cloud UI, trained a Predictor model, described its model features and finally predicted the target water quality value.

MindsDB is really simple, easy-to-use and free to all of its users. So, I would suggest all of you to pick up any random dataset from the internet and start predicting values out of it using your own MindsDB Predictors.

Lastly, before you leave, I would love to know your feedback in the Comments section below and would be really motivated if you drop a LIKE on this article.

MindsDB Sponsor

Sponsorship Badge

Top comments (0)