DEV Community

Cover image for Predicting Supermarket Future Sales Using Machine Learning with MindsDB.
EphraimX
EphraimX

Posted on

Predicting Supermarket Future Sales Using Machine Learning with MindsDB.

Data science and machine learning have been dubbed the sexiest jobs of the twenty-first century, but despite the hype, corporate executives and managers have come to appreciate the value they provide to any organization. Values such as assisting management in making sound decisions, assisting companies in defining goals and targets, boosting income through effective identification of target audiences, and recruiting the right people These advantages have led to increased growth and adoption among enterprises in a variety of industries.

However, developing competence in data science entails learning not only programming languages like Python or R but also principles in topics like arithmetic and statistics, making the learning curve a bit steep for prospective and current developers. Aside from the individual's learning curve, corporations will need to employ a variety of individuals to deploy an end-to-end machine learning model in production. This increases the challenge of developing and growing machine learning systems, particularly for small and medium-sized businesses.

To solve these issues, MindsDB is developing a platform that enables developers with a basic knowledge of Structured Query Language (SQL) to develop and deploy models using machine learning in databases. And in this tutorial, you'll learn how to create these models with the MindsDB SQL editor.

Getting Started by Creating a MindsDB Account.

To create an account;

  1. Click the Get Started button on the homepage. Get started with MindsDB
  2. Fill in your details in the respective fields below. Sign up for MindsDB
  3. You will be sent receive an email requesting that you validate your email address. Verify Email Address
  4. On validation, you will be directed to MindsDB SQL editor. MindsDB SQL Editor

To follow along with this tutorial, you need to upload the dataset to MindsDB GUI.

Uploading the Dataset to MindsDB GUI.

  • First, dowload the zip file of the dataset here and extract its contents (a train.csv file).
  • Next, click on the Add Data button at the top of your MindsDB SQL Editor. Click Add Data Button
  • Next, select the files tab, and click the import file button. Select Files Tab and Import File Button
  • Click the import file button, name the table sales_data. Click Save and Continue to upload your dataset. Save and Continue
  • On Successful upload, you should be presented with the SQL Editor. Successful Upload

Understanding the Data.

Every dataset you upload to MindsDB is kept in the files database as a table. As a result, you must run your queries for a table against this database.

Run the query below to see the first five records in the sales data table.

SELECT * FROM files.sales_data LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

First Five Records

This dataset contains eighteen columns, including the sales column. However, for this lesson, the order date, ship date, ship mode, segment, and category columns will be used to make predictions.

To see these columns, run the query below.

SELECT `Order Date`, `Ship Date`, `Ship Mode`, Segment, Category, Sales FROM files.sales_data
Enter fullscreen mode Exit fullscreen mode

Predictive Columns

What do these columns stand for?

  • Order Date: The date the product was ordered by the consumer.
  • Ship Date: The date the product was shipped by the company.
  • Ship Mode: The shipment class that the customer has selected. There are four classes available here: Same Day, First Class, Standard Class, and Second Class.
  • Segment: The product's segmentation. This field has three possible values: consumer, corporate, and home office.
  • Category: The category to which the products belong. There are three values in this column: furniture, office supplies, and technology.

Building the Model.

Photo by Johnny Ho on Unsplash
Before you start developing the model, you might want to know the steps you'll need to follow to build it with Python and machine learning tools like Scikit-Learn and Pandas.

  • Data Cleaning: First, you will clean the data by removing duplicates, filling in missing values, and verifying data types in a field are consistent, among other things.
  • Exploratory Data Analysis: Next, you will analyze the data to identify patterns and trends that may be relevant to the problem at hand.
  • Feature Selection: Not every column in a dataset is important. At this point, you will remove any fields that you believe will have little to no effect on your machine learning model's performance.
  • Data training: This is when you try out multiple machine learning algorithms to see which ones perform the best on your data. If solving a regression problem, you choose a regression algorithm; the same is true for classification and time-series analysis, among other things. Not to mention before training the data, you must ensure that all categorical data (words) have been converted to numerical data (numbers) and that these numerical data have been normalized.
  • Model validation: Once the model has been trained, it is tested against the validation and test data to determine its accuracy. The model's parameters are then tweaked until you get a consistent and good performance.
  • Deploying the model: Now that you've trained and tested your model, you must convert it to a binary file using Pickle or Joblib to deploy it and make predictions. Tired Spongebob These are the steps you would follow if you were creating a model using Python and machine learning tools, but MindsDB takes care of these steps and enables you to create your model using SQL queries.

Run the query below to create a model for the 'sales data' table using the columns supplied.

CREATE PREDICTOR mindsdb.predict_sales
FROM files
    (SELECT `Order Date`, `Ship Date`, `Ship Mode`, Segment, Category, Sales FROM sales_data)
PREDICT Sales;
Enter fullscreen mode Exit fullscreen mode

If everything is done correctly, you should see a "Query Successful" message.
The following syntax illustrates how the query works.

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column];
Enter fullscreen mode Exit fullscreen mode
  • [predictor name]: The name of the model.
  • [integration name]: The name of the database where your table is stored, for example, files.
  • [column name,...]: The field to predict, as well as the fields to train for the prediction.
  • [table name]: The table containing the columns, for example,'sales data'.
  • [target column]: The column to be predicted, such as Sales.

Making Predictions.

Photo by Jordan Rowland on Unsplash
Consider the following question for this section.

A customer orders a product on March 28th, 2022, and the product is dispatched on March 31st, 2022. The product is from the consumer segment, the furniture category, and was shipped first class. How much did the item cost?

To find out, use the query below.

Select Sales_confidence, Sales_explain
FROM mindsdb.predict_sales
WHERE  `Order Date` = '28/03/2022' AND
        `Ship Date` = '31/03/2022'AND
        `Ship Mode` = 'First Class'AND
        Segment = 'Consumer'AND
        Category = 'Funiture';
Enter fullscreen mode Exit fullscreen mode

Predictive Results

Using the newly constructed model, the product costs around $201.27. The model has a confidence level of 76%, or 0.76.

That's all. MindsDB makes it simple to build machine learning models.
Happy Seal
That's all for now, guys. This tutorial shows how MindsDB is democratizing access to machine learning with SQL, a language that has been used for decades and is still relevant today.

You can connect on Slack to ask questions about MindsDB, and you can also raise an issue on GitHub if you run into any problems.

You may learn more about MindsDB by reading the documentation.

Thank you for reading this post. if you liked it, please share it.

To understand more about the field of machine learning and artificial intelligence in general, you can follow me on Twitter and LinkedIn, and please leave a comment below if you have any questions. For the time being, bye.

Latest comments (0)