loading...
Cover image for Creating a dataset for your ML project.

Creating a dataset for your ML project.

rakshakannu profile image Raksha Kannusami ・2 min read

The problem.

I sat out to work on a machine learning project yesterday. I grabbed my cup of coffee and was so motivated to start this project.

yes

The first thing I wanted to search for, was a good dataset to train my model on. I was particular about the dataset because I knew in my mind exactly what variables the dataset should contain for this project. I searched on every platform where you can find free datasets, but couldn't find the one I exactly wanted!

Then I thought, the unavailability of a dataset shouldn't stop me from making this project.

The solution.

Then I decided to generate my own dataset which took me barely a few mins.

I created a table with all the necessary variables as headings for the columns and then used the RANDBETWEEN function under each variable to generate a random value.

Step 1.

Create a simple table with the necessary variables.
Alt Text

Step 2.

Use the RANDBETWEEN function to generate a random value in any range, say 1 to 100.
Alt Text

Step 3.

Drag the cell to how many ever rows to generate data.

Alt Text

Step 4.

Export the file into a CSV file.
This was a simple trick to create your own dataset for any kind of ML project that you want to make.

Now nothing can stop you from making that ML project! πŸŽ‰
keep learning! keep coding! πŸ’–

Let's be friends on Twitter, LinkedIn or Github! 😊

Posted on by:

rakshakannu profile

Raksha Kannusami

@rakshakannu

There is magic in code! 🌟 Constantly learning and sharing to get better at coding. 🎯

Discussion

pic
Editor guide
 

If your features are random, I don’t think the model will be able to learn anything. I totally agree this is a great way to get started , but if/when you do get a realistic dataset, you will be basically starting from scratch

 

You are right Tony! The main aim for me was to build an end to end ML project. My goal was not accuracy and prediction, but create a working model so that with those variables as inputs, I should get an output. If I wanted to focus more on the accuracy of prediction, It can't be done without a real data set.

 
 

from sklearn.datasets import make_regression