DEV Community

Cover image for Intro to Machine Learning in Python: Part I
Brett Hammit
Brett Hammit

Posted on • Updated on

Intro to Machine Learning in Python: Part I

After messing around with really getting to know the in's and outs of data frame management and other sides of data science in Python I have been reluctant to get into Machine Learning with the worry of not having the time I would like to commit to it and get as good as I would like. Like everything sometimes you just gotta do it. So here we go.

Starting Point:

Where I am starting is Supervised learning, which basically means there is known input and outputs and you are just modifying the parameters of your model to predict future outcomes.
-An example of this would be Positive vs. Negative movie reviews

I am doing doing this work in Jupyter with the library scikit learn in Python which has algorithms already in it, which makes it much easier to fit models, split test and training data etc.

Linear Regression

Linear Regression is the step up after correlation, it is when we try to model the relationship between of two variables by fitting a model to predict a value.

Within Machine Learning there are some base algorithms and it can be hard to decide what is the best model for your data. This cheat sheet really gives a pretty good guide of what you should be doing based off your data.
Alt Text

Working With Our Data

So the first thing is we need data to work with in order to try to build a model. When you have your data readily available the first thing to do is to analyze what you are working with.

The first step to this is taking your data and setting it into a data frame. We can do this by using "pd.read_csv('YourData')" or whatever type of file you are working with to read to. Creating this data frame will allows us to dig deeper to see what we need to do with our model.

Analyzing Our Data

A good starter on where to first look within your data is use the .describe() and .columns methods on your data to see your columns names and some additional info about them.

With Seaborn in Python being imported we can use "sns.pairplot(YourDataFrame)"
to give us a good idea of the distribution of our data.

Alt Text
An example of Normally Distributed Data vs. Not Normally Distributed Data

After that we can look at the correlation of our data by using "sns.heatmap(df.corr(), annot=True)" to see a heat map of our data as well as the correlations on top of them. 1 means that they are perfectly correlated with one another.

Lastly, in analyzing our data we need to pick what we would like to predict so we can choose the column of what we want to predict and use "sns.distplot(YourDataFrame['ColumnName'])" to pull up a distribution plot of that column. It should be normally distributed like I talked about above.

Conclusion

In this post I mainly talked about my first day in Machine Learning primarily working with Linear Regression and analyzing your data for getting ready to fit it. My next post should be more about actual ML and training, testing and fitting our model!

Top comments (4)

Collapse
 
samuel_nait profile image
Samuel NAIT 🇫🇷

Nice! Can't wait to read more on this.

I'm very interested into ML but doesn't have time to go deep into it (As you say in the beginning so no excuses I guess. 🙄)

Anyway, thanks for sharing your ML experience with us. 👍

Collapse
 
berett21 profile image
Brett Hammit

It can be very hard to find time when life can be so busy! Keep going and keep at it! :)

Collapse
 
gravesli profile image
gravesli

I think you are great! i just want to discuss tech with Python developer.
I built a display machine state using Python3 with Flask!
Flask State Github:github.com/yoobool/flask-state
Should i can get some improvement suggestions from you? Thanks~

Collapse
 
gravesli profile image
gravesli

That's great. Would you give me a star on GitHub Flask State?
because my project isn't active. ^.^