DEV Community

Cover image for Linear Regression With Python and Sklearn
Damon Marc Rocha II
Damon Marc Rocha II

Posted on

Linear Regression With Python and Sklearn

Alt Text

This week, at the request of
Sm03leBr00t, I dabbled in something a little different, linear regression and python. I really enjoyed this chance to be able to work with machine learning and python again. This project was simple enough, I think I used around 50 lines of code.
So to start this off you need three things matplotlib and pandas at first; and then sklearn a little later.

import pandas as pd
import matplotlib.pyplot as mplot
Enter fullscreen mode Exit fullscreen mode

I had a lot of issues getting all of these things to work on my wsl Ubuntu subsystem. So I finally broke down and downloaded anaconda and spyder. I did this because even upon getting these dependencies to work I still could not display the graphs I generated in this program. So I will say this now if you plan on using python for machine learning and you are using wsl; get something similar to anaconda and spyder to implement these packages.
After setting up anaconda and spyder I created a simple CSV file to read into my program using panda. You can create a simple spreadsheet on google sheets or excel then save it as a CSV file.

file_name = input("Enter CSV file: ") #read in user input
dataset = pd.read_csv(file_name) #read in csv file 
X = dataset.iloc[:, :-1] #data set of first col
y = dataset.iloc[:, 1]  #data of second col
print(dataset)
Enter fullscreen mode Exit fullscreen mode

After this I passed the data above into the sklearn machine learing package:

from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)
#X_train contians 1st col
#Y_train contains 2nd col
Enter fullscreen mode Exit fullscreen mode

The train test split method will split the CSV data into a train and test matrix. The test_size parameter gives the size of the test matrix compared to the actual data. The random state will add random data to the matrices I did not want this so I set it to zero.
After completing this I needed to train the machine for linear regression, as shown below:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

The code above imports LinearRegression from sklearn and then trains the machine with the training data from the last step.

With all of this set up the only thing left is to setups the graphs and allow the user to input what data they want.

#get graph names/ xy name
graph_name = input("Enter ScatterPlot name: ")
x_name = input("Enter Name of X axis: ")
y_name = input("Enter Name of Y axis: ")

# Visualizing the Training set results

mplot.scatter(X_train, y_train, color='red') #create datapoint scatter plot
mplot.plot(X_train, regressor.predict(X_train), color='blue') #create linear line through pts
mplot.title('{} (Training set)'.format(graph_name)) #give graph a title
mplot.xlabel(x_name) #x axis name
mplot.ylabel(y_name) # y axis name
mplot.show() #print out training data graph

# Visualizing the Test set results

mplot.scatter(X_test, y_test, color='red')  #create test datapoint scatter plot
mplot.plot(X_train, regressor.predict(X_train), color='blue') #create linear line through pts
mplot.title('{} (Test set)'.format(graph_name)) #give test graph a title
mplot.xlabel(x_name) #x axis name
mplot.ylabel(y_name) #y axis name
mplot.show() #print out test data graph
Enter fullscreen mode Exit fullscreen mode

With the two graphs created, I then created a loop to allow the user to input and get data from the Machine

choice = "y" 
while(choice == "y" or choice == "Y"): #while user wants to enter data loop
  regre_val = float(input("What {} Would You like to find: ".format(x_name))) #get value from user and convert to a float
  y_pred = regressor.predict([[regre_val]])[0] #send value to AI to get result
  print("Here is your {} {:.2f}".format(y_name, y_pred)) #print out result
  choice = input("Would You like to predict more values(y/n): ") #user input to try again

print("Thank You")
Enter fullscreen mode Exit fullscreen mode

This project really opened my eyes to data science and I appreciate the suggestion Sm03leBr00t. I will definitely work on something similar to this in the future.
Github:Repo

Top comments (0)