# Prediction using Supervised ML

• Predict the percentage of marks of an student based on the number of study hours.
• This is a simple linear regression task as it involves just 2 variables.
• Data can be found at clickhere
• You can use R, Python, SAS Enterprise Miner or any other tool.
• What will be predicted score if a student studies for 9.25 hrs/ day?

# Demo

Prediction using Supervised Machine Learning

In this regression task I tried to predict the percentage of marks that a student is expected to score based upon the number of hours they studied.

This is a simple linear regression task as it involves just two variables.

# Importing the required libraries

``````# Importing the required libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
``````

# Reading the data from source

``````# Reading data from remote link
print("Data import successful")
``````

# Step 2 - Input data Visualization

``````# Plotting the distribution of scores
s_data.plot(x='Hours', y='Scores', style='o')
plt.title('Hours vs Percentage')
plt.xlabel('Hours Studied')
plt.ylabel('Percentage Score')
plt.show()
``````

From the graph we can safely assume a positive linear relation between the number of hours studied and percentage of score.

# Step 3 - Data Preprocessing

This step involved division of data into "attributes" (inputs) and "labels" (outputs).

``````X = s_data.iloc[:, :-1].values
y = s_data.iloc[:, 1].values
``````

# Step 4 - Model Training

Splitting the data into training and testing sets, and training the algorithm.

``````X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train.reshape(-1,1), y_train)

print("Training complete.")
``````

# Step 5 - Plotting the Line of regression

Now since our model is trained now, its the time to visualize the best-fit line of regression.

``````# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_

# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line,color='red');
plt.show()
``````

# Step 6 - Making Predictions

Now that we have trained our algorithm, it's time to test the model by making some predictions.

For this we will use our test-set data

``````# Testing data
print(X_test)
# Model Prediction
y_pred = regressor.predict(X_test)
``````

# Step 7 - Comparing Actual result to the Predicted Model result

``````# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
``````
``````#Estimating training and test score
print("Training Score:",regressor.score(X_train,y_train))
print("Test Score:",regressor.score(X_test,y_test))
``````

# Plotting the Bar graph to depict the difference between the actual and predicted value

``````# Plotting the Bar graph to depict the difference between the actual and predicted value

df.plot(kind='bar',figsize=(5,5))
plt.grid(which='major', linewidth='0.5', color='red')
plt.grid(which='minor', linewidth='0.5', color='blue')
plt.show(
``````

# Testing the model with our own data

``````# Testing the model with our own data
hours = 9.25
test = np.array([hours])
test = test.reshape(-1, 1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred))
``````

# Step 8 - Evaluating the model

The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. Here different errors have been calculated to compare the model performance and predict the accuracy.

``````from sklearn import metrics
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('R-2:', metrics.r2_score(y_test, y_pred))
``````

Mean Absolute Error: 4.183859899002975
Mean Squared Error: 21.598769307217406
Root Mean Squared Error: 4.647447612100367
R-2: 0.9454906892105355
R-2 gives the score of model fit and in this case we have R-2 = 0.9454906892105355 which is actually a great score for this model.

I was successfully able to carry-out Prediction using Supervised ML task and was able to evaluate the model's performance on various parameters.