DEV Community

Cover image for Predicting House Rent with Linear Regression in Python
khaula nauman
khaula nauman

Posted on

Predicting House Rent with Linear Regression in Python

Table of Contents

  1. Introduction
  2. Dataset
  3. Visualizing the Data
  4. Splitting Data: Features and Target
  5. Splitting Data: Training and Testing
  6. Implementing Linear Regression
  7. Making Predictions
  8. Wrapping Up
  9. What’s Next?

Introduction

Are you curious about predicting house rent based on factors like area? Let's walk through building a simple predictive model using Python! We’ll use the House Rent Prediction Dataset from Kaggle and tools like Google Colab, Pandas, NumPy, and Matplotlib. For machine learning, we’ll leverage Scikit-Learn.


Dataset

We’ll use the dataset from Kaggle: House Rent Prediction. Download the .xlsx file

import pandas as pd

# Load the Excel file
df = pd.read_excel("Rent.xlsx")
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • The pd.read_excel() function from the pandas library reads data from an Excel file.
  • Parsing Data: The function parses the data from the file and creates a pandas DataFrame object.
  • DataFrame (df): This variable now holds the data from the Excel file in a structured format that can be easily manipulated and analyzed.
# Preview the data
df.head()
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • .head(): Displays the first 5 rows of the DataFrame. This is a quick way to preview the structure of the data, including its columns, datatypes, and a sample of the actual values.

Fun Fact 🧠

The .head() function gets its name because it displays the "head" or the first few rows of a dataset—just like a quick peek at the top of a document.


Visualizing the Data

To understand the relationship between area and rent, let’s plot a scatter plot.

import matplotlib.pyplot as plt

plt.scatter(df['area'], df['rent'])
plt.xlabel('area')
plt.ylabel('rent')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • plt.scatter(): Creates a scatter plot with:
    • x=df['area']: Values on the x-axis (independent variable, area).
    • y=df['rent']: Values on the y-axis (dependent variable, rent).
  • plt.xlabel() and plt.ylabel(): Add labels for the x and y axes.

Observation 📊

Here’s what we observe from the plot:

The relationship between area and rent seems somewhat linear—perfect for a Linear Regression Model!

Visualization Output


Splitting Data: Features and Target

We’ll now separate our dataset into features (X) and target (Y).

# Selecting the feature and target variables
x = df.iloc[:, 0:1]  # Feature: Area
y = df.iloc[:, -1]   # Target: Rent

y
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • iloc: This function provides integer-based indexing to select specific rows and columns.
  • [:, 0:1]: Selects all rows and the first column (0:1 includes column index 0, excludes column index 1).
  • [:, -1]: Selects all rows and the last column (-1 refers to the last column).

Pro Tip 📝

The iloc function is useful for slicing data:

  • x = df.iloc[:, 0:1] → Selects the first column (Area).
  • y = df.iloc[:, -1] → Selects the last column (Rent).

Splitting Data: Training and Testing

We need to split our data into training and testing sets. This helps us evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • train_test_split(): Splits arrays or matrices into random training and testing subsets.
  • Arguments:
    • test_size=0.2: Allocates 20% of the data to testing, 80% to training.
    • random_state=2: Ensures reproducibility of the split.

Fun Fact 🤓

Scikit-Learn, the library we’re using, was originally a Google Summer of Code project. It has since grown into one of the most widely-used tools for machine learning.


Implementing Linear Regression

Time to build and train our linear regression model! 🎉

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • LinearRegression(): Initializes the linear regression model.
  • fit(): Trains the model using:
    • x_train: Training feature data (area).
    • y_train: Training target data (rent).

Making Predictions

Let’s test our model by predicting the rent for a sample area from our test data.

# Predict rent for a specific area in the test set
lr.predict(x_test.loc[[x_test.index[2]], ['area']])
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • The predict() method takes data points as input and returns predicted target values (e.g., rent).
  • x_test.loc[]: Retrieves specific rows and columns using label-based indexing.
  • [x_test.index[2]]: Selects the third row from the test set by its index.
  • ['area']: Ensures only the 'area' column is used as the feature for prediction.

Output 🏡

Our model predicts a rent of ₹21,112, which is quite close to the actual rent of ₹21,500! 🎯

Prediction Output


Wrapping Up

In this tutorial, we:

  1. Explored the House Rent Prediction Dataset.
  2. Visualized the relationship between area and rent.
  3. Built a Linear Regression Model using Scikit-Learn.
  4. Made predictions and validated their accuracy.

What’s Next?

Learn multivariate linear regression!
Happy coding 🌠


Top comments (0)