DEV Community

juved
juved

Posted on

Linear Transformation

Why using Linear Transformation when it appears to have no significant impact on the model performance Metric?
The secrets lies in their potent advantage of making the results more relatable and understandable for stakeholders. They help make the interpretations of the results more compelling, representing a strategic move to give the result a human touch.

Scenario: Predicting Energy Consumption

Imagine, that we have to predict energy consumption in buildings based on various features, including temperature, square footage, and the number of rooms and energy consumptions. Let’s build a linear regression model to understand how these features influence the energy consumption. And we will modify our models using the linear regression techniques (Scaling, Shifting, and Normalizing).

Data Understanding

We will apply the different techniques of linear regression on the energy_consumption_data_set.

We proceed by importing the necessary libraries, pandas, statsmodels and sklearn. Additionally, we are using panda to upload our dataset energy consumption.

image 1

image 2

  • Let's build our initial model selecting the features : temperatures, square_footage, room_count and our target energy_consumption

image 3

image 4

*Interpretation: *
For every increase in 1 Fahrenheit, the energy consumption increase in 20 energy consumption unit. For every increase of 1 square-ft, there’s an increase of 100 energy consumption units. For every additional room, there is an increase of 49 energy consumption units.
The results look normal, however, the stakeholders are more familiar with Celsius. Let’s proceed with converting the temperature in Celsius.

Scaling :

Scaling in linear regression is aiming to give our variables a tailored makeover. This results in making our model more relatable, without disrupting the overarching metrics. Think of it as having the same model but in different units, facilitating the elaboration of a story that is easier to share with stakeholders.

--We first make a copy of the subset, and then apply the appropriate formula to convert the temperature from Fahrenheit to Celsius.

image 4

Cmodels
-- Let's build the Celsius model using the converted feature

image 5

C models output

*Interpretation: *
For every increase in 1 Celsius, the energy consumption increase in 37 energy consumption unit.
For every increase of 1 square-ft, there’s an increase of 100 energy consumption units.
For every additional room, there is an increase of 49 energy consumption units.

Shifting:

In the context of linear regression models, shifting is a common practice to improve model interpretability, reduce multicollinearity and provide a meaningful interpretation of the intercept. It typically refers to centring or mean-entering the variables by subtracting the mean of a variable from each individual data point in that variable. The practice of shifting is recommended to be carried out before building the model.

Below, we shift our features that 0 will represent the mean.

S models

--Let’s build our centred model around X_centred

S output

*Interpretation: *

We would expect about 205157 energy consumption units for the average of temperature consumption, average of square_footage, average of room_count.

Standardizing:

Standardizing provide the benefic of comparing the coefficients to each other.

-- Let's perform a .describe() on the dataset to evaluate the values before standardization.

before std

--Let’s standardize the features

Image description

--Let’s build our standardize model

Image description

*Interpretation: *
For each increase of 1 standard deviation in temperature, we see an associated increase of about 21 energy_consumption unit.
For each increase of 1 standard deviation in the square_footage, we see an associated increase of about 100 energy_consumption unit.
For each increase of 1 standard deviation in the room_count, we see an associated increase of about 49 energy_consumption unit.

Conclusion:

These transformations modify the features without compromising the linear regression process, as illustrated in the example of predicting the energy consumption. There are other linear techniques, such as Min-Max Scaling, Unit Vector Transformation and many other tools provided by Scikit-learn that can be used.

More :
https://sebastianraschka.com/Articles/2014_about_feature_scaling.html
https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing

Top comments (0)