Saoni Deb

Posted on May 17, 2022

Linear Regression With Time Series

#100daysofcode #100daysofml #python #machinelearning

Documentation: https://www.kaggle.com/code/ryanholbrook/linear-regression-with-time-series

This post aspires to show the application of modern machine learning methods to time series data with the goal of producing the most accurate predictions.

Once we check these steps, we'll know how to:

engineer features to model the major time series components (trends, seasons, and cycles),
visualize time series with many kinds of time series plots,
create forecasting hybrids that combine the strengths of complementary models, and
adapt machine learning methods to a variety of forecasting tasks.

What is a Time Series?
The basic object of forecasting is the time series, which is a set of observations recorded over time. In forecasting applications, the observations are typically recorded with a regular frequency, like daily or monthly.

This series records the number of hardcover book sales at a retail store over 30 days. Notice that we have a single column of observations Hardcover with a time index Date.

Linear Regression with Time Series
Linear regression adapts naturally to even complex forecasting tasks. The linear regression algorithm learns how to make a weighted sum from its input features.
For two features, we would have:

target = weight_1 * feature_1 + weight_2 * feature_2 + bias

During training, the regression algorithm learns values for the parameters weight_1, weight_2, and bias that best fit the target. (This algorithm is often called ordinary least squares since it chooses values that minimize the squared error between the target and the predictions.) The weights are also called regression coefficients and the bias is also called the intercept because it tells you where the graph of this function crosses the y-axis.

Time-step features
There are two kinds of features unique to time series: time-step features and lag features.

Time-step features are features we can derive directly from the time index. The most basic time-step feature is the time dummy, which counts off time steps in the series from beginning to end.

Linear regression with the time dummy produces the model:

target = weight * time + bias
The time dummy then lets us fit curves to time series in a time plot, where Time forms the x-axis.

Time-step features let you model time dependence.
A series is time dependent if its values can be predicted from the time they occured.
In the Hardcover Sales series, we can predict that sales later in the month are generally higher than sales earlier in the month.

Lag features
To make a lag feature we shift the observations of the target series so that they appear to have occured later in time. Here we've created a 1-step lag feature, though shifting by multiple steps is possible too.

Linear regression with a lag feature produces the model:

target = weight * lag + bias
So lag features let us fit curves to lag plots where each observation in a series is plotted against the previous observation.

You can see from the lag plot that sales on one day (Hardcover) are correlated with sales from the previous day (Lag_1). When you see a relationship like this, you know a lag feature will be useful.

More generally, lag features let you model serial dependence.
A time series has serial dependence when an observation can be predicted from previous observations. In Hardcover Sales, we can predict that high sales on one day usually mean high sales the next day.

Adapting machine learning algorithms to time series problems is largely about feature engineering with the time index and lags. For most of the course, we use linear regression for its simplicity, but these features will be useful whichever algorithm you choose for your forecasting task.

Tunnel Traffic is a time series describing the number of vehicles traveling through the Baregg Tunnel in Switzerland each day from November 2003 to November 2005. In this example, we'll get some practice applying linear regression to time-step features and lag features.

Notebook 1: https://www.kaggle.com/code/saonideb/linear-regression-with-time-series-17-05-22/

Top comments (1)

Dev Mehta • Jan 24 '23

Great post Saoni. If you want to also understand Gradient Descent and how linear regression works behind the scenes with visual learning of the topic, I would recommend checking out this blog post. Also, learning about basic linear algebra and calculus would help new developers getting into this field :)

DEV Community

Linear Regression With Time Series

Top comments (1)

Read next

Machine Learning in Algorithmic Trading: The Global Impact and India’s Rising Role

Day 4 - None Datatype & input() function in Python

Automating Flask & PostgreSQL Deployment on KVM with Terraform & Ansible

Python Asynchronous Programming: Simplifying Concurrency Like a Pro