You might be wondering, what are time series models? These are models built leveraging time series data. There are a variety of time series models to choose from, ranging from simple models like Autoregressive (AR) time series models to deep learning time series models like Recurrent Neural Networks (RNN).
However, time series modelling doesn't only start at that, there are steps leading to it. This whole process is called Time Series Analysis.
And what are the possible benefits of conducting time series analysis?
- You can identify trends and seasonality in data which may not have been as obvious. From this you can gain valuable insights as a business that help in decision making such as marketing.
- Forcasting future events. From analysis and creating models you can predict future values. For example predicting the GDP of a nation from previous GDP values of the nation.
- It can aid in identifying relationships between variables over time. For example identifying the postive relationship of labor, gross enrollment ratio and development over time for a country like Kenya.
- By viewing trends and distribution of variabes over time, it is more convinient to observe anomalies in organizations. This could be relevant in fraud detection or failure of machines or amenities.
In this article, I will take you through the steps involved in time series analysis.
1. Where to source data?
When it comes to time series analysis, the data used has to be indexed by a time dimension. This means that the data has to be recorded in a temporal way, whether every minute, hour or day etc. Because of this the data is commonly referred to as time series.
An important point to note is that no missing values should be present in the data being used for time series analysis. This is to ensure that the trend observed on the data is as accurate as possible.
2. What should be the nature of the data?
Data used in time series analysis should be stationary. And what does it mean for data to be stationary? When observations at given points in time in our data are not affected by the points preceeding it in any way, the data is said to be stationaty. This means that said data has no:
- Trend
- Drift
- Seasonality
-
Cyclicality
NB: As much as the data used needs some 'predictability' removed, extreme randomness in the data is still not suitable. Meaning that data with random walks cannot be used for time series analysis. Afterall, the whole purpose of a time series model is to predict values right? :>)
Because stationarity is a major consideration when using data, it is important to diagnose the data early. Luckily there are tests to ease our decision making. The most popular test is the Augmented Dickey Fuller (ADF) test. This test has the null hypothesis that there is an existence of a unit root (non stationarity) in the time series against the alternative that the time series is stationary.
If the data is found to be stationary, then you can proceed with your analysis comfortably. However, if it is not stationary, there is no need to worry, there are ways to transform the time series to stationary.
1. Differencing the data points.
This is done by taking the difference between n-lag points. For example:
a 1st lag differnce would be x(t) - x(t-1)
a 2nd lad differnce would be x(t) - x(t-2)
................................................................
a nth lag difference would be x(t) - x(t-n)
2. Log-transforming
This is done by log teansforming all datapoints in the time series.
From x to ln(x)
3. Seasonal decomposition
This invloves identifying trends and seasonality in data. After that the trend can be removed from the data to make it stationary.
After applying one of these methods your time series should be stationary and ready for modelling.
3. Identifying the autoregressive (p) and moving average (q) lags for the models.
As discussed earlier, there are a variety of time series models to choose from. Moving average (MA) and autoregressive (AR) models can be considered as foundational in time esries analysis. Whne building such models we look for the best moving average and autoregressive lags to use. We do this by the help of functions. Namely the Autocorrelation (ACF) and partial autocorrelation (PACF) functions. The 'p' in AR models is determined from the PACF function while the 'q' in MA models is obtained from the ACF function. More resources on this.
In the case where an ARIMA model is being used, an 'i' parameter will be inrtoduced, similar to 'p' and 'q'. This 'i' stands for intergration and is the value for differencing applied to the data to make it stationary. So 1 for a 1st lag differenced, 2 for a 2nd lag differenced and so on and so forth.
4. Building the model
Now that all necessary parameters are ready, all that is left is to chose what model is suitable for your problem. The basic models are AR and MA models. If you want to move your model performance futher you can apply the ARMA model which is a combination of both AR and MA models. ARIMA models are also a combination of AR and MA models however an 'intergrated' parameter is introduced and the beauty of it is that the data used for ARIMA models can be non-stationary.
**NB:** The non-stationarity in the data for ARIMA models should only be due to unit root as the model cannot deal with other causes of non-stationarity.
I hope from this you have gotten an idea of what time series modelling is and how to do it.
Top comments (0)