This example is meant to explain how we can do AI on a simple time series data and not a comprehensive tutorial of getting a best model for time series. Also, I have used ReactJS for building this article but I am not a React expert.
First what is a time series data? Any data with a time dimension, example data collected at particular intervals of time forms an time series.
Well, the data can be collected at a non regular interval and later be binned into regular interval in a data pre process step but that is beyond the scope of this article. We assume that data is binned and the data is available at a regular time cadence.
We are going to consider a univariate time series i.e. no other variable in the model but only the dependent variable (the one we want to forecast). So a naive question is how can our model predict without independent (input) variables/features? We are going to do a small feature engineering with our data.
But before that we need to first build an application where user can select our time variable and the variable they need to forecast (dependent variable). A simple UI wizard will drive this process.
For purpose of preparing the data and having it in matrix form similar to pandas in python I am using a npm dependency dataframe.js. This allows to manipulate data in columns and rows, query etc and even load data easily.
Once time and dependent features are selected we need to do a feature engineering by generating independent variables. The question is how? Well, we are going to use a simple technique called lag. The concept of lag is that we assume that the current point is correlated to a previous time point and this relationship is called autocorrelation. What we are saying is that today's stock price is correlated to previous 6 days stock price. The value 6 here is called 6 lags. We obviously do not know this value and hence it is one hyper parameter for our model, meaning by varying this value we can see how our model performs. Once we get this value from user we split the time series into sequences of no of lags. Example assuming 3 lags we virtually generate a table of 4 columns please check the image below. The LHS shows the actual data and RHS shows split sequences.
We are going to train our model which a two layered model. First layer is LSTM (Long Short Term Memory) model with 50 units. The number of units can also be a hyper parameter but to keep things simple it is hardcoded. If you do not know what is LSTM do not worry much. It is a form of complex RNN (Recurrent Neural Network) model used to model sequential data like time series of language data.
The RNN structure looks like
Image Courtesy: fdeloche
While an LSTM looks like
Image Courtesy: Guillaume Chevalier
Don't worry much about the model simply understand it is two layer model with first layer being LSTM with 50 units with activation as "Relu"
The second layer is simple dense layer with one unit and since our model will output number it is a regression model with loss function as Mean Square Error
Here is what the JS code for model building looks like.
Final step is predicting with the model and comparing it with actual series to see how the model predicts.
The prediction code is very simple, we just take original series and run it through model with predict function.
Off course for everything we need to convert the values to tensors.
If this article generates some curiosity in you then feel free to check out the entire code
Feel free to fork it on GitHub and try digging deeper in the code.