DEV Community

Cover image for Time Series Forecasting through Extreme Learning Machine:
Rafael Rocha
Rafael Rocha

Posted on • Updated on

Time Series Forecasting through Extreme Learning Machine:

Extreme Learning Machine

The most common artificial neural network architecture is the feedforward neural network. The information of this network propagate (flows) in one direction from the input layer to the output layer.

Extreme Learning Machine (ELM) are feedforward neural networks, which can be used for regression and classification approaches, for example. The weights between the input layer and the hidden layer are assigned randomly. While the weights between the hidden layer and the output layer are computed or learned in a one-step. The second set of weights is computed by the Moore-Penrose inverse of the hidden layer output matrix.

Feedforward Neural Network

The figure below shows a feedforward neural network, which demonstrates the elements in ELM.

Feedforward neural network

The input layer is composed by the input matrix X of size M x N plus the bias of size M x 1, where M is the number of examples and N (equal to 3 in the image) is the number of features. Next is to show the weights W1 assigned randomly of size L x N + 1, where L is the number of neurons in the hidden layer.

The hidden layer output is computed by the following equation:

H=tanh(XaW1T) H = tanh(X_{a}W_{1}^{T})

Where Xa is the concatenation of bias and the input matrix X and tanh is the hyperbolic tangent activation function, which limits the output of each neuron to -1 and 1.

The weights W2 are obtained by the multiplication by the Moore-Penrose inverse of Ha and the target y, as shown in the equation below:

W2=Ha1y W_{2} = H_{a}^{-1}y

Where Ha is the bias plus hidden layer output matrix H. Thus, we can make predictions by the following equation:

W2=Ha1y W_{2} = H_{a}^{-1}y

All steps to obtain the parameters W1 and W2 are performed by the training data, and with these parameters in hand, we can make new predictions of data that isn't in the training process, in this case, the test data. The below function shows the one-step learning of ELM to obtain the predictions and the parameters W1 and W2:

def elm_train(X, y, L, w1=None):

  M = np.size(X, axis=0) # Number of examples
  N = np.size(X, axis=1) # Number of features

  # If w1 is defined
  if w1 is None:
    w1 = np.random.uniform(low=-1, high=1, size=(L, N+1)) # Weights with bias

  bias = np.ones(M).reshape(-1, 1) # Bias definition
  Xa = np.concatenate((bias, X), axis=1) # Input with bias

  S = Xa.dot(w1.T) # Weighted sum of hidden layer
  H = np.tanh(S) # Activation function f(x) = tanh(x), dimension M X L

  bias = np.ones(M).reshape(-1, 1) # Bias definition
  Ha = np.concatenate((bias, H), axis=1) # Activation function with bias

  w2 = (np.linalg.pinv(Ha).dot(y)).T # w2' = pinv(Ha)*D

  y_pred = Ha.dot(w2.T) # Predictions

  return y_pred, w1, w2
Enter fullscreen mode Exit fullscreen mode

Time Series Forecasting

Initially, it is necessary to transform the time series forecasting problem into a machine learning problem. To do that, we adjust the temporal data in terms of input and target variables to make it available to any linear regression application, including EML.

In this way, we use the concept of lag, which is the past values in a time series. For our problem, the lag is used to set the number of features in the input matrix, if we use a lag of 3, the input matrix will have a size of M x 3, and the three past values of the time series are used as input. The target variable is assigned as the next value in the series, which can be immediately after the first lag (one-step forward) or more steps forward.

To exemplify, consider the example below:

# Time series
series = [3.93, 4.58, 4.8, 5.07, 5.14, 4.94]

# Three lags and one-step forward
X = [[3.93, 4.58, 4.8],
[4.58, 4.8, 5.07],
[4.8, 5.07, 5.14]]
y = [[5.07],
[5.14],
[4.94]]

# Two lags and two-steps forward
X = [[3.93, 4.58],
[4.58, 4.8],
[4.8, 5.07]]
y = [[5.07],
[5.14],
[4.94]]
Enter fullscreen mode Exit fullscreen mode

In this example, the time series is series, X and y are the input matrix and the target, respectively. The first example presents three lags and one-step forward, which to predict the value at X(t) it is necessary the values at X(t-1), X(t-2), and X(t-3). In the last example, two lags and two-steps forward, the values at X(t-2) and X(t-3) are used to predict the values at X(t).

The function below adjust the time series to a machine learning problem:

def sliding_window(serie, lag=2, step_forward=1):

  M = len(serie) # Lenght of time series

  X = np.zeros((M-(lag+step_forward-1), lag)) # Input definition
  y = np.zeros((M-(lag+step_forward-1), 1)) # Target definition

  cont = 0
  posinput = lag + cont
  posout = posinput + step_forward

  i = 0
  while posout<=M:

    X[i, :] = serie[cont:posinput]
    y[i] = serie[posout-1]
    cont+=1
    posinput = lag+cont
    posout = posinput + step_forward
    i+=1

  return X, y
Enter fullscreen mode Exit fullscreen mode

The inputs of the function are the time series, the number of lags (lag) and, the number of steps forward (step_forward). While the outputs are the input matrix of size (M-(lag+step_forward-1)) x lag and the target variable of size (M-(lag+step_forward-1)) x 1.

Results and Evaluation

To evaluate the predictions of ELM is used the mean squared error (MSE) as shown in the equation below:

MSE=1MCi=1M(yypred)2 MSE = \frac{1}{MC}\sum_{i=1}^{M}(y-y_{pred})^{2}

Where M is the number of examples, C is the number of outputs, y is the true target variables and y_pred is the predicted target variable.

The function below computes the MSE:

def mse_function(Y, Y_pred):

  M, C = Y.shape[0], Y.shape[1] # Number of examples (M) and number of outputs (C)

  E = Y-Y_pred # Error between Y true and Y predicted
  mse = np.sum(E.T.dot(E))/(M*C) # Mean squared error

  return mse
Enter fullscreen mode Exit fullscreen mode

Before training the ELM model and obtaining the set weights to make the predictions, we must normalize the time series. Here is used the min-max normalization, which normalizes the input data into a specific range (a, b), in this case (0, 1), as shown in the function below.

def minmax_normalization(X, a=0, b=1):

  xmin, xmax = X.min(), X.max() # Min and max values of data
  X_norm = a + ( (X-xmin)*(b-a)/(xmax-xmin) ) # Normalized data in a new range

  return X_norm, a, b, xmin, xmax
Enter fullscreen mode Exit fullscreen mode

To obtain the parameters of normalization is used only the training data, and with these parameters, we normalize both training and test data. In this way, after training the model and making the prediction in both data, we perform the reverse normalization to evaluate properly the model performance.

We used the time series of Brazilian gross domestic product between 1980 and 1997 and the frequency of the data is monthly (total of 256 months), available on link. It is used 80% of the first monthly data to train the model and 20% of the last monthly data to test the training parameters. The figure below shows the time series split into training (blue line) and test (green line) data, where the black dashed line represents the split.

Time series

First, to create the X and y, it is used the lag of 2 and one-step forward to make predictions. The trained model with 3 hidden neurons shows a good performance in the time series forecasting, which reach MSE of 30.99 and 38.74 in the training and test data, respectively. The figure below shows the predictions in the test data, assigned as the red dashed line, which is similar to real test data (green line).

Predictions

When changing the lag to 3 and preserving the one-step forward, occurs a worsening in the model performance, reaching a MSE of 180.83 in the test data. Therefore, it is not a good idea to increase the number of past values to predict one-step forward for the analyzed time series.

Conclusion

In this way, use extreme learning machine is a good start method to time series forecasting, where it is necessary transform the time series into an input and target before. The ELM is simple approach, but obtain powerful results in propose, where the training is done in a one-step with help of Moore-Penrose inverse.


The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.

Top comments (0)