Amit Kayal for AWS Community Builders

Posted on Oct 2, 2021 • Edited on Oct 3, 2021

Demand Forecasting with AWS Forecast

#aws #machinelearning #datascience #deeplearning

What is forecasting?

A time series essentially is a series of quantitative values. These values are obtained over time, and often have equal time intervals between them. These intervals can be quite different and may consist of yearly, quarterly, monthly or hourly buckets for instance.

Time-series methods are:

Moving Average
Autoregression
Vector Autoregression
Autoregressive Integrated Moving Average
Autoregressive Moving Average

Components of Time Series

Key classifications of the components of the time series are:

Random or Irregular movements.
Cyclic Variations.
Seasonal Variations.
Trend.

Why Amazon Forecast

Fully managed service that uses machine learning to deliver highly accurate forecasts.
Based on the same machine learning forecasting technology used by Amazon.com.
Automated machine learning
- Includes AutoML capabilities that take care of the machine learning for you.
Works with any historical time series data to create accurate forecasts
- in a retail scenario, Amazon Forecast uses machine learning to process your time series data (such as price, promotions, and store traffic) and combines that with associated data (such as product features, floor placement, and store locations) to determine the complex relationships between them.
- So, it can combine combining time series data with additional variables for time series prediction

How Amazon Forecast works?

Here is the flow diagram taken from AWS site and key points here to note are

Historical data can be pushed into S3
Forecast can be triggered on data arrival in S3
Output of forecast can be pushed into S3 for further actions.

Forecasting automation with Amazon Forecast by applying MLOps

The following model architecture taken from AWS site allows us to build, train, and deploy a time-series forecasting model leveraging an MLOps pipeline encompassing Amazon Forecast, AWS Lambda, and AWS Step Functions. To visualize the generated forecast, we will use a combination of AWS serverless analytics services such as Amazon Athena and Amazon QuickSight.

Key components of this architecture are

dataset is uploaded to the Amazon S3 cloud storage under the /**train** directory (prefix).
uploaded file triggers Lambda, which initiates the MLOps pipeline built using a Step Functions state machine.
The state machine stitches together a series of Lambda functions to build, train, and deploy a ML model in Amazon Forecast.
Amazon CloudWatch, which captures Forecast metrics is being used for log analysis
SNS being used for Forecasting job status change notification
- final forecasts become available in the source Amazon S3 bucket in the **/forecast** directory.
- ML pipeline saves any old forecasts in the **/history** directory.

Forecast Workflow

The workflow to generating forecasts consists of the following steps.

Creating related datasets and a dataset group
Retrieving training data
Training predictors (trained model) using an algorithm or AutoML
Evaluating the predictor with metrics
Creating a forecast
Retrieving forecast for users

Amazon Forecast supports the following dataset domains:

RETAIL Domain – For retail demand forecasting
INVENTORY_PLANNING Domain – For supply chain and inventory planning
EC2 CAPACITY Domain – For forecasting Amazon Elastic Compute Cloud (Amazon EC2) capacity
WORK_FORCE Domain – For work force planning
WEB_TRAFFIC Domain – For estimating future web traffic
METRICS Domain – For forecasting metrics, such as revenue and cash flow
CUSTOM Domain – For all other types of time-series forecasting

Example 1: Dataset Types in the RETAIL Domain

If you are a retailer interested in forecasting demand for items, you might create the following datasets in the RETAIL domain:

Target time series is the required dataset of historical time-series demand (sales) data for each item (each product a retailer sells). In the RETAIL domain, this dataset type requires that the dataset includes the item_id, timestamp, and the demand fields. The demand field is the forecast target, and is typically the number of items sold by the retailer in a particular week or day.
Optionally, a dataset of the related time series type. In the RETAIL domain, this type can include optional, but suggested, time-series information such as price, inventory_onhand, and webpage_hits.
Optionally, a dataset of the item metadata type. In the RETAIL domain, Amazon Forecast suggests providing metadata information related to the items that you provided in target time series, such as brand, color, category, and genre.

A case study

I took the dataset from Kaggle Store Item Demand Forecasting Challenge which has given 5 years of store-item sales data, and asked to predict 3 months of sales for 50 different items at 10 different stores.

Here is the way I have used AWS Forecasting with minimal coding.

Import your data

Dataset Details

Following are the details needs to be provided.

Dataset name
Frequency of your data’
Data schema
- I have used here schema builder option which is more visual one. Another option is json schema which allows us to specify AttributeName and AttributeType in the JSON format.
- Forecast data schema has concept of domain to make our dataset creation much easier. I have selected retail domain option here and forecast has guided me to have following attributes.
- item_id (attribute type: string) - Mandatory by forecast
- timestamp(attribute type: timestamp and have selected format as yyyy-mm-dd) - Mandatory by forecast
- demand(attribute type float) - Mandatory by forecast
- store (attribute type: string) - Had to add this as my forecast has to be based on timestamp, store id and item id.
- It is essential that All attributes displayed must exist in your CSV file and must be ordered in the same order that they appear in your CSV file

I have used following python code to process my dataset from kaggle.

train_df = pd.read_csv("train.csv")
train_df["timestamp"]=pd.to_datetime(train_df['timestamp'])
train_df["timestamp"]=pd.to_datetime(train_df['timestamp'],
                                     format='%Y%m%d %H:%M:%S')
train_df_final = train_df_final[['item_id', 'timestamp', 'demand', 'store']]
train_df_final.to_csv("train_df.csv",
                      index=False,
                     date_format='%Y-%m-%d')

Upload dataset into AWS S3

Create an AWS S3 bucket, and upload the time-series data into the bucket.

s3.create_bucket(Bucket=bucketName)
s3.upload_file(Filename="data/item-demand-time.csv", Bucket=bucketName, Key=key)

Dataset import details

Following are the details needs to be provided for Import task.

Dataset import name
Select time zone Info
- My dataset does not have TZ as any variable and so I have selected the option of do
Data location Info
- This is input file path from my S3 bucket which needs to be provided
IAM Role Info
- Dataset groups require permissions from IAM to read your dataset files in S3.

Now it will give us option to import and once this is done then we should get Successfully imported your data.

Train a predictor

Train a predictor, a custom model with underlying infrastructure that Amazon Forecast trains on your datasets.
Following are the key parameters required here.

Additional configurations to be set during this phase include

How the training dataset is to be split into training and testing dataset ?
How the missing data is to be addressed ?
How the model validation is to be performed (i.e., back test window in the context of time-series analysis)
How many times the model validation is to be performed during the model training phase (i.e., number of back test windows in the context of time-series analysis)
What is the forecast horizon ?

Predictor settings

Forecast name
Forecast horizon
- This number tells Amazon Forecast how far into the future to predict your data at the specified forecast frequency.
Forecast frequency
- My data set has timestamp daily and hence i have set this as 1 day

Predictor details

Algorithm selection
- Here i have selected the option of AutoML which allows me to let Amazon Forecast choose the right algorithm for dataset.
Optimization metric
- I have selected default here.
- Amazon Forecast provides Root Mean Square Error (RMSE), Weighted Quantile Loss (wQL), Average Weighted Quantile Loss (Average wQL), Mean Absolute Scaled Error (MASE), Mean Absolute Percentage Error (MAPE), and Weighted Absolute Percentage Error (WAPE) metrics to evaluate your predictors.
Forecast dimensions
- Item id is used in training by default and that has been added as mandatory by Forecast. Additionally I have selected Store because my aim is to have forecast based on store and item id.
Forecast type
- Choose up to 5 quantiles between 0.01 and 0.99 (by increments of 0.01). AWS allows us to have by default 0.1,0.5 and 0.9.

Advanced Configuration

Here this is the default FeaturizationMethod being recommended by Amazon Forecast. Provides information about the method that featurizes (transforms) a dataset field.

Here this method is only being applied for my element demand which is specified by AttrubuteName.

[
    {
        "AttributeName": "demand",
        "FeaturizationPipeline": [
            {
                "FeaturizationMethodName": "filling",
                "FeaturizationMethodParameters": {
                    "aggregation": "sum",
                    "frontfill": "none",
                    "middlefill": "zero",
                    "backfill": "zero"
                }
            }
        ]
    }
]

Supplementary features

This is quite crucial information and can impact business problem sometime.

Weather info
- Amazon Forecast Weather Index combines multiple weather metrics from historical weather events and current forecasts at a given location to increase your demand forecast model accuracy.
- In retail inventory management use cases, day-to-day weather variation impacts foot traffic and product mix.
Holiday info

Create a Forecaster

Once the Predictor is trained, it is to be prepared to provide the forecasting.

create_forecast_response = forecast.create_forecast(
                           ForecastName=forecastName,
                           PredictorArn=predictorArn)

Following key inputs needs to be provided from console and then we can start the process of forecast.

Forecast name
Predictor (This is the one was created in earlier step)
Forecast types
- By default, Amazon Forecast will generate forecasts for 0.10, 0.50 and 0.90 quantiles.

Make Forecasts

Now we are ready to make forecasts. In our case, we are going to write the forecasted outputs back in S3 bucket.

Following inputs needs to be provided.

Start date Info
End date

I have given below a snapshot of the forecasts which I got using the Predictor that I trained.

DEV Community