What is forecasting?
A time series essentially is a series of quantitative values. These values are obtained over time, and often have equal time intervals between them. These intervals can be quite different and may consist of yearly, quarterly, monthly or hourly buckets for instance.
Time-series methods are:
- Moving Average
- Autoregression
- Vector Autoregression
- Autoregressive Integrated Moving Average
- Autoregressive Moving Average
Components of Time Series
Key classifications of the components of the time series are:
- Random or Irregular movements.
- Cyclic Variations.
- Seasonal Variations.
- Trend.
Why Amazon Forecast
- Fully managed service that uses machine learning to deliver highly accurate forecasts.
- Based on the same machine learning forecasting technology used by Amazon.com.
- Automated machine learning
- Includes AutoML capabilities that take care of the machine learning for you.
- Works with any historical time series data to create accurate forecasts
- in a retail scenario, Amazon Forecast uses machine learning to process your time series data (such as price, promotions, and store traffic) and combines that with associated data (such as product features, floor placement, and store locations) to determine the complex relationships between them.
- So, it can combine combining time series data with additional variables for time series prediction
How Amazon Forecast works?
Here is the flow diagram taken from AWS site and key points here to note are
- Historical data can be pushed into S3
- Forecast can be triggered on data arrival in S3
- Output of forecast can be pushed into S3 for further actions.
Forecasting automation with Amazon Forecast by applying MLOps
The following model architecture taken from AWS site allows us to build, train, and deploy a time-series forecasting model leveraging an MLOps pipeline encompassing Amazon Forecast, AWS Lambda, and AWS Step Functions. To visualize the generated forecast, we will use a combination of AWS serverless analytics services such as Amazon Athena and Amazon QuickSight.
Key components of this architecture are
- dataset is uploaded to the Amazon S3 cloud storage under the
/**train**
directory (prefix). - uploaded file triggers Lambda, which initiates the MLOps pipeline built using a Step Functions state machine.
- The state machine stitches together a series of Lambda functions to build, train, and deploy a ML model in Amazon Forecast.
- Amazon CloudWatch, which captures Forecast metrics is being used for log analysis
- SNS being used for Forecasting job status change notification
- final forecasts become available in the source Amazon S3 bucket in the
**/forecast**
directory. - ML pipeline saves any old forecasts in the
**/history**
directory.
- final forecasts become available in the source Amazon S3 bucket in the
Forecast Workflow
The workflow to generating forecasts consists of the following steps.
- Creating related datasets and a dataset group
- Retrieving training data
- Training predictors (trained model) using an algorithm or AutoML
- Evaluating the predictor with metrics
- Creating a forecast
- Retrieving forecast for users
Amazon Forecast supports the following dataset domains:
- RETAIL Domain – For retail demand forecasting
- INVENTORY_PLANNING Domain – For supply chain and inventory planning
- EC2 CAPACITY Domain – For forecasting Amazon Elastic Compute Cloud (Amazon EC2) capacity
- WORK_FORCE Domain – For work force planning
- WEB_TRAFFIC Domain – For estimating future web traffic
- METRICS Domain – For forecasting metrics, such as revenue and cash flow
- CUSTOM Domain – For all other types of time-series forecasting
Example 1: Dataset Types in the RETAIL Domain
If you are a retailer interested in forecasting demand for items, you might create the following datasets in the RETAIL domain:
- Target time series is the required dataset of historical time-series demand (sales) data for each item (each product a retailer sells). In the RETAIL domain, this dataset type requires that the dataset includes the
item_id
,timestamp
, and thedemand
fields. Thedemand
field is the forecast target, and is typically the number of items sold by the retailer in a particular week or day. - Optionally, a dataset of the related time series type. In the RETAIL domain, this type can include optional, but suggested, time-series information such as
price
,inventory_onhand
, andwebpage_hits
. - Optionally, a dataset of the item metadata type. In the RETAIL domain, Amazon Forecast suggests providing metadata information related to the items that you provided in target time series, such as
brand
,color
,category
, andgenre
.
A case study
I took the dataset from Kaggle Store Item Demand Forecasting Challenge which has given 5 years of store-item sales data, and asked to predict 3 months of sales for 50 different items at 10 different stores.
Here is the way I have used AWS Forecasting with minimal coding.
Import your data
Dataset Details
Following are the details needs to be provided.
- Dataset name
- Frequency of your data’
- Data schema
- I have used here schema builder option which is more visual one. Another option is json schema which allows us to specify AttributeName and AttributeType in the JSON format.
- Forecast data schema has concept of domain to make our dataset creation much easier. I have selected retail domain option here and forecast has guided me to have following attributes.
- item_id (attribute type: string) - Mandatory by forecast
- timestamp(attribute type: timestamp and have selected format as yyyy-mm-dd) - Mandatory by forecast
- demand(attribute type float) - Mandatory by forecast
- store (attribute type: string) - Had to add this as my forecast has to be based on timestamp, store id and item id.
- It is essential that All attributes displayed must exist in your CSV file and must be ordered in the same order that they appear in your CSV file
I have used following python code to process my dataset from kaggle.
train_df = pd.read_csv("train.csv")
train_df["timestamp"]=pd.to_datetime(train_df['timestamp'])
train_df["timestamp"]=pd.to_datetime(train_df['timestamp'],
format='%Y%m%d %H:%M:%S')
train_df_final = train_df_final[['item_id', 'timestamp', 'demand', 'store']]
train_df_final.to_csv("train_df.csv",
index=False,
date_format='%Y-%m-%d')
Upload dataset into AWS S3
Create an AWS S3 bucket, and upload the time-series data into the bucket.
s3.create_bucket(Bucket=bucketName)
s3.upload_file(Filename="data/item-demand-time.csv", Bucket=bucketName, Key=key)
Dataset import details
Following are the details needs to be provided for Import task.
- Dataset import name
- Select time zone Info
- My dataset does not have TZ as any variable and so I have selected the option of do
- Data location Info
- This is input file path from my S3 bucket which needs to be provided
- IAM Role Info
- Dataset groups require permissions from IAM to read your dataset files in S3.
Now it will give us option to import and once this is done then we should get Successfully imported your data.
Train a predictor
Train a predictor, a custom model with underlying infrastructure that Amazon Forecast trains on your datasets.
Following are the key parameters required here.
Additional configurations to be set during this phase include
- How the training dataset is to be split into training and testing dataset ?
- How the missing data is to be addressed ?
- How the model validation is to be performed (i.e., back test window in the context of time-series analysis)
- How many times the model validation is to be performed during the model training phase (i.e., number of back test windows in the context of time-series analysis)
- What is the forecast horizon ?
Predictor settings
- Forecast name
- Forecast horizon
- This number tells Amazon Forecast how far into the future to predict your data at the specified forecast frequency.
- Forecast frequency
- My data set has timestamp daily and hence i have set this as 1 day
Predictor details
- Algorithm selection
- Here i have selected the option of AutoML which allows me to let Amazon Forecast choose the right algorithm for dataset.
- Optimization metric
- I have selected default here.
- Amazon Forecast provides Root Mean Square Error (RMSE), Weighted Quantile Loss (wQL), Average Weighted Quantile Loss (Average wQL), Mean Absolute Scaled Error (MASE), Mean Absolute Percentage Error (MAPE), and Weighted Absolute Percentage Error (WAPE) metrics to evaluate your predictors.
-
Forecast dimensions
- Item id is used in training by default and that has been added as mandatory by Forecast. Additionally I have selected Store because my aim is to have forecast based on store and item id.
-
Forecast type
- Choose up to 5 quantiles between 0.01 and 0.99 (by increments of 0.01). AWS allows us to have by default 0.1,0.5 and 0.9.
Advanced Configuration
Here this is the default FeaturizationMethod being recommended by Amazon Forecast. Provides information about the method that featurizes (transforms) a dataset field.
Here this method is only being applied for my element demand which is specified by AttrubuteName.
[
{
"AttributeName": "demand",
"FeaturizationPipeline": [
{
"FeaturizationMethodName": "filling",
"FeaturizationMethodParameters": {
"aggregation": "sum",
"frontfill": "none",
"middlefill": "zero",
"backfill": "zero"
}
}
]
}
]
Supplementary features
This is quite crucial information and can impact business problem sometime.
- Weather info
- Amazon Forecast Weather Index combines multiple weather metrics from historical weather events and current forecasts at a given location to increase your demand forecast model accuracy.
- In retail inventory management use cases, day-to-day weather variation impacts foot traffic and product mix.
- Holiday info
Create a Forecaster
Once the Predictor is trained, it is to be prepared to provide the forecasting.
create_forecast_response = forecast.create_forecast(
ForecastName=forecastName,
PredictorArn=predictorArn)
Following key inputs needs to be provided from console and then we can start the process of forecast.
- Forecast name
- Predictor (This is the one was created in earlier step)
- Forecast types
- By default, Amazon Forecast will generate forecasts for 0.10, 0.50 and 0.90 quantiles.
Make Forecasts
Now we are ready to make forecasts. In our case, we are going to write the forecasted outputs back in S3 bucket.
Following inputs needs to be provided.
- Start date Info
- End date
I have given below a snapshot of the forecasts which I got using the Predictor that I trained.
Top comments (0)