Lab 3 – Forecast

In this lab, we will use aggregated retail analytics to forecast future sales.

A Quick Note, before we begin:

Outside of this workshop setting, you would have already setup infrastructure for ingestion, processing, analyzing, and storing data (essentially labs 1 and 2). And you’d have data that is readily available.

But because it takes some time to train Amazon Forecast’s predictors, for the purposes of this workshop, we will simulate aggregated retail analytics data and use that data to train predictors. And work on labs 1 and 2 while the training runs in the background.

This simulated data set is aggregated hourly and, for convenience, is included in the file retail_analytics.csv. It looks something like below.

Simulated Sample Data (aggregated hourly)

Time	Item	Quantity	StoreLocation
2019-07-01 09:00:00	staplers	38	San Francisco
2019-07-01 10:00:00	post-its	30	New York
…	…	…	…
2019-07-01 14:00:00	markers	29	Los Angeles

Regions

If you’re using an AWS account vended by the Event Engine for this lab, you will be in us-west-2. This is displayed as Oregon in the top right-hand corner of the AWS console and is referred to as us-west-2 when using the CLI or API.

If you’re using one of your own AWS accounts, please note the region that you created your Cloud9 IDE console in and remain within that region.

Step A

In the same browser window that you have the Cloud9 IDE open, open up a new browser tab and point it to https://console.aws.amazon.com/forecast. On the top right-hand corner of the console, note the region you’re in. This will typically look like this:
Click on ‘Create dataset group’
We’ll first give this dataset group a name. Against ‘Dataset group name’ enter something descriptive.
For ‘Forecasting domain’, choose ‘Retail’.
Click ‘Next’.

Step B

Now we’ll create and define the schema the dataset that we’ll base forecasts on. The RETAIL domain supports 3 dataset types, TARGET_TIME_SERIES, RELATED_TIME_SERIES, and ITEM_METADATA.

TARGT_TIME_SERIES is the core dataset that has the feature (column) whose value we’re trying to forecast. The other two are optional and can help add peripheral information (weather, color, etc.) for more accurate forecasts.

Enter a name for the dataset against ‘Dataset Name’. For example JulyToSeptemberSales.
For ‘Frequency of your data’ dropdowns, leave the first at ‘1’ and choose ‘hour’ for the second.

For ‘Data schema’, copy and paste the below:

This schema matches the aggregated dataset we generated in retail_analytics.csv.

{
    "Attributes": [
        {
            "AttributeName": "timestamp",
            "AttributeType": "timestamp"
        },
        {
            "AttributeName": "item_id",
            "AttributeType": "string"
        },
        {
            "AttributeName": "demand",
            "AttributeType": "float"
        },
        {
            "AttributeName": "location",
            "AttributeType": "string"
        }
    ]
}

Now click ‘Next’.

Step C

We will now import data target timeseries dataset that we just defined

Before we import the target timeseries dataset, we’ll need to upload the retail_analytics.csv to an Amazon S3 bucket that Amazon Forecast can access and get the dataset from.
Switch your browser tab back to the one where you have Cloud9 open.
All S3 bucket names, regardless who created them, need to be globally unique. So choose something unique to you and append or prepend it to retail-forecast such as sudoamit-retail-forecast.

From the Cloud9 terminal window run the below after replacing BUCKET_NAME with some unique name that you came up with for your S3 bucket.
```
cd lab3/src
```
```
aws s3 mb s3://[SOME_UNIQUE_NAME]-retail-forecast
```
```
aws s3 cp retail_analytics.csv s3://[SOME_UNIQUE_NAME]-retail-forecast/
```
Switch back to the browser tab where you have Amazon Forecast open to pick up where we left off. The console should now be in the ‘Import target time series dataset’ screen.
Enter a descriptive name for ‘Dataset import name’.
Leave the ‘Timestamp format’ as-is.
For ‘IAM Role’, click on drop-down and choose
Click on ‘Create role’
For ‘Data location’ copy and paste the S3 bucket name that you created earlier like so: s3://[SOME_UNIQUE_NAME]-retail-forecast/retail_analytics.csv
Click on ‘Start Import’. If all is successful, you should see a flash message like so:

This should take around 2 mins. Refresh the browser a few times to see if it is complete.

Step D

We’ll now train a Predictor on this dataset that we just imported.

Click on ‘Start’ under ‘Train a predictor’ (in the middle column)
Enter something descriptive for ‘Predictor name’
For ‘Forecast horizon’ enter 30 days (we’ll attempt to predict demand over the next 30 days)
For ‘Forecast frequency’ enter ‘1’ in the first drop down and ‘day’ in the second. Our forecast frequency will be daily.
For ‘Algorithm selection’, choose the ‘Automatic’ option. Amazon Forecast will make the best decision and choose among the available forecasting algorithms.

Forecast supports 5 algorithms, ARIMA, DeepAR+, ETS, NPTS, and Prophet. You can read more about these algorithms here https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html
For ‘Forecast dimensions’, click the drop-down and choose ‘Location’
OPTIONAL We’re not assuming any holidays in this simulation, so you can choose to leave the ‘Country for holidays’ blank.

The final choices should have the Train predictor screen looking something like this:
Click on ‘Train Predictor’. If all succeeds, you will see a screen that shows training is in progress

Recap

This training step can take a while, at least 20 - 30mins, so we are going to jump from here to Lab 1.

Just to recap, what we’ve done so far is:

Simulated an aggregated retail analytics dataset.
Copied this dataset to an S3 bucket.
Imported this dataset into Forecast
And launched a job to train a predictor on this data.

At this point, we’ll jump to Lab 1 and pick this thread back up when the training is done.

Step E

Once the predictor is trained, we’ll generate forecasts.

Click on ‘Start’ under ‘Generate forecasts’
Enter a descriptive name against ‘Forecast name’
For the ‘Predictor’ drop-down, choose the predictor that we just trained in the previous step.
Click on ‘Create a Forecast’
If successful, there should be a flash message like below.

Note This step, though shorter than training a predictor, should still take around 10 to 15mins. Given the time required for this step to complete, Let’s jump to Lab 2 and pick this back up when complete.

Step F

After the forecast generation is complete, we can look up lookup forecasts for specific items. Click on ‘Lookup Forecast’
In the ‘Forecast lookup’ screen, enter a name for the ‘Forecast’
We’ll do our 30 day forecast from Sep 29 thru Oct 29. So, for the ‘Start date’, enter 2019/09/29 as the date and 09:00:00 as time
For the ‘End date’, enter 2019/10/29 as the date and 17:00:00 as the time.
For ‘Forecast key’, item_id should have already been chosen by default (required by default, since this is what we’re forecasting)
For ‘Value’, enter ‘staplers’
Click on ‘Get Forecast’
You should subsequently see a graph with P90, P50, and P10 forecasts like so

Results Explained The P10, P50, and P90 forecast values represent 10%, 50%, and 90% probability of satisfying actual demand.

Running the Code (OPTIONAL)

While you can directly use the pre-generated retail_analytics.csv file to generate forecasts, you can also modify the gen_aggregate_pos_data.rb Ruby script (or gen_aggregate_pos_data.py in Python) that generates this file and modify it to generate a new retail_analytics.csv file to see differences in forecast based on changes you made.

To run this code:

cd into the lab3 directory

$ cd lab3

install bundler (which is used to install ruby dependencies)

$ gem install bundler

then install dependencies

$ bundle install

and run the script like so…

$ ruby gen_aggregate_pos_data.rb

[Ignore anything the below this, including any <style> directives]