Sales season is on. Black Friday and Christmas are traditionally some of the promotion strongest dates during the year. However for many marketing analysts and data driven marketers a lot of uncertainty develops on how to predict the outcome of promotional efforts. Obviously most promotions will probably lead to more orders, but to how much incremental revenue will it lead? Can it even have a negative effect by cannibalizing sales in the time following the promotions? Or is the overall effect positive nonetheless?
So imagine if you had a predictive model with which you can forecast the effects of your planned promotions. Giving you more certainty on how big your sales revenue will be.
This guide provides step-by-step instructions on how develop such a forecasting model for your business based on past promotions. All you need is Google Sheets and a free solver add-on. I recommend working along in the provided sheet to truly understand everything.
Defining the task
For this example you are given the historical daily revenue for an online retailer for the last two and a half years up until 2018-11-28 (obviously the techniques in this model can be used for other products such as online courses, software and others as well). In the past years the retailer ran 20% and 30% promotions reducing the price of all sold products respectively. Promotions were run on random days as well as during Black Friday and before Christmas.
We now want to find out a) what the impact of those promotions is, b) if it makes more sense to run a 20% or 30% promotion for overall revenue and c) if a promotion should be run during upcoming Christmas time.
Preparing the data
Formatting and cleaning data is usually the first step you have to take for every data analysis. As we are dealing with quite a simple data set here (revenue and dates) and the data has been prepared upfront there is not much to do in terms of cleaning and formatting.
However we still have to do some preparations for our upcoming analysis. As we want to investigate the impact of promotions we have to indicate the dates on which promotions happened. Furthermore we need to indicate if the promotions had any visible impact on the days after the promotions, i.e. negatively due to the cannibalization of revenue.
Both has been done already in the provided sheet. In column E - H a “1” indicates that a event has occurred respectively. E.g. a “1” In cell E2 would mean a 30% promotion has happened on 2016-01-01, while a 1 in cells E3 - E5 means a negative after-promotion effect was observed on those days.
Days of the week and seasonality usually have an impact on sales as well (e.g. weekdays vs. weekends, summer month vs December), which is why we want to factor in those as well.
There are two simple formulas we can use here: MONTH() and WEEKDAY() which will give you month and the day of the week for a reference cell respectively. Those formulas have been used in column B and C.
That’s it for preparation. Obviously even though everything above is already provided in our example data you would still have to do it for your own datasets.
Building a basic model
What we are going to do is build is a model, which forecasts sales based on the impact of seasonality and promotions. We will have the following five factors influencing our sales prediction:
- A constant as a basis
- Influencing effects for the day of the week
- Influencing effects for the month
- Influencing effects for a promotion day
- Influencing effects for the days after a promotion
This means for each day the forecast of revenue will be calculated by the following equation:
Predicted revenue = Constant + [day of the week effect] + [month effect] + [20% promotion effect] + [days after 20% promotion effect] +[30% promotion effect] + [days after 30% promotion effect]
Put those individual parameters in cell Q2:Q27 and a 100 in the respective cells in column R. Those are only trial values and will act as coefficients.
We will use the solver to estimate the coefficients, which will make the model fit best to our data, i.e. minimizing the (squared) error between our forecasts and the historical data. But more on that later on .
Next step is to write the actual formula, which corresponds to above equation. In cell M2 put the following formula and drag it down the whole column:
=$R$2 + VLOOKUP(B2,$Q$4:$R$15,2,FALSE) + VLOOKUP(C2,$Q$17:$R$23,2,FALSE) + E2*$R$24 + F2*$R$26 + G2*$R$25 + H2*$R$27
R2 represents the constant. VLOOKUP(B2,$Q$4:$R$15,2,FALSE) picks up the coefficient for each month while VLOOKUP(C2,$Q$17:$R$23,2,FALSE) picks up the coefficient for the day of the week. If a promotion occurred on a given day E2$R$24 and G2$R$25 will pick up the effects of a 20% and 30% promotion respectively, while F2$R$26 and H2$R$27 will do the same for after promotion day effects.
Now you’ll calculate squared error for each day. This can be done simply by putting =(D2-M2)^2 into N2 and dragging it down the column. Sum all squared errors up in cell Y4.
One last thing you have to do before using the solver is to average all values of the day of the week and months values. You can do that by putting =AVERAGE(R17:R23) in into V4 and =AVERAGE(R4:R15) into V5. We are doing this as we will add constraints to the Solver model, which constrain the average day of the week and the average month to equal 0. We are doing this as those constraints make sure that a day of the week or month with a positive effect has a higher than average revenue and a day of the week or month with a negative effect has a lower than average revenue.
If you haven’t so far: No it’s time to download the solver. After doing so click on Add-ons -> Solver -> Start to set it up:
We want to minimize objective cell Y4 by changing values R2:R27, while constraining V4:V5 = 0.
So basically what we are doing is we are minimizing the squared error between the historical revenue data and our fictional forecast by changing the coefficients for each of our above defined five factors. When you are done setting up the Solver click on “solve”.
The outcome tells you a couple of things. E.g. a 30% promotion day raises (all else being equal) the revenue by $6116. Weekends are revenue wise a lot better than other week days and December accounts for a major sales raise as well. Days after a promotion usually lead to a drop in revenue.
Evaluating the model
You can evaluate the model by calculating the R2 value between the forecasts and the actual historical revenue. Plug =RSQ(D2:D1064,M2:M1064) into cell Y5 for this. The formula calculates the percentage of the variation in revenue, which is explained by the forecasting model. In our case we should get a 0.59, which means that the independent variables of the five factors explain 59% of the daily variation of the revenue. Obviously as such the model has some room for improvement.
In order to do so we’ll try to spot any outliers in the daily forecast accuracy. As a first step calculate the error between the forecast and the historical daily revenue. Do so by putting =D2-M2 into cell O2 and copying it down the column. Calculate a close approximation to the standard error of the forecast with the standard deviation of the errors with the formula =STDEV(O2:O1064) in cell Y6.
We define outliers if the absolute value of the forecast error exceeds two times the standard error. So as a next step we want to identify the outliers fitting into this definition. For this we will use conditional formatting. Select range O2:O1064 and click on Format -> Conditional Formatting -> Add new rule. In the "Format if…" dropdown menu choose "Custom formula is" and put =abs(O2)>=2*$Y$6 in the box.
This will highlight all outliers within above definition.
And, surprise surprise we will find clusters of outliers around Black Friday and before Christmas, both we did not include in our original model. However, obviously people are shopping a lot more during these dates (e.g. 12/01 - 12/17) and less after (e.g. 12/18 - 12/31).
Improving the model
Now that we know what caused the outliers and thus a lower forecasting accuracy we can improve our model by adding those factors to our original five factors.
As such we’ll add Black Friday, After Black Friday, Christmas sales, After Christmas into the headers of the columns I:L in order to be able to mark the occurrence of those events for each day similar to what we did before for the promotions. As such we’ll add the coefficients for those new factors in Q28:R31 as well.
Last adjustment we have to do now is to add the factors in our forecasting calculation. Put the following formula into Cell M2 and drag it to M1064 (we added the Black Friday, Christmas and the days after those events respectively as factors):
=$R$2 + VLOOKUP(B2,$Q$4:$R$15,2,FALSE) + VLOOKUP(C2,$Q$17:$R$23,2,FALSE) + E2*$R$24 + F2*$R$26 + G2*$R$25 + H2*$R$27 + I2*$R$28 + J2*$R$29 + K2*$R$30 + L2*$R$31
Now run the solver again with the added variables:
Adding those variables improved R2 drastically to 0.88, which means the variables explain 88% of the daily variation of the revenue
The actual forecasting
So far we have only worked on the prediction model. Next step is obviously to forecast sales revenue with this model until the end of the year and to examine what impact a 20% or 30% promotion has. This actually the easy part.
First copy the formula from cell M1064 to all cells M1065:M1097. Next mark the Christmas sales time and days after the Christmas sales in column K and L (this is quite easy here as the change in sales revenue is quite obvious. However it might be more difficult to spot in real life data.)
This will give you a prediction on what revenue to expect for December without any promotions: An overall revenue of $175,734.
Let’s try now what will happen, if we have a 20% promotion on 2018-12-01. Plug in the indicators into G1067 and for the after promotion effects into H1068:H10670 respectively. The overall revenue for December would be $175,082 now.
Delete the indicators for the 20% promotion and try the same for a 30% promotion by writing a “1” into E1067 and F1068:F10670. Overall revenue for December would be now $177,291. Not much of a difference but still it is better to go with a 30% promotion.
That’s it, we are done! We are now able to answer all of our initial questions: We know what revenue impact the promotions have respectively and which promotion we should run during Christmas time
You might wonder, why we did not solve the problem with a regression analysis tool. The answer to this is actually quite simple:
For a regression analysis with a typical spreadsheet analysis add-on the dependent variable would have been revenue. In addition we would have needed 25 independent variables to account for the different factors (months, day of the week, promotions, Black Friday, Christmas, after event days).
Most analysis add-ons simply can’t handle that many independent variables (including the Excel Analysis ToolPak, which can handle up to 15 independent variables). That’s why we use the Solver, which allows a lot more changing cells) to estimate the coefficients of the independent variables in order to minimize the sum of squared errors.
In addition as with all models above created model is only a simplification of reality and can never give a 100% accurate forecast. Furthermore in this case the revenue numbers were simplified as well. As such revenue growth wasn’t included. In addition events and after event days were very easily identifiable and could be clearly distinguished from normal days. As mentioned this might be more difficult for a real life data sets.
Nevertheless the above guide provides a good basis to develop your own forecasting models and to fit a model to your revenue data and events. So next step for you is to fit it to your own data set and to predict if promotions for Christmas will make sense for you!