A Machine Learning Approach to Inventory Demand Forecasting
The problem of Inventory Demand Forecasting is extremely simple to understand, yet challenging to solve optimize. The classic example is a grocery store that needs to forecast demand for perishable items. Purchase too many and you’ll end up discarding valuable product. Purchase too few and you’ll run out of stock. Numerous businesses face different flavors of the same basic problem, yet many of them use outdated or downright naive methods to tackle it (like spreadsheet guided, stock-boy adjusted guessing). In this article, I’ll outline a scientific approach for inventory demand forecasting using Machine Learning.
The Value Add
It’s worth discussing the value add of accurately forecasting inventory demand. There are two ways such a model can add value: 1) By reducing the amount of overstocked items and 2) By increasing the amount of understocked items.
Overstocked Items
In the case of grocery stores, overstocked perishable items result in a direct loss since expired perishable items must be discarded. For example, consider a grocery chain operating ten stores. Each store sells 100 perishable products with an average unit cost of $3. On a typical day you restock the shelves with ten of each product since items are both 1) being sold and 2) being discarded. If just 5% of the outflowing items are discarded, this is a loss of
10 stores
x 100 products per store
x 10 items per product off-shelved per day
x 0.05 discard rate
= 500 items discarded per day
or equivalently $547,500 discarded per year. If a forecasting model can reduce the discarded items by 10%, from 500 items discarded per day to 450 items discarded per day, this would result in an annual savings of $54,750 per year (assuming the model does not increase the number of understocked items).
Other retail companies face a different issue with overstocking. For example a clothing store that overstocks on winter coats will 1) reduce limited store space that could have been used to sell winter boots and 2) potentially result in having to sell the coats at a discount or loss come Spring to make space available for new items. These financial losses are harder to measure, but depending on the size of the business and quality of the existing inventory demand forecasting model, losses can be tens or hundreds of thousands of dollars annually.
Understocked Items
Understocking items is potentially a more severe issue. Understocking milk in a grocery store is likely to drive customers to a competitor. Losing a customer can cost hundreds if not thousands of dollars. Similarly, understaffing a restaurant on a busy day can result in long wait times and poor service, which would also result in loss of customers. Estimating the financial losses from understocking can be difficult since it’s not clear how much product would have been sold if the product were available. It’s even less clear how much customer value is lost since understocking drives customers away from a business. Nonetheless, it’s worth analyzing the frequency by which products run out of stock and attempting to estimate the financial loss of such occurrences.
Phases of the project
1) Data
As with any machine learning project, the first step is to collect, interpret, and analyze data. To implement a forecasting model, you should ideally have historic data regarding
- what products were in stock each day/week
- how many of each product were sold per day/week
- when was each product being promoted (e.g. when were products on sale)
- what days/hours was your business open
If your business is affected by seasonal trends, you should collect at least a couple years worth of data. Other datasets that may play into a forecasting model would be things like weather, holidays, event dates, etc. but these can usually be obtained by a third party. The important thing is that you collect and curate as much of the stock and sales data as possible. The reason for this is that machine learning models are trained using historic data, working under the assumption that you can use past data to make predictions about the future, and it’s important to be able to back-test a model to make sure it would have worked on different time periods.
2) Evaluation Metric
The next step is to determine what to predict and how to measure the performance of predictions. There is no best evaluation metric. Evaluation metrics are subject to the goals of a business. For some companies, minimizing understocked items is ten times more important than minimizing overstocked items, but for other companies the ratio might be 3 to 1. However, certain evaluation metrics are easier to minimize/maximize using standard machine learning models. For a grocery store, we may be asked to predict the sales of each (store, item, date), resulting in a table of predictions like
Store | Product | Date | Predicted Sales |
---|---|---|---|
1 | apples | 2018-01-01 | 4 |
1 | apples | 2018-02-01 | 5 |
1 | milk | 2018-01-01 | 26 |
... | ... | ... | ... |
3 | apples | 2018-02-01 | 9 |
3 | milk | 2018-01-01 | 21 |
3 | milk | 2018-02-01 | 8 |
Once we know the true sales, we could use a simple metric like Root Mean Square Logarithmic Error (RMSLE) to evaluate the accuracy of the model. I’ll avoid the gritty details here, but the metric would report a single positive value like 1.4 or 0.7 that scores the overall error of the model. Lower values are better, and the metric has a particularly nice property in that it measures the ratio of predicted-to-actual values. Predicting sales of 5 turkeys when only 1 turkey is sold will be penalized more than predicting sales of 100 apples when only 90 are sold. The big takeaway here is that, selecting an appropriate evaluation metric is extremely powerful because it quantifies the model’s performance with a single number. This allows you to make statements like “Model A is better than Model B” or “The model’s performance improves by 0.07 when we include feature X.” You could also use the same evaluation metric to evaluate the performance gain provided by a machine learning model versus whatever technique you currently use to forecast demand.
3) Validation Framework
Once an evaluation metric has been selected, you’ll need a framework for validating model predictions. In the case of inventory demand forecasting, an example framework would be
- Train a model using only data known as of 2016-12-31. Predict the sales for items in the range [2017-01-01 through 2017–01-14]. Measure the performance, $ P_1 $
- Train a model using only data known as of 2017-01-14. Predict the sales for items in the range [2017-01-15 through 2017–01-29]. Measure the performance, $ P_2 $
- Train a model using only data known as of 2017-01-29. Predict the sales for items in the range [2017-01-30 through 2017–02-13]. Measure the performance, $ P_3 $
- Average the performance scores, $ P_{Average} = (P_1, P_2, P_3)/3 $.
Obviously, doing this procedure for more time periods will increase your confidence in the model’s ability to forecast demand. Developing a cross-validation framework like this will allow you to compare different models and give you a sense of how well a model will perform in practice before you actually implement it. The important thing regarding this framework is that when you make predictions for the period [date1 through date2], the model should not be allowed to see any information/data on or after date1. Such a mistake is called leakage and is a common pitfall amongst model builders. Leakage results in over-estimating a model’s performance compared to how it will perform on real future data.
My suggestion to any company looking to build a forecasting model is to ask the data scientist to provide his or her cross-validation score and ask for future predictions. Do this before you put the model into production. Wait and see if the real predictions align with your sales and check whether the score on those future predictions aligns with the score(s) achieved during cross-validation. It’s often the case that data scientists achieve good results on past data but perform poorly on real future data due to flaws in their cross-validation framework.
4) Develop a benchmark model
A simple benchmark model might be, for a given (store, item, day), predict the sales to be exactly what they were last year. Another benchmark model might be, ask the stock-boy to make his best predictions of demand. Benchmark models are important because they put the evaluation metric into context by providing a baseline score. For example, if your stock-boy’s predictions yield a Root Mean Squared Logarithmic Error of 0.8, you’d absolutely expect so see a model you purchased yield a lower error. If your model is scoring 0.8 or above, it probably either has an error or is poorly designed (or you have a really smart stock boy).
5) Implementation
Last but not least, you’ll need to design and implement a model. Machine learning models like ridge regression, tree-based learners, neural networks, etc. may all be well suited for the task. However, model selection tends to be over-valued by data scientists. The difference between a gradient boosting model and a random forest model is marginal compared to the improvement you’ll see by including more data samples and better quality features. The benefit of using cutting edge, complex methods is rarely worth the cost of debugging them and trying to explain why they occasionally break or make weird predictions. Additionally, post processing techniques can be effective in cleaning up nonsensical predictions. (For example, if a model predicts negative sales, this will obviously need to be post-processed to 0.)
Credit to Kaggle for introducing me to this problem via the Corporacion Favorita Grocery Sales Forecasting competition.
The Size Curve Problem
A highly related problem is “the size curve problem” whereby a retailer has to determine how many of each size or variant of a particular product to order. Check out my write up on this problem here.