Sales Forecasting with Prophet

From a business strategy perspective, the accuraccy of forecasting could be one of the most import aspects determining the company's success, depending on the industry. It is certainly the case in industries such as supply chain, manufacturing, retail, and most industries which involve the trading of physical goods.

The reason for forecasting importance might not be obvious at first. Most people often associate loss of sales (under-forecasting) or storage cost (over-forecasting) with the cost from having bad forecastings. But for some businesses (especially in trading), the cost of working capital is actually the most pressing issue comes from inaccurate sales forecastings. Having things stored in the warehouse and not sold means not only warehouse cost, but also, no cash coming in to pay for your suppliers and staff. Borrowing cash from the bank comes with an 8-15% interest rate, that's never fun.

In this post, I want to show a Kaggle competition I participated in sales forecasting, using Facebook Prophet. The key feature of my post is to introduce real life business practice to measuring the accuracy of forecasting models. if you would like to see the detailed code (Juptyer Notebook), please check it out on my Github.

1. Key Focus in EDA

Using this retail dataset, I want to emphasize a few key focuses in the EDA stage. First of all, for the most part, retail is an industry with strong seasonalities in sales. For instance, during winters people are less inclined to go out compared to summer, so the retail sales are stronger in the summer months. Same goes for weekends vs weekdays during a week, and working hours vs off-working hours during a day.

2. Models and Parameters

First of all, there is no one size fits all when it comes to modeling. When you are trying to solve a data science question, selecting the model best fits the dataset is fine. However, in the real business world, there are no "datasets", but rather historical data figures that we are trying to draw insights from in our attempt to predict the future. Like most people, I would also advise to test out a few models before using the best one. However, one thing that I would add from experience is that we need to try to approach model selection every time for a new forecast as time goes on.

I tried over 1,000 hyper-parameters for my model given the resource constraints, and just like in any business, there are always limitations and we do our best given what we have.

3. Analysis of Results

How to measure the accuraccy of results is an ongoing discussion in the data science community, like RSME and etc. In this area, and given my business background, I tend to argue for a business importance as the measurement of accuracy. For instance, if I'm Zara and doing a forecasting excercise for a store in NY. Our goal is to have sufficient stock to sell. In our daily forecast, we modeld 10pc of sales next Monday and 1 for next Tuesday. However, the actual results show that we sold 1pc on Monday, and 10pc on Tuesday. Did we have a horrible model? Yes, if we are measuring our MSE on a daily basis. But in the real business world, we have a perfect model since most supply cycles are weekly to monthly for most products across industries, and daily sales make no difference in acutal business planning.

Another area that should be given more consideration is the business side relevance. Let's use the Zara example again (I think I just really like their clothes). If I'm a category manager, my monthly target is on my categories. So I need to find out in which stores are my products not selling according to forecast, and make corrective actions. However, if I'm a Zara store manager, my priorities are different. I'm responsible for the overall sales of the store, as that's the target, so it would make no difference to me which product is selling and which are not. As long as I don't have stock storage so I'd have enought to sell, I'm happy.

In other cases, for instance, if the company is having cash flow issues, then our forecast should be relevant to the people making stock management decisons or supplier contract negotiations.

If you are interested in seeing my entire approach to the project, please feel free to check out my Github page here.

Check out my other A.I/Data Projects