Large companies of all kinds define, evolve and rely on enterprise-wise forecasting systems that model and predict many aspects of business development. Central to such business analyses are revenue forecasting components that operate at multiple scales in time and across business enterprises. In large retail supermarket companies, forecasts are impacted by multi-scale influences such as company-wide policy, regional differences, variation across Categories of items bought and sold, and demand for individual items at individual stores, among many other influences on revenue streams. In large and diverse supermarket chains, forecast information at multiple levels of aggregation– devolving to groups of items (Categories) and groups of stores, referred to as Local Store Groups (LSGs) – are utilized by down-stream decision makers in the enterprise. In this setting, we discuss aspects of a large case study that evolve modeling approaches to aid and inform these complex decision processes.
In business sales forecasting, information about demand filters from the bottom-up in terms of consumer behavior that underlies item-level sales. In parallel, information about supply, projected sales targets and macroeconomic considerations filter from the top-down, often in formats that are not easily compatible with statistical forecasting models. Models generating revenue forecasts for product Categories and groups of stores thus need to integrate bottom-up and top-down information. Forecast outputs also need to be in a form that Category-managers, store-managers and executives can utilize. In major companies with many stores and products, what may appear to be very small improvements in forecast accuracy at the levels of groups of items and groups of stores can translate to very major revenue impact at the enterprise level; hence modeling developments that yield apparently modest improvements at the “micro” levels are of major interest.
In this work, we discuss aspects of a long-term case study of revenue forecasting for a large grocery chain. There are two primary dimensions of interest: Local Store Groups, groups of policy-similar stores (in terms of geography or management); and Categories, defined groups of similar or related items on sale. The business setting defines a focus on forecasting revenue 12 weeks ahead for every LSG-Category pair. There are multiple challenges in this and related settings. While patterns of Category demand are related across LSGs, there is also considerable heterogeneity by LSG and Category. Sharing information has the potential to improve forecasts, especially for smaller LSGs and Categories, but it is not obvious at what level to share information due to the heterogeneity. Key questions arise on how to utilize Category-level information on discounts and pricing, in particular. The focus on longer-term forecasting– a forecast horizon of 12 weeks or more to feed-into longer-term planning and decisions– defines challenges to all forecasting approaches.
A number of down-stream business questions are informed by revenue forecasts. The primary interest is in forecasting for 12 weeks ahead to feed into pricing decisions; even very small improvements in forecast accuracy at LSGs and Category levels can translate to large monetary gains across the system. The grocery chain is also interested in understanding the roles of pricing and promotion strategies, for both LSGs and Categories, and in exploring “What-if?” scenarios where pricing and discounts are altered and the impact of these changes assessed. This necessitates interpretable models such that: (i) the roles of such control and predictor variables can be assessed; (ii) users can intervene in the models in informed ways; and (iii) forecast uncertainties are fully characterized for proper use in down-stream decision making. There is also interest in understanding dependencies between Categories, particularly in relation to possible “cannibalization” effects that might occur when one Category is subject to more aggressive discount policies than another that might “compete” for customer purchases. There is also the evident need for models to be open, responsive and adaptable over time as realized consumer behavior and grocery demand is inherently time-varying. We address these desiderata using customized classes of dynamic linear models(West and Harrison, 1997; Prado et al., 2021) applied to revenue time series at the LSG-Category level, with multi-scale extensions (e.g. Berry and West, 2020; West, 2020) to represent key aspects of multivariate relationships.
Statistical forecasting has a long history in revenue management across industries. Models must address basic questions of seasonality, stochastic variation in demand, price sensitivity, and computational efficiency (e.g. Weatherford,Larry, 2016)
. More recently, machine learning and algorithmic approaches have been explored for revenue forecasting.Pundir et al. (2020) and Lei and Cailan (2021) et al. (2019) and Chu and Zhang (2003)
explore deep learning methods. Such approaches can yield forecast accuracy improvements, especially in short-term forecasting and when time-variation is very limited. They are, however, challenging to interpret and typically neither probabilistic nor dynamic. Particularly in the retail domain, Bayesian dynamic models have been successful in terms of forecasting accuracy, and are substantially preferable in terms of interpretation, openness to intervention, and fully probabilistic forecasting(e.g. Berry and West, 2020; Berry et al., 2020; Yanchenko et al., 2021).
Our case study also involves methodological contributions. We extend multi-scale models (e.g. Berry and West, 2020; Berry et al., 2020; Yanchenko et al., 2021) to allow sharing of discount information, and represent multivariate structure in pricing and revenue via a recoupled system of univariate models. These are embedded in the case study discussion throughout.
Section 2 introduces the retail setting and data. Section 3 describes the multi-scale modeling framework, noting the role of the decouple/recouple approach in engendering scalability of multivariate models. Section 4 discusses selected results, highlighting: (i) retail Categories that benefit from multi-scale modeling in improved revenue forecasting, and others that do not; (ii) contexts where forecasts can be improved by joint modeling of pricing, revenue and dependencies across Categories; and (iii) aspects of cross-Category dependencies. Concluding comments are in Section 5.
2 Setting and Data
The setting is revenue forecasting at the LSG-Category level for a large grocery chain. The forecasting level of interest here is across groups of items (Categories) and groups of stores (LSGs). Each Category is a collection of (a large number of) related items; each LSG is a subset of (a small number of) regionally proximate stores. LSGs, in general, share traits in terms of discounts offered and pricing, though there is variability across LSGs and Categories. It is thus important to allow for variability by LSG and Category, while also allowing information sharing– as appropriate– to potentially increase forecast accuracy.
The data provide 2 calendar years of weekly information for 100 product Categories across 9 LSGs in one geographic region of the USA. This includes weekly revenue (in $s) and detailed information about pricing and promotion for each Category and LSG. Several “breadth of discount” measures (weighted averages across items within each Category) exist and we use three: Temporary Price Reduction (TPR) percent, a percent measure of advertising on the front page of leaflets (AdFront percent), and a percent measure of special stock displays in the back of stores (DspBack percent). Each of these discount measures represents the percentage of items within each Category with each type of discount, weighted by how often each item has historically been purchased. Other information includes the weighted average of discounted price of items within a Category, referred to as the Net Price; this is a quantity that turns out to be quite useful in forecasting weekly LSG-Category level revenue. Throughout, all revenue results are scaled by a random factor.
In Figure 2, we see that revenue varies both by LSG and Category. Over all 104 weeks, however, revenue by Category trends appear similar across LSGs, though different in scale (Figure 2). While there do appear to be potential holiday effects for some Categories, we do not explicitly take holidays into account here. Both pricing (Figure 4) and discounts (Figure 6) tend to be very similar across LSGs, and to vary considerably by Category. While each LSG has some control over individual discounts for that particular group of stores, there is coordination among the LSGs in terms of pricing and promotion decisions. Pricing, in particular, tends to be very similar between LSGs over time, and in general, fairly stable for most Categories (Figure 4). Variation in the Net Price variable over time and between LSGs is largely a function of discounting, as the Net Price variable is the weighted average of price actually paid by customers after taking any discounts into account. On the other hand, there is much more variation over time in terms of TPR percent (Figure 6). Again, TPR trends are similar across LSGs, though vary considerably by Category. TPR percent tends to be the most variable of the three available discount measures.
3.1 Multi-Scale Modeling
We are interested in forecasting revenue weeks ahead for each LSG-Category pair. Discount information is set multiple weeks in advance, so discount covariates can be treated as known 12 weeks into the future. However, Net Price needs to be forecast to be used as a covariate at this forecast horizon, as Net Price depends on the discounts seen by individual customers. To improve the revenue forecasts at the LSG-Category level, we utilize aggregate multi-scale discount information across LSGs, extending the approach of Berry and West (2020).
Multi-scale analysis enables forecast information from aggregate levels to inform lower-level forecasts, inherently hierarchical by design. Multi-scale models are critically interesting alternatives to far more computationally implicated hierarchical models (e.g. Salinas et al., 2019; Sen et al., 2019)
. Multi-scale approaches share information across series while enabling parallel estimation of univariate models(Berry and West, 2020; Berry, 2019; West, 2020)
. This enables scaling to large numbers of time series such as are frequently seen in business contexts; computations scale linearly in the number of series. Importantly, this avoids the need for large, complex Markov chain Monte Carlo or particle filtering methods, while retaining the ability to improve multi-step ahead forecasts for individual series by incorporating multi-scale “dynamic factor” signals. Scalability is especially relevant in demand forecasting settings, where there are very many noisy, sparse and heterogeneous individual series. However, there often exist cross-sectional or other hierarchical structures in this type of data– across items, for example– that can be leveraged as aggregate, multi-scale signals to improve forecasts at the lowest level. Our models here build on this background.
Let be the revenue for week , Category and LSG and be the revenue aggregated across LSGs for each Category . Then, let
be the vector of discount measures (TPR percent, ad front percent and display back percent). Hereis known 12 weeks in advance and we aim to forecast for all into the future . Our modeling strategy is to:
Model aggregate revenue across LSGs (multi-scale): .
Extract inferred effects of aggregate discounts from model (1): .
Model revenue: .
This model for revenue depends on LSG-Category specific discount information () and multi-scale discount information across LSGs (); see Figure 7. This defines a flexible baseline model. Section 4.2 discusses extensions to include Category pricing information that can yield revenue forecasting improvements.
This hierarchical, multi-scale approach allows each LSG-Category pair to “see” common, aggregate revenue responses to discounts differently and allows for sharing of information and personalization of the common trends for each specific LSG. This approach increases forecast accuracy for many LSG-Category pairs for 12-week ahead revenue forecasts, in particular for smaller LSGs that build on information from larger LSGs. On a key technical point, we use “plug-in” point forecasts of the multi-scale effects of discount predictors, choosing the current (time ) posterior mean of the effect in the aggregate model. This under-states uncertainty in resulting revenue forecast distributions as it ignores uncertainty about aggregate discount effects. Applied evaluations lead us to accept this practical side-step of full uncertainty characterization, as it has modest practical impact. At the costs of more extensive computation it is, of course, easy to extend the analysis to include full uncertainty characterization, repeating the analysis with Monte Carlo samples of the discount effect; see Berry and West (2020) in related models. This more computationally intensive analysis, across numerous LSGs and Categories, can aid in understanding how relevant or– in this case study– practically limited, is the impact of this second-order uncertainty analysis.
3.2 Dynamic Linear Models
DLMs define the core class of time series models for all levels in the multi-scale setting of Figure 7. For a generic univariate time series observed at discrete times , information at time is denoted by where represents any additional relevant information beyond the observed data. A DLM has the form
is a matrix of known covariates at time ,
is the state vector, which evolves via a first-order Markov process,
is a known state evolution matrix,
is the stochastic innovation vector, with the independent over time, and
is the known innovation variance matrix at time
Sequential learning in the DLM proceeds naturally via computationally easy updates and forecasting algorithms. Analysis at the level of each univariate series is standard (West and Harrison, 1997; Prado et al., 2021).
3.3 Modeling Details
Revenue is modeled on the log scale using normal DLMs with a trend term and additional covariates; each univariate DLM has the vector with a leading element of 1 followed by entries representing potential seasonal components and known predictor/covariate values. Among the latter, the aggregate revenue model for uses the average discounts across LSGs, , as additional covariates and has yearly seasonality represented by the fundamental (52 week) harmonic model component. The LSG-Category revenue model for , has multi-scale discount information included as predictor values; here has elements and as covariates, again with yearly seasonality defined by the first harmonic. All models use the same specific state evolution discount factors to define rates of change over time of state vectors. This completes the basic DLM outlook for each univariate revenue series.
In terms of customized predictor information, Category price discount covariates that are negligible over all weeks are not included (some Categories are rarely discounted, especially various alcohol Categories). Similarly, covariates that are static for many weeks have some small amount of noise added to them to stabilize the modeling; this is a common approach in machine learning and has connections to ridge regression. Here, we add noise to control variables to (1) stabilize inference when there is not much variation in the covariates, and (2) to reflect potential noise in the estimation of these control variables out to 12 weeks in advance, for some increased robustness in the models for practical application. All LSG-Categories pairs are modeled separately as univariate DLMs as described in Section3.2. Recoupling is then induced by sharing information within the over-arching multi-scale framework. Analysis is implemented in PyBats (Lavine and Cron, 2020).
4 Selected Results
Models were fit and evaluated over the first year of data to define selection of DLM discount factors. The detailed forecasting analysis and selected evaluations are based on then running the analyses sequentially over the second year of data with out-of-sample forecasts generated each week for the following 12 weeks. Empirical forecast accuracy measures are all on the 12-week horizon. Section 4.1 gives selected examples where multi-scale modeling improves revenue forecasts and others where it does not. Section 4.2 highlights situations where adding information to the multi-scale models is shown to improve revenue forecasting, with rationalization and discussion of business implications. Section 4.3 explores aspects of dependencies across Categories with a view to advising potential competing goals in Category-wide pricing and discount strategies. Throughout, all revenue results are scaled by a random factor.
4.1 Multi-Scale Revenue Forecasting
4.1.1 Some Aggregate Results
A first interest is in identifying Categories and LSGs where there are forecast improvements using the multi-scale analysis that shares discount information across LSGs, as described in Section 3. Using the MAPE metric, the results vary by LSG-Category pair, as seen in Figure 8. About 45% of the LSG-Category pairs benefit from the inclusion of multi-scale discount information, having lower MAPE values. Again, at this enterprise-wide level of forecasting, even small very improvements in MAPE can lead to large increases in revenue, so these cases are of key interest. Then, identifying cases that are better forecast without the multi-scale information is just as important; these LSG-Category pairs will be forecast using their individual models.
4.1.2 Revenue Forecasts
We now focus on specific LSG-Category examples that benefit from multi-scale information. In addition to the forecasts themselves, we look at the regression effect of the discount information from the multi-scale model to illuminate the impact of the multi-scale information. For each Monte Carlo sample of the state vector from the multi-scale model across LSGs, the discount regression effect is
This represents the overall impact of the multi-scale discount information.
Some general points and findings are noted first. Forecasting 12 weeks ahead is challenging. An evaluation on 1 week ahead forecasts could be misleading in terms of the main longer-term horizon of interest. Then, we find that in the cases where multi-scale information improves the forecasts at the 12-week horizon, it also does at the 1 week ahead forecast horizon. Further, multi-scale information can improve the forecasts of both large and small LSGs. Additionally, Category discount information is absolutely critical to include in the revenue forecasting models, either as multi-scale information or not. As the main control variable, the discount information is able to produce good forecasts alone for the majority of LSGs and Categories. Finally, some Categories have clear and strong holiday effects. With only two years of data here, there is not enough information to estimate holiday effects directly, but we discuss possible approaches to addressing holiday information in more detail in Section 4.2.
One Category that particularly benefits from multi-scale information is the Sugars & Sweeteners Category, with forecasts for two LSGs and the multi-scale regression effect shown in Figure 9. Across both larger and smaller LSGs, the inclusion of multi-scale discount information defines MAPE optimal forecasts that are more accurate than those from the no multi-scale model, especially over weeks 10 and 30. For Sugars & Sweeteners, around weeks 10-20 in (c) there is a dynamic, negative discount regression effect, compared to the rest of the weeks; this translates to lower forecasts from the multi-scale model compared to the no multi-scale model in Figure 9. This negative regression effect pulls the forecasts down in this region, leading to more accurate forecasts. This response to discounts in terms of the revenue is shared across LSGs and well captured by the multi-scale model, leading to improved forecasts for this specific Category. Additionally, in both the forecasts and regression effect in Figure 9, there are strong holiday effects around week 30 (the week of December 15).
Figure 10 shows similar forecast summaries at the 1 week ahead horizon. While overall forecast accuracy is naturally higher than that for the 12-week horizon, note that the multi-scale model still leads to improved forecasts for these LSG-Category pairs.
Frames (a) and (b) show 12-week ahead forecasts from the multi-scale and the no multi-scale models for the Sugar & Sweeteners Category for two LSGs. Average MAPE values over the year are shown in the legends. The point forecasts are MAPE optimal, shading shows 90% credible intervals in the multi-scale model, and points are the observed revenue values. For all LSGs, the inclusion of multi-scale information improves the forecasts. Frame (c) shows the on-line estimated regression effects, with 90% credible intervals, of the combined discount predictor information.
Broth/Dry Soup is an example of a Category where the value of the multi-scale information varies by LSG. In the larger LSGs in Figure 11, there is little benefit from the multi-scale information and the multi-scale model tends to under-forecast around weeks 20-30. However, there is real benefit from the multi-scale information for the smaller LSG. This is a common finding in hierarchical models: smaller groups (here LSGs) can benefit more from sharing of information across larger groups due to the increased shrinkage on smaller groups. The multi-scale discount information improves forecasts the most for smaller LSGs generally. In this example, note also the change in regression effect around weeks 20-30, shown in (c). The multi-scale regression effect tends to lead to better forecasts for this time period for the smaller LSGs, as compared to the larger LSGs which under-forecast here. Forecasts for both models also naturally improve at the shorter, 1 week ahead, forecast horizon; see Figure 12.
Finally, Baked Sweet Goods is an example of a Category where multi-scale information does not improve revenue forecasts. In general, from weeks 35-50, the multi-scale model tends to over-forecast, as reflected in both the forecasts themselves and the positive regression effect for this time period in Figure 13. For weeks prior to week 35, the regression effect is approximately 0 and the no multi-scale and multi-scale models give very similar forecasts. This Category could perhaps benefit from other types of multi-scale information that is more relevant, especially in early weeks when there is minimal discount regression effects. One week ahead forecasts are given in Figure 14.
4.2 Extending the Revenue Models
Additional information from the grocery chain offers potential to further improve revenue forecasting in specific settings. Here, we focus on the role of Category level pricing and holiday effects.
There is additional information about pricing information via the Net Price variable; this is an average measure of the Net Price realized by customers (including discounts), averaged over customers within LSG and Category. We find that jointly modeling and forecasting Net Price together with revenue can further improve revenue quite generally. Updating the details in Section 3, the modifications are as follows.
Let be the Net Price for week , Category and LSG ; we need to forecast as it incorporates realized discounts received by customers and so is uncertain in future weeks. We define a joint model by coupling two univariate dynamic models: one for Net Price and one for revenue that extends the earlier DLM to also include Net Price as a predictor. This decouple/recouple approach enables customization of each of the univariate model as well as sensitive modeling of dependence of revenue on Net Price. In summary, for each LSG-Category over weeks we:
Model Net Price: .
Model revenue across LSGs (multi-scale): .
Extract imputed values of the discount state vectors from model (2):as before.
Model revenue: now also conditional on imputed values of
At the final model stage, the imputed values of can be any selected point forecasts; the baseline choice is a “plug-in” analysis that uses the forecast median of Net Price as from its univariate model. This can be refined to run analyses repeatedly over a range of values or a Monte Carlo forecast sample of Net Price to understand if uncertainty under-quantification using the plug-in analysis is practically meaningful. The revenue model also includes both the LSG-Category specific discount information and multi-scale discount information across LSGs, as before. The Net Price model uses the LSG-Category specific discount information and pricing information without discounts (the latter being which is a control variable for the grocery chain).
Selected aggregate results are highlighted in Figure 15. With the set of univariate DLMs without multi-scale and Net Price extensions (“No Multi-Scale”) as baseline, this shows average revenue forecast MAPE values from (i) a revenue model with Net Price information only, (ii) the original multi-scale revenue model, and (iii) the more general revenue model with both multi-scale and Net Price information of this section. Compared to the baseline, 28% of the LSG-Category pairs are improved with the Net Price model, 45% for the multi-scale model, and 37% for the multi-scale and Net Price model. A number of specific LSG-Category pairs that particularly benefit from the inclusion of pricing information, while others do not.
One Category where the combination of pricing and multi-scale discount information improves revenue forecasts is Craft/Micro Beers. This Category is rarely discounted and when it is the discounts tend to be small. There is also some retail price drift separate from discount information that can be helpful for this Category (see further comments in Supplementary Materials). Forecast comparisons and regression effects are given in Figure 16. The regression effect is generally insignificant over time. We do see that, around weeks 35-40, the larger negative regression effect pulls down the forecasts in the multi-scale model, improving 12-week forecast accuracy over for this time period. While there is limited explanatory information in LSG-specific or multi-scale discounts for this Category, they nevertheless have practical value in revenue forecasting.
4.2.2 Holiday Effects
Some product Categories exhibit clear, important but sporadic holiday effects. The Sugars & Sweeteners Category, for example, shows effects particularly around Christmas (Figure 17
). However, two years of data do not provide historical information sufficient to incorporate holiday week dummy variables, or holiday-specific transfer response model components over the week before, of and after the holiday period, such as is standard in Bayesian forecasting in commercial settings(West and Harrison, 1997, Sections 9.3 and 11.2). Transfer response models designed specifically for local holiday effects have been utilized in related models in our setting, and coded for public access and incorporation into revenue models (Lavine and Cron, 2020).
The revenue models in further development for routine application are developed this way, but for our interest here we are mainly concerned about the impact of holiday events on forecast accuracy summaries. In terms of basic empirical accuracy impact, it is easy to re-evaluate MAPE (or other) metrics across all LSGs and Categories over the year of test data but simply dropping the (rare) holiday weeks from the summary. This does not wholly re-evaluate accuracy, since the model analysis includes those weeks and so the sequential updating analysis is inevitably perturbed (negatively) by poor forecasts at holiday times that are not explicitly modeled as they might be, as noted above. But, simply masking out a few holiday weeks from the forecast error evaluation gives at least a lower bound on potential improvements.
More formally, a fully Bayesian feed-forward intervention approach simply defines each holiday week as a known time when major departures from the routine model forecasts are expected, and treats the outcome data for those few weeks as missing observations. This is effectively building in a “holiday week” random intervention effect specific to each holiday, and with very high prior uncertainty. The result is that the state vectors in the baseline models will be protected from what may be large forecast errors in the forward filtering and updating analysis (Berry and West, 2020; West and Harrison, 1997, Section 11.2.4).
Identifying the week of Thanksgiving, the week of Christmas and the week after Christmas (New Years’) for the Sugars & Sweeteners Category leads to strong aggregate improvements in terms of lower MAPE values; the net reduction in empirical MAPE values averaged over LSGs, Categories and across the 1 year evaluation period is about 7-8%. This indicates that the three holiday periods have a substantial impact on forecast accuracy metrics. Some Categories are far more impacted than others, of course, and implementation of the models for routine use will customize developments for holidays as needed. For a subset of Categories, including specific holiday effects formally with more data is likely to be beneficial to revenue forecasts. This has been found to be the case internally by the grocery chain on separate data for which a longer period of time is available on some Categories and LSGs.
4.3 Exploration of Cross-Category Dependence
There are business interests in identifying whether discounts for one Category affect sales and hence revenue in other Categories. Identifying such relationships has potential to yield forecast accuracy improvements by including relevant cross-Category discount predictors in revenue models. Then, if higher discounts for Category A are associated with higher sales for Category B, the products within the Categories are potential complements and cross-Category promotion strategies may be of interest to management. A store or LSG could offer discounts in Category A to induce customers to also purchase products in Category B at lesser discounts. On the other hand, if higher discounts for Category A are associated with lower sales for Category B, then products within the two Categories are possible substitutes of each other, and discounts potentially “cannibalize” cross-Category sales. If some products in Category A are heavily discounted and sales within Category B decrease, then consumption has merely shifted and apparent sales lift in Category A is masking potentially store-level, or LSG-level, drops in revenue.
We identify potential pairs for cross-Category analysis by examining relationships between standardized forecast errors from the log revenue models. This is exemplified here using the 12-week multi-scale revenue models incorporating multi-scale discount, Net Price, and the primary discount variables as predictors. Post-forecasting exploration of 12-week ahead forecast errors is key as these realized errors are implicitly already free (modulo the assumed adequacy of the models) of the effects of Category-specific discounts and other effects that may generate spurious indications of cross-Category relationships. The later include, for example, any patterns of local trend and/or seasonality that may be common to Category revenue and discount decisions; e.g. sales of hot chocolate increase in the winter while discounts on ice cream decrease. Further, we use realized errors standardized under their step ahead forecast distributions; this appropriately accounts for series-specific residual volatility over time prior to evaluating cross-Category correlations.
It is also important to examine consistency of any potential cross-Category relationships across Local Store Groups. Each LSG has, in theory, the ability to independently select discount strategies in any Category for stores in the LSG. If an observed cross-Category relationship is consistent across LSGs, then that Category pair is of more interest for further exploration.
Some summaries of exploratory analysis using the top Category combinations are highlighted. Figure 18 presents a heat-map of cross-Category correlations of forecast errors from the 12-week ahead multi-scale revenue models with TPR% discount. For each pair of Categories the correlation is that between realized forecast errors in Category and TPR% in Category evaluated over the 52-week forecasting test period and averaged across the 9 LSGs. While many pairwise correlations are apparently negligible, interest lies in exploring specific example pairs where the correlation seems highest. First note, however, that the corresponding correlation heat-map based on raw revenue data rather than on the model-based forecast errors shows substantial numbers of much higher correlations (Supplementary Material, Figure 27). The naive analysis using raw revenue generates many apparently interesting but spurious suggestions of cross-Category relationships that disappear when evaluation uses forecast errors instead of revenue. A more important comparison is with the corresponding heat-map of correlations using using 1-week rather than 12-week ahead forecast errors (Supplementary Material, Figure 28). Analysis at the longer forecast horizon shows evidence of some stronger correlations than using 1-week forecast errors. This is important since the 12-week horizon is most relevant for business decisions; at that horizon, forecasts are generally less accurate than the 1-week forecasts, so there is more room for improvement by incorporating cross-Category promotion strategies in the 12-week forecasting models.
We highlight one particular pair of Categories: Cold Cereal and NF (organic) Milk. Discounts for Cold Cereal have the largest correlation with NF Milk forecast errors across the examined Categories in two of the nine LSGs, always positive, and fourth largest when averaged across LSGs. Figure 19 shows 12-week forecast errors for NF Milk against Cold Cereal TPR% discount for each of the LSGs. Slightly positive– albeit rather weak and noisy– relationships are consistent with the view that the models tend to under-predict NF Milk revenues when Cold Cereal experiences higher discounts. The concordance across several LSGs is important in supporting the view that this is a systematic, potentially casual relationship. From a forecasting viewpoint, the potential for such a cross-Category association to be useful is explored by re-estimating the revenue for NF Milk but now extending the model to also include the Cold Cereal TPR% discount as a predictor. That analysis was performed and confirmed point forecast improvements; the 12-week ahead MAPE metric averaged over the 52 week test period is reduced for 6 of the 9 LSGs and remains essentially unchanged for the other 3. Again, we repeat the point that even very small improvements in this measure of forecast accuracy at the LSG level can be of real practical importance in informing planning, promotion and logistics with meaningful business revenue impact.
5 Summary Comments
Our case study of revenue modeling at the LSG-Category level for a large grocery chain has extended Bayesian multi-scale dynamic modeling to enable integration of series specific as well as cross-Category discount information in forecasting several hundred multivariate revenue time series. A few summaries here of the much broader analysis exhibit key practical aspects of these joint models for pricing and revenue, and examine features of cross-Category dependencies. Substantial heterogeneity across both LSGs and Categories offers opportunities for multi-scale, aggregate information sharing to improve LSG-Category specific forecasts. The multi-scale signal in this setting is the aggregate state vector information related to discounts for each Category across LSGs, and we find that this can improve multi-step ahead (12-week ahead) forecasts for about half of the main Categories of interest to the company. The baseline dynamic models should be maintained for the other Categories and they already define forecasting advances in relying on LSG-Category predictors generated from pricing and discount information, as well as benefiting from the inherent adaptability over time of Bayesian DLMs. For the LSG-Category cases that do benefit from the multi-scale extension, forecast improvements are practically relevant and some quite large in terms of revenue implications.
There are several avenues for future development in applications and methodology. The company is involved in developing broader evaluation on more extensive data sets including explicit integration of holiday effects in the DLMs. Additional exploration of other types of multi-scale information, for example across groups of similar Categories, is one direction that raises potential for further improvements in forecast accuracy. In particular, alcohol (and other) Categories that are rarely discounted are likely to benefit from information sharing across multiple contextually-related Categories, in addition to across LSGs. Additionally, measures of traffic, such as weekly transactions within an LSG containing items of a given category, might be jointly modeled with category revenue to improve forecast accuracy. Exploring cross-Category dependence is possible with this modeling approach and is of interest for the application, specifically to understand how discounts in one Category impact revenue in another Category. This ties into more formal “What-if?” decision analyses– also known as “scenario forecasting”– to explore, for example, how changes in pricing or promotions for specific Categories leads to changes in revenue in the same or other Categories. The potential for extending this line of thinking to a causal basis, involving real-time experimentation, is clearly an open and interesting area, though as yet not a direction addressed in public-domain R&D linked to this specific study.
We note that the model analyses presented and summarized can be developed by interested readers and potential users based on prototype code available in PyBats (Lavine and Cron, 2020).
- Probabilistic forecasting of heterogeneous consumer transaction-sales time series. International Journal of Forecasting 36, pp. 552–569. External Links: Cited by: §1, §1.
- Bayesian forecasting of many count-valued time series. Journal of Business and Economic Statistics 38, pp. 872–887. External Links: Cited by: §1, §1, §1, §3.1, §3.1, §3.1, §4.2.2.
- Bayesian dynamic modeling and forecasting of count time series. Ph.D. Thesis, Department of Statistical Science, Duke University. Cited by: §3.1.
- A comparative study of linear and nonlinear models for aggregate retail sales forecasting. International Journal of Production Economics 86, pp. 217–231. Cited by: §1.
- PyBats: a Python package for Bayesian Analysis of Time Series and Bayesian forecasting. Note: https://pypi.org/project/pybats/ Cited by: §3.3, §4.2.2, §5.
- Comparison of multiple machine learning models based on enterprise revenue forecasting. In 2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS), pp. 354–359. External Links: Cited by: §1.
Forecasting corporate revenue by using deep-learning methodologies.
2019 International Conference on Control, Artificial Intelligence, Robotics Optimization (ICCAIRO), pp. 115–120. External Links: Cited by: §1.
- Time series: modeling, computation & inference. 2nd edition, Chapman & Hall/CRC Press. External Links: Cited by: §1, §3.2.
- Machine learning for revenue forecasting: a case study in retail business. In 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 201–207. External Links: Cited by: §1.
- High-dimensional multivariate forecasting with low-rank Gaussian copula processes. In Advances in Neural Information Processing Systems, Vol. 32, pp. 6827–6837. Cited by: §3.1.
Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. In Advances in Neural Information Processing Systems, Vol. 32. External Links: Cited by: §3.1.
- The history of forecasting models in revenue management. Journal of Revenue and Pricing Management 15, pp. 212–221. Cited by: §1.
- Bayesian forecasting of multivariate time series: Scalability, structure uncertainty and decisions (with discussion). Annals of the Institute of Statistical Mathematics 72, pp. 1–44. External Links: Cited by: §1, §3.1.
- Bayesian forecasting and dynamic models. 2nd edition, Springer-Verlag, New York, Inc. Cited by: §1, §3.2, §4.2.2, §4.2.2.
- Hierarchical dynamic modeling for individualized Bayesian forecasting (submitted). Note: arXiv: 2101.03408 Cited by: §1, §1.
Additional Joint Pricing and Revenue Forecasting Summaries
Observed covariates and retail price information ((a)) for the Craft/Micro Beers example shown in Section 4.2. This Category is rarely discounted and there is some observed price drift over time. Net Price forecasts for this chosen LSG-Category pair are given in ((b)).
Additional Aspects of Cross-Category Dependence
Figures 27 and 28 display heat-maps of cross-Category correlations of actual revenue and realized 1-week ahead revenue forecast errors with TPR% discount. As in the case of 12-week ahead forecast errors (main paper Section 4.3 and Figure 18) these are computed over the 52-week forecasting test period and averaged across the 9 LSGs.