It is well known from the Computational Complexity Theory that for some NP-hard optimization problems the complexity class can be reduced in cases when discovering exact global optimum is not necessary AVL85 . For example, it can be done when it is possible to find a special heuristic algorithm which solves the optimization problem approximately (with an accuracy up to according to the objective function), but with polynomial complexity Heur2009 . Moreover, for tasks of P complexity class a similar situation also takes place: if it is not necessary to minimize objective function with very high degree of precision, it is often possible to encounter heuristic with a simple structure, allowing as a consequence its widespread application without the necessity to use special mathematical packages. Sometimes this heuristic demonstrates additional benefits, such as high convergence speed as well as parallelizability. However, the convergence conditions analysis even for very simple methods can be far from trivial, while without these conditions application of the method to the real-life problems might be questionable.
In this article we propose to use aforementioned approach to the quadratic transportation problem (QTP) with additional constraints. This particular problem has arisen while working on the task of forecasting two-dimensional hierarchical time-series and was initially posed in RH2014 . In many businesses data is organized in a hierarchical way, and often has several dimensions. For example, sales transactions between a company and its clients might be represented as a two-dimensional hierarchical data structure (cube), one dimension being product dimension and another client one (note that time dimension is excluded from the cube as it is reserved for the forecasting purpose). Both products and clients are organized in hierarchies: e.g. clients are aggregated by geography while products by brands. For the sake of simplicity, we will further assume that both hierarchies are comprised of only two levels: top level (all products or clients) and bottom level (particular products or clients), but in fact, the method we propose can be repeatedly applied for each level of multi-level hierarchies as well.
To forecast the data-cube described above, several approaches are applicable. Clearly, all low-level forecasts can be obtained independently. Unfortunately, low-level time series are often quite volatile and forecasting quality can be poor. Another approach is to forecast upper-level time series, and then prorate obtained forecast to the low levels. All pros and cons of these approaches are described in details in SC2016 . For one-dimensional time series in RH2011 a special combination approach was proposed, where forecasts are created simultaneously at all levels of the hierarchy, and then optimally reconciled using a regression model. In this article, the reconciliation approach is applied to the two-dimensional time series which leads us to QTP as described below.
Let’s consider a structured set of random functions and their partial realizations, which represent actual sales of a company’s products to its clients (the first index indicates client and the second one – product):
Also, additional random functions, strictly dependent on original ones, are taken into consideration:
Suppose that at a particular time we have obtained independent forecasts: , , and . Further on variable in all formulas will be omitted for the sake of simplicity.
In the first step, we need to balance upper-level forecasts, i.e. to make sure that: . In many businesses, this condition is satisfied automatically due to the nature of the top-down forecasting process: forecasts and are in fact not independent but obtained as a shares of forecast . In case this condition is not satisfied, a relatively simple optimization task might be solved to perform optimal balancing. This optimization task will be described in details in our article to follow.
Moreover, it is clear that upper-level forecast cannot differ dramatically from the sum of lower-level forecasts, so we assume that . We now need to find corrected sales forecast matrix , balanced by both rows and columns, while at the same time minimally different from the original matrix . We assume that both actual and future sales are non-negative (negative sales are nothing but returns of the products, which are not taken into account when performing forecasting task). Moreover, often there are products that are not sold to particular clients due to a company policy or other factors, which means that if there are zeros in some positions of the matrix , these zeros should also be present in the same positions of the matrix . As a consequence, we obtain the following quadratic programming problem:
with the following sets of indexes:
2 Numerical algorithm and its convergence analysis
Suppose . For let’s consider the following two-step iterative procedure:
We will further use term "iteration" to designate elementary periodical unit of the aforementioned iterative procedure and term "step" to designate first and second steps inside every iteration. Nevertheless, we will use continuous numbering of the steps through all the iterations, so refer to the steps of the first iteration, – to the steps of the second iteration and so on.
At every step of a particular iteration balancing of the forecast matrix either by lines or by columns is performed. It’s quite clear that restrictions (7-8) in this case are fulfilled automatically. As we start our iterative procedure from matrix , there is hope that solutions, obtained by this heuristic, will provide the objective function (4) with volumes, relatively close to the minimal ones. Mind, that as we could have used other norm (not Euclidean) to define the measure of the difference between the objective matrix and the original one, there is no need to minimize the Euclidean norm exactly. What is indeed important, is for the solution to be in some way close to the original matrix .
matrix does not contain neither zero lines no zero columns,
all the elements of vectorsand are strictly positive,
and for the initial approximation the following is true:
[Proof.] Let’s take (14) as a Lyapunov function Polyak . For any matrix , which belongs to the admissible set (5-6), and only for such matrices, . Let’s show that the results of iterations of the method (10-11) does not move away from the admissible set, that is . For the second step (11) we have the following upper estimate:
For the first step (10) estimation is made in a similar way. The last equality in the previous formula is the consequence of the fact that all elements of matrices are non-negative. The last inequality in this formula might be strict. To understand when exactly it happens, let’s separately consider the expression below
Here we use the fact that sum of the elements of sequence by all at every iteration is equal to zero, and, as a consequence, the set of indexes is divided into two disjoint subsets: , that is the subset of indexes where elements of the sequence are positive, and the subset of indexes where they are negative (both subsets are not empty). But in this case, the set of indexes of the external sum is also divided into two subsets: , where the first subset corresponds with positive internal sums inside the module sign, while the second subset – with negative ones. Let’s show that both these subsets cannot be empty as well:
Let’s now in the expression for add and subtract those 2 terms out of 4, which participate in this expression with preceding minus sign:
The last inequality is correct because for 4 non-negative numbers the following implication is true:
Let’s make the first estimation of the proof stronger, taking into account that :
The last inequality is necessary because we need to obtain iteration-independent estimation. As current division of the sets of indexes into subsets for the iteration is not known, we have to use double minimum by all possible subsets , . This continues the upper estimation.
Suppose now, that the initial approximation is not in the "dead zone" () and is close enough to the admissible set for the condition (12
) to be true. For the oddand arbitrary subsets and we can estimate the following expression:
It’s easy to check that for odd number :
If we continue the obtained estimation by all iterations, we get:
In accordance with Lemma 1, sequence , which satisfy (18), converges to zero with the speed of geometric progression, if the initial approximation satisfy the condition (12), which corresponds with inequality (23) in the lemma.
Let a sequence of positive numbers be given by the recurrence relation
then there does exist a majorizing geometric progression with ratio (between zero and one) so that:
if for the initial term of the original sequence following conditions are satisfied:
[Proof.] Let’s rewrite recurrence relation (19) in the following way:
where is 1st order finite difference operator on an arbitrary numeric sequence . It can be shown that this operator has the following property:
where . Let’s now apply finite difference operator to the equation (24):
After summing in (27) from 1 to and reducing on the left side the same terms we get:
In accordance with Lemma 2 majorizing geometric progression will exist, if
for all . In this case all the minorants are equal to constant and inequalities (35) are fulfilled, the last one as a consequence of (22). Let’s designate expression in the left part of (29) as and calculate for it an estimate from below:
The last inequality is fulfilled only in case the initial term was selected not too far from zero:
Performed above simplification is based on (21) and on definition of the function . Let the ratio of the geometric progression be defined by the following parameter
then, according to (30), for the initial term to be close to zero is necessary:
where auxiliary function has, taking into account (21), the following form and properties:
Let the sequence of functions be positive and does not exceed 1 in the respective domains
In case there does exist minorizing sequence of functions so that
monotone decreasing by the set of its arguments, that is
then for recurrently defined positive numeric sequence
there does exist a majorizing sequence :