1 Introduction and motivation
1.1 Problem formulation
Ferson, Ginzburg, Kreinovich, Longpré and Aviles  studied the pair of optimization problems
where and are given input data and
It is obvious that Eq. 1 is a convex quadratic program (CQP) solvable in polynomial time, while Eq. 2 is easily proven to be NP-hard. It is worth noting that a general CQP solver yields a weakly polynomial algorithm for Eq. 1, but Ferson et al.  introduced a strongly polynomial method.
1.2 Summary of results
In this text we focus on the NP-hard case Eq. 2, called MaxVariance, and the FGKLA algorithm. Our contribution is twofold.
Improving the worst-case complexity of the FGKLA algorithm
Proving a “good” behaviour in a probabilistic setting
Then we treat the input data as random variables. We introduce a natural and fairly general probabilistic model (details are in Section 3), under which we show that
on average, the algorithm works in time
the probability that the algorithm computes in exponential time tends to zero faster than exponentially with . In other words, we show that the “hard” instances occur indeed rarely.
More specifically: 1 we prove that under the probabilistic model it holds
1.3 Motivation from statistics
. The statistical motivation is as follows: we are interested in sample varianceof a dataset . However, the data is not observable. What is available instead is a collection of intervals , , such that (for example, instead of the exact values we have rounded versions only). Then, cannot be computed exactly, but we can get tight bounds for in the form Eqs. 2 and 1. In econometrics, this phenomenon is sometimes called partial identification .
1.4 Related work
In general, this paper contributes to the analysis of complexity of optimization problems and algorithms when input data can be assumed to be random, drawn from a particular distribution or a class of distributions. As a prominent example recall the famous average-time analysis of the Simplex Algorithm , , where the phenomenon “exponential in the worst case but fast on average” has been studied since 1980’s.
The phenomenon is particularly interesting in case of NP-hard problems since the exponential time at worst case seems to be unavoidable. From the areas related to our work, we mention average-case complexity studies of the well-known NP-complete -clique problem: Rossman  derived the bounds of average-case complexity of the -clique problem on monotone circuits. His results were subsequently followed by Fountoulakis et al.  in a study whether the “hard” instances occur frequently or rarely under a probability setup. The result is in some sense similar to our one: if the probability of edge between two vertices comes from “natural” distribution function, then the deterministic algorithms for -clique problem work in polynomial time with “high” probability, i.e. “hard” instances occur with probability smaller than any nonnegative polynomial in the number of vertices.
2 FGKLA Algorithm
Recall that the input instance is given by the pair and . Compact intervals will be denoted in boldface, e.g. . For define
The numbers are referred to as center and radius of , respectively, and is called a narrowed interval (i.e., shrunk by factor around its center). For we define (the mean of ).
Our version of the FGKLA algorithm is summarized as Algorithm 1. The main result of this section is Theorem 1. In particular, it improves the worst-case complexity bound from  (see also Remark 1). The proof of Theorem 1 will be given in Section 2.1.
Theorem 1 (properties of the FGKLA algorithm (Algorithm 1))
The FGKLA algorithm correctly solves (2).
Let be an undirected graph where if and only if (here, ). Let be the size of the largest clique in . Then, FGKLA algorithm works in time .
The graph from Theorem 1 is referred to as FGKLA intersection graph with data .
2.1 Idea of the FGKLA algorithm
Since the quadratic form is positive semidefinite, the maximum of Eq. 2 is attained in a vertex (also called extremal point) of the feasible set
Lemma 1 ()
Let and let there exist such that
for all and
one of the following is satisfied:
, and ,
, and .
Let be a maximizer and .
Let be the set of all vectors
be the set of all vectorssatisfying:
if , and
Then contains a maximizer.
In cases (a) and (b) we say that variable (or index ) is fixable with respect to ; in case (c), variable (or index ) is free with respect to .
Algorithm 1 works as follows. It builds the set (creftype 1) containing all endpoints of the narrowed intervals , and denotes them (creftype 2). Consider the set of all possible means. The endpoints from divide the interval into at most regions
Thanks to Corollaries 1 and 1, every region contains means with the same set of free indices. For a region , we denote this set by , i.e. . The set of endpoints contains the worst possible mean values with respect to the number of free indices. More precisely: for every , , all indices from are free.
Then, Algorithm 1 takes means one by one. For every mean, say , it takes the set of indices of narrowed intervals beginning in and inserts it to the set of free indices with respect (creftype 7). Indices are fixable with respect to . This yields candidate extremal points that are examined by Algorithm 2, called on creftype 8.
Then, indices from the set of narrowed intervals ending in are removed from . Intervals with these indices will be fixed to the lower endpoint for every upcoming (creftype 9 of Algorithm 1). The update of and will be explained later.
Algorithm 2 consecutively traverses all extremal points (vertices of ) resulting from fixing either or for the free indices . For every such vertex, say , the variance is computed. To make these computations cheap, the traversal of is performed in a way that two successive extremal points differ in just one component. Then Lemma 2 shows how to get from in arithmetic operations. The variance is stored indirectly as variables and ; they can be easily updated when is switched to , or vice versa.
For , we have , where and . Furthermore, if differs from in just one component, say th, then
Algorithm 2 is an adaptation of the algorithm from [10, pg. 37] for enumeration of elements of the set for a given . The enumeration can in general start from an arbitrary element. The proof of correctness can be found therein. In our variant, the variable indicates the current extremal point. In every iteration of while cycle, some is set to . The th index is taken from (here we consider as a list rather than a set) and is switched to the other endpoint. For this new extremal point, and are updated (creftype 8) and the resulting variance is compared to the best value found so far (creftype 9).
Finally, creftype 9 of Algorithm 1 removes intervals ending in from . These intervals are going to be fixed to their lower endpoints in the following iterations. Since they are at the upper endpoint now, creftype 9 updates and accordingly.
Proof (Proof of Theorem 1)
Correctness. Let be a maximizer of Eq. 2. Since the maximum is attained in a vertex of the feasible set , we can assume for all . Moreover, thanks to Corollary 1, we can assume for every such that and for every such that . Put all other indices to set , i.e. . Set . Consider the set processed by Algorithm 2 in th iteration of Algorithm 1. By construction, . Hence, the maximizer is among the examined extremal points.
Complexity. On creftype 2, the algorithm sorts numbers with complexity . Algorithm 2 is called at most times, where . Recall that is the size of the maximal clique of the FGKLA intersection graph. In the th iteration of the for cycle on creftypeplural 6 to 10 of Algorithm 1 we have . Thus .
Algorithm 2 performs exactly iterations of the while cycle on creftypeplural 2 to 11. Inside its iteration, there is the for cycle on creftypeplural 3 to 6. The amortized time complexity of this for cycle is , because in its iteration it either sets some nonzero to or stops iterating. Since is set to a nonzero value only times, the overall time of all courses of the for cycle is .
The amount of work in the remaining steps is negligible. In particular, note that since are pairwise disjoint sets (the same holds true for ), the total number of iterations of for cycles on creftypeplural 9 and 7 is at most during the whole course of FGKLA algorithm.
The overall complexity is .
Aside of the implementation details (which are however important for the reduced time complexity bound), our formulation of the algorithm differs from the original paper  also for another reason. The original formulation can lead to complexity , for example if and if there are narrowed intervals ending in some and further narrowed intervals starting in . However, a minor modification of the original formulation would be sufficient to achieve the time .
3 A probabilistic model where the FGKLA algorithm works in time on average
This section is devoted to the main result: on average, FGKLA algorithm works in “almost linear time” and the cases when it computes in exponential time occur extremely rarely.
Here we use the statistical motivation of the problem as described in Section 1.3. Namely, in statistics, data are often assumed to form a random sample from a certain distribution. This is exactly our probabilistic model: we assume that both centers of the intervals and their radii form two independent random samples from fairly general classes of distributions.
Assumption (the probabilistic model)
are independent and identically distributed (“i.i.d.”) random variables with a Lipschitz continuous cumulative distribution function (“c.d.f.”). That is, there exists a constant such that
are i.i.d. nonnegative random variables with a finite moment of orderfor some . In other words, we assume
We assume that the pair of random variables , are independent.
The size of the largest clique of the FGKLA intersection graph with data has the following properties:
for a sufficiently large ; here ,
for every there is an such that if .
Corollary 2 (main result)
The average computing time is
for an arbitrarily small . Moreover, the exponential case, when is linear in , occurs with probability (i.e., with probability even smaller than exponential).
The assumption on the distribution of radii is very mild; indeed, we need just something a little more than existence of the expectation (we even do not need finite variance). On the other hand, Lipschitz continuity of (Item 1) is unavoidable; we will show what can happen without Lipschitz continuity in Section 4. We will also discuss there what happens when we relax the independence assumption (Item 3) and what is the cost for dependence paid by existence of higher-order moments.
In Eq. 7 we have imposed a technical condition . This is not restrictive since the interesting cases are those with . Indeed, the difficult case is when is close to zero (“radii can be large with a high probability”), (“radii are large on average”) and (“the density of centers can have high peaks”, or “many centers can be close to one another”).
We have also introduced a technical condition in Item 2. Again, this is not at all restrictive — for example, if a distribution of reader’s interest has finite high-order moments, it must also have a finite moment of low order .
3.1 Proof of Theorem 2
Let and let be the probability that is an edge of the FGKLA intersection graph. That is,
Observe that does not depend on by the i.i.d. assumptions.
Given a random variable
, its probability density function (“p.d.f.”) is denoted by.
We have for every .
Observe that the p.d.f. of exists because the c.d.f. is Lipschitz continuous (and thus absolutely continuous). Recall that is the corresponding Lipschitz constant. The Lipschitz condition also implies for all . By symmetry, . Using independence of , the symmetry of around zero and the Convolution Theorem we get
where is the 0-1 indicator of . Now .
Recall that the number has been introduced in Eq. 7.
Recall that is the value of the th moment of for some . The well-known Minkowski inequality tells us
By Markov’s inequality we get
Using the Law of Total Probability and independence ofand we get
Let us introduce indicator variables ():
Obviously, almost surely (“a.s.”) if . If , then is alternatively distributed with parameter . Moreover, the variables
are independent (this is an important point). Putting
Now we can use an estimate based on (a kind of) Penrose’s method, see, Chapter 6. The crucial observation is
Indeed, if the FGKLA graph has a clique of size containing vertex , then at least indicators from Eq. 9 are one.
Lemma 5 (Tail bound for the binomial distribution [9, p. 16])
If and , then
where was defined in Eq. 7 and .
for a sufficiently large .
It is easy to verify that is increasing for . If (see also Remark 4)
Continuing with estimate Eq. 11 with , we get
Property Eq. 10 and Lemma 4 imply the correctness of Eq. 16 if is sufficiently large. In Eq. 15 we used the fact that are identically distributed (but not independent) and the Bonferroni inequality for any events . It remains to prove Eq. 17.