The moment problem is a classical problem in analysis and optimization, with roots dating back to the middle of the nineteenth century. At that time, the goal there was to seek to bound tail probabilities and expectations with given distributional moment information. Pursuing this initial goal remains active to the present day. For example,Bertsimas and Popescu  provides tight closed form bounds of
with given first three moments of a random variable. He et al.  extends the problem for random variables given first, second and forth order moments, which also provided the first nontrivial bound for .
Beyond these foundational questions, the moment problem serves as an important building block in a variety of applications in the stochastic and robust optimization literatures [44, 42, 48, 47, 51]. In particular, moment problem are foundational to distribution-free robust optimization, where insight into the structure of optimal measures can be used to devise algorithms and describe properties of optimal decisions. A classic example of this approach is due to Scarf et al.  who leverages the fact that an optimal solution to the moment problem given the first two moments is a sum of two Dirac measures. This insight provides an analytical formula for the optimal inventory decision in a robust version of the newsvendor problem. There is a vast literature on robust optimization that builds on these initial insights in a variety of facets (see, for instance, [20, 16, 27, 4, 11, 36, 18, 14, 28] among many others).
The focus of this paper is the discrete moment problem, an important special case of the general moment that is less well-studied in the literature. In the discrete moment problem, the underlying sample space is a discrete set. The work of Prékopa (see for instance Prékopa 
) made a fundamental contribution by devising efficient linear programming methods to study discrete moment problems. These approaches remain state-of-the-art and has seen application in numerous areas including project management and network reliability . Project management has also been studied in the robust optimization (see, for instance, ).
In classical versions of the moment problem (including the works by Prékopa and his co-authors just cited), the only constraints arise from specifying a finite number of moments. One criticism of this approach is that it can result in bounds and conclusions that may be too weak to be meaningful, or in the case of robust optimization with only moment constraints, result in decisions that are too conservative. For instance, Scarf’s solution for the newsvendor problem may even suggest to not order any inventory even when the profit margin is high . This has driven researchers to introduce additional constraints, including those on the shape of the distribution. For example, Perakis and Roels  study the newsvendor problem leveraging non-moment information, including symmetry and unimodality. Han et al.  study the newvendor problem relaxing the usual assumption of risk neutrality. Saghafian and Tomlin  analyze the problem with the bound of tail probability and Karthik et al.  recently developed closed-form solutions under asymmetric demand information. In all cases, more intuitive and less conservative inventory decisions result, when compared to the classical setting with moment information alone. Other robust optimization papers that consider shape constraints include Li et al.  who study the chance constraints and conditional Value-at-Risk constraints when the distributional information consists of the first two moments and unimodality, Lam and Mottet  who study tail distributions with convex-shape constraints, and Hanasusanto et al.  who study the multi-item newsvendor problems with multimodal demand distributions.
However, introducing shape constraints brings new theoretical challenges. A seminal paper by Popescu  provides a general framework for studying continuous moment problems under shape constraints that includes, among others, symmetry and unimodality. These moment problems are formulated as semi-definite programs (SDPs) that are polynomial time solvable. Perakis and Roels  also employ Popescu’s framework to provide analytical robust solutions to the newsvendor problem under shape constraints that are better behaved than classical Scarf solutions. For the discrete moment problem, we are aware of only one paper  that considers shape constraints. Subasi et al.  adapt Prékopa’s linear programming (LP) methodology to include unimodality, which is modeled by an additional set of linear constraints.
Both Popescu  and Subasi et al.  illustrate how a certain class of constraints can be adapted into existing computational frameworks, SDP-based in the case of Popescu and LP-based in the case of Subasi et al.. However, there remains relevant shape constraints that are practical significant and do not naturally fit into these settings. In this paper we focus on two shape constraints: log-concavity (LC) and the increasing failure rate (IFR) property of discrete distributions (these are defined in Section 2 below). Here, we briefly highlight the importance and the applications for each class of distributions.
LC measures arise naturally in many applications. For example, Subasi et al. 
illustrate how the length of a critical path in a PERT model where individual task times are described by beta distributions has a LC distribution but its other properties (other than moments inferred by the beta distributions) are unknown. Log-concavity has a wide range of applications to statistical modeling and estimation, e.g., Duembge et al.  show how the log-concavity allows the estimation of a distribution based on arbitrarily censored data (which is a common form of data for demand observations). The log-concavity also plays a critical role in economics . For example, in contract theory, one commonly assumes that an agent’s type is a LC random variable . The log-concavity of a distribution function has also been widely used in theory of regulation [6, 34], and in characterizing efficient auctions [39, 37].
IFR distributions also play an important role in numerous applications in fields as wide-reaching as reliability theory , inventory management , revenue management  and contract theory [15, 33]. One reason for the prevalence IFR distributions in applications is that IFR distributions are closed under sums of random variables (and the associated convolutions of distribution functions). This is not the case for the shape properties studied by others, including symmetry and unimodality. The IFR property is useful in applications for simplifying optimality conditions to facilitate the derivation of properties of optimal decisions that yield managerial insights.
In Section 2 we show that the standard characterizations of discrete LC and IFR distributions, when added to the moment problem, make the resulting problem nonconvex and thus not amenable to either an SDP or LP formulation. Indeed, when Subasi et al.  derive LC distributions in their applications, they relax the LC property to unimodality, a shape constraint that can be approached by LP techniques.
At this point, one could turn to approximation methods, including conic-optimization techniques to solve the resulting nonconvex formulation. It is well known that the copositive cone and its dual are powerful tools to could convert nonconvex problems equivalently into convex ones (see, e.g., [40, 53, 12, 22]). For instance, the LC discrete moments problem considered here can be cast as a completely positive conic problem . Despite this convexity, the resulting problem is still computationally intractable and further relaxation is required to obtain an approximate solution.
We do not follow an approximation approach. The nonconvexities that arise in our problems are of a certain type that can be leveraged to provide an exact global optimization algorithm and analytical results on the structure of optimal solutions. Indeed, the feasible regions have reverse convex properties (as introduced in  and later developed in  among others). A set is reverse convex if its complement is convex. Reverse convex programming is a little-studied field that has largely found application in the global optimization literature (see, for instance, Horst and Thoai ). To our knowledge, this theory has not been applied in the robust optimization literature.
In Section 3 we extend standard results in the reverse convex programming literature (in particular, those of ) so that they are applicable to our setting by introducing the notion of reverse convexity relative to another set. The main benefit is that we can show reverse convex programs of this type have the following appealing structure — there exist optimal extreme point solutions with a basic feasible structure analogous to basic feasible solutions in linear programming. The basic feasible structure reveals (in Section 4) that optimal extreme point distributions in the LC and IFR settings have piecewise geometric structure. This analytical characterization allows for solving these moment problems as low-dimensional systems of polynomials equations. We propose a specialized computation scheme for working with such systems. This allows us to provide numerical bounds on probabilities that are tighter than those in the existing literature, including those bounds that leverage unimodal shape constraints (see Section 5). All proofs not in the main text are included in the appendix.
Summary of contributions
The main focus of the paper is on theoretical properties of LC and IFR-constrained moment problems, where we provide structural results on optimal solutions. For the LC case we show there exists optimal solutions that are piecewise geometric, and for the IFR case we show the tail probabilities of optimal distributions are piecewise geometric.
Our structural results and computational approaches suggest a wide range of applications due the prevalence of these classes of shape-constraints in real applications, as discussed above. Our results can provide new bounds on tail inequalities (i.e., ) for a random variable under moment and shape constraints. We provide a numerical framework for computing these bounds.
Moreover, the techniques developed in this paper allow us to solve an inner maximization problem with LC and IFR constraints. Our structural results could prove useful in solving the outer minimization problem in a robust optimization framework. Indeed, solution approaches to the standard max-min robust optimization formulation benefit greatly when the inner maximization problem has analytical structure.
Finally, we prove a new result on a generalized form of reverse convex optimization (Theorem 3.6) that may be of independent interest, with potential applications to other nonconvex optimization problems.
We use the following notation throughout the paper. Let denote the set of real numbers and
the vector space of-dimensional real vectors. Moreover, let denote the set of -dimensional vectors with all nonnegative components and denote the set of -dimensional vectors will all positive components. The closure of the set in (in the usual topology) is denoted and its boundary by . Let denote the expectation operator and the indicator function of set .
Let denote the set of consecutive integers, starting with integer and ending with integer . Similarly, let . We will not have occasion to use and in their usual sense as intervals in , so there is no chance for confusion. For , positive integers, denotes the binomial coefficient of choose ; that is, it counts the number of ways to choose -subsets of objects.
2 The discrete moment problem with nonconvex shape constraints
We study the classical problem of moments with moments (cf. ):
where is a subset of measures on the measurable space (with elements denoted by ) with -algebra , is a measurable function and for . We take to ensure that is a probability measure and the remaining constraints correspond to requiring the measure has as its first moments.
Our focus is where is a finite set of real numbers and is the power set of . In fact, we assume that and so (however, see Section 5.1 where we have occasion to rescale the ). In this setting, a measure can be represented by a nonnegative -dimensional vector where and for . We will refer to the vector as a distribution and often suppress the measure that it represents. This yields the following discrete moment problem (DMP):
Definition 2.1 (cf. Definition 2.2 in ).
A distribution is discrete log-concave (or simply log-concave or (LC)) if (i) for any such that then ; and (ii) for all , . We let denote the class of all LC distributions.
More precisely, (i) implies that for every LC distribution there exists a consecutive support for some such that for and otherwise. For an LC distribution with support we must then ensure holds for . At all other the inequality is trivial because at least one of or is zero.
Definition 2.2 (cf. Definition 2.4 in ).
A distribution has an increasing failure rate (IFR) if the failure rate sequence is a non-decreasing sequence; that is, for all . We let denote the class of all IFR distributions.
It is well-known that is a strict subset of . It is relatively straightforward to see that the sets and are nonconvex. However, they share one additional common feature that is critical to our approach.
A set in is reverse convex if for some convex set . A set is said to be reverse convex with respect to (w.r.t) a set if for some convex set .
In the remainder of this section we show that the problem (DMP) when setting be or all have constraints that are reverse convex w.r.t. . This common fact is leveraged to solve these related problems to global optimality in a unified framework.
The seemingly more or less straightforward generalization to reverse convexity w.r.t. , however, could lead to a significantly different analytical properties. For example, observe that if a function is quasiconcave (over ) then its lower level sets are reverse convex. However, a function whose lower level sets are reverse convex w.r.t. some strict subset of need not be quasiconcave. In Section 3 we show that problems with reverse convex structure can be approached using a novel optimization technique that extends the pioneering work of .
2.1 The moment problem over log-concave distributions
Consider problem (DMP) when . We separate the optimization over into first determining a support (mapping to condition (i) of Definition 2.1) and then introducing inequalities of the form for in that support (mapping to condition (ii) in Definition 2.1). This yields the two-stage optimization problem: equationparentequation
The strict constraints (2d) make the feasible region appear not to be closed. However, the following reformulation of (DMP-LC) reveals that the feasible region can be described with non-strict inequalities and is thus closed: equationparentequation
In (DMP-LC’) there is no outer maximization over the support between and .
Problems (DMP-LC) and (DMP-LC’) are equivalent.
The set is convex for any positive integers and .
Whereas the set is convex, the set where nonnegativity is relaxed – that is, – is not convex. Indeed, and are in but is not in . This means that is not quasiconcave on its domain.
2.2 The moment problem over increasing failure rate distributions
Consider problem (DMP) with . The following result illustrates a tight connection between the IFR case and the LC case. This result is known in the continuous case (see [5, Chapter 2]), we provide details for the discrete analogue that is the focus of this paper.
A distribution has an increasing failure rate if and only if its tail probability sequence is log-concave, where .
In the IFR case, (DMP) becomes equationparentequation
where are set to . Constraint (5c) captures the log-concavity of the tail probabilities and (5d) captures the non-increasing property of tail probabilities. There is no need to consider an outer optimization over supports and use strict inequalities to capture the property of consecutive supports. The consecutiveness of supports is immediate from the monotonicity condition of the . Indeed, once for some then for all by monotonicity.
3 A special class of nonconvex optimization problems
In this section we present a general class of problems that includes all the problems introduced in Section 2 as special cases. This class admits optimal extreme point solutions that are determined by setting a sufficient number of inequalities to equalities. This result is reminiscent of linear programming where extreme points have algebraic characterizations as basic feasible solutions.
Our analysis proceeds in two stages. First, we discuss a broad class of optimization problems that have optimal extreme point solutions. Second, we specialize this general class to a class of nonconvex optimization problems where the source of nonconvexity arises from reverse convex sets (see Definition 2.3). This work extends some of theory on reverse convex optimization, initiated by  but tailors these results to the discrete moment problem. To our knowledge, these results are not subsumed by others in the existing literature.
3.1 Linear optimization over (nonconvex) compact sets
Let us first consider a very general optimization problem:
where is a lower semicontinuous and quasiconcave function and is nonempty and compact (closed and bounded) subset of . It is worthwhile to note that the results in this section can be generalized to any locally convex topological vector space in the sense of Aliprantis and Border [1, Chapter 5]. This is not required for the study of the discrete moment problem, but is potentially relevant for an exploration of the continuous case that follows a similar line of inquiry.
The goal of this subsection is to prove the following:
There exists an optimal solution to (6) that is an extreme point of .
Recall that an extreme point of is any point where the set of such that for some is empty. Let denote the extreme points of the set . The special case to Theorem 3.1 where is convex well-known and immediate from Aliprantis and Border [1, Corollary 7.75]:
If is compact and convex then (6) has an optimal extreme point solution.
The proof when is not convex takes a couple more steps. The first step is to work with the closed convex hull of , which is the intersection of all closed convex sets that contain .
(Theorem 5.3 in ) The closed convex hull of a compact set is compact. In particular, is a compact convex set.
The following lemma helps us to leverage these results about closed convex hulls to learn about the original problem (6).
Let be a compact subset of . Then .
3.2 Reverse convex optimization problem with nonnegative constraints
The following lemma captures the essence of reverse convex optimization and serves as motivation and a visualization tool for understanding our main theoretical result below (see Theorem 3.6).
Consider the optimization problem
where is a lower semicontinuous and quasiconcave function, , and the are closed, reverse convex sets such that is a nonempty and compact subset of . Then there exists an optimal solution that lies on the boundary of at least of the sets .
Lemma 3.5 extracts some ideas from existing results (particularly from [25, Theorem 2]) and presents them in a clean, geometric form. To facilitate the understanding of this lemma, we further provide an intuitive graphical illustration in Figure 1.
Despite its elegance, this lemma is insufficient for our purposes. First, it only applies when the are reverse convex. The argument breaks down if the are reverse convex w.r.t. another convex set , as needed for the problems in Section 2. In particular, when the convex set is a polytope, even though we can use as a reverse convex set to replace , it contains the boundaries from the original polytope which are undesirable for analyzing the extreme optimum solutions. Second, the conclusion only provides a lower bound on the number of boundaries an optimal solution lies on. Although sufficient for the LC case, a strengthening that leverages the concept of linear independence — familiar from the analogous linear programming result [9, Theorem 2.3] — is needed for the IFR case.
As to the second insufficiency, a standard setting in reverse convex optimization is to consider a feasible region
and assume properties on the functions . These properties typically include differentiability assumptions (so that gradients are defined) and some form of concavity (the weakest being quasiconcavity). Under these concavity assumptions, the lower-level sets of are reverse convex and Lemma 3.5 applies so that extreme points are determined by a minimum number of tight constraints of the form . Unfortunately, those results do not apply in our setting. Indeed, the discrete moment problems we consider here does not involve quasiconcave functions, instead functions whose lower level sets are reverse convex w.r.t. the nonnegative orthant.
These considerations motivate us to establish a more general theory of reverse convex optimization. In particular, we analyze the following problem
where and the are functions from to and is an by , and for , the set is reverse convex w.r.t. the nonnegative orthant .
We make the following additional technical assumptions on (3.2):
The objective function is continuous and quasiconcave,
The matrix is full-row rank,
For each , is differentiable, and
The feasible region is nonempty and compact.
We also need the following notation to state the main theorem of this section. For any feasible solution to (2.3) let denote the support of ; that is, . Let denote the -th row of the matrix and the -th column. For any subset of (for instance, the support of a feasible solution), let . That is, is the submatrix of consisting the columns indexed by . Recall that denotes the -th row of the matrix and let denote the span of the rows of . Finally, let denote the gradient of at , where is the gradient of restricted to the components in the subset .
Moreover, for any extreme point optimal solution , of the following inequalities
In addition, letting , if we further assume that for all the tight constraints with one has , then there are of the vectors are linearly independent, where is the unit vector with in the th component and otherwise.
Theorem 3.6 is the main theoretical result in this paper. The proof largely follows the geometric intuition captured in Figure 1. At its core, it involves defining separating hyperplanes and inscribing a polyhedral set inside the feasible region. Then, the equivalence of extreme points and basic feasible solutions for the polyhedron is leveraged to establish the result.
However, the proof has additional technical challenges. It must make sense of how inequalities that describe the orthant interact with the gradients of the constraint functions . Moreover, the affine equality constraints , that correspond to the moment conditions in (1), force us to work within the affine space defined by these constraints for much of the proof. Finally, we require a spanning condition of the gradients to ensure that the full analysis can be captured in that space.
Proof of Theorem 3.6.
Since is continuous and quasiconcave and is a compact set, then by Theorem 3.1, there exists an optimal extreme point solution. For any such optimal extreme point with support define
Then the feasible region includes . Let and denote
Our goal is as follows. For , we want to construct sets of the form
is a subset of , where and will be specified later. As long as , since is an extreme point of , it is an extreme point of as well. Note that is defined by a number of linear equalities and inequalities, then there must exists of them that are tight at point , and we can further check which constraint is tight.
We now construct such a . Let and
A key property of is that it admits a strong separation property useful for our arguments (see Claim 1 below). To describe this property, we explore a related set in a smaller subspace. Construct matrix such that its columns span the whole null space of ; That is and . Then, we have that
we can define the “strong separation” property of as follows.
(Strong separation) For all there exists and such that
Moreover, if we further assume then for all .
since . In other words, we need to prove that implies .
|, , and obtained by strong separation|
|, , are obtained weak separation|
In case (i), according to Claim 1, there exists some and such that
By letting and , one has . Moreover, from definition (7) of , any satisfies and
In case (ii), again by Claim 1, we have , where if . Then we can take and in (7). Obviously, and for any . Similarly, we can argue that such a does not belong to due to the violation of the constraint . Then it follows that for all .
So far, we have constructed in the form of (9) as in Table 1 and based on (8). Moreover, we have shown . Since is an extreme point of and lies both in and , it is an extreme point of as well. Note that is defined by a number of linear equalities and inequalities, then there must exists of them that are tight and linear independent at point , by standard theory, e.g. [9, Theorem 2.3].
Since is an by matrix of rank , there are tight constraints from
where if . Now let’s investigate which constraint in the above could be tight. First of all, it is obvious that is tight for all and could not be tight for all . Then for the constraint such that , since , cannot be tight. Finally, recall we have proved in the previous discussion that for all such that . That is, when , it holds that and thus the corresponding constraint cannot be tight at . In summary, all tight constraints come from
which implies of the inequalities
in (3.2) are tight. Moreover, when for all , in (14) and these tight constraints are linearly independent. In other words, the set of vectors are linearly independent, where is the gradient of the constraint . This completes the proof of Theorem 3.6. ∎
4 Characterizing optimal extreme point solutions in the discrete moment problem
Theorems 3.1 and 3.6 are powerful tools for analyzing the moment problems we discussed in Section 2. They will allow us to characterize the structure of optimal extreme point solutions. In the following two subsections we analyze the LC and IFR distributions cases from Sections 2.2 and 2.1. There is a general pattern to our analysis, which we briefly describe here.
Each problem has two alternate formulations, with one indicated by a “prime”. In the LC case these two formulations are (DMP-LC) and (DMP-LC’). The “prime” formulation has a closed and compact feasible region which allows us to leverage Theorem 3.1 to show the existence of an optimal extreme point solution . With in hand, we apply Theorem 3.6 to a small adjustment of the “non-prime” formulation that replaces strict inequalities with non-strict inequalities based on the support of . Theorem 3.6 implies that a certain number of constraints are tight, including some number of the reverse convex constraints (for instance, (2c) in (DMP-LC)). Making these constraints tight determines the structure of the optimal extreme point solutions. In the LC case, a piecewise geometric structure is obtained.
Every feasible instance of (DMP-LC) has an optimal extreme point solution. Moreover, every optimal extreme point solution has the following structure: there exist (i) integers and for with where is the support of and (ii) real parameters , for such that
That is, there exists an optimal solution to (DMP-LC) that has a piecewise geometric structure with (at most) pieces.
Consider the (DMP-LC’) representation of the problem. The zeroth order moment constraint ((3b) for ) is , which, along with the nonnegative constraints (3d), implies the feasible region of the problem (DMP-LC’) is compact. Then by Theorem 3.1, there exists an optimal extreme point solution to (DMP-LC’) and thus also (DMP-LC) since these problems are equivalent (via Proposition 2.4).
Let be any extreme optimal solution and for simplicity we assume its support is (the general case of suppose with follows analogously). Note that when , there are at most points in the interval , where each point , could be viewed as a single piece and the conclusion readily follows. Therefore, in the remainder of the proof we assume .
Let and define the following problem: equationparentequation
Note that (16) is a restriction of (DMP-LC) with a given support and replacing the strict inequalities in (2d) with non-strict inequalities in (16d). Note also that is an extreme optimal solution to (DMP-LC) and it is feasible to (16), hence is an extreme optimal solution to (16).
To uncover the structure (15) of we apply Theorem 3.6. Convert the constraint as a nonnegative constraint to mimic the nonnegativity constraint of (2.3) by making a change of variables to arrive at the following equivalent form: equationparentequation
Observe that is an optimal extreme point solution of (17).
We now verify that (17) satisfies the conditions of Theorem 3.6. Again, the zeroth order moment constraint guarantees the feasible region is compact. Let for . Here the index plays the role of index in Theorem 3.6. Note that (the index of the constraint functions) need not be tied to (the index of the decision variable components) in a general application of Theorem 3.6. As we have shown in Proposition 2.5, the set is convex, and it is an easy extension that and this implies that is convex. This implies that all of the conditions in Theorem 3.6 are satisfied when applied to (17).
Since the constraints cannot be tight at point for , this application of Theorem 3.6 implies that at least of the (17c) constraints are tight at , or equivalently there are at most of the (16c) constraints that are not tight at in (16c). These non-tight indexes can divide the interval into at most pieces, and within each piece we have , where , are the left and right endpoint of piece of the domain. It is a standard observation to note that such a system implies for . Setting yields the form (15). ∎
The proof of Theorem 4.1 does not use the linear independence conditions of Theorem 3.6. A basic count of tight constraints is able to deliver the piecewise geometric structure, since the number of constraints in problem (DMP-LC) for a given support is small compared to the number of variables. Consider support in (DMP-LC). Theorem 3.6 implies that of the constraints in (2c)–(2d) are tight. Since all constraint in (2d) are strict (this is handled carefully in the proof) this implies all tight constraints are from (2c), which are of the form . Setting of these constraints to equality directly yields the geometric structure (15).
4.2 Increasing failure rate
Recall the formulation (DMP-IFR’) of the IFR moment problem in Section 2.2 with . We will show that the optimal solution has similar structure as the log-concave case, again using Theorems 3.1 and 3.6.
Here we notice two facts. First, by the log-concave constraint and the non-increasing property of , any feasible solution has a consecutive support naturally, and the support starts from . This is different from the log-concave case. Second, if there is some such that , this combined with the constraint indicates that we have . However, we also have in the problem’s constraints. This means that implies . Then by induction we have .
Combine the two facts above, the interval can be divided into three consecutive parts: , where we have , , , i.e., an all-one interval, a strictly decreasing interval, and an all-zero interval. Further, the optimal solution in the middle interval has a more detailed characterization stated here.
Every feasible instance of (DMP-IFR’) has an optimal extreme point solution. Moreover, for every optimal extreme point solution , there exist integers such that when , when . The interval can be divided as follows. There exist (i) integers and for with