# Toward breaking the curse of dimensionality: an FPTAS for stochastic dynamic programs with multidimensional actions and scalar states

We propose a Fully Polynomial-Time Approximation Scheme (FPTAS) for stochastic dynamic programs with multidimensional action, scalar state, convex costs and linear state transition function. The action spaces are polyhedral and described by parametric linear programs. This type of problems finds applications in the area of optimal planning under uncertainty, and can be thought of as the problem of optimally managing a single non-discrete resource over a finite time horizon. We show that under a value oracle model for the cost functions this result for one-dimensional state space is "best possible", because a similar dynamic programming model with two-dimensional state space does not admit a PTAS. The FPTAS relies on the solution of polynomial-sized linear programs to recursively compute an approximation of the value function at each stage. Our paper enlarges the class of dynamic programs that admit an FPTAS by showing, under suitable conditions, how to deal with multidimensional action spaces and with vectors of continuous random variables with bounded support. These results bring us one step closer to overcoming the curse of dimensionality of dynamic programming.

## Authors

• 1 publication
• 8 publications
• ### Approximate Dynamic Programming with Neural Networks in Linear Discrete Action Spaces

Real-world problems of operations research are typically high-dimensiona...
02/26/2019 ∙ by Wouter van Heeswijk, et al. ∙ 0

• ### Multi-Step Greedy and Approximate Real Time Dynamic Programming

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...
09/10/2019 ∙ by Yonathan Efroni, et al. ∙ 0

• ### Low-rank State-action Value-function Approximation

Value functions are central to Dynamic Programming and Reinforcement Lea...
04/18/2021 ∙ by Sergio Rozada, et al. ∙ 0

• ### An Approximate Dynamic Programming Approach to The Incremental Knapsack Problem

We study the incremental knapsack problem, where one wishes to sequentia...
10/15/2020 ∙ by Ali Aouad, et al. ∙ 0

• ### Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in A Continuous State Space

Slowly changing variables in a continuous state space constitute an impo...
06/10/2021 ∙ by Vincent Zha, et al. ∙ 0

• ### HJB-RBF based approach for the control of PDEs

Semi-lagrangian schemes for discretization of the dynamic programming pr...
08/06/2021 ∙ by Alessandro Alla, et al. ∙ 0

• ### Macro-action Multi-timescale Dynamic Programming for Energy Management with Phase Change Materials

This paper focuses on home energy management systems (HEMS) in buildings...
06/11/2019 ∙ by Zahra Rahimpour, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A dynamic program (DP) is a mathematical model for sequential decision making. DPs are widely used by the operations research community to model and solve a large variety of problems concerning optimal planning under uncertainty. Unfortunately, DPs are affected by the curse of dimensionality – an expression coined by Richard E. Bellman more than 50 years ago [2] – that makes their solution very difficult in practice. There is a large body of work devoted to ways of circumventing the curse, possibly foregoing optimality or approximation guarantees: this is discussed in Section 1.1.

This paper deals with a class of discrete-time finite-horizon stochastic DPs characterized by a scalar state and a multidimensional action, where the optimal action at each stage and state can be computed as the solution of a linear program (LP). We now give a more detailed description of the underlying mathematical model. The evolution of the state of the DP is governed by the transition function and the equation , where: is the discrete time index, is the state of the system at time ( is the state space at stage ), is the action or decision to be selected at time after observing state ( is the action space at stage and state ), is a vector of interstage independent random variables (r.v.s) over the sample space , and is the number of time periods. The random vector represents an exogenous information flow, and can be continuous or discrete. The cost function gives the cost of performing action from state at time for each possible realization of . The function is also called the “terminal cost function”, and it gives the cost of leaving the system in state at the end of the time horizon under consideration. Costs are accumulated over all time periods: the total incurred cost is equal to . We use , to emphasize vector quantities, whereas the other quantities are scalars. The problem is that of choosing policies in order to determine a sequence of actions that minimizes the expectation of the total incurred cost. This problem is called a stochastic dynamic program. Formally, we want to determine:

 z∗(I1)=minπ1,…,πTE[T∑t=1gt(It,πt(It),→Dt)+gT+1(IT+1)], (DP) subject to: It+1=ft(It,πt(It),→Dt),∀t=1,…,T,

where

is the initial state of the system and the expectation is taken with respect to the joint probability distribution of the random variables

. The seminal result of Bellman [1] is that the solution to this problem is given by a recursion:

###### Theorem 1.1

For every initial state , the optimal value of the DP is given by , where is the function defined by together with the recursion:

 zt(It)=min→xt∈At(It)E{gt(It,→xt,→Dt)+zt+1(ft(It,→xt,→Dt))},t=1,…,T. (1)

When the state and action spaces are finite, and the expectations can be computed with a finite process, this recursion gives a finite algorithm to compute the optimal value. However, this algorithm may require exponential time in general. If the state and action spaces are not finite (e.g. when they are intervals or polyhedra as is discussed in the present paper), or if we cannot compute expectations in finite time (e.g. when there are continuous r.v.s that do not possess a closed-form formula for the expectation), computing may be intractable.

This paper proposes an FPTAS for a specific class of DPs. A polynomial-time approximation scheme (PTAS) is an approximation scheme that returns a solution whose cost is at most times the optimal cost, where is a given parameter. A PTAS runs in time polynomial in the (binary) input size. If the algorithm also runs in time polynomial in , we call it an FPTAS. To construct an FPTAS, we must impose additional structure on the DP, because it is known that not all DPs admit a polynomial-time approximation scheme (see e.g. [13, Thm. 9.2]). We assume that the state is a scalar, whereas the action can be vector-valued. We show in Sect. 3 that if the cost functions are only accessible via value oracles, the restriction that is a scalar cannot be relaxed. At each stage, the vector is composed of a given number of independent but not necessarily identically distributed r.v.s with bounded support, that can have continuous or discrete distribution. The vectors are also interstage independent. (If the vectors are not independent, it is known that the DP may be APX-hard and therefore cannot admit an FPTAS, see e.g. [13, Thm. 10.1, Cor. 10.2].) Furthermore, we assume that: the transition functions are affine; the costs functions decompose to the sum of a deterministic piecewise linear convex function and a convex function that may depend on the r.v.s via an affine transformation; and the action spaces are polyhedral sets, described as the feasible region of a parametric LP with right-hand side parameter . A formal description of the assumptions is given in Sect. 2.2. DPs satisfying these assumptions can be seen as multistage stochastic LPs, see e.g. [4], with a single variable linking consecutive stages (the state variable) and stochastic r.h.s. vector. The constraint matrix and the matrix linking the state between consecutive stages (also called “technology matrix”) are deterministic. However, unlike the sample average approximation approach typically used in multistage stochastic LP, our algorithm is deterministic and its running time depends polynomially on the number of stages. Our framework allows some degree of convex nonlinearity in the objective function, see Condition 3(ii). One can think of the type of optimization problems addressed here as stochastic resource management problems with a single resource, see Sect. 2.3. An example of a problem that exhibits the required characteristics in the context of energy resource allocation is described in [24]. More applications are described in [22]: management of water in a reservoir, of cement for a construction company, of the cash reserve for an investment bank.

#### Organization of the paper.

This paper is organized as follows. In Sect. 2 we define our notation, state our assumptions, and present an example of an application. In Sect. 3 we discuss the necessity of our assumptions and show some related hardness results. Sect. 4 introduces an algorithm to deal with the issue of number sizes growing too rapidly. Sect. 5 contains an FPTAS for the sum of r.v.s, and shows how to efficiently compute the expectation in the DP in the presence of continuous r.v.s. Sect. 6 puts all the building blocks together to design the FPTAS for problem (DP) satisfying our assumptions. Sect. 7 concludes the paper. In the rest of the Introduction, we position our paper as compared to the existing literature, we provide a summary of our contributions, and give an overview of the techniques used.

### 1.1 Relevance to existing literature

Dynamic programming is an invaluable tool for sequential decision making under uncertainty, and it has received a large amount of attention. DPs are often very difficult to solve in practice; for this reason, several approximate solution methodologies have been developed. [23] discusses explicitly the three curses of dimensionality of DPs, namely: the large dimensionality of the state space, of the action space, and the difficulty or impossibility of computing expectations.

To deal with these curses, [5] studies fixed-dimensional stochastic dynamic programs with discrete state and action spaces over a finite horizon. Assuming that the cost-to-go functions are discrete -convex (classes of discrete convex functions are discussed later in this subsection), [5] proposes a pseudopolynomial-time approximation scheme that satisfies an arbitrary pre-specified additive error guarantee. The proposed approximation algorithm is a generalization of the explicit enumeration algorithm. The main differences between our paper and [5] are: (i) [5] considers discrete state and action spaces (as opposed to continuous); (ii) it considers fixed dimensional (as opposed to one-dimensional) state spaces; (iii) it gives additive (as opposed to relative) error approximation; (iv) the running time of the approximation algorithm in [5] is pseudopolynomial (as opposed to polynomial) in the binary size of the input. Both [5] and the current paper are based on generalization of the technique of -approximation sets and functions. Another relevant work in dealing with the curses of dimensionality in options pricing and optimal stopping is [6]. While [6] provides rigorous guarantees on the approximation error (additive and, in some cases, relative), the assumptions and techniques are very different from our paper.

The discipline known as Approximate Dynamic Programming (ADP) strives to deal with the three curses of dimensionality, while hopefully providing theoretical guarantees on the solution quality. Comprehensive references in this area are [3, 23]. In most cases, ADP approaches are content with proving asymptotic convergence to a “best possible” value function, e.g., the best value function approximation that can be obtained from a given set of basis functions [7]. This is considerably different from the approach presented in this paper: we compute an -approximate solution in polynomial time, for any given .

Our paper gives a framework to construct an FPTAS for a continuous optimization problem. The literature contains only a few general frameworks that yield approximation schemes for non-discrete optimization problems. The framework in [26] deals with stochastic LPs, and is a randomized scheme, whereas the algorithm given in the present paper is deterministic. The framework of [13] deals with stochastic discrete DPs in which the r.v.s are described explicitly as lists of scenarios , and the single period cost functions possess either monotone or convex structure. [16] studies a subclass of the DP model of [13], in which the single period cost functions are assumed to possess convex structure, and provides a faster FPTAS from both theoretical worst-case upper bounds and practical standpoint. An extension of [13] to continuous state and action spaces is given in [15]: however, [15] still deals with scalar state and action spaces, and discrete (scalar) r.v.s. Our paper lifts some of those restrictions.

To construct an approximation scheme for (DP

), we study the problem of approximating the cumulative distribution function (CDF) of

where are given r.v.s. This problem is well known to be #P-hard, see e.g. [18], [14, Thm. 4.1]. It plays a central role in stochastic optimization because many problems in this area inherit #P-hardness from it, e.g. [14, 13]. The problem is related to counting knapsack solutions: given a binary knapsack , a possible way to determine the number of feasible solutions is to define discrete r.v.s equal to or with probability each, and then compute . Based on the technique for counting knapsack solutions developed in [9] (see also [11]), [19] gives the first FPTAS to approximate the CDF of the sum of r.v.s. The approach presented in this paper to deal with vectors of r.v.s gives, as a by-product, an FPTAS for the same problem, under weaker assumptions: the difference between [19] and our paper is discussed toward the end of the next section.

Another area of relevance to the present paper is that of discrete convexity, a discrete analog of convexity studied in the discrete optimization literature. The first investigation of a class of discrete functions for which local optimality implies global optimality is due to Miller [20]. Favati and Tardella [8] considered a certain special way of extending functions defined over the integer lattice to piecewise-linear functions defined over the real space, and they introduced the concept of “integrally convex functions”. [21] introduced the concepts of “convex extensible”, “-convexity” and “-convexity,” in which “L” stands for “Lattice” and “M” stands for “Matroid”; see [21, Sec. 1.4]. The relationship between several classes of discrete convex functions is depicted in Fig. 1. [10, Sec. 3] shows that separable convex functions in admit polylogarithmic-space approximations, while any approximation of a two-dimensional nondecreasing convex function in the sense of Miller may require exponential space in the size of the domain of the function. Of the classes represented in Fig. 1, [10] leaves open the approximability status of the class of convex-extensible functions; it is settled in the present paper, see Theorem 1.2 below.

### 1.2 Our contribution

Our paper makes progress toward overcoming some of the curses of dimensionality, while keeping the strong theoretical guarantee of finding an -approximate solution. First, the framework presented in this paper allows an action vector of arbitrary dimension, and the running time depends polynomially on such dimension. Second, we show how to handle vectors of continuous r.v.s with bounded support under suitable conditions using an oracle for their CDF, yielding an efficient approximate computation of expected values of the cost functions. We remark that continuous r.v.s are harder to handle than their discrete counterpart, because an exact computation of expected values cannot be carried out in finite time in general.

Thus, we summarize below our contribution regarding the three curses of dimensionality:

1. We can handle scalar state spaces in polynomial time, as was first shown in [15] under different assumptions. We show that if the cost functions are only accessible via value oracles, our continuous DP model with two-dimensional state space does not admit a PTAS.

2. We can handle vector action spaces of arbitrary dimension in polynomial time.

3. We can handle vectors of continuous r.v.s with bounded support in polynomial time. We can also handle vectors of discrete r.v.s, extending the work of [15] on scalar discrete r.v.s to the vector case. A more precise definition of the required conditions is given in Sect. 2.2.

To prove that a DP with multidimensional state space does not admit a PTAS when cost functions are described by an oracle, we prove in Section 3.1 the following result (we define discrete convex-extensible in Section 3.1).

###### Theorem 1.2

There exists a family of two-dimensional positive nondecreasing discrete convex-extensible functions for which any approximation (with approximation ratio below a certain threshold) requires exponential space in the size of the domain of the function, regardless of the scheme used to represent the approximation.

This result is of independent interest and resolves the approximability status of discrete convex-extensible functions, see Fig. 1. Since discrete convex-extensible functions coincide with their convex envelope by definition, the above result implies that two-dimensional nondecreasing piecewise-linear convex functions do not admit an approximation that is polylogarithmic in the size of the domain (i.e., polynomial in the (binary) size of the domain’s description).

To handle vectors of r.v.s, we assume that such vectors appear in the cost and transition functions only via affine transformations. Our FPTAS to approximate the weighted sum of r.v.s has weaker assumptions than [19]. The work of [19] assumes that we are given integer discrete r.v.s and an exact oracle for the corresponding CDFs. It constructs an FPTAS for the sum of r.v.s by discretizing a DP recurrence relation. We relax their assumptions: the approach described in this paper only requires an approximate oracle for the CDF of the r.v.s, and it can also handle non-integer discrete r.v.s and continuous r.v.s with bounded support. The running time is essentially the same. This contribution of our paper extends the applicability of approximation schemes for the sum of r.v.s.

The main constructive result shown in this paper is that a stochastic DP satisfying some structural assumptions (Conditions 1-3) admits an FPTAS. The approximation scheme is remarkably short and simple: its pseudocode consists of less than ten lines that use three different subroutines. The majority of this paper is devoted to their analysis. While the problem that we solve is continuous in nature, the technique used to build the approximation is discrete in nature, and may be useful for the construction of approximation schemes for continuous optimization problems. We remark that our approximation scheme is fully deterministic and does not rely on any form of random sampling, unlike known sample average approximation schemes for multistage optimization problems.

While the work presented in this paper is still far from the generality of ADP approaches, there are several real-world applications that fit into our framework, see Sect. 2.3. Furthermore, our framework provides much stronger theoretical guarantees than ADP: despite the fact that an exact solution to the problem may be impossible to compute in finite time, we determine an -approximation of the optimum for arbitrary . To the best of our knowledge, the present paper is the first that uses FPTASs to tackle two curses of dimensionality of DP, i.e., multidimensional action spaces and r.v.s: other FPTASs for DPs mentioned in our literature review do not address these two aspects.

### 1.3 Techniques used

The approximation scheme discussed in this paper relies on the concept of -approximation sets and functions, introduced in [14]. Intuitively, a -approximation function (with ) is a function that has a relative error of at most (i.e., ) with respect to a given function. A -approximation set is a set of points that induces a

-approximation function via linear interpolation. A

-approximation set of a function with codomain can be constructed with points because we can allow the function values to increase by roughly a factor between two consecutive points. The papers [14, 13] define -approximation sets for functions over discrete domains. The paper [15] extends this to continuous domains, but uses both relative and absolute error measures. Since we work with continuous domain, the definition of -approximation sets used in this paper is essentially the same as in [15] when only relative error is allowed, i.e., the additive error is fixed to 0. Formal definitions are given in Sect. 2.1.

The approach used in this paper to deal with vector action spaces is based on LP, and for this reason we impose structure on the problem that allows us to use an exact LP solution methodology as a subroutine. Since our approximation scheme proceeds recursively, we must be careful to ensure that the size of the LPs does not grow too fast. This is achieved with a scaling technique, finding a -approximation set with integer domain and codomain values for a suitably scaled version of the function to be approximated; see Sect. 4.

Regarding the approximation for the weighted sum of r.v.s, our construction is not based on counting knapsack solutions as in [19], but rather on constructing compact representations for the value function of a DP. As a result, we obtain a compact representation for the CDF of the sum of r.v.s over the entire domain, rather than its value at a single point. The construction is iterative and amounts to building a -approximation set for the CDF of a sum of r.v.s, adding one r.v. at a time. This is discussed in Sect. 5.

## 2 Preliminaries

In this section, we formally introduce our notation and give the necessary definitions, discussing the type of problems that can be handled by the algorithm we propose.

### 2.1 Definitions

Let and be the sets of nonnegative integers, integers, rational numbers, nonnegative real numbers and real numbers respectively. We distinguish between three types of continuous random variables: continuous r.v.s that may have unbounded support and for which for all , mixed continuous r.v.s that may have for some , and truncated continuous r.v.s that have bounded support with strictly positive probability on the endpoints only. Vectors are denoted with an overhead arrow, such as in . Subscripts should generally be intended to index a stage, e.g.  is the action vector at stage rather than the -th component of a vector . When it is necessary to index a component of a vector, it will be made explicit, e.g.  will be the -th component of the vector . For , we denote a finite set of the form by , whereas denotes a real interval.

Given a real-valued function over a bounded linearly-ordered domain , , such that is not identically zero, we denote , , , and . We write . For any function , denotes an upper bound on the time needed to evaluate on a single point in its domain. A function is called Lipschitz continuous with a given Lipschitz constant (-Lipschitz continuous, in short), if for all . Given a Lipschitz continuous function , we denote by its Lipschitz constant. Let be a set, and let be a set for every . We denote by the set . We write to denote . When indicating running times of algorithms, we give bounds on the number of operations, that we assume to have unitary cost. For arithmetic operations, the number of bit operations depends on the size of the numbers: because we always ensure that the size of the numbers is polynomially bounded in the (binary) input size, the number of bit operations is at most a polylogarithmic factor larger than the number of arithmetic operations.

As mentioned in Sect. 1.3, the approximation scheme discussed in this paper relies on of -approximation sets and functions [14]. These concepts are formalized below. The definitions are based on [13, 15].

###### Definition 2.1

Let and let . We say that is a -approximation function of (or more briefly, a -approximation of ) if for all we have .

###### Definition 2.2

Let be a convex function. For every finite , the convex extension of induced by is the function defined as the lower envelope of the convex hull of .

###### Definition 2.3

Let be nondecreasing. For every finite , the monotone extension of induced by is the function defined as .

###### Definition 2.4

Let and let be a convex (resp. nondecreasing) function. We say that a finite set is a -approximation set of if the convex extension (resp. the monotone extension) of induced by is a -approximation function of .

For a monotone nondecreasing function, an algorithm to compute a -approximation set in polynomial time is described in [13] if the function is defined over integers. If the function is defined over a real interval and is strictly positive, such an algorithm is given in [15], see the appendix A. We call this algorithm ApxSetInc, consistently with these papers: whether the discrete or continuous version should be applied will be clear from the context. -approximation functions can be combined as indicated below.

###### Proposition 2.5 (Calculus of K-approximation functions, Prop. 5.1 in [13])

For let , let be an arbitrary function over domain , and let be a -approximation of . Let , and let . Then:

1. is a 1-approximation of itself,

2. (linearity of appr.) is a -approximation of ,

3. (summation of appr.) is a -approximation of ,

4. (composition of appr.) is a -approximation of ,

5. (minimization of appr.) is a -approximation of ,

6. (maximization of appr.) is a -approximation of ,

7. (appr. of appr.) If then is a -approximation of .

We call canonical representation of an approximation set of a function its representation as a linearly ordered set of points and the corresponding function values. Formally, we write the canonical representation as , represented as an array sorted by its coordinate. Such a representation corresponds to the data that is normally stored in a computer implementation of the proposed algorithms. Let be a convex (resp. monotone) function, a -approximation set of , and let be the convex (resp. monotone) extension of induced by . We say that is a -approximation function of that is given in the form of a canonical representation. (Note that by Def. 2.4, is indeed a -approximation function of .) Given the canonical representation and any query point in the domain of , one can compute in time the value of at . Therefore, the canonical representation serves as a value oracle for with query time . Given a canonical representation, we can recover the approximation set on which it was constructed in time linear in .

We define a routine called CompressInc(), that given a monotone nondecreasing function and an approximation factor , constructs a value oracle (in the form of a canonical representation) for a -approximation function of . Details are given in appendix A. Similar routines will be constructed in the remainder of this paper.

The main idea for the approximation algorithm proposed in this paper is to construct a value function approximation following the backward recursion (1). The value function approximation is stored as a -approximation set. At each stage, we must be able to efficiently compute the expected value appearing in (1), and to perform the minimization over the action space. We will show how to achieve these goals under the assumptions detailed in the next section.

### 2.2 Assumptions

To construct an FPTAS, we require the following conditions to be satisfied:

###### Condition 1

[Domains] and for are intervals on the real line. is a -dimensional polyhedral set that is expressed as the feasible set of a parametric LP with variables, where: the right-hand side vector is an affine function of the parameter ; ; and for .

###### Condition 2

[Description of random events] For every , we have . For every and , one of the following two conditions hold for the -th random variable of the vector :

• The random variable is truncated continuous with compact support of length , and its CDF is Lipschitz continuous.

• The random variable is discrete, its support consists of elements, and the -th largest element in the support can be identified in time polylogarithmic in .

Furthermore, , , and the information about the random events is given via value oracles to the CDF. All are independent.

###### Condition 3

[Structure of the functions] For every , the function is linear in its variables and is expressed as . The function can be expressed as , where: is a piecewise linear convex function described as the pointwise maximum of a set of

given hyperplanes; and

is an affine function expressed as . One of the following conditions hold:

• The function is positive piecewise linear convex over with given breakpoints and slopes, and minimum value . Furthermore, for all the function is a piecewise linear convex function over a compact interval with given breakpoints and slopes.

• The function is Lipschitz continuous and convex over and is described via a value oracle. There is a given positive bound on the minimum value of . Furthermore, for all the function is non-negative Lipschitz continuous and convex over a compact interval , and is described via a value oracle.

We give next a few remarks about our DP model. In Condition 2(i) and Condition 3(ii) the Lipschitz constant need not be given a-priori. However, the running time of our algorithms depends polylogarithmically on it. In Condition 2(ii), we use

to indicate either the size of the support in terms of interval length for continuous random variables, or the cardinality of the support for discrete random variables: this allows us to write the running time in a more compact way. An example of a truncated continuous r.v. satisfying Condition

2 is that of a Gaussian r.v.

with given mean and standard deviation, clipped to a bounded interval: for example, if

represents a demand, we can truncate it from below at zero because demand cannot be negative (), and from above at some positive order capacity because all demand above this value is treated identically (). Truncated mixed r.v.s can in principle be handled under assumptions similar to Condition 2, but their treatment is cumbersome and is not pursued here. We remark that the implicit model for the description of the r.v.s, as stated in Condition 2, is both more general and more compact than listing all possible values in the support of the r.v.s and the corresponding probabilities. The latter approach, while simpler and more commonly employed, would only be viable for discrete r.v.s with support of small size.

Condition 3(i) is the traditional model in which the cost functions are analytically known in explicit form. Condition 3(ii) is a more flexible model that allows some of the cost functions to be described by value oracles. The FPTAS proposed in this paper is essentially the same under both Condition 3(i) and 3(ii), but some of our hardness results assume the oracle model. Studying the oracle model is appealing because positive results that hold in this setting have weak assumptions and can therefore be broadly applied. Value oracles can also allow strong negative results, see Sect. 3.

The input data of the problem includes: the number of time periods ; the initial state ; the constant (Condition 2 implies ); the maximum cost in a time period ; the maximum Lipschitz constant of the cost functions (and of the CDF of in the case of Condition 2(i)) for ; the minimum value of the terminal cost function; the maximum length of the state spaces; the maximum size of the coefficients in the constraint matrix or right-hand side vector of the LPs describing ; the maximum size of the coefficients in the vectors and of the scalars in the description of and ; the maximum numbers of pieces and in the piecewise description of the cost functions; the maximum size of the support of the r.v.s. Our algorithms run in time polynomial in , , , , , , , , , , , , . Throughout this paper, the Lipschitz constants of the functions involved need not be known, but the running time of the algorithms depends on them: intuitively, a function with high Lipschitz constant may change quickly, and it may take longer for the algorithms to identify sudden “jumps” in function value with the required precision. The necessity of our assumptions is discussed in Sect. 3.

### 2.3 An application to resource management under uncertainty

We describe here an example of the type of models allowed under the assumptions given in the previous section. The example involves managing a resource of non-discrete nature, e.g., liquid natural gas or cement, at a central storing facility. Let be a directed graph with nodes with a special node that is the central storing facility; for , we denote by its set of outgoing edges and by its set of ingoing edges. The optimization is performed over a finite time horizon . The state of the system is the amount of resource available at the central storing facility at the beginning of time period . At the end of every time period, there is an unknown arrival of the resource encoded by a r.v. , e.g., cargo ships (in which case, the r.v. is discrete). There are locations, that consume the resource but do not have (between time periods) storing capabilities. Each location has unknown demand at time period , distributed according to a continuous r.v.  with . The demand cannot be transferred to other locations: unsatisfied demand or excess inventory at each of the locations is lost. The resource can be moved from the central storing facility to the locations, as well as between locations, over a capacitated flow network described by , potentially with losses over the arcs (i.e., the flow variables may appear with coefficients in the flow balance constraints at some nodes). There is no backlogging at the central storing facility: inventory must stay nonnegative (but we could easily amend the formulation to allow backlogging). At each time period , one observes the state of the system , and based on the distribution of the demand in time periods decides on the flow over each arc . At the end of the period a cost is incurred that consists of the transportation cost given by the flow network with unit cost on each arc, and shortage (resp. disposal) cost (resp. ) that is accounted for every unit by which the demand at location is not met (resp. is exceeded). At the final time period , there is a disposal cost for every unit left in the inventory. We denote by , the flow balance (exiting minus entering) at location , therefore is positive if the node sends out more than it receives, negative otherwise. Furthermore, we denote by an auxiliary variable to encode the cost incurred at the end of time period at location , i.e., the maximum between the shortage and disposal costs.

This problem can be formulated as a multistage stochastic linear program as follows:

 min E→D⎡⎣T∑t=1⎛⎝∑(i,j)∈Ecijft,ij(→Dt)+m∑i=1wt+1,i(→Dt)⎞⎠+h0IT+1(→DT)⎤⎦ (2) t=1,…,T −It(→Dt−1)+It+1(→Dt)+yt,0(→Dt)=→Dt,0 (3) t=1,…,T,i=0,…,m ∑(i,j)∈δ+(i)ft,ij(→Dt)−∑(j,i)∈δ−(i)ajift,ji(→Dt)=yt,i(→Dt) (4) t=1,…,T yt,0(→Dt)≤It(→Dt−1) (5) t=2,…,T+1,i=1,…,m −hiyt−1,i(→Dt−1)−hi→Dt−1,i≤wt,i(→Dt) (6) t=2,…,T+1,i=1,…,m siyt−1,i(→Dt−1)+si→Dt−1,i≤wt,i(→Dt) (7) t=1,…,T,(i,j)∈E ft,ij(→Dt)≥0 (8) t=1,…,T,i=1,…,m yt,i(→Dt)≤0 (9) I1(→D0)=¯I1. (10)

In the above formulation, is the initial inventory. The inventory level depends on the random variable at time period , as indicated in constraint (3): is automatically determined after all decisions at time period are taken and is revealed. Similarly, the auxiliary variable encodes the cost at location at the end of the previous time period, because such cost can only be computed after the demand at a given time period is revealed. only serves the purpose of keeping this notation consistent for , and is otherwise not used in the model. Constraint (5) implies .

If the are all discrete r.v.s, this problem admits a deterministic equivalent formulation by introducing a copy of the decision variables for each sample path, but the number of sample paths is exponential in . We now show that a DP formulation of this problem satisfies Conditions 1-3.

Let be the state variable. The action vector is defined by ; for simplicity, we use to refer to the variables in the action vector corresponding to the variables with the same name in (2)-(10). The transition function is . The action space is . The cost functions are: , , . It is easy to verify that with the cost functions defined as given, we obtain an equivalent description to problem (2)-(10), and Conditions 1-3(i) are satisfied.

## 3 Hardness results

We now discuss the necessity of our assumptions. Regarding Condition 1, it is an open question under what additional assumptions the restriction that the state spaces are intervals on the real line can be lifted under our DP model. However, the next subsections show that constructing an approximation of the value function is difficult already whenever are two-dimensional boxes (Cor. 3.3). We show this by first proving the existence of a class of two-dimensional piecewise linear convex functions that are hard to approximate.

Regarding Condition 2, the conditions on truncated r.v.s seem difficult to relax under the oracle model for the CDFs: [15, Prop. 2.6] shows that a continuous function over a compact interval may not admit an efficient approximation unless its values at the endpoints of the domain are bounded away from zero. This result holds even if the function is monotone and Lipschitz continuous. Hence, approximating the CDF of a continuous r.v. that does not have positive probability mass at the endpoints seems difficult. ([15] also shows that for bounded Lipschitz continuous functions we can drop the strict nonnegativity requirement if we allow both relative and absolute error at the same time; however, in this paper we aim for an FPTAS in the usual sense.) Regarding Condition 3, the nonnegativity requirement on the cost functions cannot be relaxed: [17] exhibits a DP that is #P-hard to approximate to any constant factor, and their problem can be cast into our framework if we allow the cost functions to be of either sign. In other words, the #P-hard DP in [17] satisfies Conditions 1-3 except for the nonnegativity requirement on . It is an open question whether the restriction (or in the setting of Condition 3(ii)) can be relaxed to (to in the setting of Condition 3(ii)). The difficulty stemming from is that we would have to keep track of the exact location of the zeroes of the value function at each stage.

### 3.1 Hardness of approximation of multivariate convex functions

The DP model discussed in this paper naturally yields piecewise linear convex value functions. To extend the framework to a multidimensional state space setting we would have to efficiently build approximations of such functions. However, the next results show that some piecewise linear convex functions do not admit an efficient approximation. We first show that there is an exponential number of two-dimensional piecewise linear convex functions with distinct values at integer points, and use this to prove that approximations of such functions require space exponential in the size of the domain.

###### Theorem 3.1

For any nonnegative integer , there exist distinct nondecreasing piecewise linear convex functions that attain the value at distinct sets of points in , and at least at the other sets of integer points. These functions are described as the pointwise maximum of hyperplanes,

###### Proof.

Let

 Φ:={φr1,…,rU:[1,U]2→[A,A+U]∣∣ ∣∣r1,…,rU∈[0,U];U∑i=1ri=U,r1≤r2≤⋯≤rU}

be a family of bivariate discrete functions defined as follows: for ,

 φr1,…,rU(x,y)={Aif x=1,…,U−∑yi=1riA+x−U+∑yi=1riotherwise. (11)

[10, Prop. 3.4] shows (taking ) that any pair of elements of attains the value at different points on the integer grid. Furthermore, it shows that the number of such functions is equal to the number of partitions of the integer , which is bounded below by . In Fig. 2 we illustrate the shape of by plotting its values for two choices of .

Let be the number of distinct values among . We show next that each function can be described as the pointwise maximum of at most hyperplanes, in a such a way that the value of the pointwise maximum on the integer grid matches that of . Note that the slopes of satisfy the following relationship:

 φr1,…,rU(x+1,y)−φr1,…,rU(x,y) ={0if x

Furthermore, for all valid choices of .

Claim. For , there exists , with , such that for all .

Proof of the claim. By backward induction. We start with . Consider the set , which corresponds to two hyperplanes given in the form : one “flat” hyperplane at the value , and one hyperplane with slope along the axis, along the axis. Consider the latter hyperplane, which we call . By the choice of , interpolates at the points for all . Furthermore, by (13), it interpolates at the points for which , because for such points its slope on the direction matches the slope of . It remains to analyze the points with . Let be the set of such points. For , has value by definition, see (11). Furthermore, . Then, since interpolates at , and its slope is in the direction, we have that the value of at is equal to . But then the maximum of the two hyperplanes in at is given by (i.e., it is given by the “flat” hyperplane), which is exactly the value of by equation (11). This concludes the initial step of the induction.

We now need to prove the induction step , assuming the claim is true for . Suppose first that . In this case, a similar argument to the one above shows that no further hyperplane is needed in , because the slope of along the axis for points with and is the same as for , and for the remaining points, the “flat” hyperplane suffices. We need to analyze the case . In this case, let . Let be the hyperplane defined by the coefficients ; it is easy to verify that interpolates at for all by construction. The argument now proceeds as before: the hyperplane interpolates at points for which , because for such points its slope on the direction matches the slope of . The remaining points with have value , therefore the “flat” hyperplane interpolates them and has value at them. Furthermore, because the , the value of at all points with is smaller than . This concludes the inductive claim.

The above claim shows that we can construct a set containing hyperplanes such that for every , the pointwise maximum of the hyperplanes at is equal to . Note that all hyperplanes except the “flat” hyperplane have positive integer slopes. Thus, the pointwise maximum of the hyperplanes at integer points is integer-valued. This implies that the value of the pointwise maximum of such hyperplanes on the integer grid is either or . To conclude the proof, we need a bound on . Recall that