Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models

04/02/2018 ∙ by Dionysios S. Kalogerias, et al. ∙ 0

We develop and analyze stochastic subgradient methods for optimizing a new, versatile, application-friendly and tractable class of convex risk measures, termed here as mean-semideviations. Their construction relies on on the concept of a risk regularizer, a one-dimensional nonlinear map with certain properties, essentially generalizing the positive part weighting function in the mean-upper-semideviation risk measure. After we formally introduce mean-semideviations, we study their basic properties, and we present a fundamental constructive characterization result, demonstrating their generality. We then introduce and rigorously analyze the MESSAGEp algorithm, an efficient stochastic subgradient procedure for iteratively solving convex mean-semideviation risk-averse problems to optimality. The MESSAGEp algorithm may be derived as an application of the T-SCGD algorithm of (Yang et al., 2018). However, the generic theoretical framework of (Yang et al., 2018) is too narrow and structurally restrictive, as far as optimization of mean-semideviations is concerned, including the classical mean-upper-semideviation risk measure. By exploiting problem structure, we propose a substantially weaker theoretical framework, under which we establish pathwise convergence of the MESSAGEp algorithm, under the same strong sense as in (Yang et al., 2018). The new framework reveals a fundamental trade-off between the smoothness of the random position function and that of the particular mean-semideviation risk measure under consideration. Further, we explicitly show that the class of mean-semideviation problems supported under our framework is strictly larger than the respective class of problems supported in (Yang et al., 2018). Thus, applicability of compositional stochastic optimization is established for a strictly wider spectrum of mean-semideviation problems, justifying the purpose of our work.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 24

page 36

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

During the last almost twenty years, many significant advances have been made in the now relatively mature area of risk-averse modeling and optimization. These primarily include the fundamental axiomatization and theoretical characterization of risk functionals, also commonly known as risk measures (Kijima and Ohnishi, 1993; Rockafellar and Uryasev, 1997; Artzner et al., 1999; Ogryczak and Ruszczyński, 1999, 2002; Rockafellar and Uryasev, 2002; Rockafellar et al., 2003, 2006; Ruszczyński and Shapiro, 2006b; Shapiro et al., 2014), as well as extensive analysis in the context of risk-averse stochastic programs in both static and sequential decision making problem settings (Rockafellar and Uryasev, 1997; Föllmer and Schied, 2002; Rockafellar et al., 2003, 2006; Ruszczyński and Shapiro, 2006a; Collado et al., 2012; Çavuş and Ruszczyński, 2014a; Asamov and Ruszczyński, 2015; Dentcheva and Ruszczyński, 2017; Grechuk and Zabarankin, 2017; Shapiro, 2017; Fan and Ruszczyński, 2018). The importance of building a well structured theory of risk is motivated by its natural and intuitive relevance to problems from a large variety of applied domains. Arguably the oldest, archetypical application of risk is in Finance (Kijima and Ohnishi, 1993; Rockafellar and Uryasev, 1997; Andersson et al., 2001; Krokhmal et al., 2001; Chen and Wang, 2008; Shang et al., 2018)

, which has decisively driven pioneering research in risk-averse modeling and optimization, from its very birth, probably dating back to the work of Markowitz

(Markowitz, 1952), to present. Other applications of risk may be found in both classical and contemporary domains such as Energy (Moazeni et al., 2015; Bruno et al., 2016; Jiang and Powell, 2016), Wireless Networks (Ma et al., 2018), Inventory Optimization (Ahmed et al., 2007; Chen et al., 2007; Xinsheng et al., 2015) and Supply Chain Management (Gan et al., 2004; Sawik, 2016), to name a few.

Most recently, the development of effective computational methods for applying risk-averse optimization to actual problems has also been attracting considerable attention; see, e.g., (Ruszczyński, 2010; Çavuş and Ruszczyński, 2014b; Moazeni et al., 2017; Tamar et al., 2017; Dentcheva et al., 2017; Huang and Haskell, 2018; Jiang and Powell, 2017; Yu et al., 2018). This line of work can be divided between sequential settings (Çavuş and Ruszczyński, 2014b; Moazeni et al., 2017; Tamar et al., 2017; Huang and Haskell, 2018; Jiang and Powell, 2017; Yu et al., 2018), and static settings (Tamar et al., 2017; Dentcheva et al., 2017), for a variety of different problem characteristics. Computational recipes also vary. For instance, (Ruszczyński, 2010) and (Çavuş and Ruszczyński, 2014b) develop and analyze variations of the well known value and policy iteration algorithms of risk-neutral dynamic programming; (Moazeni et al., 2017) proposes a method for risk-averse nonstationary direct parametric policy search for finite horizon problems; (Tamar et al., 2017), (Dentcheva et al., 2017) and (Yu et al., 2018) rely on the so-called Sample Average Approximation (SAA) approach (Shapiro et al., 2014)

, where an appropriately constructed empirical estimate of the original objective is used as a

surrogate to that of the original stochastic program, assuming existence of a sufficiently large sample of the processes introducing uncertainty into the corresponding risk-averse objective; (Huang and Haskell, 2018) and (Jiang and Powell, 2017) consider an Approximate Dynamic Programming (ADP) (Powell, 2011) approach, where sequential finite state/action risk-averse stochastic programs are tackled via stochastic approximation (Kushner and Yin, 2003).

Following this recent trend, this paper proposes and rigorously analyzes recursive stochastic subgradient methods for an important class of static, convex risk-averse stochastic programs. In a nutshell, we make the following contributions:

  1. Following the Mean-Risk Model paradigm (Shapiro et al., 2014), we introduce a new class of convex risk measures, called mean-semideviations. These strictly generalize the well known mean-upper-semideviation risk measure, and are constructed by replacing the positive part weighting function of the latter by another nonlinear map, termed here as a risk regularizer, obeying certain properties. Mean-semideviations share the same core analytical structure with the mean-upper-semideviation risk measure; however, they are much more versatile in applications. We study mean-semideviations in terms of their basic properties, and we present a fundamental constructive characterization result, demonstrating their generality. Specifically, we show that the class of all mean-semideviation risk measures is almost in one-to-one correspondence

    with the class of cumulative distribution functions (cdfs) of all integrable random variables. This result provides an

    analytical device for constructing mean-semideviations with desirable characteristics, starting from any cdf of the aforementioned type. The flexibility and effectiveness of mean-semideviations are explicitly demonstrated on a classical, chance-constrained newsvendor model, as well.

  2. We introduce the (MEan-Semideviation Stochastic compositionAl subGradient dEscent of order ) algorithm, an efficient, data-driven Stochastic Subgradient Descent (SSD) -type procedure for iteratively solving convex mean-semideviation risk-averse problems to optimality. The algorithm constitutes a parallel variation of general purpose T-level Stochastic Compositional Gradient Descent (-SCGD) algorithm, recently developed in (Yang et al., 2018), under a generic theoretical framework. Although risk-averse optimization is listed in (Yang et al., 2018) as a potential application of stochastic compositional optimization for the mere case of mean-upper-semideviations, this work is the first to propose a general algorithm, applicable to any mean-semideviation model of choice.

  3. We analyze the asymptotic behavior of the algorithm under a new, flexible and structure-exploiting set of problem assumptions, which reveal a well-defined trade-off between the expansiveness of the random cost and the smoothness of the mean-semideviation risk measure under consideration. In particular, under our proposed structural framework:

    • Under appropriate stepsize rules, we establish pathwise convergence of the algorithm in a strong technical sense, confirming its asymptotic consistency.

    • Assuming a strongly convex cost function, the convergence rate of the algorithm is studied in detail. More specifically, we show that, for fixed semideviation order and for , the algorithm achieves a squared- solution suboptimality rate of the order of iterations, where, for , pathwise convergence is simultaneously guaranteed. Thus, this new result establishes a rate of order arbitrarily close to , also ensuring strongly stable pathwise operation of the algorithm. In the simpler case where the semideviation order is chosen as , the rate order of the proposed algorithm improves to , which is sufficient for pathwise convergence as well, and matches previous results in the related literature (Wang et al., 2017).

    • For the general case of a convex cost, we show that, for any , the algorithm with iterate smoothing achieves an objective suboptimality rate of the order of . As in the strongly convex case, for , pathwise convergence is also simultaneously guaranteed. For , this result provides maximal rates of , if , and , if , matching the state of the art, as well.

  4. We discuss the superiority of the proposed framework for convergence, as compared to that employed earlier in (Yang et al., 2018), within the risk-averse context under consideration. By performing careful analysis and by constructing non-trivial counterexamples, we explicitly demonstrate that the class of mean-semideviation problems supported herein is strictly larger than the respective class of problems supported in (Yang et al., 2018). As a result, this paper establishes the applicability of compositional stochastic optimization for a significantly and strictly wider spectrum of convex mean-semideviation risk-averse problems, as compared to the state of the art. This fact justifies the purpose of our work from this perspective, as well.

Our contributions, briefly outlined above, are now discussed in greater detail. We also briefly explain how our work relates to and is placed within the existing literature.

1.1 Mean-Semideviation Risk Measures

Mean-semideviation risk measures, as proposed and developed in this work, constitute a new class of risk measures where, given a random cost, the corresponding dispersion measure (the term penalizing the “mean” part of a mean-risk functional) is defined as the -norm of a nonlinear, one-dimensional map of the centered cost, or, in other words, its central deviation. This map is called a risk regularizer, and possesses certain analytical properties: convexity, nonnegativity, monotonicity and nonexpansiveness. Dispersion measures with this structure are suggestively called generalized semideviations.

This terminology originates from the presence of the positive part function , which is the simplest, prototypical example of a risk regularizer, in the corresponding dispersion measure of the well known mean-upper-semideviation risk measure (Shapiro et al., 2014), i.e., the upper-(central)-semideviation. Mean-semideviations are much more versatile, however, since different choices for the involved risk regularizer correspond to different rules for ranking the relative effect of both riskier (higher than the mean) and less risky (lower than the mean) events, corresponding to specific regions in the range of the (centered) cost. As a result, the choice of the risk regularizer affects the general quality and the roughness/stability of an optimal random cost, in a decision making setting. Consequently, owing to their versatility, mean-semideviations are practically appealing as well, because they are parametrizable and they may incorporate domain specific knowledge more easily than the rigid mean-upper-semideviation.

In this work, after we formulate simple conditions for the existence of mean-semideviation risk measures, we study their basic geometric properties, such as convexity and monotonicity. Contrary to the mean-upper-semideviation alone, mean-semideviations are not coherent risk measures, in general (as a class), because they do not satisfy positive homogeneity (Shapiro et al., 2014). This is due to the potential nonhomogeneity of the risk regularizer involved. They do satisfy convexity, monotonicity and translation equivariance, though and, therefore, they belong to the class of convex risk measures, (Föllmer and Schied, 2002; Shapiro et al., 2014), and that of convex-monotone risk measures, as well.

Further, we present a fundamental constructive characterization result, demonstrating the generality of mean-semideviations. Specifically, on the one hand, this result shows that the class of all mean-semideviation risk measures is almost in one-to-one correspondence with the class of cdfs of all integrable random variables (on the line). On the other, it provides an analytical device for constructing such risk measures from any cdf of the aforementioned type. Although not studied in this paper, this correspondence between mean-semideviations and cdfs might be of interest in other areas related to stochastically robust optimization such as stochastic dominance; see, for instance, the seminal articles (Ogryczak and Ruszczyński, 1999, 2002) for some interesting connections.

Our discussion on mean-semideviation risk measures is concluded by a demonstration of their practical usefulness and flexibility on a classical, chance-constrained newsvendor model. After we briefly analyze the structure of the problem under consideration, we put risk regularizers -each inducing a mean-semideviation risk measure- in context, and we explicitly discuss their construction, so that the resulting mean-semideviation risk measure best reflects problem characteristics, and the objectives of the decision maker. Additionally, we present numerical simulations, experimentally confirming the effectiveness of the proposed risk-averse approach. Our simulations also reveal some interesting features of the resulting risk-averse solutions, which we further discuss.

Relation to the Literature: We are not the first to propose convex risk measures featuring nonlinear weighting functions; see, for instance, (Kijima and Ohnishi, 1993; Chen and Yang, 2011; Fu et al., 2017). In particular, the recent article (Fu et al., 2017) considers risk measures defined as a nonlinearly weighted, order- (lower) semideviation from a fixed target (see, for instance, Example 6.25 in (Shapiro et al., 2014)), focusing mainly on their applications on a portfolio selection model. In (Fu et al., 2017), the corresponding weighting function shares the same properties as a risk regularizer (see above), except for nonexpansiveness. However, our proposed mean-semideviation risk measures are substantially different and structurally more complex compared to the risk measures proposed in (Fu et al., 2017). The main reason is the presence of the expected cost, rather than a fixed target, in the definition of mean-semideviations; for more details, compare ((Fu et al., 2017), Definition 1) with Section 3 herein.

1.2 Recursive Optimization of Mean-Semideviations

The main contribution of this work concerns efficient optimization of mean-semideviations, measuring convexly parameterized random cost functions, over a closed and convex set. We introduce and rigorously analyze the (MEan-Semideviation Stochastic compositionAl subGradient dEscent of order ) algorithm (Algorithm 1 in Section 4.3), which constitutes an efficient Stochastic Subgradient Descent (SSD) -type procedure for iteratively solving our base problem to optimality. The algorithm may be seen as a parameterized (relative to the choice of the risk regularizer), parallel variation of the general purpose T-Level Stochastic Compositional Gradient Descent (-SCGD) algorithm, presented and analyzed very recently in (Yang et al., 2018) under generic assumptions. In turn, the -SCGD algorithm is a natural generalization of the Basic 2-Level SCGD algorithm, presented and analyzed earlier in (Wang et al., 2017). A key feature of the aforementioned compositional stochastic subgradient schemes is the existence of more than one (, in general), pairwise coupled stochastic approximation updates, or levels, each with a dedicated stepsize, which are executed concurrently through the operation of the algorithm. In the case of the algorithm, there exist three such levels (that is, ), and this results naturally, due our specific problem structure. However, contrary to the -SCGD algorithm, all three stochastic approximation levels of the algorithm are executed completely in parallel within every iteration, presenting additional operational efficiency, potentially important in various applications.

Pathwise convergence and convergence rate analyses of the -SCGD algorithm are presented in (Yang et al., 2018), and (Wang et al., 2017) (where, in the latter, ). However, the respective structural framework considered in both (Yang et al., 2018) and (Wang et al., 2017), when applied to the problem class considered in this work, imposes significant restrictions in regard to the possible choice of the risk regularizer, partially related to the expansiveness and smoothness (or roughness) of the involved random cost function. This fact significantly limits the type of problems the -SCGD algorithm is provably applicable to, at least within the class of risk-averse problems introduced and studied herein. For example, when , arguably the most popular regularizer , leading to the mean-upper-semideviation risk measure, is not supported within the framework of (Wang et al., 2017; Yang et al., 2018). This is because nonsmooth risk regularizers exhibiting corner points, such as , apparently have discontinuous subderivatives, whereas the respective assumptions made in (Wang et al., 2017; Yang et al., 2018) essentially require the respective risk regularizer to be not only everywhere differentiable, but to have Lipschitz derivatives, as well. This shortcoming of the theoretical framework of (Wang et al., 2017; Yang et al., 2018) naturally carries over to higher values of the semideviation order, . Naturally, the theoretical narrowness of (Wang et al., 2017; Yang et al., 2018) motivates closer study of any compositional subgradient algorithm whatsoever, one that would exploit the special characteristics of a mean-semideviation risk measure. The ultimate goal is the development of a sufficiently general theoretical framework, which will justify the compositional optimization approach for the whole class of mean-semideviation risk measures, under as weak structural assumptions as possible.

Following this direction, and focusing on optimizing mean-semideviation models, we present a new and flexible set of problem assumptions, substantially weaker than those employed in (Wang et al., 2017; Yang et al., 2018), under which we analyze the asymptotic behavior of the algorithm, proposed in our work. Our framework carefully exploits the structure of mean-semideviations, and presents a probably fundamental, though practically useful, trade-off between the expansiveness of the random cost function and the smoothness of the chosen risk regularizer, in a very well-defined sense. As previously outlined, our results are restated, as follows.

First, under appropriate stepsize rules, we establish pathwise convergence of the algorithm in the same strong sense as in (Wang et al., 2017; Yang et al., 2018), thus confirming its asymptotic consistency.

Second, assuming a strongly convex cost function, we study the convergence rate of the algorithm, in detail. More specifically, we show that, for fixed semideviation order and for any choice of , the algorithm achieves a squared- solution suboptimality rate of the order of iterations. Here, is a user-specified parameter, which directly affects stepsize selection. If, additionally, is chosen to be strictly positive, that is, for , pathwise convergence is simultaneously guaranteed. This completely novel result establishes a convergence rate of order arbitrarily close to as , while ensuring strongly stable pathwise operation of the algorithm. In the structurally simpler case where , the rate order improves to , which is sufficient for pathwise convergence as well, and matches existing results in compositional stochastic optimization, developed earlier along the lines of (Wang et al., 2017).

Third, for the general case of a convex cost function, we show that, for any , the algorithm with iterate smoothing achieves an objective suboptimality rate of the order of . As in the strongly convex case, for , pathwise convergence is also simultaneously guaranteed. For , this result provides maximal rates of , if , and , if , matching the state of the art, as well (Wang et al., 2017; Yang et al., 2018). Although those rates may not be particularly satisfying, they quantitatively demonstrate the remarkable speedup achieved by assuming and leveraging strong convexity for the analysis and operation of the algorithm.

The proposed structural framework adequately mitigates the aforementioned technical issues of that considered in (Wang et al., 2017; Yang et al., 2018). For example, we show that, when the random cost function has bounded (random) subgradients and its distribution is generally well-behaved, the choice of the risk regularizer can be completely unconstrained, regardless of the value of . As a result, under the new framework, the most popular candidate , but also every risk regularizer exhibiting corner points, are now valid choices (under appropriate conditions) for any , contrary to (Wang et al., 2017; Yang et al., 2018).

Finally, in order to show the superiority of our proposed framework compared to that of (Wang et al., 2017; Yang et al., 2018), we present a detailed analytical comparison, which rigorously demonstrates that the class of mean-semideviation programs supported within this work contains the respective class of problems supported within (Wang et al., 2017; Yang et al., 2018); further, the inclusion is strict. Such comparison is made possible by performing careful analysis and by constructing non-trivial, non-cornercase counterexamples. As a result, the applicability of compositional stochastic optimization is established herein for a significantly and strictly wider spectrum of convex mean-semideviation risk-averse problems, as compared to the state of the art. This fact justifies the purpose of our work from this perspective, in addition to our algorithmic contribution, as well.

Relation to the Literature: Apparently, the results presented in this work are related to those developed in (Wang et al., 2017; Yang et al., 2018), for a generic problem setting. Indeed, as already stated, optimization of mean-upper-semideviation risk measures has been briefly identified in (Wang et al., 2017; Yang et al., 2018) as a potential application of the compositional algorithms proposed therein. However, as mentioned above, the assumptions on problem structure employed in (Wang et al., 2017; Yang et al., 2018) are too restrictive to adequately study the class of mean-semideviation risk measures introduced herein, which include the mean-upper-semideviation as a single member of this class. Except for the aforementioned works, and as also discussed above, there is a significant line of research considering the SAA approach to risk-averse stochastic optimization, both from a fundamental, theoretical perspective (Shapiro, 2013; Guigues et al., 2016; Dentcheva et al., 2017) and from the computational one (Dentcheva et al., 2017; Tamar et al., 2017). As noted in (Wang et al., 2017; Yang et al., 2018), the compositional, SSD-type optimization algorithms analyzed in this paper present some major natural advantages over the SAA approach. First, the algorithm solves the original risk-averse stochastic program asymptotically to optimality, whereas, in the SAA approach, the corresponding SAA surrogate to the original program is solved, producing only an approximate solution; as the number of the sample increases the solution to the SAA surrogate approaches that of the original stochastic program, in some well defined sense (Shapiro, 2013; Dentcheva et al., 2017). Second, because of its nature, the SAAs cannot exploit new information available to the decision maker, so that they can improve their decisions, based on those made so far; in fact, the SAA surrogate needs to be redefined using new available information, and then solved afresh. Of course, the algorithm efficiently exploits new information, due to its recursive, sequential nature. Third, as a result of the above, SAAs are not suitable for settings where information is available sequentially, and decisions have to be made adaptively over time. Fourth, SAAs might often require a very large number of samples for producing accurate approximations to the optimal decisions corresponding to the original problem, and this might result in optimization problems whose objective is computationally difficult to evaluate. For more details on this, see (Wang et al., 2017). On the contrary, the algorithm is iterative in nature, and presents minimal and fixed time and space complexity per iteration.

Organization of the Paper

The rest of the paper is organized as follows. Section 2 establishes the stochastic risk-averse convex programming setting under study, and provides some elementary, albeit necessary preliminaries on the theory of risk measures. In Section 3, we constructively introduce the class of mean-semideviation risk measures, we study their existence and their structural properties, we discuss specific examples, and we develop our above mentioned fundamental characterization result. Section 4 is devoted to the development and analysis of the algorithm, under our proposed theoretical framework for convergence, and includes the rigorous comparison of our results with those presented in (Yang et al., 2018). Finally, Section 5 concludes the paper.

Note: Some longer proofs of the theoretical results presented in the paper in the form of Theorems, Lemmata and Propositions are excluded from the main body of the paper for clarity in the exposition, and are presented in Section 7 (Appendix).

Notation & Definitions

Matrices and vectors will be denoted by boldface uppercase and boldface lowercase letters, respectively. Calligraphic letters and formal script letters will generally denote sets and

-algebras, respectively, except for clearly specified exceptions. The operator will denote vector transposition. The -norm of is , for all . Similarly, the norm of an appropriately measurable function will be for , and , where the reference measure will be clearly specified by the context. The finite -dimensional identity operator will be denoted as . Additionally, we define , and , for .

If denotes a base sample space and (referring directly to ), then, for the sake of clarity, we sometimes drop dependence on , and write simply (clear by the context).

For every set , which is nonempty, closed and convex, the Euclidean projection onto , is defined, as usual, as , for all . Euclidean projections, as defined above, always exist and are nonexpansive operators.

For every real-valued function , which is differentiable at a point , the vector denotes its gradient at . If, additionally, is differentiable on , the function denotes its gradient function, mapping each to .

If is nonsmooth and convex, its subdifferential is the closed-valued multifunction , defined, for every , as the set of all gradients each corresponding to a linear underestimator of , or, in other words,

(1)

A subgradient (function) of , suggestively denoted as , is defined as any selection of the subdifferential multifunction , that is, for every , it is true that ; for brevity, we write . For fixed , will be called a subgradient of at .

2 Problem Setting & Preliminaries

We now formally introduce the problem of interest in this work. Henceforth, all subsequent probabilistic statements will presume the existence of a common probability space . We refer to as the base space. We place no topological restrictions on the sample space . However, in order for some mild technicalities to be easily resolved, we conveniently assume that constitutes a complete measure space.

Let be a bivariate real-valued mapping, such that, for every , the function is -measurable and, for every , the path is (real-valued) convex (and subdifferentiable). Also, for a given -measurable (in general) random element , consider the composite function , defined as

(2)

It easily follows that, for every , the function is an -measurable (in general), real-valued random variable. We additionally assume that, for every , belongs to the Lebesgue space for some fixed choice of , relative to the base measure , that is, . Of course, if, for every , , where is the Borel pushforward of , then , as well. Hereafter, will be referred to as a random cost function.

With the term risk measure, we refer to some fixed and known real-valued functional on the Banach space (Shapiro et al., 2014). Among all risk measures on , we pay special attention to those exhibiting the following basic structural characteristics.

Definition 1.

(Convex-Monotone Risk Measures) A real valued functional on , , is called a convex-monotone risk measure, if and only if it satisfies the following conditions:

(Convexity): For every and , it is true that

(3)

for all .

(Monotonicity): For every and , such that , for -almost all , it is true that .

For a possibly convex-monotone risk measure (following Assumption 1), we will be interested in the “static” stochastic program

(4)

where the set of feasible decisions is assumed to be closed and convex.

Under the standard problem setting outlined above, it is straightforward to formulate the following elementary result, provided here without proof, and for completeness.

Proposition 1.

(Convexity of Risk-Function Compositions (Shapiro et al., 2014)) Consider a real-valued random function , as well as a real-valued risk measure . Suppose that, for every , is convex and that is convex-monotone. Then, the real-valued composite function is convex.

Proposition 1 shows that, under the respective assumptions, (4) constitutes a convex mathematical program in standard form. Thus, application of a subgradient method would require that some selection of the subdifferential multifunction can be evaluated at will, at any . However, for most choices of the random cost function and of the risk measure , even the composition is impossible to be evaluated exactly, let alone (a selection of) . Instead, we may be given either realizations of the random exogenous information , or direct evaluations of and a subgradient, , at some test decision candidate . It might also be desirable that decision making is performed sequentially over time, where decisions are updated adaptively as new information arrives. Such settings motivate the consideration of SSD-type algorithms for solving (4), which are of main interest in this paper.

Some basic assumptions follow, fairly standard in the literature of stochastic approximation (Shapiro et al., 2014; Wang et al., 2017; Yang et al., 2018; Kushner and Yin, 2003). To this end, let us formally introduce the elementary concept of a IID process. Then, our assumptions follow.

Definition 2.

(IID Process) A stochastic sequence is called IID if and only if it consists of statistically independent, -valued random elements, identically distributed according to a fixed Borel measure .

Assumption 1.

(Availability of Information) Either one, or more, mutually independent, IID sequences are available sequentially, all distributed according to .

Remark 1.

Note that in Assumption 1 we do not require that the process is actually observable to the user, but only available, either in the form of a data stream, or by simulation.

Assumption 2.

(Existence of an ) There exists a mechanism, called a Sampling Oracle (), which, given and , returns either , or , a subgradient of relative to , or both. It is further assumed that the has direct access to all available information streams, according to Assumption 1.

In this work, we propose and analyze efficient algorithms for solving (4) under Assumptions 1 and 2, and explicitly assuming no prior knowledge of either the random cost function , or its respective subgradients. We will be restricting our attention to a new class of convex-monotone risk measures with, however, wide applicability, and whose general structure follows the so-called Mean-Risk Model ((Shapiro et al., 2014), Section 6.2). This special class of risk measures is introduced and analyzed, in detail, in Section 3.

3 Mean-Semideviation Models

Under the Mean-Risk Model paradigm (Shapiro et al., 2014), a risk measure is defined, for each random cost , as

(5)

where the functional constitutes a dispersion measure, and provided that the respective quantities are well defined, for the particular choice of . The dispersion measure may be conveniently thought as a penalty, weighted by the penalty multiplier , effectively quantifying the uncertainty of the particular cost .

In this section, we introduce a special class of dispersion measures, which constitute natural generalizations of the well-known upper semideviation of order (Shapiro et al., 2014). This new class of dispersion measures is termed here as generalized semideviations. Reasonably enough, risk measures of the form of (5), where the respective dispersion measure constitutes a generalized semideviation will be called either mean-semideviation risk measures, or, interchangeably, mean-semideviation models, or, simply, mean-semideviations.

This section is structured as follows. First, the simple notion of a risk regularizer is introduced; risk regularizers constitute the basic building block of generalized semideviations. The basic properties of risk regularizers are concisely presented, and a formal definition of generalized semideviations is also formulated, along with a brief discussion related to their practical relevance. Mean-semideviation risk measures are then formally introduced, along with their basic properties, and specific examples are discussed, highlighting their versatility. Next, we develop a constructive characterization result, essentially showing that the class of all mean-semideviation risk measures is almost in one-to-one correspondence with the class of cumulative distribution functions of all integrable random variables (on the line). This result readily demonstrates an apparent generality of mean-semideviations, as well. Lastly, the usefulness, flexibility and effectiveness of mean-semideviation risk measures are demonstrated on a classical, chance-constrained newsvendor model. In particular, risk regularizers (each inducing a mean-semideviation risk measure) are put in context, and their construction is explicitly discussed, reflecting the special characteristics of the specific newsvendor problem under consideration, and the objectives of the decision maker.

3.1 Basic Concepts

We start by introducing the concept of a risk regularizer. Risk regularizers are simple, real-valued functions of one variable, which are reasonably structured, so that they, on the one hand, can be used to quantify risk (see below) and, on the other, can result in problems which can be solved efficiently and exactly via convex stochastic optimization.

Definition 3.

(Risk Regularizers) A real-valued function is called a risk regularizer, if it satisfies the following conditions:

is convex.

is nonnegative.

is nondecreasing.

For every , it is true that , for all .

Fig. 3.1 illustrates the shapes of various risk regularizers, other than the arguably most obvious example of the positive part function . Note that a risk regularizer need not be smooth (a trivial example is ); several of the examples of Fig. 3.1 are indeed nonsmooth, with the respective corner points highlighted by black dots.

Risk regularizers of Definition 3 may be further structurally characterized via the following simple result.

Proposition 2.

(Characterization of ) Consider a real-valued function , satisfying condition of Definition 4. Then, condition holds if and only if is nonexpansive.

Proof of Proposition 2.

First, assume that condition holds. Then, by the fact that is nondecreasing (), it is true that

(6)

for all , showing that is a nonexpansive map. Conversely, assume that is nonexpansive. Then, for any , it is true that

(7)

for all , verifying condition . ∎

At this point, let us emphasize the elementary fact that, because of convexity, every (real-valued) risk-regularizer must also be differentiable almost everywhere, relative to the Lebesgue measure on the Borel space . This also follows either by monotonicity, or due to the fact that a risk regularizer is nonexpansive and, therefore, Lipschitz continuous on . Further, because of convexity, the set of Lebesgue measure zero of points in , where a risk regularizer is nondifferentiable, is at most countable.

The class all possible risk regularizers induces that of generalized semideviations, which constitute the class of dispersion measures considered in this paper. The definition of a generalized semideviation is presented below.

Definition 4.

(Generalized Semideviations) Fix and choose a risk regularizer . A dispersion measure is called a generalized semideviation of order , if and only if, for ,

(8)

where it is assumed that all involved quantities are well defined and finite.

Figure 3.1: Some examples of both smooth and nonsmooth risk regularizers. Black dots highlight the respective corner points of nondifferentiability (some imperceptible).

The power of generalized semideviations is in the fact that they form a parametric family relative to the choice of the risk regularizer ; different risk regularizers correspond to different rules for ranking the relative effect of both riskier (higher than the mean) and less risky (lower than the mean) events, corresponding to specific regions in the range of the cost. For more details, see Section 3, where we illustrate the versatility of generalized semideviations via additional examples, considering various specific choices for , with the well known upper-semideviation dispersion measure (Shapiro et al., 2014) being the prototypical representative of this class.

3.2 Mean-Semideviations: Definition, Existence & Structure

Utilizing the concept of generalized semideviations, we may now introduce the class of risk measures of central interest in this work, as follows.

Definition 5.

(Mean-Semideviation Risk Measures) Fix and choose a risk regularizer . The mean-semideviation of order , induced by , or , for short, is the real-valued risk measure defined, for , as111A mean-semideviation risk measure will be denoted either as , which is proper, or , which is simpler, as long as the choices of and are clearly specified.

(9)

where constitutes a fixed penalty multiplier, and provided that all involved quantities are well defined and finite.

Next, we state and prove a small number of relatively simple results, related to the existence of mean-semideviation risk measures, introduced in Definition 5, as well as their functional structure. First, as it might be expected, we show that mean-semideviation risk measures of order may be naturally associated with costs which are also in (i.e., choosing ). Recall that, throughout the paper, is reserved for specifying the order of the mean-semideviation risk measure under consideration, whereas is related to the integrability of the respective cost.

Proposition 3.

(Compatibility of ’s and ’s) Fix , , and choose any risk regularizer . Then, as long as , the risk measure is well-defined and finite, for every .

Proof of Proposition 3.

Since , it is trivial that , simply due to the inclusion , for any choice of . Thus, the expectation of every exists and is finite, and what remains is to prove the result for the dispersion measure .

For simplicity, let . Using the fact that is finite, it is true that, for every , the shifted cost is in . It thus suffices to show that, for every , is in , as well. Because the risk regularizer is nonnegative (condition ), the integral exists. Also, due to condition of Definition 3, it follows that, for every , and since is nondecreasing (), it is true that , for all . Setting , this yields

(10)

and since , , as well. Consequently, it is true that

(11)

showing that and, therefore, , are both well defined and finite, for every .

Now, due to the inclusion , we know that, if , for some , then , as well. Enough said. ∎

Hereafter, for the sake of generality, we will implicitly assume that and are compatible, so that existence and finiteness of the resulting risk measures considered is ensured. Of course, in actual applications, Proposition 3 may be directly invoked on a case-by-case basis, in order to select the order of the particular dispersion measure of choice, depending on the nature of the random cost, or a family of those, under study.

After characterizing existence and finiteness of mean-semideviation risk measures, as introduced in Definition 5, we focus on their structural properties, from a functional point of view. As the following result suggests, mean-semideviation risk measures are indeed convex-monotone under a standardized assumption on the penalty multiplier .

Theorem 1.

(When are Mean-Semideviations Convex-Monotone?) Fix and choose any risk regularizer . Then, as long as , the risk measure is convex-monotone; that is, it satisfies both conditions and .

Proof of Theorem 1.

Let us start with verifying convexity (). Since the expectation term of is a linear functional on , it will suffice to show that the generalized semideviation term is convex. Indeed, for every , and every , we may write

(12)

where the first inequality is true due to conditions (convexity) and (nonnegativity), and the second is due to the triangle (Minkowski) inequality. Thus, is a convex functional, which means that is also convex on . Note that the value of is not crucial in order to show convexity of .

Let us now study monotonicity () of the risk measure . For every and , such that , for -almost all , we have

(13)

where the first inequality is due to conditions (nonnegativity) and (monotonicity), the second is due to conditions (nonnegativity), (nonexpansiveness), as well as the fact that , and the third is again due to the triangle inequality. From (13), we readily see that, as long as , we may further write

(14)

completing the proof of the theorem. ∎

We may now invoke Proposition 1, presented earlier, to immediately obtain the following key corollary. The proof is trivial and, thus, omitted.

Corollary 1.

(When is (4) Convex?) Fix and choose any risk regularizer . Then, as long as , the composite function is convex on , and (4) constitutes a convex stochastic program.

Corollary 1 is an important result, because it shows that, for every mean-semideviation risk measure, or equivalently, for every risk regularizer of choice, problem (4) would be exactly solvable via, for instance, subgradient methods, if the function was known in advance. This fact reinforces our hope that it might indeed be possible to solve (4) to optimality, utilizing some carefully designed stochastic search, or, more specifically, and based on the assumed subdifferentiability of , stochastic subgradient algorithm. Of course, such an algorithm should be designed to work under Assumptions 1 and 2, without the need for explicit knowledge of , or .

Remark 2.

(Coherence?) We should mention that mean-semideviations are not coherent risk measures ((Shapiro et al., 2014), Section 6.3), since they do not satisfy the axiomatic property of positive homogeneity. This is simply due to the fact that, in general, one may find choices for such that

(15)

for some and . Nevertheless, mean-semideviations may be readily shown to satisfy translation equivariance, although such property is not explicitly required in this work. As a result, except for being convex-monotone, mean-semideviations also belong to the class of convex risk measures (Föllmer and Schied, 2002; Shapiro et al., 2014).

3.3 Examples of Mean-Semideviation Models

Before moving on, it would be instructive to discuss some examples of mean-semideviations, highlighting the versatility of this particular class of risk measures. We start from simple, illustrative choices as far as the involved risk regularizer is concerned, and then we generalize.

3.3.1 Mean-Upper-Semideviations

The simplest, prototypical example of a mean-semideviation risk measure is the mean-upper-semide- viation of order ((Shapiro et al., 2014), Sections 6.2.2 & 6.3.2), which is constructed by choosing as risk regularizer the function

(16)

yielding the risk measure

(17)

for . Of course, in this case, it is trivial to show that satisfies conditions - of Definition 3. Recall that we have assumed that is appropriately chosen, such that is a well defined, real-valued functional on .

3.3.2 Entropic Mean-Semideviations

Our second example is a generalization of the mean-upper-semideviation risk measure discussed in the previous example. Here, the risk regularizer is chosen itself from a parametric family, as

(18)

where is a parameter, regulating the sharpness of the function at zero. It is trivial to verify conditions (nonnegativity) and (monotonicity). Also, for fixed , the first derivative of relative to is the logistic function

(19)

showing that is a contraction mapping, immediately verifying condition (nonexpansiveness), via Proposition 2. Likewise, the second derivative of is given by

(20)

and, thus, (convexity) is readily verified, as well. Hence, is a valid risk regularizer. Alternatively and to illustrate the procedure, we may verify condition directly; for fixed , for every and for every , we may write

(21)

where the inequality is due to the fact that . It is also easy to see that, for every , , showing that constitutes a smooth approximation to the risk regularizer of the mean-upper-semideviation risk measure discussed previously.

The resulting risk measure is called an entropic mean-semideviation of order , and may be expressed as

(22)

for . For obvious reasons, this risk measure may be considered a soft version of the mean-upper-semideviation risk measure.

3.3.3 CDF-Antiderivative (CDFA) Mean-Semideviations

We now show that, in fact, both previously presented examples are special cases of a much more general approach, which may be utilized for the construction of risk regularizers. To this end, let be a random variable in , with cumulative distribution function (cdf) . Consider the choice

(23)

where, because is a nonnegative Borel measurable function, the involved integration is always well-defined (might be , though), in the sense of Lebesgue. The particular antiderivative of the cdf , as defined in (23), constitutes a very important quantity in the theory of stochastic dominance; see, for instance, related articles (Ogryczak and Ruszczyński, 1999) and (Ogryczak and Ruszczyński, 2002) for definition and insights. In particular, via Fubini’s Theorem (Theorem 2.6.6 in (Ash and Doléans-Dade, 2000)), may be easily shown to admit the alternative integral representation

(24)

Exploiting the assumption that , it follows that , for every . Also, from (24), it is trivial to see that, because of the structure of the function , is convex (), nonnegative () and nondecreasing () on . Nonexpansiveness () may also be readily verified.

Consequently, is a valid risk regularizer, and the resulting risk measure, called a CDF-Antiderivative (CDFA) mean-semideviation, may be expressed in various forms as

(25)

for , where can be arbitrarily taken to be independent of and denotes the Borel pushforward of .

We may now verify that both mean-upper-semideviation and entropic mean-semideviation risk measures discussed above are special cases of CDFA mean-semideviations. In mean-upper-semidevi- ations, the respective risk regularizer is an antiderivative (taken piecewise) of the cdf corresponding to the Dirac measure at zero. In entropic mean-semideviations, the respective risk regularizer is an antiderivative of (19) (by monotone convergence and via a sequential argument), which is the cdf of a zero-mean element in . In both cases, the antiderivatives involved are of the form of (23).

3.3.3.1 Special Case: Gaussian Antiderivative (GA) Mean-Semideviations

An interesting subclass of CDFA mean-semideviations is the one resulting from taking antiderivatives of the cdf of a standard Gaussian random variable . In this case, the simplest possible risk regularizer may be constructed as

(26)

where and denote the standard Gaussian cdf and density, respectively. This particular antiderivative of appears naturally in standard treatments of the so-called ranking-&-selection, or best arm identification problem and, more specifically, in lookahead selection policies, such as the Knowledge Gradient and the Expected Improvement (Frazier et al., 2008; Ryzhov, 2016).

The resulting mean-semideviation risk measure is called a Gaussian Antiderivative (GA) mean-semideviation of order , and may be expressed as

(27)

for . Of course, as it happens for all mean semideviations, the functional , as defined in (27), is a convex risk measure for every , and a convex-monotone risk measure, if .

3.4 A Complete Characterization of Mean-Semideviations

As a result of the discussion in Section 3.3.3 above, it follows that risk regularizers may be formed by taking antiderivatives of the cdf of any integrable random variable of choice, resulting in a vast variety of mean-semideviation risk measures, all sharing a common favorable structure.

Here, we show that if we start from a given risk regularizer , the converse statement is also true. In this respect, we state and prove the following important result.

Theorem 2.

(CDF-Based Representation of Risk Regularizers) Let be a random variable, such that, for every , , and let denote its cdf. Then, for any fixed and , the function defined as

(28)

is a valid risk regularizer, where integration may be interpreted either in the improper Riemann sense (for computation), or in the standard sense of Lebesgue (for derivation).

Conversely, let be any risk regularizer. Then, there exist some random variable , satisfying , for all , with cdf , and constants and , such that, for every , the representation (28) is valid. In particular, if denotes the right derivative of , it is always true that , , and, as long as is nonconstant, it holds that , and is given by for all .

Proof of Theorem 2.

See Section 7.1 (Appendix). ∎

Theorem 2 is important for two main reasons, the first being related to the forward statement, and the second to the converse. On the one hand, Theorem 2 provides us with the clean, very versatile and analytically friendly integral formula (28) for constructing risk regularizers of various shapes and types. On the other hand, it informs us that, necessarily, any risk regularizer can be expressed in the form of (28) and, as a result, all possible risk regularizers may be constructed utilizing (28), each time for some suitably chosen cdf. Therefore, risk regularizers are completely characterized by the cdf-based representation of Theorem 2.

Of course, every risk regularizer induces a unique mean-semideviation risk measure. But also notice that, trivially, every mean-semideviation risk measure corresponds to a uniquely specified risk regularizer (as a functional, or when all costs in the corresponding -space -the largest such space, for the smallest possible - are considered). Therefore, Theorem 2 provides a complete characterization of the whole class of mean-semideviation risk measures. In particular, Theorem 2 implies that the class of all mean-semideviation risk measures is almost in one-to-one correspondence with the class of cdfs of all integrable -valued random elements. The “almost” in the preceding statement is due to the presence of constants and in Theorem 2, and that actually slightly less is required than (absolute) integrability of the involved random variable .

3.5 Practical Illustration of Mean-Semideviation Models

We conclude this section by briefly outlining the relevance of mean-semideviation models in applications, also putting our proposed risk regularizers in context. More specifically, we consider a chance-constrained version of the prototypical, single-product newsvendor problem (see, for instance, Chapter 1 in (Shapiro et al., 2014)), upon which we are based in order to formulate a doubly risk-averse newsvendor problem, which jointly controls both unmet demand and holding costs. We also explicitly demonstrate how the respective risk regularizer may be potentially designed, based on the characteristics of the particular problem under consideration.

Although the single-product newsvendor problem (and its variations) indeed constitutes a one-dimensional, toy example, it provides insights and highlights some important features of the mean-semideviation risk measures advocated herein. Additionally, the simplicity of such a problem facilitates numerical solution, and enables us to present some numerical results, verifying the effectiveness of the proposed mean-semideviation risk measures experimentally, as well.

3.5.1 A Chance-Constrained Single-Product NewsVendor

Suppose that a newsvendor is interested in optimally producing newspapers for an uncertain market, so that they minimize the cost incurred by actual production and by not meeting market demand, while respecting their holding capacity, or a predefined holding cost target. Let , and be known constants, standing for the production, unmet demand and holding costs per production unit. Also let be the random market demand, a random variable with cdf , for simplicity assumed to be absolutely continuous relative to the Lebesgue measure on . Since the market is uncertain, the newsvendor resorts to stochastically deciding their production plan by solving the chance-constrained program

(29)

where, also for simplicity, we assume that the decision variable is real-valued, and where constitutes the newsvendor’s tolerance in the event that their holding cost will exceed a prescribed threshold . Both and are fixed design parameters decided by the newsvendor beforehand. Chance-constrained newsvendor problems similar to (29) have been previously considered in the literature; see, for instance, the related article (Zhang et al., 2009). Here, an important detail is that, despite the probabilistic constraint, problem (29) is risk-neutral as far as treatment of unmet demand is concerned. This is because only the expectation of the cost of not meeting the demand, corresponding to , is considered in the objective.

Problem (29) exhibits some interesting features and may be significantly simplified, as follows. First, we may observe that, for every fixed choice of ,

(30)

Consequently, it is true that

(31)
(32)

where, due to

being continuous, the pseudo-inverse or quantile function

is defined as

(33)

Thus, problem (29) is convex and may be reformulated as

(34)

Hereafter, without loss of generality, we may assume that and are chosen such that . Otherwise, the problem is trivially solved at . To be fully compatible with the generic notation utilized in this paper, we may also define and .

Next, let us consider the derivative of the objective of (34), relative to . We have

(35)

Hence, unless , it readily follows that, for every , , again implying that the choice constitutes a solution of (34); in other words, producing nothing is always optimal whenever . On the other hand, it is apparently true that , for all , if and only if

(36)

implying that the condition

(37)

is sufficient to ensure negativity of