1 Introduction
During the last almost twenty years, many significant advances have been made in the now relatively mature area of riskaverse modeling and optimization. These primarily include the fundamental axiomatization and theoretical characterization of risk functionals, also commonly known as risk measures (Kijima and Ohnishi, 1993; Rockafellar and Uryasev, 1997; Artzner et al., 1999; Ogryczak and Ruszczyński, 1999, 2002; Rockafellar and Uryasev, 2002; Rockafellar et al., 2003, 2006; Ruszczyński and Shapiro, 2006b; Shapiro et al., 2014), as well as extensive analysis in the context of riskaverse stochastic programs in both static and sequential decision making problem settings (Rockafellar and Uryasev, 1997; Föllmer and Schied, 2002; Rockafellar et al., 2003, 2006; Ruszczyński and Shapiro, 2006a; Collado et al., 2012; Çavuş and Ruszczyński, 2014a; Asamov and Ruszczyński, 2015; Dentcheva and Ruszczyński, 2017; Grechuk and Zabarankin, 2017; Shapiro, 2017; Fan and Ruszczyński, 2018). The importance of building a well structured theory of risk is motivated by its natural and intuitive relevance to problems from a large variety of applied domains. Arguably the oldest, archetypical application of risk is in Finance (Kijima and Ohnishi, 1993; Rockafellar and Uryasev, 1997; Andersson et al., 2001; Krokhmal et al., 2001; Chen and Wang, 2008; Shang et al., 2018)
, which has decisively driven pioneering research in riskaverse modeling and optimization, from its very birth, probably dating back to the work of Markowitz
(Markowitz, 1952), to present. Other applications of risk may be found in both classical and contemporary domains such as Energy (Moazeni et al., 2015; Bruno et al., 2016; Jiang and Powell, 2016), Wireless Networks (Ma et al., 2018), Inventory Optimization (Ahmed et al., 2007; Chen et al., 2007; Xinsheng et al., 2015) and Supply Chain Management (Gan et al., 2004; Sawik, 2016), to name a few.Most recently, the development of effective computational methods for applying riskaverse optimization to actual problems has also been attracting considerable attention; see, e.g., (Ruszczyński, 2010; Çavuş and Ruszczyński, 2014b; Moazeni et al., 2017; Tamar et al., 2017; Dentcheva et al., 2017; Huang and Haskell, 2018; Jiang and Powell, 2017; Yu et al., 2018). This line of work can be divided between sequential settings (Çavuş and Ruszczyński, 2014b; Moazeni et al., 2017; Tamar et al., 2017; Huang and Haskell, 2018; Jiang and Powell, 2017; Yu et al., 2018), and static settings (Tamar et al., 2017; Dentcheva et al., 2017), for a variety of different problem characteristics. Computational recipes also vary. For instance, (Ruszczyński, 2010) and (Çavuş and Ruszczyński, 2014b) develop and analyze variations of the well known value and policy iteration algorithms of riskneutral dynamic programming; (Moazeni et al., 2017) proposes a method for riskaverse nonstationary direct parametric policy search for finite horizon problems; (Tamar et al., 2017), (Dentcheva et al., 2017) and (Yu et al., 2018) rely on the socalled Sample Average Approximation (SAA) approach (Shapiro et al., 2014)
, where an appropriately constructed empirical estimate of the original objective is used as a
surrogate to that of the original stochastic program, assuming existence of a sufficiently large sample of the processes introducing uncertainty into the corresponding riskaverse objective; (Huang and Haskell, 2018) and (Jiang and Powell, 2017) consider an Approximate Dynamic Programming (ADP) (Powell, 2011) approach, where sequential finite state/action riskaverse stochastic programs are tackled via stochastic approximation (Kushner and Yin, 2003).Following this recent trend, this paper proposes and rigorously analyzes recursive stochastic subgradient methods for an important class of static, convex riskaverse stochastic programs. In a nutshell, we make the following contributions:

Following the MeanRisk Model paradigm (Shapiro et al., 2014), we introduce a new class of convex risk measures, called meansemideviations. These strictly generalize the well known meanuppersemideviation risk measure, and are constructed by replacing the positive part weighting function of the latter by another nonlinear map, termed here as a risk regularizer, obeying certain properties. Meansemideviations share the same core analytical structure with the meanuppersemideviation risk measure; however, they are much more versatile in applications. We study meansemideviations in terms of their basic properties, and we present a fundamental constructive characterization result, demonstrating their generality. Specifically, we show that the class of all meansemideviation risk measures is almost in onetoone correspondence
with the class of cumulative distribution functions (cdfs) of all integrable random variables. This result provides an
analytical device for constructing meansemideviations with desirable characteristics, starting from any cdf of the aforementioned type. The flexibility and effectiveness of meansemideviations are explicitly demonstrated on a classical, chanceconstrained newsvendor model, as well. 
We introduce the (MEanSemideviation Stochastic compositionAl subGradient dEscent of order ) algorithm, an efficient, datadriven Stochastic Subgradient Descent (SSD) type procedure for iteratively solving convex meansemideviation riskaverse problems to optimality. The algorithm constitutes a parallel variation of general purpose Tlevel Stochastic Compositional Gradient Descent (SCGD) algorithm, recently developed in (Yang et al., 2018), under a generic theoretical framework. Although riskaverse optimization is listed in (Yang et al., 2018) as a potential application of stochastic compositional optimization for the mere case of meanuppersemideviations, this work is the first to propose a general algorithm, applicable to any meansemideviation model of choice.

We analyze the asymptotic behavior of the algorithm under a new, flexible and structureexploiting set of problem assumptions, which reveal a welldefined tradeoff between the expansiveness of the random cost and the smoothness of the meansemideviation risk measure under consideration. In particular, under our proposed structural framework:

Under appropriate stepsize rules, we establish pathwise convergence of the algorithm in a strong technical sense, confirming its asymptotic consistency.

Assuming a strongly convex cost function, the convergence rate of the algorithm is studied in detail. More specifically, we show that, for fixed semideviation order and for , the algorithm achieves a squared solution suboptimality rate of the order of iterations, where, for , pathwise convergence is simultaneously guaranteed. Thus, this new result establishes a rate of order arbitrarily close to , also ensuring strongly stable pathwise operation of the algorithm. In the simpler case where the semideviation order is chosen as , the rate order of the proposed algorithm improves to , which is sufficient for pathwise convergence as well, and matches previous results in the related literature (Wang et al., 2017).

For the general case of a convex cost, we show that, for any , the algorithm with iterate smoothing achieves an objective suboptimality rate of the order of . As in the strongly convex case, for , pathwise convergence is also simultaneously guaranteed. For , this result provides maximal rates of , if , and , if , matching the state of the art, as well.


We discuss the superiority of the proposed framework for convergence, as compared to that employed earlier in (Yang et al., 2018), within the riskaverse context under consideration. By performing careful analysis and by constructing nontrivial counterexamples, we explicitly demonstrate that the class of meansemideviation problems supported herein is strictly larger than the respective class of problems supported in (Yang et al., 2018). As a result, this paper establishes the applicability of compositional stochastic optimization for a significantly and strictly wider spectrum of convex meansemideviation riskaverse problems, as compared to the state of the art. This fact justifies the purpose of our work from this perspective, as well.
Our contributions, briefly outlined above, are now discussed in greater detail. We also briefly explain how our work relates to and is placed within the existing literature.
1.1 MeanSemideviation Risk Measures
Meansemideviation risk measures, as proposed and developed in this work, constitute a new class of risk measures where, given a random cost, the corresponding dispersion measure (the term penalizing the “mean” part of a meanrisk functional) is defined as the norm of a nonlinear, onedimensional map of the centered cost, or, in other words, its central deviation. This map is called a risk regularizer, and possesses certain analytical properties: convexity, nonnegativity, monotonicity and nonexpansiveness. Dispersion measures with this structure are suggestively called generalized semideviations.
This terminology originates from the presence of the positive part function , which is the simplest, prototypical example of a risk regularizer, in the corresponding dispersion measure of the well known meanuppersemideviation risk measure (Shapiro et al., 2014), i.e., the upper(central)semideviation. Meansemideviations are much more versatile, however, since different choices for the involved risk regularizer correspond to different rules for ranking the relative effect of both riskier (higher than the mean) and less risky (lower than the mean) events, corresponding to specific regions in the range of the (centered) cost. As a result, the choice of the risk regularizer affects the general quality and the roughness/stability of an optimal random cost, in a decision making setting. Consequently, owing to their versatility, meansemideviations are practically appealing as well, because they are parametrizable and they may incorporate domain specific knowledge more easily than the rigid meanuppersemideviation.
In this work, after we formulate simple conditions for the existence of meansemideviation risk measures, we study their basic geometric properties, such as convexity and monotonicity. Contrary to the meanuppersemideviation alone, meansemideviations are not coherent risk measures, in general (as a class), because they do not satisfy positive homogeneity (Shapiro et al., 2014). This is due to the potential nonhomogeneity of the risk regularizer involved. They do satisfy convexity, monotonicity and translation equivariance, though and, therefore, they belong to the class of convex risk measures, (Föllmer and Schied, 2002; Shapiro et al., 2014), and that of convexmonotone risk measures, as well.
Further, we present a fundamental constructive characterization result, demonstrating the generality of meansemideviations. Specifically, on the one hand, this result shows that the class of all meansemideviation risk measures is almost in onetoone correspondence with the class of cdfs of all integrable random variables (on the line). On the other, it provides an analytical device for constructing such risk measures from any cdf of the aforementioned type. Although not studied in this paper, this correspondence between meansemideviations and cdfs might be of interest in other areas related to stochastically robust optimization such as stochastic dominance; see, for instance, the seminal articles (Ogryczak and Ruszczyński, 1999, 2002) for some interesting connections.
Our discussion on meansemideviation risk measures is concluded by a demonstration of their practical usefulness and flexibility on a classical, chanceconstrained newsvendor model. After we briefly analyze the structure of the problem under consideration, we put risk regularizers each inducing a meansemideviation risk measure in context, and we explicitly discuss their construction, so that the resulting meansemideviation risk measure best reflects problem characteristics, and the objectives of the decision maker. Additionally, we present numerical simulations, experimentally confirming the effectiveness of the proposed riskaverse approach. Our simulations also reveal some interesting features of the resulting riskaverse solutions, which we further discuss.
Relation to the Literature: We are not the first to propose convex risk measures featuring nonlinear weighting functions; see, for instance, (Kijima and Ohnishi, 1993; Chen and Yang, 2011; Fu et al., 2017). In particular, the recent article (Fu et al., 2017) considers risk measures defined as a nonlinearly weighted, order (lower) semideviation from a fixed target (see, for instance, Example 6.25 in (Shapiro et al., 2014)), focusing mainly on their applications on a portfolio selection model. In (Fu et al., 2017), the corresponding weighting function shares the same properties as a risk regularizer (see above), except for nonexpansiveness. However, our proposed meansemideviation risk measures are substantially different and structurally more complex compared to the risk measures proposed in (Fu et al., 2017). The main reason is the presence of the expected cost, rather than a fixed target, in the definition of meansemideviations; for more details, compare ((Fu et al., 2017), Definition 1) with Section 3 herein.
1.2 Recursive Optimization of MeanSemideviations
The main contribution of this work concerns efficient optimization of meansemideviations, measuring convexly parameterized random cost functions, over a closed and convex set. We introduce and rigorously analyze the (MEanSemideviation Stochastic compositionAl subGradient dEscent of order ) algorithm (Algorithm 1 in Section 4.3), which constitutes an efficient Stochastic Subgradient Descent (SSD) type procedure for iteratively solving our base problem to optimality. The algorithm may be seen as a parameterized (relative to the choice of the risk regularizer), parallel variation of the general purpose TLevel Stochastic Compositional Gradient Descent (SCGD) algorithm, presented and analyzed very recently in (Yang et al., 2018) under generic assumptions. In turn, the SCGD algorithm is a natural generalization of the Basic 2Level SCGD algorithm, presented and analyzed earlier in (Wang et al., 2017). A key feature of the aforementioned compositional stochastic subgradient schemes is the existence of more than one (, in general), pairwise coupled stochastic approximation updates, or levels, each with a dedicated stepsize, which are executed concurrently through the operation of the algorithm. In the case of the algorithm, there exist three such levels (that is, ), and this results naturally, due our specific problem structure. However, contrary to the SCGD algorithm, all three stochastic approximation levels of the algorithm are executed completely in parallel within every iteration, presenting additional operational efficiency, potentially important in various applications.
Pathwise convergence and convergence rate analyses of the SCGD algorithm are presented in (Yang et al., 2018), and (Wang et al., 2017) (where, in the latter, ). However, the respective structural framework considered in both (Yang et al., 2018) and (Wang et al., 2017), when applied to the problem class considered in this work, imposes significant restrictions in regard to the possible choice of the risk regularizer, partially related to the expansiveness and smoothness (or roughness) of the involved random cost function. This fact significantly limits the type of problems the SCGD algorithm is provably applicable to, at least within the class of riskaverse problems introduced and studied herein. For example, when , arguably the most popular regularizer , leading to the meanuppersemideviation risk measure, is not supported within the framework of (Wang et al., 2017; Yang et al., 2018). This is because nonsmooth risk regularizers exhibiting corner points, such as , apparently have discontinuous subderivatives, whereas the respective assumptions made in (Wang et al., 2017; Yang et al., 2018) essentially require the respective risk regularizer to be not only everywhere differentiable, but to have Lipschitz derivatives, as well. This shortcoming of the theoretical framework of (Wang et al., 2017; Yang et al., 2018) naturally carries over to higher values of the semideviation order, . Naturally, the theoretical narrowness of (Wang et al., 2017; Yang et al., 2018) motivates closer study of any compositional subgradient algorithm whatsoever, one that would exploit the special characteristics of a meansemideviation risk measure. The ultimate goal is the development of a sufficiently general theoretical framework, which will justify the compositional optimization approach for the whole class of meansemideviation risk measures, under as weak structural assumptions as possible.
Following this direction, and focusing on optimizing meansemideviation models, we present a new and flexible set of problem assumptions, substantially weaker than those employed in (Wang et al., 2017; Yang et al., 2018), under which we analyze the asymptotic behavior of the algorithm, proposed in our work. Our framework carefully exploits the structure of meansemideviations, and presents a probably fundamental, though practically useful, tradeoff between the expansiveness of the random cost function and the smoothness of the chosen risk regularizer, in a very welldefined sense. As previously outlined, our results are restated, as follows.
First, under appropriate stepsize rules, we establish pathwise convergence of the algorithm in the same strong sense as in (Wang et al., 2017; Yang et al., 2018), thus confirming its asymptotic consistency.
Second, assuming a strongly convex cost function, we study the convergence rate of the algorithm, in detail. More specifically, we show that, for fixed semideviation order and for any choice of , the algorithm achieves a squared solution suboptimality rate of the order of iterations. Here, is a userspecified parameter, which directly affects stepsize selection. If, additionally, is chosen to be strictly positive, that is, for , pathwise convergence is simultaneously guaranteed. This completely novel result establishes a convergence rate of order arbitrarily close to as , while ensuring strongly stable pathwise operation of the algorithm. In the structurally simpler case where , the rate order improves to , which is sufficient for pathwise convergence as well, and matches existing results in compositional stochastic optimization, developed earlier along the lines of (Wang et al., 2017).
Third, for the general case of a convex cost function, we show that, for any , the algorithm with iterate smoothing achieves an objective suboptimality rate of the order of . As in the strongly convex case, for , pathwise convergence is also simultaneously guaranteed. For , this result provides maximal rates of , if , and , if , matching the state of the art, as well (Wang et al., 2017; Yang et al., 2018). Although those rates may not be particularly satisfying, they quantitatively demonstrate the remarkable speedup achieved by assuming and leveraging strong convexity for the analysis and operation of the algorithm.
The proposed structural framework adequately mitigates the aforementioned technical issues of that considered in (Wang et al., 2017; Yang et al., 2018). For example, we show that, when the random cost function has bounded (random) subgradients and its distribution is generally wellbehaved, the choice of the risk regularizer can be completely unconstrained, regardless of the value of . As a result, under the new framework, the most popular candidate , but also every risk regularizer exhibiting corner points, are now valid choices (under appropriate conditions) for any , contrary to (Wang et al., 2017; Yang et al., 2018).
Finally, in order to show the superiority of our proposed framework compared to that of (Wang et al., 2017; Yang et al., 2018), we present a detailed analytical comparison, which rigorously demonstrates that the class of meansemideviation programs supported within this work contains the respective class of problems supported within (Wang et al., 2017; Yang et al., 2018); further, the inclusion is strict. Such comparison is made possible by performing careful analysis and by constructing nontrivial, noncornercase counterexamples. As a result, the applicability of compositional stochastic optimization is established herein for a significantly and strictly wider spectrum of convex meansemideviation riskaverse problems, as compared to the state of the art. This fact justifies the purpose of our work from this perspective, in addition to our algorithmic contribution, as well.
Relation to the Literature: Apparently, the results presented in this work are related to those developed in (Wang et al., 2017; Yang et al., 2018), for a generic problem setting. Indeed, as already stated, optimization of meanuppersemideviation risk measures has been briefly identified in (Wang et al., 2017; Yang et al., 2018) as a potential application of the compositional algorithms proposed therein. However, as mentioned above, the assumptions on problem structure employed in (Wang et al., 2017; Yang et al., 2018) are too restrictive to adequately study the class of meansemideviation risk measures introduced herein, which include the meanuppersemideviation as a single member of this class. Except for the aforementioned works, and as also discussed above, there is a significant line of research considering the SAA approach to riskaverse stochastic optimization, both from a fundamental, theoretical perspective (Shapiro, 2013; Guigues et al., 2016; Dentcheva et al., 2017) and from the computational one (Dentcheva et al., 2017; Tamar et al., 2017). As noted in (Wang et al., 2017; Yang et al., 2018), the compositional, SSDtype optimization algorithms analyzed in this paper present some major natural advantages over the SAA approach. First, the algorithm solves the original riskaverse stochastic program asymptotically to optimality, whereas, in the SAA approach, the corresponding SAA surrogate to the original program is solved, producing only an approximate solution; as the number of the sample increases the solution to the SAA surrogate approaches that of the original stochastic program, in some well defined sense (Shapiro, 2013; Dentcheva et al., 2017). Second, because of its nature, the SAAs cannot exploit new information available to the decision maker, so that they can improve their decisions, based on those made so far; in fact, the SAA surrogate needs to be redefined using new available information, and then solved afresh. Of course, the algorithm efficiently exploits new information, due to its recursive, sequential nature. Third, as a result of the above, SAAs are not suitable for settings where information is available sequentially, and decisions have to be made adaptively over time. Fourth, SAAs might often require a very large number of samples for producing accurate approximations to the optimal decisions corresponding to the original problem, and this might result in optimization problems whose objective is computationally difficult to evaluate. For more details on this, see (Wang et al., 2017). On the contrary, the algorithm is iterative in nature, and presents minimal and fixed time and space complexity per iteration.
Organization of the Paper
The rest of the paper is organized as follows. Section 2 establishes the stochastic riskaverse convex programming setting under study, and provides some elementary, albeit necessary preliminaries on the theory of risk measures. In Section 3, we constructively introduce the class of meansemideviation risk measures, we study their existence and their structural properties, we discuss specific examples, and we develop our above mentioned fundamental characterization result. Section 4 is devoted to the development and analysis of the algorithm, under our proposed theoretical framework for convergence, and includes the rigorous comparison of our results with those presented in (Yang et al., 2018). Finally, Section 5 concludes the paper.
Note: Some longer proofs of the theoretical results presented in the paper in the form of Theorems, Lemmata and Propositions are excluded from the main body of the paper for clarity in the exposition, and are presented in Section 7 (Appendix).
Notation & Definitions
Matrices and vectors will be denoted by boldface uppercase and boldface lowercase letters, respectively. Calligraphic letters and formal script letters will generally denote sets and
algebras, respectively, except for clearly specified exceptions. The operator will denote vector transposition. The norm of is , for all . Similarly, the norm of an appropriately measurable function will be for , and , where the reference measure will be clearly specified by the context. The finite dimensional identity operator will be denoted as . Additionally, we define , and , for .If denotes a base sample space and (referring directly to ), then, for the sake of clarity, we sometimes drop dependence on , and write simply (clear by the context).
For every set , which is nonempty, closed and convex, the Euclidean projection onto , is defined, as usual, as , for all . Euclidean projections, as defined above, always exist and are nonexpansive operators.
For every realvalued function , which is differentiable at a point , the vector denotes its gradient at . If, additionally, is differentiable on , the function denotes its gradient function, mapping each to .
If is nonsmooth and convex, its subdifferential is the closedvalued multifunction , defined, for every , as the set of all gradients each corresponding to a linear underestimator of , or, in other words,
(1) 
A subgradient (function) of , suggestively denoted as , is defined as any selection of the subdifferential multifunction , that is, for every , it is true that ; for brevity, we write . For fixed , will be called a subgradient of at .
2 Problem Setting & Preliminaries
We now formally introduce the problem of interest in this work. Henceforth, all subsequent probabilistic statements will presume the existence of a common probability space . We refer to as the base space. We place no topological restrictions on the sample space . However, in order for some mild technicalities to be easily resolved, we conveniently assume that constitutes a complete measure space.
Let be a bivariate realvalued mapping, such that, for every , the function is measurable and, for every , the path is (realvalued) convex (and subdifferentiable). Also, for a given measurable (in general) random element , consider the composite function , defined as
(2) 
It easily follows that, for every , the function is an measurable (in general), realvalued random variable. We additionally assume that, for every , belongs to the Lebesgue space for some fixed choice of , relative to the base measure , that is, . Of course, if, for every , , where is the Borel pushforward of , then , as well. Hereafter, will be referred to as a random cost function.
With the term risk measure, we refer to some fixed and known realvalued functional on the Banach space (Shapiro et al., 2014). Among all risk measures on , we pay special attention to those exhibiting the following basic structural characteristics.
Definition 1.
(ConvexMonotone Risk Measures) A real valued functional on , , is called a convexmonotone risk measure, if and only if it satisfies the following conditions:

(Convexity): For every and , it is true that
(3) for all .

(Monotonicity): For every and , such that , for almost all , it is true that .
For a possibly convexmonotone risk measure (following Assumption 1), we will be interested in the “static” stochastic program
(4) 
where the set of feasible decisions is assumed to be closed and convex.
Under the standard problem setting outlined above, it is straightforward to formulate the following elementary result, provided here without proof, and for completeness.
Proposition 1.
(Convexity of RiskFunction Compositions (Shapiro et al., 2014)) Consider a realvalued random function , as well as a realvalued risk measure . Suppose that, for every , is convex and that is convexmonotone. Then, the realvalued composite function is convex.
Proposition 1 shows that, under the respective assumptions, (4) constitutes a convex mathematical program in standard form. Thus, application of a subgradient method would require that some selection of the subdifferential multifunction can be evaluated at will, at any . However, for most choices of the random cost function and of the risk measure , even the composition is impossible to be evaluated exactly, let alone (a selection of) . Instead, we may be given either realizations of the random exogenous information , or direct evaluations of and a subgradient, , at some test decision candidate . It might also be desirable that decision making is performed sequentially over time, where decisions are updated adaptively as new information arrives. Such settings motivate the consideration of SSDtype algorithms for solving (4), which are of main interest in this paper.
Some basic assumptions follow, fairly standard in the literature of stochastic approximation (Shapiro et al., 2014; Wang et al., 2017; Yang et al., 2018; Kushner and Yin, 2003). To this end, let us formally introduce the elementary concept of a IID process. Then, our assumptions follow.
Definition 2.
(IID Process) A stochastic sequence is called IID if and only if it consists of statistically independent, valued random elements, identically distributed according to a fixed Borel measure .
Assumption 1.
(Availability of Information) Either one, or more, mutually independent, IID sequences are available sequentially, all distributed according to .
Remark 1.
Note that in Assumption 1 we do not require that the process is actually observable to the user, but only available, either in the form of a data stream, or by simulation.
Assumption 2.
(Existence of an ) There exists a mechanism, called a Sampling Oracle (), which, given and , returns either , or , a subgradient of relative to , or both. It is further assumed that the has direct access to all available information streams, according to Assumption 1.
In this work, we propose and analyze efficient algorithms for solving (4) under Assumptions 1 and 2, and explicitly assuming no prior knowledge of either the random cost function , or its respective subgradients. We will be restricting our attention to a new class of convexmonotone risk measures with, however, wide applicability, and whose general structure follows the socalled MeanRisk Model ((Shapiro et al., 2014), Section 6.2). This special class of risk measures is introduced and analyzed, in detail, in Section 3.
3 MeanSemideviation Models
Under the MeanRisk Model paradigm (Shapiro et al., 2014), a risk measure is defined, for each random cost , as
(5) 
where the functional constitutes a dispersion measure, and provided that the respective quantities are well defined, for the particular choice of . The dispersion measure may be conveniently thought as a penalty, weighted by the penalty multiplier , effectively quantifying the uncertainty of the particular cost .
In this section, we introduce a special class of dispersion measures, which constitute natural generalizations of the wellknown upper semideviation of order (Shapiro et al., 2014). This new class of dispersion measures is termed here as generalized semideviations. Reasonably enough, risk measures of the form of (5), where the respective dispersion measure constitutes a generalized semideviation will be called either meansemideviation risk measures, or, interchangeably, meansemideviation models, or, simply, meansemideviations.
This section is structured as follows. First, the simple notion of a risk regularizer is introduced; risk regularizers constitute the basic building block of generalized semideviations. The basic properties of risk regularizers are concisely presented, and a formal definition of generalized semideviations is also formulated, along with a brief discussion related to their practical relevance. Meansemideviation risk measures are then formally introduced, along with their basic properties, and specific examples are discussed, highlighting their versatility. Next, we develop a constructive characterization result, essentially showing that the class of all meansemideviation risk measures is almost in onetoone correspondence with the class of cumulative distribution functions of all integrable random variables (on the line). This result readily demonstrates an apparent generality of meansemideviations, as well. Lastly, the usefulness, flexibility and effectiveness of meansemideviation risk measures are demonstrated on a classical, chanceconstrained newsvendor model. In particular, risk regularizers (each inducing a meansemideviation risk measure) are put in context, and their construction is explicitly discussed, reflecting the special characteristics of the specific newsvendor problem under consideration, and the objectives of the decision maker.
3.1 Basic Concepts
We start by introducing the concept of a risk regularizer. Risk regularizers are simple, realvalued functions of one variable, which are reasonably structured, so that they, on the one hand, can be used to quantify risk (see below) and, on the other, can result in problems which can be solved efficiently and exactly via convex stochastic optimization.
Definition 3.
(Risk Regularizers) A realvalued function is called a risk regularizer, if it satisfies the following conditions:

is convex.

is nonnegative.

is nondecreasing.

For every , it is true that , for all .
Fig. 3.1 illustrates the shapes of various risk regularizers, other than the arguably most obvious example of the positive part function . Note that a risk regularizer need not be smooth (a trivial example is ); several of the examples of Fig. 3.1 are indeed nonsmooth, with the respective corner points highlighted by black dots.
Risk regularizers of Definition 3 may be further structurally characterized via the following simple result.
Proposition 2.
(Characterization of ) Consider a realvalued function , satisfying condition of Definition 4. Then, condition holds if and only if is nonexpansive.
Proof of Proposition 2.
First, assume that condition holds. Then, by the fact that is nondecreasing (), it is true that
(6) 
for all , showing that is a nonexpansive map. Conversely, assume that is nonexpansive. Then, for any , it is true that
(7) 
for all , verifying condition . ∎
At this point, let us emphasize the elementary fact that, because of convexity, every (realvalued) riskregularizer must also be differentiable almost everywhere, relative to the Lebesgue measure on the Borel space . This also follows either by monotonicity, or due to the fact that a risk regularizer is nonexpansive and, therefore, Lipschitz continuous on . Further, because of convexity, the set of Lebesgue measure zero of points in , where a risk regularizer is nondifferentiable, is at most countable.
The class all possible risk regularizers induces that of generalized semideviations, which constitute the class of dispersion measures considered in this paper. The definition of a generalized semideviation is presented below.
Definition 4.
(Generalized Semideviations) Fix and choose a risk regularizer . A dispersion measure is called a generalized semideviation of order , if and only if, for ,
(8) 
where it is assumed that all involved quantities are well defined and finite.
The power of generalized semideviations is in the fact that they form a parametric family relative to the choice of the risk regularizer ; different risk regularizers correspond to different rules for ranking the relative effect of both riskier (higher than the mean) and less risky (lower than the mean) events, corresponding to specific regions in the range of the cost. For more details, see Section 3, where we illustrate the versatility of generalized semideviations via additional examples, considering various specific choices for , with the well known uppersemideviation dispersion measure (Shapiro et al., 2014) being the prototypical representative of this class.
3.2 MeanSemideviations: Definition, Existence & Structure
Utilizing the concept of generalized semideviations, we may now introduce the class of risk measures of central interest in this work, as follows.
Definition 5.
(MeanSemideviation Risk Measures) Fix and choose a risk regularizer . The meansemideviation of order , induced by , or , for short, is the realvalued risk measure defined, for , as^{1}^{1}1A meansemideviation risk measure will be denoted either as , which is proper, or , which is simpler, as long as the choices of and are clearly specified.
(9) 
where constitutes a fixed penalty multiplier, and provided that all involved quantities are well defined and finite.
Next, we state and prove a small number of relatively simple results, related to the existence of meansemideviation risk measures, introduced in Definition 5, as well as their functional structure. First, as it might be expected, we show that meansemideviation risk measures of order may be naturally associated with costs which are also in (i.e., choosing ). Recall that, throughout the paper, is reserved for specifying the order of the meansemideviation risk measure under consideration, whereas is related to the integrability of the respective cost.
Proposition 3.
(Compatibility of ’s and ’s) Fix , , and choose any risk regularizer . Then, as long as , the risk measure is welldefined and finite, for every .
Proof of Proposition 3.
Since , it is trivial that , simply due to the inclusion , for any choice of . Thus, the expectation of every exists and is finite, and what remains is to prove the result for the dispersion measure .
For simplicity, let . Using the fact that is finite, it is true that, for every , the shifted cost is in . It thus suffices to show that, for every , is in , as well. Because the risk regularizer is nonnegative (condition ), the integral exists. Also, due to condition of Definition 3, it follows that, for every , and since is nondecreasing (), it is true that , for all . Setting , this yields
(10) 
and since , , as well. Consequently, it is true that
(11) 
showing that and, therefore, , are both well defined and finite, for every .
Now, due to the inclusion , we know that, if , for some , then , as well. Enough said. ∎
Hereafter, for the sake of generality, we will implicitly assume that and are compatible, so that existence and finiteness of the resulting risk measures considered is ensured. Of course, in actual applications, Proposition 3 may be directly invoked on a casebycase basis, in order to select the order of the particular dispersion measure of choice, depending on the nature of the random cost, or a family of those, under study.
After characterizing existence and finiteness of meansemideviation risk measures, as introduced in Definition 5, we focus on their structural properties, from a functional point of view. As the following result suggests, meansemideviation risk measures are indeed convexmonotone under a standardized assumption on the penalty multiplier .
Theorem 1.
(When are MeanSemideviations ConvexMonotone?) Fix and choose any risk regularizer . Then, as long as , the risk measure is convexmonotone; that is, it satisfies both conditions and .
Proof of Theorem 1.
Let us start with verifying convexity (). Since the expectation term of is a linear functional on , it will suffice to show that the generalized semideviation term is convex. Indeed, for every , and every , we may write
(12) 
where the first inequality is true due to conditions (convexity) and (nonnegativity), and the second is due to the triangle (Minkowski) inequality. Thus, is a convex functional, which means that is also convex on . Note that the value of is not crucial in order to show convexity of .
Let us now study monotonicity () of the risk measure . For every and , such that , for almost all , we have
(13) 
where the first inequality is due to conditions (nonnegativity) and (monotonicity), the second is due to conditions (nonnegativity), (nonexpansiveness), as well as the fact that , and the third is again due to the triangle inequality. From (13), we readily see that, as long as , we may further write
(14) 
completing the proof of the theorem. ∎
We may now invoke Proposition 1, presented earlier, to immediately obtain the following key corollary. The proof is trivial and, thus, omitted.
Corollary 1.
Corollary 1 is an important result, because it shows that, for every meansemideviation risk measure, or equivalently, for every risk regularizer of choice, problem (4) would be exactly solvable via, for instance, subgradient methods, if the function was known in advance. This fact reinforces our hope that it might indeed be possible to solve (4) to optimality, utilizing some carefully designed stochastic search, or, more specifically, and based on the assumed subdifferentiability of , stochastic subgradient algorithm. Of course, such an algorithm should be designed to work under Assumptions 1 and 2, without the need for explicit knowledge of , or .
Remark 2.
(Coherence?) We should mention that meansemideviations are not coherent risk measures ((Shapiro et al., 2014), Section 6.3), since they do not satisfy the axiomatic property of positive homogeneity. This is simply due to the fact that, in general, one may find choices for such that
(15) 
for some and . Nevertheless, meansemideviations may be readily shown to satisfy translation equivariance, although such property is not explicitly required in this work. As a result, except for being convexmonotone, meansemideviations also belong to the class of convex risk measures (Föllmer and Schied, 2002; Shapiro et al., 2014).
3.3 Examples of MeanSemideviation Models
Before moving on, it would be instructive to discuss some examples of meansemideviations, highlighting the versatility of this particular class of risk measures. We start from simple, illustrative choices as far as the involved risk regularizer is concerned, and then we generalize.
3.3.1 MeanUpperSemideviations
The simplest, prototypical example of a meansemideviation risk measure is the meanuppersemide viation of order ((Shapiro et al., 2014), Sections 6.2.2 & 6.3.2), which is constructed by choosing as risk regularizer the function
(16) 
yielding the risk measure
(17) 
for . Of course, in this case, it is trivial to show that satisfies conditions  of Definition 3. Recall that we have assumed that is appropriately chosen, such that is a well defined, realvalued functional on .
3.3.2 Entropic MeanSemideviations
Our second example is a generalization of the meanuppersemideviation risk measure discussed in the previous example. Here, the risk regularizer is chosen itself from a parametric family, as
(18) 
where is a parameter, regulating the sharpness of the function at zero. It is trivial to verify conditions (nonnegativity) and (monotonicity). Also, for fixed , the first derivative of relative to is the logistic function
(19) 
showing that is a contraction mapping, immediately verifying condition (nonexpansiveness), via Proposition 2. Likewise, the second derivative of is given by
(20) 
and, thus, (convexity) is readily verified, as well. Hence, is a valid risk regularizer. Alternatively and to illustrate the procedure, we may verify condition directly; for fixed , for every and for every , we may write
(21) 
where the inequality is due to the fact that . It is also easy to see that, for every , , showing that constitutes a smooth approximation to the risk regularizer of the meanuppersemideviation risk measure discussed previously.
The resulting risk measure is called an entropic meansemideviation of order , and may be expressed as
(22) 
for . For obvious reasons, this risk measure may be considered a soft version of the meanuppersemideviation risk measure.
3.3.3 CDFAntiderivative (CDFA) MeanSemideviations
We now show that, in fact, both previously presented examples are special cases of a much more general approach, which may be utilized for the construction of risk regularizers. To this end, let be a random variable in , with cumulative distribution function (cdf) . Consider the choice
(23) 
where, because is a nonnegative Borel measurable function, the involved integration is always welldefined (might be , though), in the sense of Lebesgue. The particular antiderivative of the cdf , as defined in (23), constitutes a very important quantity in the theory of stochastic dominance; see, for instance, related articles (Ogryczak and Ruszczyński, 1999) and (Ogryczak and Ruszczyński, 2002) for definition and insights. In particular, via Fubini’s Theorem (Theorem 2.6.6 in (Ash and DoléansDade, 2000)), may be easily shown to admit the alternative integral representation
(24) 
Exploiting the assumption that , it follows that , for every . Also, from (24), it is trivial to see that, because of the structure of the function , is convex (), nonnegative () and nondecreasing () on . Nonexpansiveness () may also be readily verified.
Consequently, is a valid risk regularizer, and the resulting risk measure, called a CDFAntiderivative (CDFA) meansemideviation, may be expressed in various forms as
(25) 
for , where can be arbitrarily taken to be independent of and denotes the Borel pushforward of .
We may now verify that both meanuppersemideviation and entropic meansemideviation risk measures discussed above are special cases of CDFA meansemideviations. In meanuppersemidevi ations, the respective risk regularizer is an antiderivative (taken piecewise) of the cdf corresponding to the Dirac measure at zero. In entropic meansemideviations, the respective risk regularizer is an antiderivative of (19) (by monotone convergence and via a sequential argument), which is the cdf of a zeromean element in . In both cases, the antiderivatives involved are of the form of (23).
3.3.3.1 Special Case: Gaussian Antiderivative (GA) MeanSemideviations
An interesting subclass of CDFA meansemideviations is the one resulting from taking antiderivatives of the cdf of a standard Gaussian random variable . In this case, the simplest possible risk regularizer may be constructed as
(26) 
where and denote the standard Gaussian cdf and density, respectively. This particular antiderivative of appears naturally in standard treatments of the socalled ranking&selection, or best arm identification problem and, more specifically, in lookahead selection policies, such as the Knowledge Gradient and the Expected Improvement (Frazier et al., 2008; Ryzhov, 2016).
The resulting meansemideviation risk measure is called a Gaussian Antiderivative (GA) meansemideviation of order , and may be expressed as
(27) 
for . Of course, as it happens for all mean semideviations, the functional , as defined in (27), is a convex risk measure for every , and a convexmonotone risk measure, if .
3.4 A Complete Characterization of MeanSemideviations
As a result of the discussion in Section 3.3.3 above, it follows that risk regularizers may be formed by taking antiderivatives of the cdf of any integrable random variable of choice, resulting in a vast variety of meansemideviation risk measures, all sharing a common favorable structure.
Here, we show that if we start from a given risk regularizer , the converse statement is also true. In this respect, we state and prove the following important result.
Theorem 2.
(CDFBased Representation of Risk Regularizers) Let be a random variable, such that, for every , , and let denote its cdf. Then, for any fixed and , the function defined as
(28) 
is a valid risk regularizer, where integration may be interpreted either in the improper Riemann sense (for computation), or in the standard sense of Lebesgue (for derivation).
Conversely, let be any risk regularizer. Then, there exist some random variable , satisfying , for all , with cdf , and constants and , such that, for every , the representation (28) is valid. In particular, if denotes the right derivative of , it is always true that , , and, as long as is nonconstant, it holds that , and is given by for all .
Theorem 2 is important for two main reasons, the first being related to the forward statement, and the second to the converse. On the one hand, Theorem 2 provides us with the clean, very versatile and analytically friendly integral formula (28) for constructing risk regularizers of various shapes and types. On the other hand, it informs us that, necessarily, any risk regularizer can be expressed in the form of (28) and, as a result, all possible risk regularizers may be constructed utilizing (28), each time for some suitably chosen cdf. Therefore, risk regularizers are completely characterized by the cdfbased representation of Theorem 2.
Of course, every risk regularizer induces a unique meansemideviation risk measure. But also notice that, trivially, every meansemideviation risk measure corresponds to a uniquely specified risk regularizer (as a functional, or when all costs in the corresponding space the largest such space, for the smallest possible  are considered). Therefore, Theorem 2 provides a complete characterization of the whole class of meansemideviation risk measures. In particular, Theorem 2 implies that the class of all meansemideviation risk measures is almost in onetoone correspondence with the class of cdfs of all integrable valued random elements. The “almost” in the preceding statement is due to the presence of constants and in Theorem 2, and that actually slightly less is required than (absolute) integrability of the involved random variable .
3.5 Practical Illustration of MeanSemideviation Models
We conclude this section by briefly outlining the relevance of meansemideviation models in applications, also putting our proposed risk regularizers in context. More specifically, we consider a chanceconstrained version of the prototypical, singleproduct newsvendor problem (see, for instance, Chapter 1 in (Shapiro et al., 2014)), upon which we are based in order to formulate a doubly riskaverse newsvendor problem, which jointly controls both unmet demand and holding costs. We also explicitly demonstrate how the respective risk regularizer may be potentially designed, based on the characteristics of the particular problem under consideration.
Although the singleproduct newsvendor problem (and its variations) indeed constitutes a onedimensional, toy example, it provides insights and highlights some important features of the meansemideviation risk measures advocated herein. Additionally, the simplicity of such a problem facilitates numerical solution, and enables us to present some numerical results, verifying the effectiveness of the proposed meansemideviation risk measures experimentally, as well.
3.5.1 A ChanceConstrained SingleProduct NewsVendor
Suppose that a newsvendor is interested in optimally producing newspapers for an uncertain market, so that they minimize the cost incurred by actual production and by not meeting market demand, while respecting their holding capacity, or a predefined holding cost target. Let , and be known constants, standing for the production, unmet demand and holding costs per production unit. Also let be the random market demand, a random variable with cdf , for simplicity assumed to be absolutely continuous relative to the Lebesgue measure on . Since the market is uncertain, the newsvendor resorts to stochastically deciding their production plan by solving the chanceconstrained program
(29) 
where, also for simplicity, we assume that the decision variable is realvalued, and where constitutes the newsvendor’s tolerance in the event that their holding cost will exceed a prescribed threshold . Both and are fixed design parameters decided by the newsvendor beforehand. Chanceconstrained newsvendor problems similar to (29) have been previously considered in the literature; see, for instance, the related article (Zhang et al., 2009). Here, an important detail is that, despite the probabilistic constraint, problem (29) is riskneutral as far as treatment of unmet demand is concerned. This is because only the expectation of the cost of not meeting the demand, corresponding to , is considered in the objective.
Problem (29) exhibits some interesting features and may be significantly simplified, as follows. First, we may observe that, for every fixed choice of ,
(30) 
Consequently, it is true that
(31)  
(32) 
where, due to
being continuous, the pseudoinverse or quantile function
is defined as(33) 
Thus, problem (29) is convex and may be reformulated as
(34) 
Hereafter, without loss of generality, we may assume that and are chosen such that . Otherwise, the problem is trivially solved at . To be fully compatible with the generic notation utilized in this paper, we may also define and .
Next, let us consider the derivative of the objective of (34), relative to . We have
(35) 
Hence, unless , it readily follows that, for every , , again implying that the choice constitutes a solution of (34); in other words, producing nothing is always optimal whenever . On the other hand, it is apparently true that , for all , if and only if
(36) 
implying that the condition
(37) 
is sufficient to ensure negativity of
Comments
There are no comments yet.