DeepAI

# Parametrized Convex Universal Approximators for Decision-Making Problems

Parametrized max-affine (PMA) and parametrized log-sum-exp (PLSE) networks are proposed for general decision-making problems. The proposed approximators generalize existing convex approximators, namely, max-affine (MA) and log-sum-exp (LSE) networks, by considering function arguments of condition and decision variables and replacing the network parameters of MA and LSE networks with continuous functions with respect to the condition variable. The universal approximation theorem of PMA and PLSE is proven, which implies that PMA and PLSE are shape-preserving universal approximators for parametrized convex continuous functions. Practical guidelines for incorporating deep neural networks within PMA and PLSE networks are provided. A numerical simulation is performed to demonstrate the performance of the proposed approximators. The simulation results support that PLSE outperforms other existing approximators in terms of minimizer and optimal value errors with scalable and efficient computation for high-dimensional cases.

• 1 publication
• 1 publication
07/16/2022

### Approximation Capabilities of Neural Networks using Morphological Perceptrons and Generalizations

Standard artificial neural networks (ANNs) use sum-product or multiply-a...
05/21/2019

### A Universal Approximation Result for Difference of log-sum-exp Neural Networks

We show that a neural network whose output is obtained as the difference...
06/20/2018

### Log-sum-exp neural networks and posynomial models for convex and log-log-convex data

We show that a one-layer feedforward neural network with exponential act...
08/06/2018

### Beyond the Central Limit Theorem: Universal and Non-universal Simulations of Random Variables by General Mappings

The Central Limit Theorem states that a standard Gaussian random variabl...
09/14/2017

### On Coordinate Minimization of Convex Piecewise-Affine Functions

A popular class of algorithms to optimize the dual LP relaxation of the ...
12/23/2021

### Optimal learning of high-dimensional classification problems using deep neural networks

We study the problem of learning classification functions from noiseless...
06/20/2020

### Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Invertible neural networks based on coupling flows (CF-INNs) have variou...

## Code Repositories

### ParametrisedConvexApproximators.jl

A Julia package for parametrised convex approximators including parametrised max-affine (PMA) and parametrised log-sum-exp (PLSE) neural networks.

## I Introduction

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Conditional decision-making problems find a suitable decision for a given condition; examples include optimal control and reinforcement learning

[liberzonCalculusVariationsOptimal2012, suttonReinforcementLearningIntroduction2018] and inference via optimization in energy-based learning [lecunTutorialEnergyBasedLearning2006]. To make a decision in an optimal sense, the decision-making problem can be solved by minimizing a bivariate function for a given condition, whose arguments consist of the condition and decision variables. This formulation carries an implication: the bivariate function is regarded as a cost function, which evaluates the cost for a given condition and decision variables. Since most convex optimization problems are known to be relatively reliable and tractable for high dimensions, i.e., polynomial time complexity [boydConvexOptimization2004, liuSurveyConvexOptimization2017], incorporating convex optimization in decision-making is apparently beneficial. A natural approach is to implement a parametrized convex surrogate model for the cost function approximation, where a bivariate function is said to be parametrized convex if the function is convex when the condition variables are fixed. In this regard, data-driven surrogate model approaches using (parametrized) convex approximators have drawn much attention and have been implemented in many applications, including structured prediction and continuous-action Q-learning [calafioreLogSumExpNeuralNetworks2020, calafioreEfficientModelFreeQFactor2020, amosInputConvexNeural2017a].

Owing to the advances of machine learning, performing data-driven function approximation has become essential and promising in many areas, including differential equations, generative model learning, and natural language processing with transformers

[chenNeuralOrdinaryDifferential2018, goodfellowGenerativeAdversarialNetworks2014, vaswaniAttentionAllYou2017]. One of the main theoretical results of function approximation is referred to as the universal approximation theorem. In general, if a function approximator is a universal approximator, it is capable of approximating any function with arbitrary precision (usually on a compact set). For example, it was shown that a shallow feedforward neural network (FNN) is a universal approximator for continuous functions [cybenkoApproximationSuperpositionsSigmoidal1989, pinkusApproximationTheoryMLP1999a], and studies on the universal approximation theorem for deep FNN have been conducted recently [kidgerUniversalApproximationDeep2020]. On the other hand, one may want to approximate a class of functions specified by a certain shape, for example, monotonicity and convexity, while the approximator preserves the same shape. This is called the shape-preserving approximation (SPA) [devoreConstructiveApproximation1993]. For convex continuous functions, some approximators were proposed as shape-preserving universal approximators, including max-affine (MA) and log-sum-exp (LSE) networks [calafioreLogSumExpNeuralNetworks2020]. For decision-making problems, however, one may need a shape-preserving approximator for parametrized convex functions, not just convex functions. An attempt was conducted to suggest a parametrized convex approximator, the partially input convex neural network (PICNN), However, it has not been shown that PICNN is a universal approximator [amosInputConvexNeural2017a].

To address this issue, in this study, new parametrized convex approximators are proposed: parametrized MA (PMA) and parametrized LSE (PLSE) networks. PMA and PLSE networks are extensions of MA and LSE networks to parametrized convex functions by replacing the parameters of MA and LSE networks with continuous functions with respect to condition variables. This study demonstrates that PMA and PLSE are shape-preserving universal approximators for parametrized convex continuous functions. The main challenge of showing that PMA and PLSE are universal approximators comes from the replacement of network parameters with continuous functions. In the construction of MA and LSE networks, subgradients are arbitrarily selected from corresponding subdifferentials [calafioreLogSumExpNeuralNetworks2020], while the subgradients are replaced with functions of the condition variables. Therefore, the subgradient functions should carefully be approximated along with condition variable axes. This issue is resolved by continuous selection of multivalued subdifferential mappings. From the practical point of view, guidelines for practical implementation are provided, for example, by utilizing the deep architecture. Numerical simulation is performed to demonstrate the proposed approximators’ approximation capability in terms of minimizer and optimal value errors as well as the solving time of convex optimization for decision-making. The simulation result highly supports that the proposed approximators, particularly the PLSE network, show the smallest minimizer and optimal value errors from low- to high-dimensional cases as well as scalable solving time compared to existing approximators, FNN, MA, LSE, and PICNN.

The rest of this paper is organized as follows. In Section II, some mathematical backgrounds for set-valued analysis and convex analysis are briefly summarized, and several types of universal approximators are defined. Additionally, the proposed parametrized convex approximators, PMA and PLSE networks, are introduced. Section III describes the main theoretical results of this study. The main results include that PMA and PLSE are parametrized convex universal approximators and can collaborate with ordinary universal approximators, for example, the multilayer FNN, to approximate continuous selection of subdifferential mappings. In Section IV, numerical simulation is performed to demonstrate the approximation capability of the proposed approximators, in terms of minimizer and optimal value errors as well as solving time. Finally, Section V concludes this study with a summary and future works.

## Ii Preliminaries

Let and be the sets of all natural and real numbers, respectively. The extended real number line is denoted by

. In this study, it is assumed that functions are defined in Euclidean vector space with a standard inner product

and Euclidean norm . Given a set , the interior and closure of are denoted by and , respectively. The diameter of a set is defined as . The supremum norm of a function is defined as where is the domain of the function , or an appropriate set in context. -Lipschitzness stands for the Lipschitzness with a specific Lipschitz constant of and is defined as follows.

###### Definition 1 (L-Lipschitzness).

A real-valued function is said to be -Lipschitz for some (simply Lipschitz or Lipschitz continuous) if for all it satisfies .

### Ii-a Set-valued analysis

If a function is defined on , whose value is a subset of , i.e., , then is called a multivalued function (or a set-valued function) and denoted as . Ordinary functions can be regarded as a single-valued function in context, i.e., for a certain .

A graph of a multivalued function is defined by

 Graph(f):={(x,y)∈X×Y|y∈f(x)}. (1)

A multivalued function is said to be upper hemicontinuous (u.h.c.) at if for any open neighborhood of , there exists a neighborhood of such that for all , is a subset of .

A ball of radius around in is denoted by . If there is no confusion, let . The unit ball is denoted by , and therefore . -selection, a notion of the approximation of a multivalued function with a specific accuracy by a single-valued function, can be defined as follows.

###### Definition 2 (ϵ-selection).

Given multivalued functions and , if there exists a single-valued function such that , is said to be an -selection of .

If the selection is continuous then it is referred to as a continuous selection.

### Ii-B Convex analysis

A function is said to be lower semicontinuous (l.s.c.) at if for every , there exists a neighborhood of such that for all where , and tends to as when . In other words, .

Let be a convex function. A convex function is called proper if where the effective domain of is defined as .

Given function , convex conjugate (a.k.a. Legendre-Fenchel transformation) of is the function , where the value at is

 f∗(x∗)=supx∈X{⟨x,x∗⟩−f(x)}, (2)

where is the dual space of .

If is a convex function defined on a convex open set in , a vector is called a subgradient at if for any one has that

 f(x)≥f(x0)+⟨v,x−x0⟩. (3)

The set of all subgradients at is called subdifferential at , denoted by . That is, .

Now, we define a class of functions.

###### Definition 3 (Parametrized convexity).

A function is said to be parametrized convex (with respect to the second argument) if for any , is convex.

### Ii-C Related works and existing universal approximators

A universal approximation theorem (UAT) [cybenkoApproximationSuperpositionsSigmoidal1989, pinkusApproximationTheoryMLP1999a] describes a kind of approximation capability of an approximator. UATs usually consider continuous functions on a compact subspace of Euclidean vector space. In this study, universal approximators for continuous functions are referred to as ordinary universal approximators, which are defined as follows.

###### Definition 4 (Ordinary universal approximator).

Given a compact subspace of , let be the collection of all continuous functions from to , and if there is no confusion. A collection of continuous functions defined on is said to be an (ordinary) universal approximator if is dense in , i.e., such that .

Examples of ordinary universal approximators include a single-hidden-layer FNN [cybenkoApproximationSuperpositionsSigmoidal1989, pinkusApproximationTheoryMLP1999a]. Another class of universal approximators for convex continuous functions, which preserves the convexity, is defined as follows.

###### Definition 5 (Convex universal approximator).

Given a compact convex subspace of , let be the collection of all convex continuous functions from to , and if there is no confusion. A collection of convex continuous functions defined on is said to be a convex universal approximator if is dense in , i.e., such that .

Examples of ordinary universal approximators include max-affine (MA) and log-sum-exp (LSE) networks [calafioreLogSumExpNeuralNetworks2020]

. The MA network is constructed as a pointwise supremum of supporting hyperplanes of a given convex function, which are underestimators of the convex function. The LSE network is a smooth version of the MA network that replaces the pointwise supremum with a log-sum-exp operator. MA and LSE networks can be represented with some

, , for , and as

 fMA(u)=max1≤i≤I(⟨ai,u⟩+bi),fLSE(u)=Tlog(I∑i=1exp(⟨ai,u⟩+biT)). (4)

Hereafter, we adopt the following notation for brevity: Condition (state) and decision (action or input) variables are denoted by and , respectively, where and denote condition and decision spaces, respectively. It is assumed that is a compact subspace of and that is a convex compact subspace of .

## Iii Main Result

In this section, two parametrized convex approximators are proposed, the parametrized max-affine (PMA) and parametrized log-sum-exp (PLSE) networks, and the main results of this study are presented. The main results are: i) PMA and PLSE networks are parametrized convex universal approximators, ii) the continuous functions and in Eq. (5) can be replaced by ordinary universal approximators for practice implementation, and iii) under a mild assumption, the results also hold for conditional decision space settings, that is, PMA and PLSE are parametrized convex universal approximators even when a conditional decision space mapping is given, where the decision must be in for a given condition .

### Iii-a Proposed parametrized convex approximators

To begin, universal approximators for parametrized convex continuous functions are defined as follows.

###### Definition 6 (Parametrized convex universal approximator).

Given a compact subspace of and a compact convex subspace of , let be the collection of all parametrized convex continuous functions from to , and if there is no confusion. A collection of parametrized convex continuous functions defined on is said to be a parametrized convex universal approximator if is dense in , i.e., such that .

Let us introduce the PMA and PLSE networks, which are the generalized MA and LSE networks for parametrized convex function approximation. For some , let and be the collection of all PMA and PLSE networks, respectively, where each PMA and PLSE network can be represented with some , , , for , and as

 fPMA(x,u)=max1≤i≤I(⟨ai(x),u⟩+bi(x)),fPLSE(x,u)=Tlog(I∑i=1exp(⟨ai(x),u⟩+bi(x)T)), (5)

where is usually referred to as the temperature. Compared to MA and LSE in Eq. (4), PMA and PLSE generalize MA and LSE to be parametrized convex by replacing network parameters and with continuous functions and for . Note from Eq. (5) that PMA and PLSE networks are indeed parametrized convex.

In the following Theorem 1, it is shown that a PLSE network can be made arbitrarily close to the corresponding PMA network with a sufficiently small temperature.

###### Theorem 1.

Given , , , and for , let and be the PMA and PLSE networks constructed as in Eq. (5), respectively. Then, for all , the following inequalities hold,

 fPMA(x,u)≤fPLSE(x,u)≤TlogI+fPMA(x,u). (6)
###### Proof.

The proof is merely an extension of [calafioreLogSumExpNeuralNetworks2020, Lemma 2] to the case of parametrized convex approximators. For completeness, the proof is shown here.

It can be deduced from Eq. (5) that

 fPMA(x,u)=max1≤i≤I(⟨ai(x),u⟩+bi(x))=max1≤i≤ITlog((exp(⟨ai(x),u⟩+bi(x)))1/T)=Tlog(max1≤i≤Iexp(⟨ai(x),u⟩+bi(x)T))≤Tlog(I∑i=1exp(⟨ai(x),u⟩+bi(x)T))=fPLSE(x,u), (7)

which shows the first inequality. The second inequality can be derived as follows.

 fPLSE(x,u)=Tlog(I∑i=1exp(⟨ai(x),u⟩+bi(x)T))≤Tlog(Iexp(max1≤i≤I⟨ai(x),u⟩+bi(x)T))=Tlog(Iexp((fPMA(x,u)))1/T)=TlogI+fPMA(x,u), (8)

which concludes the proof. ∎

Theorem 1 implies that for any , for all .

### Iii-B ϵ-selection of subdifferential mapping

To prove that MA and LSE are convex universal approximators, as in [calafioreLogSumExpNeuralNetworks2020], a dense sequence of points in the decision space is chosen, and corresponding subgradient vectors are chosen arbitrarily from subdifferentials at each point of the dense sequence. However, to extend MA and LSE to PMA and PLSE, that is, to prove that PMA and PLSE are parametrized convex universal approximators, the main difficulty arises from the fact that the subgradient vectors that appeared in MA and LSE become functions of the condition variable in PMA and PLSE, and therefore the subgradient vectors cannot be chosen arbitrarily. The following theorem addresses how to deal with this issue: each subdifferential mapping, a function of the condition variable, can be approximated by a continuous selection of the corresponding multivalued functions.

###### Theorem 2.

Let be a parametrized convex continuous function. Suppose for all that is -Lipschitz. Given , let be a multivalued function such that for all . Suppose that has a nonempty interior. Given a sequence , for all , there exist -selections of for all . Additionally, a sequence of -selections, , is equi-Lipschitz.

###### Proof.

See Appendix A. ∎

### Iii-C Universal approximation theorem

The main UAT results are provided in the following; PMA and PLSE networks can be arbitrarily close to any parametrized convex continuous functions on the product of condition and decision spaces.

###### Theorem 3 (PMA is a parametrized convex universal approximator).

Given a parametrized convex continuous function , for any , there exists a PMA network such that .

###### Proof.

See Appendix B. ∎

###### Corollary 3.1 (PLSE is a parametrized convex universal approximator).

Given a parametrized convex continuous function , for any , there exists a positive constant such that for all , there exists a PLSE network such that .

###### Proof.

By Theorem 3, given , there exists a PMA network (with in Eq. (5)) such that . By Theorem 1, setting and letting be the corresponding PLSE network imply that

 ∥^fPLSE−f∥∞≤∥^fPLSE−^fPMA∥∞+∥^fPMA−f∥∞<ϵ2+ϵ2=ϵ, (9)

for all , which concludes the proof. ∎

### Iii-D Implementation guidelines

Although it is shown from Theorem 3 and Corollary 3.1 that PMA and PLSE networks have enough capability to approximate any parametrized convex continuous functions, it is hard in practice to directly find the continuous functions ’s and ’s that appear in Eq. (5). For practical implementations, one would utilize ordinary universal approximators to approximate ’s and ’s. The following theorem supports that PMA and PLSE can be constructed with ordinary universal approximators to make them practically implementable while not losing their approximation capability.

###### Theorem 4 (PMA with ordinary universal approximators is a parametrized convex universal approximator).

Let and be ordinary universal approximators on to and , respectively. Given parametrized convex continuous function , for any , there exist , , and for such that where .

###### Proof.

By Theorem 3, , such that where the PMA network is defined as in Eq. (5). Additionally, since and are ordinary universal approximators on to and , respectively, given , , for , there exist , such that and for . Then,

 ∥^^f−f∥∞≤∥^^f−^f∥∞+∥^f−f∥∞<ϵ2+ϵ2=ϵ, (10)

which concludes the proof. ∎

###### Corollary 4.1 (PLSE with ordinary universal approximators is a parametrized convex universal approximator).

Let and be ordinary universal approximators on to and , respectively. Given parametrized convex continuous function , for any , there exist and such that for all , there exist and for such that where .

###### Proof.

The proof can be shown from Theorem 4 and Theorem 1 in a similar manner as the proof of Corollary 3.1 and is omitted here. ∎

In Theorems 3 and 4, it is proven that PMA and PLSE networks are shape-preserving universal approximators for parametrized convex continuous functions on the product of condition and decision spaces. In practice, the decision space may depend on a given condition. The following corollary shows that a simple modification can make the above results applicable to conditional decision space settings.

###### Corollary 4.2 (Extension to conditional decision space).

Let be a mapping of conditional decision space such that is convex compact for all . Suppose that there exists a convex compact subspace of such that . Then, Theorem 3 can be replaced by the conditional decision space setting.

###### Proof.

By Theorem 3, PMA is a parametrized convex universal approximator on , A fortiori, PMA is a parametrized convex universal approximator on from the assumption. ∎

Indeed, it is straightforward to show that this extension can easily be applicable to other results, e.g., Theorem 3, Corollary 3.1, Theorem 4, and Corollary 4.1.

###### Remark 1 (Comparison to the existing convex universal approximators).

Since the projection of convex functions is also convex, the existing convex universal approximators, e.g., MA and LSE networks [calafioreLogSumExpNeuralNetworks2020], are also applicable to decision-making problems. That is, with appropriate dimension modification, Eq. (4) changes to

 fMA(x,u)=max1≤i≤I(⟨ai,z⟩+bi),fLSE(x,u)=Tlog(I∑i=1exp(⟨ai,z⟩+biT)),

where . Examples of this approach include finite-horizon Q-learning using LSE [calafioreEfficientModelFreeQFactor2020]. Compared to this projection approach, PMA and PLSE have several advantages: i) PMA and PLSE are not required to restrict the condition space to be a convex compact space, while MA and LSE are, and ii) PMA and PLSE can be constructed by utilizing deep networks, e.g., multilayer FNN, for ’s and ’s in Theorem 4 and Corollary 4.1

. Building a network with deep networks would be practically attractive considering the successful application of deep learning in numerous fields

###### Remark 2 (Comparison between PMA and PLSE).

As LSE networks can be viewed as smoothed MA networks by replacing the pointwise supremum with the log-sum-exp operator, PLSE networks can likewise be viewed as smoothed PMA networks with respect to the decision variable . The choice of networks may depend on tasks, domain knowledge, convex optimization solvers, etc.

###### Remark 3 (Data normalization).

Normalization of data may be critical for the training and inference performance of PLSE. Note that LSE can also be normalized by its temperature parameter as if all LSE networks have the same temperature, usually set to be one, i.e., [calafioreLogSumExpNeuralNetworks2020]. It should be pointed out that the temperature normalization is also applicable for PLSE in the same manner as in LSE.

## Iv Numerical Simulation

In this section, a numerical simulation of the function approximation is performed to demonstrate the proposed approximators’ approximation capability, optimization accuracy, and solving time of optimization for a given condition to perform decision-making. For the simulation of the proposed parametrized convex approximators, a Julia [bezansonJuliaFreshApproach2017] package, ParametrisedConvexApproximators.jl, is developed in this study. All simulations were performed on a desktop with an AMD Ryzen™ 9 5900X and Julia v1.7.1.

Several approximators will be compared: FNN, MA, LSE, PICNN, PMA, and PLSE. FNN is the most widely used class of neural networks and has been proven to be an ordinary universal approximator for certain architectures. MA and LSE are convex universal approximators [calafioreLogSumExpNeuralNetworks2020]. PICNN is a parametrized convex approximator proposed for decision-making problems, mainly motivated by energy-based learning to perform inference via optimization [amosInputConvexNeural2017a]. The main characteristics of PICNN include its deep architecture recursively constructed with two paths, i.e., -path and -path. Note that it has not been shown that PICNN is a parametrized convex universal approximator.

The following target function is chosen as a parametrized convex function,

 f(x,u)=−12nx⊺x+12mu⊺u. (11)

Figure 1 shows the target function for .

Data points are uniformly randomly sampled within the box or hypercube for high-dimensional cases of , where denotes the number of data points. The data points are split into two groups, train and test data, in a ratio of :. Each approximator

is trained via supervised learning with the ADAM optimizer

[kingmaAdamMethodStochastic2017]. Simulation settings are summarized in Table I. FNN is constructed as a fully connected feedforward neural network with input nodes, hidden layer nodes (denoted as hidden layer width in Table I), and output node. For PMA and PLSE, a fully connected feedforward neural network is constructed with input nodes, hidden layer nodes, and output nodes. The output of the feedforward neural network is split into matrices whose sizes are and , respectively. The -th column of each matrix corresponds to and in Eq. (5), for . For the construction of PICNN, - and -paths’ hidden layer nodes are used [amosInputConvexNeural2017a]. Each simulation is performed with various input-output dimensions of . Note that is for 3D visualization, and others are borrowed from OpenAI gym environments [brockmanOpenAIGym2016] considering future applications to high-dimensional dynamical systems: and from the dimensions of observation and action spaces of HandManipulateBlock-v0 and Humanoid-v2 from OpenAI gym, respectively. Note that HandManipulateBlock-v0 is an environment for guiding a block on a hand to a randomly chosen goal orientation in three-dimensional space, and Humanoid-v2 is an environment for making a three-dimensional bipedal robot walk forward as fast as possible without falling over. From the fact that the observation and action spaces of most dynamical systems range in dimension from and to and , respectively, each dynamical system represents the dynamical system with i) medium- and high-dimensional state and action variables and ii) high- () and slightly high-dimensional state and action variables.

Figure 2 shows the illustration of trained approximators, only for for 3D visualization. As shown in Figure 2, most approximators can approximate a given target function, which is not convex but parametrized convex, for as its dimension is very low, while MA and LSE cannot because they are convex universal approximators. Note that MA and PMA have piecewise-linear parts, especially along the -axis, due to their maximum operator and linear combination. Compared to others, FNN, PICNN, and PLSE approximate the target function smoothly. This low-dimensional visualization indicates that MA and LSE may be restrictive for function approximation of a larger class of functions, e.g., parametrized convex functions, and FNN, PICNN, PLSE, and PLSE can approximate a smooth target function very well.

Numerical optimization is performed with the trained approximators to verify that the proposed approximators are well suited for optimization-based inference considering various pairs. The condition variables in the test data (unseen data) are used for the numerical optimization. For the decision variable optimization, a box constraint is imposed on the data point sampling, that is, . For (parametrized) convex approximators, a splitting conic solver (SCS) convex optimization solver is used, SCS.jl v0.8.1 [scs], with a disciplined convex programming (DCP) package, Convex.jl v0.14.18 [convexjl]. For nonconvex approximators, FNN in this case, the interior-point Newton method is used for optimization using Optim.jl v1.5.0 [kmogensenOptimMathematicalOptimization2018].

Figure 3

shows the violin plot of the 2-norm of minimizer error and the absolute value of the optimal value error. FNN shows a precise minimizer and optimal value estimation performance for low dimensions of

except for a few cases. However, FNN’s estimation performance is significantly degenerated for high dimensions of and . Due to its nonconvexity, the optimization solver finds poor minimizers and optimal values for high-dimensional cases. MA and LSE show slightly poor estimation performance. Given that they show poor approximation capability for parametrized convex functions, as shown in Figure 2, if the target function were not symmetric, the minimizer and optimal value estimation performance would be worse. PICNN’s performance seems better as the dimension increases, but for all cases including the high-dimensional cases of , the performance of PLSE is superior to that of PICNN. Note that MA and PMA show worse minimizer and optimal value estimation performance compared to LSE and PLSE in most cases, respectively, due to their inherited nonsmoothness from piecewise-linear construction. It should be pointed out that PLSE shows the near-minimum error in terms of both minimizer error and optimal value error.

Table II shows the mean values of solving time for optimization, minimizer error, and optimal value, averaged with test data (unseen data). For the low-dimensional case of , the mean solving times of FNN and MA are the smallest. As the dimension increases, most approximators’ solving times are not scalable; however, PLSE shows scalable solving time for and

. This tendency is also similar for minimizer and optimal value errors. For most cases, PLSE shows the near-minimum mean values of minimizer and optimal value errors. It should be clarified in this simulation study that all approximators have similar hyperparameters of network architecture although similar hyperparameter settings do not imply a similar number of network parameters for each approximator. For example, the same

for MA, LSE, PMA, and PLSE gives very different numbers of network parameters. The reason is related to how convex optimization solvers and DCP packages work: Most DCP packages generate slack variables and related constraints to transform the original problem into a corresponding conic problem. For instance, consider the following optimization problem:

 Minimizemax1≤i≤Iuisubject toumin≤u≤umax, (12)

where is the optimization variable, and the inequality constraint is elementwise. Then, the above problem can be transformed into an equivalent problem as

 Minimizetsubject toumin≤u≤umax,u≤t1, (13)

where denotes a vector whose elements are one. Similarly, in the case of MA, it generates one slack variable with additional constraints. For example, to make MA have a similar number of network parameters in the above simulation, should be up to approximately , which significantly slows the solver. In this regard, MA and LSE cannot easily increase the number of network parameters because the number of their network parameters depends only on . In contrast, PMA and PLSE can have many network parameters while fixing , which maintains the sufficiently fast solving time.

In summary, the results from the numerical simulation support that PLSE can approximate a given target function from low- to high-dimensional cases with considerably small minimizer and optimal value errors and fast solving time compared to other existing approximators including an ordinary universal approximator, FNN, convex universal approximators, MA and LSE, and a parametrized convex approximator, PICNN.

## V Conclusion

In this paper, parametrized convex approximators were proposed, namely, the parametrized max-affine (PMA) and parametrized log-sum-exp (PLSE) networks. PMA and PLSE generalize the existing convex universal approximators, max-affine (MA) and log-sum-exp (LSE) networks, respectively, for approximation of parametrized convex functions by replacing network parameters with continuous functions. It was proven that PMA and PLSE are shape-preserving universal approximators, i.e., they are parametrized convex and can approximate any parametrized convex continuous function with arbitrary precision on any compact condition space and compact convex decision space. To show the universal approximation theorem of PMA and PLSE, the continuous functions replacing network parameters of MA and LSE were constructed by continuous selection of multivalued subdifferential mappings.

The results of the numerical simulation show that PLSE can approximate a target function even in high-dimensional cases. From the conditional optimization tests performed with unseen condition variables, PLSE’s minimizer and optimal value errors are small, and the solving time is scalable, compared to existing approximators including ordinary and convex universal approximators as well as a practically suggested parametrized convex approximator.

The proposed parametrized convex universal approximators, such as PLSE, may be used as useful approximators for i) decision-making problems, e.g., continuous-action reinforcement learning, and ii) applications to differentiable convex programming. Future research directions may include i) surrogate model approaches to cover nonparametrized-convex function approximation by PMA and PLSE and ii) various applications to decision-making with dynamical systems including aerospace and robotic systems.

### V-a Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1A2C208394612).

## Appendix A Proof of Theorem 2

The proof of Theorem 2 is followed by the following Lemmas.

###### Lemma A.1.

The pointwise supremum of a collection of l.s.c. functions is l.s.c.

###### Proof.

Let be a collection of l.s.c. functions on to . Let be a pointwise supremum function of , i.e., . Given , for any , there exists an l.s.c. function such that . Since is l.s.c., there exists a neighborhood of such that . Hence, , which implies that is l.s.c. ∎

###### Lemma A.2.

Let be a parametrized convex continuous function. Let for any . Then, a mapping is l.s.c. on .

###### Proof.

Fix . From the continuity of , given , , such that where . Choose . Given ,

 (A.1)

for all such that , which implies . That is, is continuous for any . A fortiori, it is l.s.c. By Lemma A.1, the mapping is l.s.c on . ∎

###### Lemma A.3.

Suppose that all assumptions of Theorem 2 hold. Then, given , is a nonempty-convex-valued u.h.c. multivalued function.

###### Proof.

First, we show that is closed for any . Given , it is sufficient to show that , as where . By [rockafellarConvexAnalysis1970, Theorem 23.5], the following relation holds,

 u∗i∈Γu(xi)=∂fxi(u)⟺⟨u,u∗⟩≥fxi(u)+f∗xi(u∗i). (A.2)

By Lemma A.2 and the continuity of , taking implies that

 ⟨u,u∗⟩=liminfi→∞⟨u,u∗i⟩≥liminfi→∞(fxi(u)+f∗xi(u∗i))≥fx(u)+f∗x(u∗)⟺u∗∈∂fx(u)=Γu(x). (A.3)

That is, is closed. Furthermore, given , is nonempty and closed convex by definition for all [rockafellarConvexAnalysis1970, Theorem 23.4]. Since is -Lipschitz for all , the codomain of is compact. By the converse statement of [aubinSetValuedAnalysis2009, Proposition 1.4.8], is u.h.c. for any . Therefore, is a nonempty-convex-valued u.h.c. multivalued function. ∎

Now, we are ready to prove Theorem 2.

###### Proof of Theorem 2.

Note that the following proof is based on [aubinSetValuedAnalysis2009, Theorem 9.2.1] and [aubinDifferentialInclusionsSetValued1984, Theorem 2, Section 0]. Fix . Given , since is u.h.c. by Lemma A.3, for every , there exists such that , . The collection of balls covers . Since is compact, there exists a finite sequence of indices such that the collection of balls also covers . We set and take a locally Lipschitz partition of unity subordinated to this covering [aubinDifferentialInclusionsSetValued1984, Theorem 2, Section 0]. That is, is a locally Lipschitz function such that vanishes outside of , , and , . Since is compact, a locally Lipschitz function is in fact Lipschitz on . Note that , and the Lipschitz constant of (namely, ) is independent of . Let us associate with every a point and define the map as , which is an -selection of , whose values are in the convex hull of the image of [aubinSetValuedAnalysis2009, Theorem 9.2.1]. Then, for all ,

 ∥^u∗ϵ,i(x)−^u∗ϵ,i(x′)∥=∥∥ ∥∥∑j∈J(aj(x)−aj(x′))yi,j∥∥ ∥∥≤∑j∈J∥∥(aj(x)−aj(x′))yi,j∥∥≤∑j∈J|(aj(x)−aj(x′))|∥∥yi,j∥∥, (A.4)

and since due to the Lipschitzness assumption,

 ∥^u∗ϵ,i(x)−^u∗ϵ,i(x′)∥≤∑j∈J|(aj(x)−aj(x′))|∥∥yi,j∥∥≤L∑j∈J|(aj(x)−aj(x′))|≤~L∥(x−x′)∥, (A.5)

where , . Hence, is an equi-Lipschitz sequence of -selections of . ∎

## Appendix B Proof of Theorem 3

Similar to Appendix A, the proof of Theorem 3 is followed by the following Lemmas.

###### Lemma B.1.

Let be a metric space. Let be a dense subset of . Let be a Banach space. Let be a sequence of continuous functions . If uniformly on , then uniformly on as well.

###### Proof.

From the uniform convergence of , for all , there exists such that on , . Then, for all , the following inequality holds,

 ∥fn−fm∥∞≤∥fn−f∥∞+∥f