Balanced Identification as an Intersection of Optimization and Distributed Computing

07/31/2019 ∙ by Alexander Sokolov, et al. ∙ Institute for Problems of Information Transmission 0

Technology of formal quantitative estimation of the conformity of the mathematical models to the available dataset is presented. Main purpose of the technology is to make easier the model selection decision-making process for the researcher. The technology is a combination of approaches from the areas of data analysis, optimization and distributed computing including: cross-validation and regularization methods, algebraic modelling in optimization and methods of optimization, automatic discretization of differential and integral equation, optimization REST-services. The technology is illustrated by a demo case study. General mathematical formulation of the method is presented. It is followed by description of the main aspects of algorithmic and software implementation. Success story list of the presented approach already is rather long. Nevertheless, domain of applicability and important unresolved issues are discussed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The same object, process or phenomenon can be described by different mathematical models. The model selection among possible candidates remains one of the important problem in applied mathematics. A complexity, dimension and detail level of mathematical description are defined by both the qualitative knowledge of the object (which can be formalized as mathematical statements), and the availability of quantitative data, their volume, detail, reliability and accuracy. The more complex is a phenomenon under study and the corresponding mathematical model, the more detailed and reliable the measurement should be. Usually to find a balance between a model complexity and an available measurement an easy approach is used: various models and datasets are tested until the built model will meet goals of research. There is no universally accepted approach here still. In each case, various parametrization, different identification and verification procedures and variety of software are used. In addition, the model selection criteria is not formalized and the selection on the particular model is often made subjectively.

What we need are: a technology that simplifies the model selection with lesser researcher participation; a unified method giving a quantitative estimation of the conformity of the mathematical description to the available dataset for each model under study.

This paper presents a method of a balanced identification (also referred to as SvF-method, simplicity vs fitting) as a candidate satisfying above requirements. Besides formal description software implementation of the SvF-method is presented. This implementation is called as SvF-technology. Regardless, that the method is rather mature and was successfully used in a number of various researches in for a few years, its general mathematical scheme has been published recently [1]. The basis for SvF-method are nonparametric identification with regularization [2] and well-known cross-validation modelling error [3, 4, 5]. Regularization enables to search the desired balance between simplicity ad fitting to dataset. Cross-validation error is used as a quantitative estimation of the model.

In general, SvF-method may be applied to both parametric and nonparametric identification (where model under study contains unknown functions). In the paper nonparametric models are considered. Depending on the class of the models under study, the variational problems arising here can be interpreted as problems of variational spline approximation [6], nonparametric regression [7], predictive modeling [5]

, machine learning

[4], and others.

One should say that in the realm of nonlinear regression there exist a trend to create algorithms of “automatic” composition of the best regression function from a predefined set of basic functions [8]. Here, an object under study is considered as a black-box and the best regression remains pure phenomenological description. SvF-technology is not a fully automated models selection routine. It enables quantitative comparison of a given set of models, but it is the researcher who make modifications of the model (e.g. include/exclude model constraint equations). Moreover, SvF-method is an attempt to select from model describing internal structure of the phenomenon, e.g. via differential equations. In some sense, SvF-method generalizes the concept of constrained regression. Rather popular example of that is a monotonic regression [9], where monotonic response function should be found.

For a given model SvF-method requires solving a bilevel optimization problem. Here, at lower-level, we have a number of independent mathematical programming problems to get the value of objective function of the upper-level problem. As a result, for the given model, optimal values of cross-validation error, mean square error and optimal regularization coefficients will be obtained. Bilevel optimization is a well-known hard challenge for numerical methods [10]. Current implementation of SVF is based on approach similar to that of surrogate optimization and on ideas of Pshenichny-Danilin linearisation method [11].

Because, at the lower-level we have a number of independent problems it is possible to increase performance of the technology using distributed computing. One should mention that optimization in distributed computing environment is already using in regression analysis, e.g.

[12]. In SvF-technology basic tool for solving independent mathematical programming problems in parallel is an Everest optimization service [13, 14] based on Everest toolkit [15], http://everest.distcomp.org.

The paper is organized as follows. The next section is a demonstrative example of SvF-technology application to modelling of oscillator with friction. Section 3 contains mathematical formulation of the basic concepts of SvF-method including formulation of the main bilevel optimization problem. Section 4 presents software implementation of SvF–method. Subsections presents main aspects of implementation: usage of Pyomo package (Python Optimization Modelling Objects), http://www.pyomo.org; special symbolic notation that simplifies formulation of the model and regularization rule; discretization of differential and integral equations if they are contained in the model; usage of Everest optimization service. Successful use of SvF-technology in different researches are presented in Section 5. Some important open issues related with current implementation of SvF-technology are discussed in Section 6 followed by Acknowledgments.

2 Demonstrative example

Before formal description of the method it is worth to consider a demonstrative case of its usage for classical damped oscillator described by the following function of time :

(1)

where: is a trajectory of oscillation; (elastic coefficient); (friction factor); (initial displacement of oscillator)

Assume that we do not know in advance expression (1). What we have is a dataset (measurement series) on time interval (here and below denotes set of numbers ):

(2)

is a random error with zero mean and variance 0.1. “Distorted” in this way data are presented in Fig.

1. Hereinafter following notation for intervals of possible values of and will be used ( is chosen with some margin):

(3)
Figure 1: Raw data of demo example

Let’s set a problem of determining equation of motion (1) using the method of balanced identification via given measurements (2). To do this, consider several mathematical models in order to obtain estimates of their errors, which will be used as a criterion when comparing models.

One should say that al subsequent models of oscillator produce variational optimization problems. Direct solving of such problems is challenging because off-the-shelf optimization solvers (at least those, which are stable and “mature”) can not handle integral and differential equations. To get numeric solution all these problems will be replaced with corresponding finite dimensional mathematical programming problems. To do that discretization of differential and integral equations is applied (see more details in Section 4).

Model 0. Spline approximation of the function .

Take simplest model as a twice differentiable function:

(4)

and consider the following optimization problem (w.r.t variable ) depending on regularization coefficient

(5)

It is known [6, 4] that for any fixed the problem (5) has unique solution, which is cubic spline for the given set of points. It is presented in Fig. LABEL:fig:oscill_m0 (c) and looks rather reasonable.

Figure (a) (linear function ) corresponds to the case , when the 2nd regularization term of objective function (5

) suppresses the 1st one and optimal solution tends to be linear regression given by

least squares method (too rough approximation). In case of (b)

and we have the problem of spline interpolation: find a curve with minimal integral curvature passing exactly through given points (

obvious overfitting).

It is required to find the “best” value of at which the “smooth” model function will pass “close enough” to the measurements, smoothing out random errors (as in Fig. LABEL:fig:oscill_m0, (c)). The choice of an optimal coefficient can be made by minimizing the error estimate on the basis of the cross-validation procedure [3, 4, 5].

(a)
(b)
(c)
Figure 2: Three variants of data approximation by a function

The simplest variant of the cross-validation (CV) procedure (leave-one-out) is to use a one-point test sample (with number ) and a training sample consisting of the remaining set of measurements (set (2) with one removed element ). Formally, for a fixed and every (see (6)): find a solution of the following optimization problems, calculate approximation error on the test samples and, after all, get average CV-error (depended on ):

(6)

The best minimises the value of CV-error.

(7)

So, is a solution of bilevel optimization problem: (7) on upper-level and independent problems (6) at lower-level. The value is called modeling CV-error. It will be used as a quantitative measure of model’s correspondence to the available experimental data:

(8)

Finally, the gives the sought model function and its mean square error:

(9)

Function obtained for Model 0 is shown in Fig. 3 (the same as in Fig. LABEL:fig:oscill_m0c)

Figure 3: Model 0. SvF-approximation as cubic spline approximation

Cubic spline approximation is the simplest form of nonparametric regression (modelling), but another nonparametric models may be considered. Because the object under study seems to be a dynamical one, following models will include differential equation with unknown characteristics.

Model 1. 1st order differential equation.

Apply SvF-method for the model with two unknown (in advance) functions , related by the following differential equation (see definition of in (3)):

(10)

In this case, it is reasonable to characterize complexity of the model by smoothness of function , responsible for dynamics, not . The main criterion (with regularization term) will be the following (hereinafter we’ll use the same notation , but with another argument list):

(11)

Now, the search of “balanced” solution for the measurement data (2) is reducing to the search of coefficient (in (11)) that minimizes the value of following CV-error:

(12)

and is determined by the following problem:

(13)

Mean Square Error is calculated as in (9). See values of and in the 2nd row of Table 1 and plot of function in Fig. 4. All of them are obviously worse than those given by the Model 0.

Figure 4: Model 1. 1st order ODE: optimal and raw data.

Replacement of model 0 with model 1 (based on a solution of ODE (10) instead of an arbitrary smooth function) reduces the possibility of describing measurements (2) by the function . As a result, the possibilities of describing data (2) are reducing, because the set of feasible solutions of equation (10) is too narrow. Really, it is easy to see that only monotonic function may be a solution of (10). It is obviously insufficient for approximation of raw data in Fig. 1. It is the reason of increasing of CV-error and in 2nd row of Table 1 relative to those in the 1st row. We conclude that Model 1 is worse than Model 0.

Model 2. 2nd order differential equation.

Let’s take a model with three unknown functions , and related by the following 2nd order ODE:

(14)

Interval is wide enough to hold all possible values of .

Now, the regularization term of SvF-criterion includes two coefficients , and partial derivatives of :

(15)

In this case SvF-method finds two optimal coefficients and minimizing CV-error , which is calculated similarly to equations (12). Calculating solutions , , CV-error and mean square error are similar to that were used for Models 0, 1 (see equations (13), (9)). Values , are presented in Table 1, contour levels of function and trajectory - in Fig. 5. Plot of function is skipped, because it is almost the same as in Fig. 3 for Model 0.

Let’s draw conclusions from results obtained above:

  1. Erros for Model 2 are almost the same as for Model 0, so, transition from an arbitrary smooth function to solution of ODE (14) doesn’t reduces capabilities of describing raw data (2);

  2. Level lines in Fig. 5 are parallel and equidistant, so graph of is a plane, so is an affine function (a linear plus a constant).

Model 3. Oscillator with friction.

Affinity of permits to refine right hand of ODE (14):

(16)

where are unknown parameters to be identified by raw data (2). Although, such parametrization has been chosen “keeping in mind” equation (1) it is equivalent to general form of affine function .

Results of SvF-method for Model 3 are presented in Table 1. You can see substantially less predictive error () among others, despite that the plot of function is almost equivalent to that of Fig. 3.

Numerical results presented in Table 1 illustrate proposed methodology of quantitative comparing successive model refinements. If CV-error becomes less after modification, then “we on the right path”. So, Model 2 is better than Model 0, and Model 3 is better than Model 2. But replacement Model 0 with Model 1 was wrong.

Mean square error, , is another quantitative characteristic of the model - a measure of its “rigidity”. Really, acceptance of a “right” assumptions about the model reduces domain of feasible solutions. It means refinement of model and (see its changes in Table 1). Ideally, has to be equal to raw data measurements error.

Calculations done lead to the conclusion that the Model 3 (oscillator with viscous friction) fits the available data best. Indeed, formula (1), which was used in the generation of data, is one of the solutions of motion equation (16) - a well-known model of an oscillator with friction.

Figure 5: Model 2. Contour levels of and trajectory .
No Model
0 A function. Spline-approximation 0.1144 0.0907
1 1st Order differential equation 0.2093 0.1869
2 2nd Order differential equation 0.1146 0.0924
3 Oscillator with friction 0.1106 0.0932
Table 1: Modelling cross-validation error and mean square error of different models (by oscillator’s motion data).

3 Balanced identification method

In previous section balanced identification method (SvF-method) has been described by the example of modelling oscillator with friction. Below general description of the method is presented. It includes formalization of the basic concepts: model; data and model error; main criterion (with regularization) and cross-validation scheme.

Mathematical model.

A mathematical model is a set of statements (hypothesis) on the properties of the object under study in the form of: equations (e.g. ); inequalities (e.g. ); statements about the belonging of variables to different spaces (e.g. ) (finite dimensional and functional) and sets (e.g. )); logical expressions (e.g ) etc.

Let’s use a formal discreaption of mathematical model in the form of a system of equations:

(17)

where each element of the vector

corresponds to one of the model’s statements (hypothesis). For a sake of simplicity the equal sign () is used in (17), but it can be sudstituted by various inequalities (), adhesion sign (), logical conditions, etc. For example, a model (10) is written in the form of four statements:

(18)

In many cases above models may be presented in the following form as is customary in optimization literature:

(19)

where is a feasible domain of model variable defined by a set of constraints, where: is a system of equations; - a space of model variables (may be a composition of Euclidean and functional spaces); is a set defined by additional simple constraints, e.g. inequalities (like ), logical expressions, integer value condition on variables , etc. For instance, the monotonic regression model [9] may be considered as special case of (19) with inequalities.

Variables may include both functional variables, e.g. describing the trajectory of a dynamic system (see demo in Section 2), and the model parameters sought, e.g. elasticity coefficients and viscous friction (in the same demo).

Data and model error.

Usually, this is quite a “wide” set and for its narrowing (or for choosing one of the elements), when modelling real objects, some its characteristics are measured, which gives a set of data:

(20)

where: is index of measurement from finite set ; is vector of measurements values; is an operator expressing measurements values via model variables. Values of may contain measurements error . Further, for simplicity, the source data set will be identified with the set of indices of pairs .

Main criterion.

Consider a problem of minimization the balanced criterion (model simplicity vs. data fitting) that depends on , and vector of regularization coefficients :

(21)

where is a part of criterion responsible for fitting to measurements and - for model’s complexity (as a rule for smoothness).

Functional is a measure of model approximation error. Usually, additive measure is considered, i.e. if , , then (for any ). For definiteness, assume that

is a standard deviation:

(22)

Functional is a measure of model complexity (it is a one of interpretation of Tikhonov regularization functional [2]). It depends on the solution sought and the vector of non-negative regularization coefficients . The individual components of are penalties coefficients for various “aspects of the complexity” of the solution. It is presumed that functional is monotonically increasing over any (a component of vector ), tends to zero as , and tends to as .

Functional of complexity is choosing based on the specificity of the object’s model (19). They can be different characteristics of the curvature of functions constituting the model and/or some integral characteristics, such as energy or entropy. For instance, in demo example in section 2, for one-variable function (see Model 1): , and for two-variable function (see Model 2):

For fixed the problem of identification for a given set of measurements is to find that minimizes functional (21) on a feasible domain (19):

When we get solution with minimal complexity, while model error becomes very large. If, on the contrary, , model solution passes very close to the measurements but with bad smoothness. Above limiting cases are rarely of practical interest – it is required to find a balanced solution, i.e. such a value that provides a reasonable compromise between model error and the its simplicity.

Cross-validation procedure

To find , a cross-validation procedure is used [3]

. The dataset K is subdivided into a set of disjoint subsets corresponding to independent (in terms of probability theory) subsets of measurements:

(23)

(let’s note that in the case, when measurement errors are random, any non-intersecting sets will be independent).

Exclude some subset from the set . Let’s find minimum (21) for a given value and for remaining set (training set). Let be a corresponding solution:

(24)

then the value is approximation error on test set . Repeating this procedure for all subsets , and summing up the results, we obtain a cross-validation error for a given :

(25)

The sought vector of optimal weight coefficients and corresponding solution (for a whole set of measurements ) are defined as follows:

(26)

So, the search of optimal coefficients for a given dataset (20) is a bilevel optimization problem [10]: with independent optimization problems (24) (may be - variational ones) in the lower level to calculate the value of cross-validation error ; in the upper level - the search of minimizing , (26).

In the case when model error is the standard deviation (22) we have mean cross-validation error () and mean-squared error ():

(27)

The practical applicability of SvF-method depends on the complexity of the model of the object under study and on the number and quality of measurements. Generally speaking, SvF-method may lead to variational problems if model variables are functions of a continuous argument. In this case, discretization is applied. E.g., continuous domain is replaced with a finite set of points with unknown values of the sought function ​​at these points. Or unknown function is replaced with polynomials with unknown coefficients. As a result, all problems becomes finite-dimensional mathematical programming ones. For their solving one can effectively use available solvers and high-performance computing environments.

4 SvF-technology as software implementation of the method

Method of balanced identification method (or, shorter, SvF-method) have been described above in rather abstract manner. In this section main details of its software implementation are presented. Namelly: capabilities of symbolic notation of the model’s equations and for SVF-method options; discretization of continuous variables, differential and integral equations; surrogate optimization method for solving bilevel optimization problem (26); usage of Python Pyomo package and Everest optimization service.

4.1 SvF–technology workflow

The general scheme of the human-computer technology that implements the balanced identification method is shown in Fig. 6. At the first level (user level), an expert (shape 1 in Fig. 6), who has an idea of ​​the object under study, should prepare a measurement file and a task-file (shape 2 in Fig. 6). The measurement file contains a table with experimental data (in text format or in MS Excel or MS Acsess formats).

A text task-file usually contains the name of the measurement file, a mathematical description of the object (model), a list of unknown parameters, cross-validation specifications, etc. These files are transferred to the client program (shape 3). Here the variational problems are replaced with NLP (discretization), sets for CV are generated and the Pyomo NLP Model is formalized. The constructed data structures are transmitted to the surrogate optimization subroutine (shape 4), which implements an iterative numerical search for unknown model parameters and regularization coefficients to minimize the cross-validation error. This subroutine includes parallel solving of mathematical programming problems in the distributed environment of Everest optimization services. Pyomo package converts the NLP description in the so-called NL–files (shape 5), which at the server level are processed by solvers (shape 7) under the control of the service for solving optimization problems (shape 6). The solutions obtained by solvers (shape 8) are collected (shape 9) and sent back to the client level (shape 4), where they are analyzed: the conditions for the completion of the iterative process are checked.

If the conditions are not met, the program surrogate optimization calculates new values ​​of weights and the process repeats. Otherwise, the program summarizes the results (shape 10), calculates the error estimates, records the solution files, prepares the graphs of the functions found (shape 11), and sends them to the specialist (shape 1). The obtained results, especially estimates of the measurement modelling errors, are used by experts as an argument to choose a new (or modified) model or to stop calculations.

Figure 6: SvF-technology workflow

4.2 Symbolic notation of model and identification problem (task-file)

Figure 7 shows, as an example, the task-file with the oscillator model (Model 2) description and parameters of the numerical method under discussion.

[frame=single] CVNumOfIter 20 CVstep 21 RunSolver ServerParallel Select x, t from ../DATA/Spring5.dat GRID: t ∈ [ -1., 2.5, 0.025 ] X ∈ [ -0.1, 2.2, 0.1 ] V ∈ [ -1, 1.5, 0.1 ] VAR: x ( t ) v ( t ) f ( X, V ); PolyPow = 6 EQ: d2/dt2(x(t)) == f(x(t),v(t)); v(t) == d/dt(x(t)); OBJ: x.MSD() +  f.Complexity([Penal[0],Penal[1]]) Draw EOF
Figure 7: Task-file for oscillator. Model 2

The first three lines define the control parameters: CVNumOfIter - maximum (20) number of iterations (of surrogate optimization procedure), CVstep - splitting a data set into subsets (21) for the cross-validation procedure, RunSolver - a place and method of calculations (ServerParallel - calculations performed on the server, parallel).

The next line defines a data file and names of the columns to be read (“x” and “t”). The file contain the dataset defined in (2).

The rest of the task-file contains the “translation” of the mathematical problem (14),(15) into the language of the task-file notation. In the GRID: section the three sets (and a discretization parameters) are specified. For example, the expression t∈ [ -1., 2.5, 0.025 ] defines a grid from -1. to 2.5 with step 0.025.

The keyword VAR: at the beginning of a line starts a block describing variables (parameters that need to be identified). The grid function x (t) is defined on the grid t and is linked by the names (x and t) with the measurement values ​​from the file "../DATA/Spring5.dat". A function “f(X,V)” is defined as a sixth degree polynomial “PolyPow = 6” (with unknown coefficients). In this case sets “X”, “V” are used for numerical integration of “complexity” function.

In the next block (the keyword EQ:), two equations (14) of Model 2 are defined on the grid t.

Finally, the key word “OBJ:” defines the objective function consisting of two parts: the first - determines data approximation error for the trajectory x(t) (first term in (15), the second - the measure of the “complexity” (second term in (15)) of the function f(X,V). The Complexity function performs the calculation of the regularization penalty for the Penal parameters. Note that the function can be explicitly specified in the notation. For example, the first part of the integral looks like Penal[0]**2 *∫ d(V)*∫( d(X)*(d2/dX2(f(X,V)))**2).

The last two lines contain commands to draw the results (functions) and complete the calculations.

Comparing the task-file with the formal mathematical description (Model 2), we can note the clarity and convenience of the proposed notation.

4.3 Implementation of discretization

To obtain an approximate solution the continuous (infinite-dimensional, variational) identification problem should be transform by discretization into a finite-dimensional one. In this balanced identification technology implementation the discretization parameters appear (are set) in two objects: the discretization step in grids and the polynomial pow in polynomial functions. So, continuous functions are replaced by grid or polynomial functions, integrals by finite sums, differential equations by sets of algebraic equations.

Relatively simple schemes are used: the trapezoid integration, the two-point scheme “forward” - for the first derivatives, the three-point scheme “central” - for the second derivative, etc. The problem reformulated on Python language is written as a Pyomo model into a special Python-file. A qualified user can always modify it to fit his needs, for example, to use another discretization scheme. However, there remains the problem of evaluating the proximity of the found numerical solution (based on finite-dimensional mathematical programming problems) to the solution of the original variational problem (in functional space).

4.4 Implementation by Python & Pyomo

In last decade Python became one of the most used programming language in scientific computing in different areas of applied mathematics. It also concerns optimization modelling, i.e. support of AMPL-compatible solvers.

The thing is that, the correct programming of mathematical programming problems (e.g. discrete analogue of (24)) for passing to a solver via its low-level API may be very time consuming, especially for complex nonlinear problems. The problem complicates more when solvers are used in a distributed computing environment. Fortunately, the state-of-the-art solvers support so called AMPL (A Modeling Language for Mathematical Programming) [16]. Our research team from Center for Distributed Computing, IITP RAS, successfully use AMPL in our studies on optimization modelling about ten years. In particular, AMPL usage enabled to develop distributed environment of optimization services on the base of Everest toolkit [13].

Originally usage of AMPL required commercially licensed translator to generate special NL-file containing all data of the optimization problems to be solved. This NL-file may be passed to AMPL-compatible solver. On successful solving the solver returns solution as SOL-file, which may be read by AMPL-translator to get values of variables (including dual ones). An open source alternative to AMPL-translator appeared in 2012 as Pyomo package [17]. Pyomo enables to reproduce all “lifecycle” of AMPL-script but without AMPL-translator. Description of optimization problems, generation of NL-files and reading data of solution from SOL-files may be done in any Python application.

It is well known that Python directly is not intended for intensive computing. It is used rather as a glue scripting providing data exchange between high-performance software. As to Pyomo, the bottleneck may be the preparing of NL-file. Fortunately, noticeable delay became prominent for rather big optimization problems with dozens thousands variables and constraints. The most hard computations concerning solving of problems presented by NL-files is performed by rather efficient solvers written in C, C++ and Fortran.

One of the results of the task-file processing is a special software module that reformulates the optimization problem into Python language using the Pyomo optimization modeling package. Example of a fragment of auto-generated Python code corresponding to the first differential equation (15) is shown in Figure 8.

[frame=single] def EQ0 (Gr,t) : return ( ((x((t+0.025))+x((t-0.025))-2*x(t))  /0.025**2)==f(x(t),v(t)) ) Gr.conEQ0 = Constraint(  myrange(-1.0+0.025,2.5-0.025,0.025),  rule=EQ0 )
Figure 8: First differential equation (15) as a part of Pyomo model

4.5 Surrogate optimization in SvF-technology

Bilevel optimization is a well-known hard challenge for researchers in numerical methods [10]. Current implementation of SvF is based on approach similar to that of surrogate optimization [18], i.e. upper-level objective function is approximated by its evaluation in a few points. Corresponding set of points is generated successively until some stopping condition is satisfied.

Scheme of SvF-algorithm has two essential features: 1) the computational cost of each iteration is rather high (a set of independent NLP problems is solving); 2) the resulting CV-error estimate may contain additional random errors caused be inexact solution of the lower-level NLP-problems. A special surrogate optimization procedure that takes into account these features has been developed. Lower-level problem is treated as a “black-box” that calculates (spending significant computational resources) an approximate result () for the input vector .

So, we are looking for an approximated minimum of function , (25) - an optimal value function in upper-level problem (shapes 4 in Fig. 6). The value of depends on solutions of independent optimization problems (24) for a given (shapes 5-9 in Fig. 6). The method is based on successive approximation of by 2nd order polynomial of on a sequence , . Coefficients of this polynomial are recalculated to get more and more better approximation of values for already “passed” points . Namely, function , is approximated by multidimensional polynomial:

(28)

components of are correcting on working of algorithm.

The algorithm builds sequence of points , , stepwise as following. Let, at step , a set of pairs is already known ().222At the initial stage, the new values of the vector are obtained by small perturbations of all its components In what follows, these of pairs will be treated as set of points in . Without loss of generality (may be after renumbering) assume that .

Consider a problem of approximation of these points by a polynomial (28). More over, the less distance between and the “best” the more weight the error will have in the following penalty function with regularization term (sum of all squared from (28) weighted with coefficient ):

(29)

This approximation (not interpolation !) does not pass exactly through the points but it is close to points, where is close to minimum of .

It is easy to see that the objective function of (29) is strongly convex (as function of and reaches unique minimum at . The algorithm searches that minimizes error of approximation (by polynomial ) at best vector . Another bilevel optimization problem arises:

(30)

where is a solution of lower-level problem (29).

Note, that in the upper-level problem we are looking for an optimal scalar value and lower-level problem is effectively solvable (as strongly convex one). Thus, optimal solution may be found by some of known line search algorithms, e.g. “bisection method”.

After polynomial approximation is determined, next vector is found as a solution of the following auxiliary problem, similar to that is used in Pshenichny-Danilin linearisation method [11]:

(31)

where is a gradient of in variable , and means scalar product of vectors. Let be a solution of this problem (it exists and unique). This linearization of polynomial (in a neighbourhood of lowest point ) makes it possible to smooth out the black-box computational errors.

At the next step of the algorithm we calculate value as result of solving a set of problems (24). If the difference is less then some predefined small threshold, then algorithm stops and returns (if ) or (if ). If above difference in is “significant”, then round of the algorithm is repeated with extended set of pairs .

4.6 Solvers and Everest optimization service

The basic tool for solving independent mathematical programming problems in parallel (shape 6-8 in Fig. 6) is Everest optimization service [13, 14]. There are a number of services available at the URL https://optmod.distcomp.org. All of them are based on Everest toolkit [15]. Each service, an Everest-application in “Everest-terminology”, is a front-end of a pool of solvers deployed and running on an number of various computing resources (desktops, standalone servers, clusters). This computing environment is extensible because potential users can add their own computing resources to the optimization environment via open source Everest agent http://everest.distcomp.org/docs.

Current implementation of SvF-technology is based on solve-ampl-stub Everest-application. This service solves mathematical programming problems presented in the form of an NL-file [16] and returns SOL-file with solution found. Now the service provides unified access to the following solvers, allowing to solve the main types of mathematical programming problems (LP/MILP/NLP/MINLP):

One should say that CBC and Bonmin seems to be a bit obsolete now. All results presented above, in Section 2, and below, in Section 5, have been obtained with solver Ipopt. Currently we are beginning to use SCIP global-optimization solver [19]. We do it carefully, because search of global optimum takes much more time than the seacrh of local solution that Ipopt does.

Besides, we are working on implementation of an analogue of AMPLX (an extension of AMPL-scripting to exchange data with Everest optimization services, [13]) on the base of Pyomo - so called PyomoEverest, https://github.com/distcomp/pyomo-everest. It will provide developers with simple Python API to exchange data between Pyomo optimization models and AMPL-compatible solvers. In addition to the solve-ampl-stub service another Everest-application to solve a set of independent problems more effectively by the generic Everest Parameter Sweep application [20] are developing now.

5 Success use cases

The technology has been successfully used in different application areas enlisted below.

Plant physiology [21]. The dataset (about 1600 points in time, step - 30 min) consist of transpiration and external climatic (photoactive radiation, humidity and temperature) measurements. Among a number of (about 10) models of different complexity, a dynamic model of the water in a plant was chosen.

Ecology [22]. The method of balanced identification was used to describe the response of Net Ecosystem Exchange of (NEE) to change of environmental factors, and to fill the gaps in continuous flux measurements in a sphagnum peat bog in the Tver region. The dataset is about 2000 points (step - 30 min). The selected model (the dependence of NEE on climatic parameters) was a superposition of one- and two-variable functions.

Meteorology [23]. The model of radon dynamics in the atmosphere was build. Data set (about 3000 points, about month, step 15 min) contains measurements of characteristics of the γ-radiation field by a gamma spectrometer. The model, consisting of 2 ODE, describes the dynamics of the volume activity of and its daughter products and .

Pollution, geography, soil erosion [24]. Datasets (from 100 to 1000 points by X and Y coordinates, step - about 100 meters), consist of altitude (SRTM satellite data) and decay activity (airborne gamma data) measurements. A number of models were considered (functional dependencies), including digital terrain, soil erosion and contamination models.

Air pollution [25]. The problem is considered of selection of mathematical description based on 2D diffusion-transport model for the solution of the inverse problem of atmospheric pollution dispersion. The following problems were studied (see poster at https://www.researchgate.net/publication/334523588):

F) The estimation of distribution and sources of the atmospheric aerosol pollution (PM10) in the North of France. The dataset (about 30000 point) consists of 12 stations of Atmo Hauts-de-France pollution measurement network.

R) The evaluation of pollutant emissions and the average wind rose for contamination sources with known positions. Measurements of pollution of soil and lake’s water by metals (Cu, Ni, etc.) were obtained in the Kola Peninsula in Russia at about 100 stations.

Medicine [26]. Different models of glucose and insulin dynamics in human blood are analyzed. Datasets (about 20 measurements of glucose concentration and insulin) contain intra venous glucose tolerance test results.

Thermonuclear plasma [27, 28]. Compared several models (integro-differential, partial derivatives) describing the initial stage of the fast nonlocal transport events, which exhibit immediate response, in the diffusion time scale, of the spatial profile of electron temperature to its local perturbation, while the net heat flux is directed opposite to ordinary diffusion (i.e. along the temperature gradient). The results of experiments on a few thermonuclear installations were used. Datasets are about 100.

Evolutionary theory [29]. The optimization problem, which reflects the mechanisms of regulation of the speed of evolution that provide an adequate population response to the direction and rate of environmental change was considered. The numerical experiment results show plausible age–specific fertility dependences on the rate of environmental changes, as well as describing and explaining a number of evolutionary effects.

6 Discussion

Presented SvF-method and its software implementation are interesting combination of knowledge from the areas of data analysis, optimization and distributed computing software including: cross-validation and regularization methods, algebraic modelling in optimization and methods of optimization, automatic discretization of differential and integral equation, optimization REST-services, etc.

Current implementation of balanced identification method is based on state-of-the-art open source NLP-solvers (AMPL-compatible) [30, 19] and, optionally, optimization services [13] of extensible computing environment https://optmod.distcomp.org based on Everest toolkit [15]. Environment is extensible because potential users can add their own computing resources to the optimization environment via open source Everest agent http://everest.distcomp.org/docs.

The implementation supports symbolic notation to describe model and corresponding identification problem. This feature automates the formulation of optimization problems with less user effort. Special module generates fragments of Python codes based on special discretization subsystem and Pyomo framework to interact with open source AMPL-compatible solvers.

Success story list of the presented approach already is rather long. Nevertheless, authors realize its domain of applicability and important unresolved issues.

First of all the method is not a fully automated “model generation and/or selection”. SvF-technology enables quantitative comparison of a given set of models, but it is the researcher who make modifications of the model (e.g. include/exclude model constraint equations).

The second issue concerns the search of global optimum in problems (24) required for cross-validation. In general case, these problems may have a lot of local extrema, more over, the may even contain integer variables. If so, NLP-solver Ipopt can not guaranty global optimum. It is possible to use SCIP solver implementing branch-and-bound algorithm for mixed-integer nonlinear mathematical programming problems, but at the cost of more longer solving time (and memory consumption). It is available parallel implementation of SCIP [31] but, before, we are going to use simpler workarounds (e.g. Ipopt multi-start options and some simply checkable necessary conditions of that Ipopt returns real global optimum). In current practice some “informal” analysis of SvF-algorithm convergence with respect to decreasing of modelling cross-validation error is applied.

Discretization is the third important challenge for SvF-method. Presence of ordinary and partially differential equations, along with integral ones, is typical for models in physics, biology, medicine etc. Current implementation is based on rather simple discretization module that implements two basic schemes: finite-difference and approximation by sums of polynomials with unknown coefficients. We plan to analyse usefulness of Pyomo subsystem DAE (discretization of Differential and Algebraic Equations) for SvF-implementation. One should remind that SvF implementation uses Pyomo package already for programming of optimization problems and exchange data with AMPL-compatible solvers.

Another issue concerns domain of applicability discretization itself, e.g. for model based on optimal control problems. Here the domain of applicability of discretization depends on whether the corresponding problem has any special properties, like a chattering for a global optimal solution. Wellknown “Fuller problem” [32] is one of the simplest demonstration of that effect. At a first glance it is very simple optimal control problem on a finite time interval, but for some segment of initial conditions any optimal solution can not be reproduced by any finite discretization of time interval.

Acknowledgments

The results of section 4 were obtained within the Russian Science Foundation grant (project No. 16-11-10352). The results of sections 1-3, 5 were obtained within Russian Foundation for Basic Research grant (project No. 18-07-01175).

References

  • [1] A.V. Sokolov and V.V. Voloshinov. Choice of mathematical model: balance between complexity and proximity to measurements. International Journal of Open Information Technologies, 6(9), 2018.
  • [2] A.N. Tikhonov. On mathematical methods for automating the processing of observations. In Problems of Computational Mathematics, pages 3–17, 1980.
  • [3] Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995.
  • [4] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009.
  • [5] Max Kuhn and Kjell Johnson. Applied predictive modeling, volume 26. Springer, 2013.
  • [6] A.I. Rozhenko. Theory and Algorithms of Variational Spline-Approximations. Novosibirsk State Technical University, 2005. (in Russian).
  • [7] Wolfgang Härdle. Applied nonparametric regression. Number 19. Cambridge university press, 1990.
  • [8] Vadim Strijov and Gerhard Wilhelm Weber.

    Nonlinear regression model generation using hyperparameter optimization.

    Computers & Mathematics with Applications, 60(4):981–988, 2010.
  • [9] Oleg Sysoev and Oleg Burdakov. A smoothed monotonic regression via L2 regularization. Knowledge and Information Systems, 59(1):197–218, 2019.
  • [10] Stephan Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
  • [11] B.N. Pshenichnyi and A.A. Sosnovsky. The linearization method: Principal concepts and perspective directions. Journal of Global Optimization, 3(4):483–500, 1993.
  • [12] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
  • [13] S. Smirnov, V. Voloshinov, and O. Sukhoroslov. Distributed optimization on the base of AMPL modeling language and Everest platform. Procedia Computer Science, 101:313–322, 2016.
  • [14] S. Smirnov and V. Voloshinov. On domain decomposition strategies to parallelize branch-and-bound method for global optimization in Everest distributed environment. Procedia Computer Science, 136:128–135, 2018.
  • [15] O. Sukhoroslov, S. Volkov, and A. Afanasiev. A web-based platform for publication and distributed execution of computing applications. In Parallel and Distributed Computing (ISPDC), 2015 14th International Symposium on, pages 175–184, June 2015.
  • [16] R. Fourer, D.M. Gay, and B.W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Second edition. Duxbury Press/Brooks/Cole Publishing Company, 2003. https://ampl.com/resources/the-ampl-book.
  • [17] W. E. Hart, C. D. Laird, J.-P. Watson, D. L. Woodruff, G. A. Hackebeil, B. L. Nicholson, and J. D. Siirola. Pyomo–optimization modeling in Python. 2nd edition, volume 67. Springer, 2017.
  • [18] Alexander Forrester, Andras Sobester, and Andy Keane. Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008.
  • [19] A. Gleixner, M. Bastubbe, L. Eifler, T. Gally, G. Gamrath, R. L. Gottwald, G. Hendel, C. Hojny, T. Koch, M. E. Lübbecke, S. J. Maher, M. Miltenberger, B. Müller, M. E. Pfetsch, C. Puchert, D. Rehfeldt, F. Schlösser, C. Schubert, F. Serrano, Y. Shinano, et al. The SCIP Optimization Suite 6.0. Technical Report 18-26, ZIB, Takustr. 7, 14195 Berlin, 2018.
  • [20] S. Volkov and O. Sukhoroslov. A generic web service for running parameter sweep experiments in distributed computing environment. Procedia Computer Science, 66:477–486, 2015.
  • [21] A.V. Sokolov, V.K. Bolondinsky, and V.V. Voloshinov. Technologies for constructing mathematical models from experimental data: applying the method of balanced identification using the example of choosing a pine transpiration model. In National Supercomputer Fjrum (NSCF-2018), 2018.
  • [22] A.V. Sokolov, V.V. Mamkin, V.K. Avilov, D.L. Tarasov, Y.A. Kurbatova, and A. V. Olchev. Application of a balanced identification method for gap-filling in CO flux data in a sphagnum peat bog. Computer Research and Modeling, 11(1):153–171, 2019.
  • [23] Yu.E. Lavruhin, A.V. Sokolov, and D.S. Grozdov. Monitoring of volume activity in the atmospheric surface layer based on the testimony of the spectrometer seg-017: error analysis. In Radioactivity after nuclear explosions and accidents: consequences and ways to overcome, pages 359–368, 2016.
  • [24] V.G. Linnik, A.V. Sokolov, and I.V. Mironenko. 137cs patterns and their transformation in landscapes of the opolye of the bryansk region. Modern trends in the development of biogeochemistry, pages 423–434, 2016.
  • [25] A.V. Sokolov, A.A. Sokolov, and Hervé Delbarre. Method of balanced identification in the inverse problem of transport and diffusion of atmospheric pollution. In EGU2019-15175, volume 26, 2019.
  • [26] A.V. Sokolov and L.A. Sokolova. Building mathematical models: quantifying the significance of accepted hypotheses and used data. In XXI International Conference on Computational Mechanics and Modern Applied Software Systems (CMMASS’2019), pages 114–115, 2019.
  • [27] A.P. Afanasiev, V.V. Voloshinov, and A.V. Sokolov. Inverse problem in the modeling on the basis of regularization and distributed computing in the Everest environment. In CEUR Workshop Proceedings, pages 100–108, 2017.
  • [28] A.B. Kukushkin, A.A. Kulichenko, P.A. Sdvizhenskii, A.V. Sokolov, and V.V. Voloshinov. A model of recovering parameters of fast non-local heat transport in magnetic fusion plasma. Problems of Atomic Science and Technology, Ser. Thermonuclear Fusion, 40(1):45–55, 2017.
  • [29] A.V. Sokolov. Mechanisms of regulation of the speed of evolution: The population level. Biophysics, 61(3):513–520, 2016.
  • [30] Andreas Wächter and Lorenz T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming, 106(1):25–57, 2006.
  • [31] Yuji Shinano, Tobias Achterberg, Timo Berthold, Stefan Heinz, Thorsten Koch, and Michael Winkler. Solving open MIP instances with ParaSCIP on supercomputers using up to 80,000 cores. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 770–779. IEEE, 2016.
  • [32] A.T. Fuller. Relay control systems optimized for various performance criteria. volume 1, pages 520–529. Elsevier, 1960. https://doi.org/10.1016/S1474-6670(17)70097-3.