While in the very heart of the standard traditional identification schemes from the early years of automatic control, the use of data-driven models in control design witnessed a renewable surge these last years due to the emergence of user-friendly tools of map fitting libraries such as scikitlearn  and TensorFlow 
to cite but few ones. These tools enable to try, in an astonishingly effortless way, several possible structures in order to map static relationships between a vector of inputs (the regressor) and some targeted variable (label) to be identified.
When it comes to their use in solving standard control problems, the emergence of these tools in the control community induces quite often very passionate debates regarding their true novelty when compared to the traditional ways of handling the same problems in the control community. As soon as nonlinear systems are concerned, the opportunities offered by these tools should be, and is in fact, less debatable as they clearly enable an easy and rapid prototyping of candidate solutions to many nonlinear control problems.
Regarding the use of nonlinear fitting maps in control design, several approaches can be used depending on the choice of the target variable for which a model is searched for. Indeed:
A first option is to use the optimal solution (the first control in the optimal sequence over the prediction horizon) as a target variable. This is particularly appealing since it avoids the need to do on-line optimization as it is typically done in standard Model Predictive Control design. For this to be done, a dynamic model of the controlled system is needed. This model can therefore be used to solve off-line a high number of open-loop optimal control problems so that a learning data can be obtained for the derivation of the model via nonlinear regression problem settings. Each of these problems is obtained by choosing a different initial state of the presumably available dynamic model. Beside the need for the complete model for the formulation of the open-loop optimal control problems, the computation time is expected to be quite important as for each sample in the learning data, a constrained NLP problem need to be solved off-line using some appropriate NLP solver. Finally, even if a mathematical model is available, the values of its parameters are generally not well known and all mismatches in these values are inherited by the learning data leading to detuned results that have to be corrected in some way.
A second option might be used when the equations of the dynamics are not available. In this case, the data-driven tools might be used to derive an input/output nonlinear model of the dynamics. The latter can then be used in two ways:
a) Either as in the previous approach in which the learned model is used to define a high number of optimal control problems to be solved off-line.
b) Or the model is used in a standard on-line NMPC framework where repetitive on-line solution of the underlying sequence of optimal control problems is solved using fast NMPC approaches.
In this option, the original task is rather ambitious because one is seeking a complete faithful model of the whole dynamics of the system that can is generally very difficult to achieve even for almost linear systems.
The third option concerns also the case where no mathematical model is available but here, one does not try to identify a simulator of the system. Rather, one tries to identify the relationship between past measurements, past and future control on one hand and a resulting cost value that is defined on the future prediction horizon on the other hand. In order to get the data that makes the identification of such map possible, it should be assumed that some initial loosely defined control (that might be open-loop or roughly defined output feedback) can be applied to the system over a sufficiently long time in order to gather a rich set of measurement data for the above-mentioned identification task. This requirement is satisfied for a reasonably large class of systems. Note that the resulting model does not allow system simulation as in the last option, but it hopefully captures the precisely needed relationship that enables the underlying optimal control problem to be defined at each decision instant based on the previously obtained measurement sequences.
The present contribution follows this third option and proposes a complete framework that is tested on an illustrative example. Moreover, all the tools used in the implementation of the proposed solution, namely the preprocessing of the data, the identification of the model and the solution of the resulting optimization problems, are made available on an open GitHub repository111https://github.com/mazenalamir/siso_predictor, https://github.com/mazenalamir/torczon .
The use of the newly available Machine Learning libraries in addressing many control problems is a very wide investigation area and it is unlikely that any additional contribution can legitimately claim a total novelty as far as general ideas are concerned. However, as the Devil is in the details, any complete exposition of a successful application of such ideas together with the availability of the tools and lines of code that made this success possible can be of great help to many researchers that are seeking concrete and accessible recipes. This is the aim of the present contribution.
This paper is organized as follows. First, some definitions and notation are stated in Section 2. The illustrative example that is used throughout the paper to support the presentation is explained in Section 3. The proposed framework is detailed in Section 4 while the results are shown in Section 5 together with an extended discussion. Finally Section 6.
2 Definition and Notation
Consider a nonlinear system to be controlled that can be described by some unknown dynamics given by
where , and stand for the state vector, the control input vector and the vector of measured variables respectively. Since the maps , and the very definition of the state vector are all supposed to be unknown, it is assumed that a cost function expressing the quality of the system’s trajectories over some prediction horizon can be expressed, at each instant , through an expression cost that involves the only measured quantities, namely and where:
The feasibility of any output feedback design is based on the observability assumption according to which there is some static map such that for sufficiently high , one has:
as this simply stems from the fact that the state is inferred from which together with determine the future (see Figure 1).
A direct consequence of the observability property is that the cost function cost can be written in the following form cost or equivalently, using a straightforward definition of :
This cost can then obviously be used to design NMPC output feedback by solving at each instant the optimization problem:
and by applying the first control in the so-computed optimal sequence (see  for more details regarding MPC design).
Note that in (5) the set stands for the admissible domain for the input vector. This implements box-like hard constraints on the control. Should output constraints be incorporated, they can appear in the definition of the cost function through constraints penalty leading to soft implementation of such constraints. These obvious details are omitted here for the sake of conciseness.
In the next section, an illustrative example is introduced so that the concepts can be associated to a concrete use-case. This example is also used in the numerical investigation-related section.
3 Illustrative Example
Consider the example of the nonlinear continuous flow stirred-tank reactor with parallel reactions :
This system is commonly modeled by the following set of highly nonlinear Ordinary Differential Equations (ODEs):
where and stand for the concentrations of and respectively while represents the temperature of the mixture in the reactor. represents the waste product. The control variable is given by the heat flow . The objective of the control is to maximize the amount of product which is precisely the only measured variable. It can be checked that this output makes the system completely observable.
This system is recurrently invoked in the works related to the so-called Economic MPC  where the cost function is not based on a priori known steady state. Instead, only an economic stage cost is used to enhance the maximization of the production. More precisely, the following structure of the cost function is typically used:
where the first term is noting but the average value over the prediction horizon while the second terms implements a terminal penalty at the end of the prediction horizon following standard stability-related recommendations . The minus sign recalls that maximization is required.
The following are some known facts that are relevant for a better understanding of the numerical results shown in the remainder of this paper:
The consequence of the EMPC formulation is that optimal trajectories are not necessarily steady but rather oscillating around the optimal steady state since these oscillations potentially induce a larger production.
The optimal solution is very sensitive to the initial state. This can be felt when using even sophisticated solvers  in the fact that they need rather high numbers of iterations even when warm start is used during the closed-loop simulations. This means that the underlying optimization problems are rather challenging. This should be kept in mind when appreciating the quality of the results under the identified maps.
In the forthcoming developments, a sampling period of time units is used (note that the equations above are normalized so that the absolute time is meaningless). Within all possible steady pairs, the optimal one corresponds to and and hence .
4 Proposed Framework
The key tasks in the implementation of the output NMPC feedback as sketched in the previous section are the following:
Define an initial input choice (initial output controller or simple open-loop) to generate the learning data (Section 4.1).
These steps are successively detailed in the following subsections. Obviously there are many possible choices for each of the above mentioned tasks. The framework sketched here proposes a set of choices and examine how the resulting framework behaves in closed-loop.
4.1 Generate the learning data
This task is probably the most problem-dependent among the above enumerated tasks. In some cases, sub-optimal explicit initial output feedback can be implemented, for others, simple random sequences can be applied. In the case of the example under consideration, a long random sequence of input is sampled uniformly inside the admissible set. This sequence is applied to the system and the resulting output is collected for the learning phase. Figures 2-3 show typical resulting sequences of input/output measurements.
To be more precise regarding the way the input signal is generated, the following procedure is used to induce explicitly non uniform piece-wise time segment lengths:
First a set of random values of interval durations is sampled uniformly in the set of integer multiple of lying inside . This leads to durations randomly sampled between and time units.
A corresponding set of
values of the input uniformly distributed insidehas been sampled.
Figures 2-3 show one realization that resulted from the above procedure. Experience showed that the overall results in terms of closed-loop performances are quite insensitive t to the specific realization that is used to learn the model provided that the duration is sufficient to enable the learning data to be representative.
4.2 Identification of the cost function
The above generated data set is used here to identify the relationship (3) [or equivalently (4)]. This problem is a standard regression problem where one seeks a function that maps a regressor to some continuous label , namely:
the is to be understood in terms of some measure of the statistics of the errors over a set of realizations that is called the learning set. Note that in the case of the example under study, the regressor and the label are defined by (see Figure 4):
Using a rolling windows of length over the time series depicted on Figures 2-3 that contains 50324 lines, it is possible to extract a number of samples to build the data set mentioned above.
The quality of the identified map is generally evaluated using a so called test set that contains realizations that are not present in the learning set in order to check the generalization power of the identified map (or to say it in other words, in order to evaluate the degree of overfitting that might occur when using a too complex model compared to the available data).
As it is shown later in the validation section, the window size can be as big as when using cost functions that are defined on a prediction horizon of 2 time units (recall that the sampling period is which is needed to ensure stability of the numerical integration as the system dynamics is stiff. Therefore, when considering the regressor defined by (9), this would lead to a regressor of dimension which is obviously prone to overfitting (too many parameters).
A typical approach to reduce the size of the regressor is to use Principal Component Analysis feature reduction method. However when it comes to handle physical time series, this approach is not necessarily the lighter one or even the more appropriate. Rather, it is advised here to use the followingmoment-like features consisting in considering the average of the successive derivatives of a smoothed version of the signals over the time intervals. More precisely, given a signal defined on some interval , we define the following features:
where is the value at instant of the -th derivative of the signal with the convention . This feature extractor is implemented in the utility feature_gen of the module siso_predictor available on the GitHub repository .
Therefore by extracting the features of , and that are involved in the definition (9
), one defines the following features extraction map:
As far as the example is concerned, is used in the implementation leading to a regressor of dimension 9.
Having this reduced vector of features at hand, one can choose one of the many available Machine Learning
nonlinear structures (SVM, Random Forest, Neural Network, Ridge, Lasso, to cite but a few of available structures) in order to solve the regression problem that can be stated shortly denoted as follows:
As far as the example is concerned, a Random Forest (RF) structure with a controlled maximum number of leaves nodes has been used222https://scikit-learn.org/stable/modules/generated/sklearn.ensemble. RandomForestRegressor.html. As a matter of fact, the utility learn_model of the module siso_predictor available on the GitHub repository  enables to built a piece-wise RF model with a desired number of clusters in order to increase the precision of the model while controlling the complexity of each sub-model). An example of the fitting results that can be achieved using a 66-33% of training/testing split of the data set is shown in Figure 5. When clustering is required, the clustering is performed based on the features contained in using a standard K-Nearest-Neighbors clustering algorithm of the scikitlearn library .
4.3 Solving the optimization problem
Recall that one is interested in finding the best sequence of future control given the previous measurement of output and control over some past measurement window of length . The resulting optimization problem can therefore be expressed as follows:
in which are the parameters of the problem (obtained from the past window ) while the future sequence of control is the decision variable. Note however that the dimension of the decision variable is which can be huge. We know however that due to the very definition of the feature extraction map introduced in the previous section:[backgroundcolor=Gray!20, linecolor=white] All the control profiles that share the same first moments lead to the same predicted cost function.
Consequently, there is no benefit from having too many degrees of freedom in the decision variable involved in (15) that is the reason why a low-dimensional parameterization of the control profile
is used that leaves as free only the values of the control input at few specific future instants on the prediction horizon. These values induce the remaining values by linear interpolation. Figure6 shows an example of such piece-wise linear control profile over a prediction horizon of length and using thee d.o.f at instants and .
Using this parameterization, one can denote the future control profile by
where is the new vector of d.o.f and the optimization problem (15) to be solved becomes:
Now one might say that at this stage, it is about solving an optimization problem which can be done using many available solvers. This is only partially true because many of the map structures that can be invoked from the ML libraries to fit a nonlinear function as expressed in (13) are discontinuous by nature (this is in particular true for the Random Forest regressor). This means that the optimization method should be selected with some care.
To address this issue, it is proposed here to use a modified version of the derivative-free Torczon algorithm [5, 10] that is made available in GitHub repository . This algorithm is particularly adapted to discontinuous cost functions and it incorporates a multiple starting point mechanism in order to avoid local minima. In a nutshell, the Torczon algorithm starts with a regular polygone in where is the dimension of the decision variable and improves it iteratively based on the values of the cost functions at its nodes. The updating involves three possible operations: contraction, reflexion and extension depending on the success/failure of the reflexion beyond the best node. Only function evaluation is required and discontinuity of the cost function induces no specific issue.
5 Results and Discussion
The following parameters are used in the closed-loop simulations:
Prediction horizon , sampling period
The parameterization of the control profile is done using two different options:
Either two degrees of freedom parameterization with the values of the control at the sampling instants taken as free decision variables to define the piece-wise affine control profile. In this case, one single initial guess () is used and the maximum number of iterations of the Torczon algorithm is fixed to .
Or Four degrees of freedom at the sampling instants . Here, and are used.
different initial states have been used to check the robustness of the result to context variations. For each of these initial states, the closed-loop system under each of the above settings is simulated. The results are shown in Figures 7 and 8 hereafter.
The identification of the model has been done using a random forest regressor with a single cluster and a maximum number of leaves nodes set to .
Note that since the cost function prediction model needs the measurement over the past window of length , the first control actions are taken randomly waiting for the buffer to be filled. This corresponds to the first time units of the simulations.
It can be noticed that the results are globally quite good in terms of the convergence towards the vicinity of the optimal value . Note however that the oscillations that is the signature of the fully optimal behavior are not present probably due to the low dimensional parameterization that prevents the cycles to take place since its open-loop realization is not inside the possible options offered by the parameterized control profile. By the way, this steady behavior can be the desired one since the slight increase in production induced by the fully optimal limit cycles is obtained at the price of undesirable fatigue on the actuator.
It can also be observed that the quality of closed-loop is slightly better in the case where the second setting is used. However, one must keep in mind that the maximum number of iterations is times higher in the second option compared to the first. ( compared to ). In fact what does really matter here is to assess the fact that the identified model seems appropriate and the remaining issue is a matter of optimization parameters tuning which is a typical problem/hardware dependent trade-off.
5.1 Computation time
Although the computation time is not the topics of this paper, it is worth giving some hints regarding the complexity of the resulting on-line computation. Computation is done using Python 3 jupyter-notebook on a MacBook-Pro 2017, Mojave 10.14.5 with a 2.9 GHz Core i7 processor. The computation of a single evaluation of the identified Random-Forest cost function (13) takes less than 1.7 ms. Since the Torczon algorithm performs () evaluation per iteration (where is the number of decision variables), this means that the maximum MPC computation time when , and (the scenario of Figure 7) is about 0.3 s against 1.53 s for the settings of Figure 8.
6 Conclusion and Future Work
In this paper, a complete framework for data-driven control is presented and tested through a rather challenging nonlinear control problem. The framework applies to the class of dynamical system for which it is possible to apply a random or a priori simple output feedback control law which can be far from being optimal but which enable a learning data set to be built. A third obvious condition is that the control objective can be expressed in terms of the measured variables. A specific and rather general choice of the feature selection is proposed and validated on the illustrative example. In the current setting, a Random Forest regressor is used to fit the underlying static relationship but many other options are obviously possible.
Undergoing work concerns the application of the framework on larger problems such as the Building Energy Management Systems where the absence of model is the main obstacle that faces the use of advanced MPC control in marketed solutions.
-  Müller M. A., David Angeli, Frank Allgöwer, Rishi Amrit, and James B. Rawlings. Convergence in economic model predictive control with average constraints. Automatica, 50(12):3100 – 3111, 2014.
-  Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
-  M. Alamir. siso_predictor: A python source a modified derivative-free torczon algorithm. GitHub repository: https://github.com/mazenalamir/torczon, 2020.
-  M. Alamir. siso_predictor: A python source for the implementation of the identification framework. GitHub repository: https://github.com/mazenalamir/siso_predictor, 2020.
-  J. E. Dennis, Jr. and Virginia Torczon. Direct search methods on parallel machines. SIAM Journal on Optimization, 1(4):448–474, 1991.
-  Andersson J. A. E., J. Gillis, G. Horn, J. B Rawlings, and M. Diehl. CasADi – A software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, 2018.
-  Bailey J. E., F. J. M. Horn, and R. C. Lin. Cyclic operation of reaction systems: effect of heat and mass transfer resistance. AIChE Journal, 17(4):818–825, 1971.
-  D. Q. Mayne, J.B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36:789–814, 2000.
-  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
-  Virginia Torczon. On the convergence of pattern search algorithms. SIAM Journal on Optimization, 7(1):1–25, 1997.