tvopt
tvopt is a prototyping and benchmarking Python framework for time-varying (or online) optimization.
view repo
This paper introduces tvopt, a Python framework for prototyping and benchmarking time-varying (or online) optimization algorithms. The paper first describes the theoretical approach that informed the development of tvopt. Then it discusses the different components of the framework and their use for modeling and solving time-varying optimization problems. In particular, tvopt provides functionalities for defining both centralized and distributed online problems, and a collection of built-in algorithms to solve them, for example gradient-based methods, ADMM and other splitting methods. Moreover, the framework implements prediction strategies to improve the accuracy of the online solvers. The paper then proposes some numerical results on a benchmark problem and discusses their implementation using tvopt. The code for tvopt is available at https://github.com/nicola-bastianello/tvopt.
READ FULL TEXT VIEW PDFtvopt is a prototyping and benchmarking Python framework for time-varying (or online) optimization.
In recent years, time-varying (or online) optimization has received increasing interest from the optimization, control, and learning communities [18, 11, 20]. Indeed, in many applications technological advances brought about a shift from traditional optimization problems to problems that have a dynamic nature, e.g. because they depend on streaming sources of data. Static optimization techniques then need to be revisited and adapted in order to provided reliable, on-the-fly algorithms for solving time-varying problems.
The goal of this paper is to propose tvopt, a framework written in Python for prototyping and benchmarking of online optimization algorithms, and to facilitate this shift from a static to a dynamic optimization context. The idea indeed is to provide all the necessary tools to model time-varying optimization problems, and to implement suitable solution algorithms and analyze their performance.
Formally, time-varying optimization problems can be modeled as follows:
(1) |
where is a cost function that varies over time. For example, we may have to solve a regression task that employs time-varying observations of a signal , affected by additive noise .
We are also interested in the distributed counterpart of (1), defined as
(2) |
where is the number of agents cooperating towards the solution of (2), and each is the private local cost of agent . For example, a multi-agent system of robots may encode a coordination task (e.g. moving in formation) as the distributed optimization problem, which is inherently time-varying due to the dynamic nature of the system.
Different online optimization algorithms have been proposed, both for centralized problems e.g. [14, 23], and in decentralized scenarios [16, 22, 6]. An interesting approach to developing online algorithms is that of prediction-correction, proposed in [23, 21] and extended in [5]. The main idea is to exploit past information on the problem to improve the solution accuracy of the algorithm.
Applications in which online algorithms are required range from signal and image processing, to control, and smart grids.
We refer to the surveys [11, 20] for an in-depth literature review and a discussion of the different applications.
This paper describes the tvopt framework and the theoretical approach that informed its design. The paper describes the main features of tvopt, which can be summarized as follows:
problem modeling: the framework provides an object-oriented approach to modeling and defining online optimization problems, both the costs and constraints;
decentralized problems: moreover, tvopt offers tailored tools to model multi-agent networks and decentralized problems;
solvers: tvopt implements widely used solution algorithms for different classes of unconstrained and constrained problems, both centralized and decentralized.
The paper also presents some results of numerical simulations performed on a benchmark problem, and shows and discusses their implementation using the tools of tvopt.
Section II discusses the theoretical approach that informs the design of tvopt. Section III describes the main components of the framework and their use for online optimization, while section IV presents additional tools for simulating distributed online problems. Section V concludes with a numerical example implemented using tvopt, and some simulations results.
In this section we review the approach to time-varying optimization that informed the design of the tvopt framework. We refer the interested reader to the theoretical framework developed in [5] and the surveys [11, 20] for more details.
At the foundation of tvopt is a discrete-time approach to online optimization, see e.g. [11], which samples (1) in the following sequence of (static) problems:
(3) |
where , , are the sampling instants and is a chosen sampling time. This approach is opposed to a continuous-time one, see e.g. [13].
The goal then is to track the optimal trajectory given by the sequence of minimizers of the sampled costs. However, the dynamic nature of the problem implies that the optima can be tracked only approximately, since a limited computational time (upper bounded by ) is available to solve each problem in the sequence.
In order to illustrate this framework, consider the following examples.
Let be a signal to be reconstructed from the noisy observations , with denoting random noise. In this case we can define where fits the observed data, and promotes sparsity.
In model predictive control (MPC), a control law is designed by solving a sequence of optimization problems which vary over time, since they depend on the states of a dynamical system. Thus MPC can be cast as a time-varying optimization problem and solved using tools developed in this framework. See [20] for an overview.
As remarked above, each problem in the sequence (3) needs to be (approximately) solved. We thus introduce the concept of solver, by which we mean any recursive algorithm that can be applied to the sampled problems.
Due to the limited computational time that is available between the observation of a problem and the next, in general we cannot solve exactly each problem in the sequence. Rather, we apply a finite number of steps of the solver to each sampled problem, denoted by . This yields an approximate solution of the problems and thus leads to an approximate tracking of the optimal trajectory.
Depending on the structure of the problem, a wide array of solvers can be used. For example, if is smooth, then a gradient method is a suitable solver. Or, for a composite problem, , we can apply a proximal gradient method.
As proposed in [23, 21] and further extended in [5], the knowledge of past sampled problems can be exploited to improve the tracking of the optimal trajectory. Indeed, the information collected up to time can be used to shape a prediction of the (as yet unobserved) problem at time . Then an approximate solution of the predicted problem can be used to warm-start the solver^{1}^{1}1That is, the approximate prediction solution is used as initial condition for the solver applied to problem at time . when applied to the actual problem. As proved in [5], this approach allows to reduce the tracking error.
A very simple prediction strategy is to choose , where denotes the prediction. Another, called extrapolation, chooses .
To conclude this section, we detail the steps of a prediction-correction method for solving (3), see Figure 1, and refer to [5] for further details.
Initialization: choose the sampling time , a prediction strategy, and the solver and its parameters.
At each sampling time , :
Prediction: (i) predict the problem by computing , and (ii) solve it approximately with steps of the solver, which yields the prediction .
Correction: (i) sample the new problem at time , and (ii) solve it approximately with steps of the solver, using as initial condition. The result is denoted by .
The framework delineated above can be particularized by omitting either the prediction or correction step. We remark that omitting the correction step yields the approach usually employed in online learning [18].
In this section we describe the main components of tvopt and their application to online optimization as reviewed in the previous section; see [8] for the full documentation. The framework can be conceptually divided in the following:
Problem formulation: the sub-modules sets and costs, which implement an object-oriented approach to defining time-varying problems. The sub-module networks can be used alongside the previous two to define distributed problems.
Prediction: the sub-module prediction is provided for approximating future problems based on the problems observed in the past.
Solvers: the sub-modules solvers and distributed_solvers implement a wide range of solvers that can be applied to (3).
In the following we describe in more details the sub-modules of tvopt, while section IV will discuss the specific tools implemented for online distributed optimization.
This sub-module implements the Set objects which are used to define the domain of the cost objects. In particular, a Set is defined as a subset of , for some . Set objects are then characterized by the dimensions of the underlying space, which are stored in the attribute shape.
In this sub-module and in the following the unknown of problem (3) is modeled as a Numpy ndarray of proper size [15]. In tvopt, then, sets are built to be compatible with NumPy’s ndarrays, and to use their broadcasting rules.
We remark that the most commonly used domains in online optimization are and
, the latter for example can be used for image processing without the need to vectorize. The definition of Set objects with more than two dimensions can however be useful as well, for example in the distributed scenario discussed in section
IV.The other element of a Set definition is its contains method, which returns True if an input is in the Set, False otherwise. When defining a Set, the projection method should also be implemented, which, given an input , returns its projection onto the set, defined as
with the Set.
Finally, Set objects provide a check_input method which verifies if a given array x fits the dimension of the set (possibly reshaping it). This is useful to implement validity checks on the inputs inside a Cost method (see section III-B).
The contains method can be accessed using the Python reserved keyword in. Set objects can be modified via the scale and translate methods. We can also define intersections of Sets by summing them (that is, using the + operator), in which case an approximate projection onto the intersection is implemented using the method of alternating projections (MAP) [9]^{2}^{2}2We remark that MAP returns a point in the intersection, not the actual projection. However, in practice MAP is faster than methods that are proven to return the projection, see [9]..
Different Sets are implemented, for example: the whole space , ball and box sets, and half-spaces. A particular built-in set is T, which defines the set of sampling instants.
The sub-module costs implements the Cost object to define time-varying cost functions
or, as a sub-case, static costs. A cost is characterized by the dom and (optionally) time attributes, which point to Sets for and the sampling times .
Cost objects are then defined by the function, gradient and hessian methods, where gradient returns a (sub-)gradient evaluation and hessian is implemented only if the cost is twice differentiable.
For example, we evaluate the (sub-)gradient of a function F as F.gradient(x, t).
The costs then provide a proximal method, which computes:
using either a gradient or Newton method, depending on the smoothness of . If a closed form proximal is available, e.g. for quadratic costs or norms, then this method should be overwritten.
Time-varying costs provide the time_derivative method which computes, using backward finite differences [17], derivatives of (or of its gradient and Hessian) w.r.t. time. For example the time-derivative of the gradient is approximated by
Further, the costs have a sample method that returns a static cost representing at a chosen sampling instant .
Costs can be scaled by a scalar and elevated to a given power, and they can be summed and multiplied by other costs.
Some of the Cost objects that are implemented are , , quadratic cost, Huber loss, and the indicator function of any given Set. The benchmark dynamic costs proposed in [23, section IV.A] and [25, section III.B] are implemented. Dynamic costs can also be defined from a sequence of static costs using DiscreteDynamicCost.
The sub-module predictions implements the Prediction object for approximating future costs from previously sampled costs. The object stores a dynamic Cost to be predicted and, through the method update, uses information on the cost up to a specified time to shape a prediction. The object behaves like a static cost, in the sense that it exposes the function, gradient, etc. methods of the (static) predicted cost.
The sub-module implements the Taylor expansion-based and extrapolation-based prediction strategies studied in [5].
The sub-module solvers implements a selection of algorithms for solving different classes of static problems. The solvers are Python functions that are passed a static problem (in the form of a dictionary), the number of iterations to be applied, and any required parameters, such as step-sizes. The functions then return an approximate solution.
All the solvers provided by tvopt are not tailored to any specific choice of cost functions. Instead, they are defined to exploit the common template of Cost objects by calling e.g. their gradient, without needing to know how gradient is actually computed.
Notice that the modular design of solvers allows to define costs that for example inexactly compute gradient using a zero-th order approximation, without the need to implement a different solver.
There are two implementation choices that underlie the solvers module. First of all, solvers are designed to solve a static problem, since in tvopt we model a time-varying problem as a sequence of static, sampled ones. We recall that dynamic costs provide the sample method. As a by-product, this also allows to employ tvopt for prototyping and benchmarking static optimization algorithms.
The second design choice is to define solvers as functions, rather than objects, in order to provide a more flexible and efficient implementation. Indeed, in the course of solving (3) a solver will be applied to several static problems and (possibly) using different parameters for each of them. As a consequence, defining a solver object is not very different from using a function, since the attributes (e.g. problem and parameters) of the object would need to be changed often, cluttering the syntax.
The sub-module utils provides the implementation of different metrics for evaluating the performance of a solver. Let be the sequence generated by a solver applied to (3). The available metrics are: fixed point residual defined as ; tracking error defined as ; and regret defined as .
Examples of built-in solvers are gradient method, proximal point algorithm, forward-backward^{3}^{3}3Also called proximal gradient method. and Peaceman-Rachford splittings, dual ascent, ADMM. The documentation [8] lists all solvers with appropriate references.
We conclude this section by discussing how tvopt can be used to solve online constrained optimization problems.
A first class of constraints that can be implemented is where is a non-empty, closed, and convex set. Indeed, given a Set object representing , we can define the indicator function of the set using an Indicator cost object. The indicator is for and for , and its proximal operator coincides with a projection onto .
Indicator functions can for example appear as the non-smooth term in a composite optimization problem .
Equality and inequality constraints , , and , can also be defined making use of Cost objects. The costs can then be used to define the Lagrangian of the constrained problem in order to apply primal-dual solvers [10].
A particular class of constrained problems that can be modeled using tvopt is the following:
(4) |
where , , . As discussed in [5] and references therein, problem (4) can be solved by formulating its dual and applying suitable solvers to it. tvopt provides the following dual solvers: dual ascent, method of multipliers, ADMM, and dual forward-backward splitting. Moreover, a distributed version of dual ascent and ADMM are also implemented.
Notice that the constraints data , , can be defined using NumPy’s ndarrays, owing to the fact that the unknowns of an optimization problem are modeled as compatible arrays.
This section describes the features of tvopt that allow to model and solve distributed online problems. As in the case of centralized problems, we consider a sequence of samples from (2):
(5) |
We remark that the Python framework DISROPT [12] is available for distributed optimization in static settings. Although there is some overlap with tvopt’s features described in this section, differently from DISROPT our goal is to model time-varying problems over networks.
The sub-module networks defines the Network objects that model the connectivity pattern of a multi-agent system cooperating towards the solution of (5). The network is created from a given adjacency matrix.
A network implements the exchange of information between agents via the methods send and receive. In particular, send is called specifying a sender, receiver, and the packet to be exchanged. After calling send, the transmitted packet can be accessed using receive, which by default performs a destructive read of the message.
The network implements also a broadcast method, using which an agents transmits the same message to all its neighbors. And a consensus method, which performs a consensus mixing of given local states using the send and receive methods.
In general, to define a new type of network (see e.g. the built-ins) it is sufficient to overwrite the send method.
The built-in Network class models a standard loss-less network. Other types of networks available are a lossy network (transmissions may randomly fail), and noisy or quantized networks (which add Gaussian noise to or quantize the transmissions, respectively).
The sub-module provides also a number of built-in functions for generating the adjacency matrix of different types of graphs, for example: random, circulant or complete graphs.
Formulating distributed optimization problems is done using the SeparableCost object defined in costs, which models a cost :
with the local cost function of the -th agent. The cost is created from a list of static or dynamic local costs. Notice that the last dimension of ’s domain is the number of agents, using the flexibility of Set objects that allow for multiple dimensions.
SeparableCost implements all the methods described in section III-B, with the difference that the outputs are arranged in an array with the last dimension indexing the agents. This choice allows for an easier access of the evaluation of each cost . For example, if F is separable, then the result of F.function(x, t) will be .
A SeparableCost also allows to evaluate e.g. the gradient of a single component function by specifying the argument i.
The sub-module distributed_solvers then provides built-in implementations of several distributed solvers. The difference with the centralized solvers described in section III-D is that these functions also require to be passed a Network object to perform agent-to-agent communications.
The built-in solvers are primal methods, e.g. DPGM [4]; primal-dual methods based on gradient tracking strategies, e.g. [19, 1]; and dual methods, e.g. dual decomposition and ADMM [3].
tvopt also provides different functions to solve average consensus using different protocols, for example gossip consensus [2].
The following sections presents a centralized example of online optimization benchmark and an example of distributed linear regression. A step-by-step discussion of the code is presented alongside some numerical results.
The benchmark problem was proposed in [23, section IV.A] and is characterized by
with , , and . We test the prediction-correction framework (see [5]) using the extrapolation-based prediction ^{4}^{4}4The alternative Taylor expansion-based prediction is also implemented in the examples section of [7]..
Defining the cost requires fixing the sampling time and a time horizon. from tvopt import costs, prediction, solvers
# sampling time and time horizon t_s, t_max = 0.1, 1e4
# cost function f = costs.DynamicExample_1D(t_s, t_max)
We also define the simulation parameters, with num_pred and num_corr representing and . The solver we will use is a gradient method, so we define its step-size.
# num. of prediction and correction steps num_pred, num_corr = 5, 5
step = 0.2 # gradient method’s step-size
We define the extrapolation-based prediction (cf. Example 4) with f_hat = prediction.ExtrapolationPrediction(f, 2) where the argument specifies the number of past costs to use for computing .
We then apply the prediction-correction solver as follows. x, x_hat = 0, 0
for k in range(f.time.num_samples):
# correction p = "f":f.sample(t_s*k) # correction problem x = solvers.gradient(p, x_0=x_hat, step=step, num_iter=num_corr)
# prediction f_hat.update(t_s*k) p = "f":f_hat # prediction problem x_hat = solvers.gradient(p, x_0=x, step=step, num_iter=num_pred) In the code, we update and during the correction and prediction steps, respectively. Moreover, notice that the dynamic cost is sampled every iteration, and that the prediction is consequently updated. The correction and prediction problem are defined with a dictionary.
Figure 3 depicts the evolution of the tracking error for the prediction-correction method discussed above. The method is compared with a correction-only strategy that does not employ a prediction to warm-start the solver at each sampling time.
The results show that prediction has a valuable effect in improving the performance of the online solver, and tvopt provides an easy way to experiment with prediction strategies.
The problem is (5) with agents and with local costs
where and is a sinusoidal signal with
a Gaussian noise of variance
. The solver employed is DGD [24]. We report a sample of the code in the following.The network can be created as follows: # adjacency matrix adj_mat = networks.random_graph(N, 0.5) # network net = networks.Network(adj_mat) and the distributed, online solver is implemented with: x = x0 # initial condition
for k in range(f.time.num_samples):
# problem creation problem = "f":f.sample(t_s*k), "network":net
# distributed solver x = distributed_solvers.dpgm (problem, step, x_0=x, num_iter=num_iter)
In Figure 4 we report the fixed point residual (defined as ) for different graph topologies. We remark that the random graph has edges and thus is the more connected of the four topologies, which explains the fact that it achieves the better results.
The author would like to thank Dr. Andrea Simonetto, Prof. Ruggero Carli, and Prof. Emiliano Dall’Anese for the many valuable discussions.
Foundations and Trends® in Machine Learning
4 (2), pp. 107–194. Cited by: §I, §II-C.
Comments
There are no comments yet.