This toolbox is designed to solve convex optimization problems of the form
or more generally
where the are lower semi-continuous convex functions from to . We assume and the have non-empty domains, where the domain of a function is given by
In problem (2), and when both and are smooth functions, gradient descent methods can be used to solve (1); however, gradient descent methods cannot be used to solve (1) when and/or are not smooth. In order to solve such problems more generally, we implement several algorithms including the forward-backward algorithm [gabay1983chapter]-[combettes2005signal] and the Douglas-Rachford algorithm [douglas1956numerical, lions1979splitting]-[combettes2007douglas].111In fact, the toolbox implements generalized versions of these algorithms that can solve problems with sums of a general number of such functions, but for simplicity, we discuss here the simplest case of two functions.
Both the forward-backward and Douglas-Rachford algorithms fall into the class of proximal splitting algorithms. The term proximal refers to their use of proximity operators, which are generalizations of convex projection operators. The proximity operator of a lower semi-continuous convex function is defined by
Note that the minimization problem in (3) has a unique solution for every , so is well-defined. The proximity operator is a useful tool because (see, e.g., [martinet1972determination, rockafellar1976monotone]) is a minimizer in (1) if and only if for any ,
The term splitting refers to the fact that the proximal splitting algorithms do not directly evaluate the proximity operator , but rather try to find a solution to (4) through sequences of computations involving the proximity operators and separately. The recent survey [combettes2011proximal] provides an excellent review of proximal splitting algorithms used to solve (1) and related convex optimization problems.
The toolbox is essentially made of three kind of functions:
Solvers: the core of the toolbox
Proximity operators: they solve small minimization problems and allow a quick implementation of common problems.
Demonstration files: examples to help you to use the toolbox
The design of the UNLocBoX
was largely inspired by the LTFAT toolbox [ltfatnote015]. The authors are jointly developing mat2doc a documentation system that allows to generate documentation from source files.
2 Solvers / algorithm
The forward-backward algorithm can be used to solve
when either or is a continuously differentiable convex function with a Lipschitz continuous gradient. A function has a -Lipschitz-continuous gradient if
If, without loss of generality, is the function with a -Lipschitz-continuous gradient , then is a solution to (1) if and only if for any (see, e.g., [combettes2005signal, Proposition 3.1]),
The forward-backward algorithm finds a point satisfying (6) by computing a sequence via
For any , the sequence converges to a point satisfying (6), which is therefore a minimizer of (1). For a detailed convergence analysis that includes generalizations of (7) which may result in improved convergence rates, see [combettes2005signal, Theorem 3.4].
The associated Matlab function forward_backward takes four parameters
function sol = forward_backward (x0, f1, f2, param).
x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . Each of these structures needs at least two fields. The Matlab structure f1 contains the fields f1.eval and f1.prox
. The former is a Matlab function that takes as input a vectorand returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector (In Matlab, write: f1.prox=@(x, T) prox_f1(x, T), where prox_f1(x, T) solves the problem given in equation 3). In the same way, the Matlab structure f2 contains the fields f2.eval and f2.grad. The former is a Matlab function that takes as input a vector and returns the value , the latter is also a Matlab function that takes as input a vector and returns the vector (In Matlab, write: f2.grad=@(x) grad_f2(x), where grad_f2(x) return the value of ). Finally, param is a Matlab structure that containing a set of optional parameters. The list of parameters is described in the help of the function itself. The following two fields are specific to the forward_backward function:
param.method: “ISTA” or “FISTA”. Specify the method used to solve problem (1). ISTA stands for the original forward-backward algorithm while FISTA stands for its accelerated version (for details, see [beck2009fast]).
param.gamma: step-size parameter . This constant should satisfy: , for .
param.lambda: weight of the update term used in ISTA method . By default, it’s equal to one.
The Douglas-Rachford algorithm [douglas1956numerical][lions1979splitting] is a more general algorithm to solve
that does not require any assumptions on the smoothness of or . For any constant , a point is a solution to (1) if and only if there exists a such that [combettes2007douglas, Proposition 18]
The convergence of the Douglas-Rachford algorithm is analyzed in [combettes2004solving, Corollary 5.2] and [combettes2007douglas, Theorem 20], which also show that the algorithm can be accelerated by allowing the step size to change at every iteration of the algorithm. Note that while the Douglas-Rachford algorithm does not require any smoothness assumptions on or , it requires two proximal steps at each iteration, as compared to one proximal step per iteration for the forward-backward algorithm.
The associated Matlab function douglas_rachford takes four parameters
function sol = douglas_rachford (x0, f1, f2, param).
As in Section 2.1, x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . The Matlab structure f1 contains the fields f1.eval and f1.prox. The former is a Matlab function that takes as input a vector and returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector . The Matlab structure f2 contains the exact same fields but for the function . Finally, param contains a list of parameters. The list of parameters is described in the help of the function itself. The following two fields are specific to the douglas_rachford function:
param.lambda: acts as a stepsize parameter. Let , should be in the interval . Its default value is .
param.gamma: controls the speed of convergence. Its default value is .
2.2 Alternating-direction method of multipliers (ADMM)
Augmented Lagrangian techniques are classical approaches for solving problem like:
First we reformulate (12) to
We then solve this problem using the augmented Lagrangian technique.
the proximal operator of function is defined to be:
The ADMM algorithm can be used when and are in and in with invertible and . The associated Matlab function ADMM takes five parameters:
function sol = admm(x_0, f1, f2, param).
As in Section 2.1, x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . The Matlab structure f1 contains the fields f1.eval and f1.prox. The former is a Matlab function that takes as input a vector and returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector . The Matlab structure f2 contains the exact same fields but for the function . Finally, param contains a list of parameters. The list of parameters is described in the help of the function itself. The following field is specific to the admm function:
param.gamma: controls the speed of convergence. Its default value is .
param.L: is an operator or a matrix. Its default value is the identity.
2.3 Other solvers
The UNLocBoX contains some other solvers that should not be forgotten. Very important are the generalization of forward backard (generalized forward backward), Douglas Rachford (PPXA) and ADMM (SDMM). Those allows to solve problems of the type (2). A list of the solver can be found at http://unlocbox.sourceforge.net/doc/solver/index.php.
3 Proximal operator
For every function that is minimized by the UNLocBoX, the gradient or the proximal operator has to be determined. The toolbox provide some of the most common ones (see table 1). A complete list can be found at http://unlocbox.sourceforge.net/doc/prox/index.php.
All proximal operator functions take as input three parameters: x, lambda, param. First, x is the initial signal. Then lambda is the weight of the objective function. Finally, param is a Matlab structure that containing a set of optional parameters.
|prox_l1||norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥Ψ(x)∥_1|
|Prox_linf||norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥Ψ(x)∥_∞|
|prox_tv||TV norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_TV|
|prox_l12||norm222The norm of x for a group ensemble is: proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_12|
|prox_l1inf||norm333The norm of x for a group ensemble is: proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_1∞|
|prox_nuclear_norm||norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_*|
If we would like to restrict the set of admissible functions to a subset of , i.e. find the optimal solution to (2) considering only solutions in , we can use projection operator instead of proximal operators. Indeed proximal operators are generalization of projections. For any nonempty, closed and convex set , the indicator function [combettes2011proximal] of is defined as
The corresponding proximity operator is given by the projection onto the set :
Such restrictions are called constraints and can be given, e.g. by a set of linear equations that the solution is required to satisfy. Table 2 summaries some of the projections function present in the toolbox.
|proj_b1||Projection on B1-ball. lambda is an unused parameter for compatibility reasons. Solve: min_x ∥x - z∥_2^2 s.t. ∥x∥_1 < ϵ|
|proj_b2||Projection on B2-ball. lambda is an unused parameter for compatibility reasons. Solve: min_x ∥x - z∥_2^2 s.t. ∥x∥_2< ϵ|
Let’s suppose we have a noisy image with missing pixels. Our goal would be to find the closest image to the original one. We begin first by setting up some assumptions about the problem.
In this particular example, we firstly assume that we know the position of the missing pixels. This can be the result of a previous process on the image or a simple assumption. Secondly, we assume that the image is not special. Thus, it is composed of well delimited patches of colors. Thirdly, we suppose that known pixels are subject to some Gaussian noise with a variance of.
Formulation of the problem
At this point, the problem can be expressed in a mathematical form. We will simulate the masking operation by a mask . This first assumption leads to a constraint.
where is the vectorized image we want to recover, are the observe noisy pixels and a linear operator representing the mask. However due to the addition of noise this constraint is a little bit relaxed. We rewrite it in the following form
Note that can be chosen equal to 0 to satisfy exactly the constraint. In our case, as the measurements are noisy, we set
to the standard deviation of the noise.
Using the prior assumption that the image has a small TV-norm (image composed of patch of color and few degradee), we will express the problem as
where b is the degraded image and A an linear operator representing the mask. is a free parameter that tunes the confidence to the measurements. This is not the only way to define the problem. We could also write:
with the first function playing the role of a data fidelity term and the second a prior assumption on the signal. adjusts the tradeoff between measurement fidelity and prior assumption. We call it the regularization parameter. The smaller it is, the more we trust the measurements and vice-versa. play a similar role as . Note that there exist a bijection between the parameters and leading to the same solution. The bijection function is not trivial to determine. Choosing between one or the other problem will affect the solvers and the convergence rate.
Solving problem I
The UNLocBoX solvers take as input functions with their proximity operator or with their gradient. In the toolbox, functions are modelized with structure object with at least two fields. One field contains an operator to evaluate the function and the other allows to compute either the gradient (in case of differentiable function) or the proxitity operator ( in case of non differentiable functions). In this example, we need to provide two functions:
The proximal operator of is defined as:
This function is defined in Matlab using:
f1.prox=@(x, T) prox_tv(x, T, paramtv);
This function is a structure with two fields. First, f1.prox is an operator taking as input and and evaluating the proximity operator of the function ( plays the role of is the equation above). Second, and sometime optional, f1.eval is also an operator evaluating the function at .
The proximal operator of the TV norm is already implemented in the UNLocBoX by the function prox_tv. We tune it by setting the maximum number of iterations and a verbosity level. Other parameters are also available (see documentation http://unlocbox.sourceforge.net/doc.php).
paramtv.verbose selects the display level (0 no log, 1 summary at convergence and 2 display all steps).
paramtv.maxit defines the maximum number of iteration.
is the indicator function of the set S defines by . We define the proximity operator of as
with is zero if x is in the set S and infinite otherwise. This previous problem has an identical solution as:
It is simply a projection on the B2-ball. In matlab, we write:
The prox field of f2 is in that case the operator computing the projection. Since we suppose that the constraint is satisfied, the value of the indicator function is . For implementation reasons, it is better to set the value of the operator f2.eval to eps than to . Note that this hypothesis could lead to strange evolution of the objective function. Here the parameter A and At are mendatory. Note that A = At, since the masking operator can be performed by a diagonal matrix containing 1’s for observed pixels and 0’s for hidden pixels.
At this point, a solver needs to be selected. The UNLocBoX contains many different solvers. You can try them and observe the convergence speed. Just remember that some solvers are optimized for specific problems. In this example, we present two of them forward_backward and douglas_rachford. Both of them take as input two functions (they have generalization taking more functions), a starting point and some optional parameters.
In our problem, both functions are not smooth on all points of the domain leading to the impossibility to compute the gradient. In that case, solvers (such as forward backward) using gradient descent cannot be used. As a consequence, we will use Douglas Rachford instead. In matlab, we write:
sol = douglas_rachford(y,f1,f2,param);
param.verbose selects the display level (0 no log, 1 summary at convergence and 2 display all steps).
param.maxit defines the maximum number of iteration.
param.tol is stopping criterion for the loop. The algorithm stops if
where is the objective function at iteration t
param.gamma defines the stepsize. It is a compromise between convergence speed and precision. Note that if gamma is too big, the algorithm might not converge.
The solution is displayed in figure 2
Solving problem II
Solving problem II instead of problem I can be done with a small modification of the previous code. First we define another function as follow:
The structure of f3 contains a field f3.grad. In fact, the l2 norm is a smooth function. As a consequence the gradient is well defined on the entire domain. This allows using the forward backward solver. However, we can in this case also use the Douglas Rachford solver. For this we have defined the field f3.prox.
We remind that forward backward will not use the field f3.prox and Douglas Rachford will not use the field f3.grad. The solvers can be called by:
sol21 = forward_backward(y,f1,f3,param);
sol22 = douglas_rachford(y,f3,f1,param);
These two solvers will converge (up to numerical errors) to the same solution. However, convergence speed might be different. As we perform only 100 iterations with both of them, we do not obtain exactly the same result. Results is shown in figure 3.
Remark: The parameter lambda (the regularization parameter) and epsilon (The radius of the l2 ball) can be chosen empirically. Some methods allow to compute those parameter. However, this is far beyond the scope of this tutorial.