UNLocBoX: A MATLAB convex optimization toolbox for proximal-splitting methods

02/04/2014 ∙ by Nathanaël Perraudin, et al. ∙ 0

Convex optimization is an essential tool for machine learning, as many of its problems can be formulated as minimization problems of specific objective functions. While there is a large variety of algorithms available to solve convex problems, we can argue that it becomes more and more important to focus on efficient, scalable methods that can deal with big data. When the objective function can be written as a sum of "simple" terms, proximal splitting methods are a good choice. UNLocBoX is a MATLAB library that implements many of these methods, designed to solve convex optimization problems of the form _x ∈R^N∑_n=1^K f_n(x). It contains the most recent solvers such as FISTA, Douglas-Rachford, SDMM as well a primal dual techniques such as Chambolle-Pock and forward-backward-forward. It also includes an extensive list of common proximal operators that can be combined, allowing for a quick implementation of a large variety of convex problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This toolbox is designed to solve convex optimization problems of the form

(1)

or more generally

(2)

where the are lower semi-continuous convex functions from to . We assume and the have non-empty domains, where the domain of a function is given by

In problem (2), and when both and are smooth functions, gradient descent methods can be used to solve (1); however, gradient descent methods cannot be used to solve (1) when and/or are not smooth. In order to solve such problems more generally, we implement several algorithms including the forward-backward algorithm [gabay1983chapter]-[combettes2005signal] and the Douglas-Rachford algorithm [douglas1956numerical, lions1979splitting]-[combettes2007douglas].111In fact, the toolbox implements generalized versions of these algorithms that can solve problems with sums of a general number of such functions, but for simplicity, we discuss here the simplest case of two functions.

Both the forward-backward and Douglas-Rachford algorithms fall into the class of proximal splitting algorithms. The term proximal refers to their use of proximity operators, which are generalizations of convex projection operators. The proximity operator of a lower semi-continuous convex function is defined by

(3)

Note that the minimization problem in (3) has a unique solution for every , so is well-defined. The proximity operator is a useful tool because (see, e.g., [martinet1972determination, rockafellar1976monotone]) is a minimizer in (1) if and only if for any ,

(4)

The term splitting refers to the fact that the proximal splitting algorithms do not directly evaluate the proximity operator , but rather try to find a solution to (4) through sequences of computations involving the proximity operators and separately. The recent survey [combettes2011proximal] provides an excellent review of proximal splitting algorithms used to solve (1) and related convex optimization problems.

The toolbox is essentially made of three kind of functions:

  • Solvers: the core of the toolbox

  • Proximity operators: they solve small minimization problems and allow a quick implementation of common problems.

  • Demonstration files: examples to help you to use the toolbox

The design of the UNLocBoX

was largely inspired by the LTFAT toolbox [ltfatnote015]. The authors are jointly developing mat2doc a documentation system that allows to generate documentation from source files.

2 Solvers / algorithm

The forward-backward algorithm can be used to solve

when either or is a continuously differentiable convex function with a Lipschitz continuous gradient. A function has a -Lipschitz-continuous gradient if

(5)

where .

If, without loss of generality, is the function with a -Lipschitz-continuous gradient , then is a solution to (1) if and only if for any (see, e.g., [combettes2005signal, Proposition 3.1]),

(6)

The forward-backward algorithm finds a point satisfying (6) by computing a sequence via

(7)

For any , the sequence converges to a point satisfying (6), which is therefore a minimizer of (1). For a detailed convergence analysis that includes generalizations of (7) which may result in improved convergence rates, see [combettes2005signal, Theorem 3.4].

The associated Matlab function forward_backward takes four parameters

function sol = forward_backward (x0, f1, f2, param).

x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . Each of these structures needs at least two fields. The Matlab structure f1 contains the fields f1.eval and f1.prox

. The former is a Matlab function that takes as input a vector

and returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector (In Matlab, write: f1.prox=@(x, T) prox_f1(x, T), where prox_f1(x, T) solves the problem given in equation 3). In the same way, the Matlab structure f2 contains the fields f2.eval and f2.grad. The former is a Matlab function that takes as input a vector and returns the value , the latter is also a Matlab function that takes as input a vector and returns the vector (In Matlab, write: f2.grad=@(x) grad_f2(x), where grad_f2(x) return the value of ). Finally, param is a Matlab structure that containing a set of optional parameters. The list of parameters is described in the help of the function itself. The following two fields are specific to the forward_backward function:

  • param.method: “ISTA” or “FISTA”. Specify the method used to solve problem (1). ISTA stands for the original forward-backward algorithm while FISTA stands for its accelerated version (for details, see [beck2009fast]).

  • param.gamma: step-size parameter . This constant should satisfy: , for .

  • param.lambda: weight of the update term used in ISTA method . By default, it’s equal to one.

2.1 Douglas-Rachford

The Douglas-Rachford algorithm [douglas1956numerical][lions1979splitting] is a more general algorithm to solve

that does not require any assumptions on the smoothness of or . For any constant , a point is a solution to (1) if and only if there exists a such that [combettes2007douglas, Proposition 18]

(8)
(9)

To find and that satisfy (8) and (9), the Douglas-Rachford algorithm computes the following sequence of points, for any fixed and stepsize :

(10)
(11)

The convergence of the Douglas-Rachford algorithm is analyzed in [combettes2004solving, Corollary 5.2] and [combettes2007douglas, Theorem 20], which also show that the algorithm can be accelerated by allowing the step size to change at every iteration of the algorithm. Note that while the Douglas-Rachford algorithm does not require any smoothness assumptions on or , it requires two proximal steps at each iteration, as compared to one proximal step per iteration for the forward-backward algorithm.

The associated Matlab function douglas_rachford takes four parameters

function sol = douglas_rachford (x0, f1, f2, param).

As in Section 2.1, x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . The Matlab structure f1 contains the fields f1.eval and f1.prox. The former is a Matlab function that takes as input a vector and returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector . The Matlab structure f2 contains the exact same fields but for the function . Finally, param contains a list of parameters. The list of parameters is described in the help of the function itself. The following two fields are specific to the douglas_rachford function:

  • param.lambda: acts as a stepsize parameter. Let , should be in the interval . Its default value is .

  • param.gamma: controls the speed of convergence. Its default value is .

2.2 Alternating-direction method of multipliers (ADMM)

Augmented Lagrangian techniques are classical approaches for solving problem like:

(12)

First we reformulate (12) to

We then solve this problem using the augmented Lagrangian technique.

Warning:

the proximal operator of function is defined to be:

The ADMM algorithm can be used when and are in and in with invertible and . The associated Matlab function ADMM takes five parameters:

function sol = admm(x_0, f1, f2, param).

As in Section 2.1, x0 is the starting point for the algorithm. f1 and f2 are two Matlab structures that represent the functions and . The Matlab structure f1 contains the fields f1.eval and f1.prox. The former is a Matlab function that takes as input a vector and returns the value , the latter is a Matlab function that takes as input a vector , a strictly positive real number and returns the vector . The Matlab structure f2 contains the exact same fields but for the function . Finally, param contains a list of parameters. The list of parameters is described in the help of the function itself. The following field is specific to the admm function:

  • param.gamma: controls the speed of convergence. Its default value is .

  • param.L: is an operator or a matrix. Its default value is the identity.

2.3 Other solvers

The UNLocBoX contains some other solvers that should not be forgotten. Very important are the generalization of forward backard (generalized forward backward), Douglas Rachford (PPXA) and ADMM (SDMM). Those allows to solve problems of the type (2). A list of the solver can be found at http://unlocbox.sourceforge.net/doc/solver/index.php.

3 Proximal operator

For every function that is minimized by the UNLocBoX, the gradient or the proximal operator has to be determined. The toolbox provide some of the most common ones (see table 1). A complete list can be found at http://unlocbox.sourceforge.net/doc/prox/index.php.

All proximal operator functions take as input three parameters: x, lambda, param. First, x is the initial signal. Then lambda is the weight of the objective function. Finally, param is a Matlab structure that containing a set of optional parameters.

Function Explanation
prox_l1 norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥Ψ(x)∥_1
Prox_linf norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥Ψ(x)∥_∞
prox_tv TV norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_TV
prox_l12 norm222The norm of x for a group ensemble is: proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_12
prox_l1inf norm333The norm of x for a group ensemble is: proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_1∞
prox_nuclear_norm norm proximal operator. Solve: min_x 12 ∥x - z∥_2^2 + λ∥x∥_*
Table 1: List of proximal operators

If we would like to restrict the set of admissible functions to a subset of , i.e. find the optimal solution to (2) considering only solutions in , we can use projection operator instead of proximal operators. Indeed proximal operators are generalization of projections. For any nonempty, closed and convex set , the indicator function [combettes2011proximal] of is defined as

(13)

The corresponding proximity operator is given by the projection onto the set :

Such restrictions are called constraints and can be given, e.g. by a set of linear equations that the solution is required to satisfy. Table 2 summaries some of the projections function present in the toolbox.

proj_b1 Projection on B1-ball. lambda is an unused parameter for compatibility reasons. Solve: min_x ∥x - z∥_2^2       s.t.       ∥x∥_1 < ϵ
proj_b2 Projection on B2-ball. lambda is an unused parameter for compatibility reasons. Solve: min_x ∥x - z∥_2^2       s.t.       ∥x∥_2< ϵ
Table 2: List of projection operators

4 Example

The problem

Let’s suppose we have a noisy image with missing pixels. Our goal would be to find the closest image to the original one. We begin first by setting up some assumptions about the problem.

Assumptions

In this particular example, we firstly assume that we know the position of the missing pixels. This can be the result of a previous process on the image or a simple assumption. Secondly, we assume that the image is not special. Thus, it is composed of well delimited patches of colors. Thirdly, we suppose that known pixels are subject to some Gaussian noise with a variance of

.

Figure 1: This figure shows the image chosen for this example: the cameraman.

Formulation of the problem

At this point, the problem can be expressed in a mathematical form. We will simulate the masking operation by a mask . This first assumption leads to a constraint.

where is the vectorized image we want to recover, are the observe noisy pixels and a linear operator representing the mask. However due to the addition of noise this constraint is a little bit relaxed. We rewrite it in the following form

Note that can be chosen equal to 0 to satisfy exactly the constraint. In our case, as the measurements are noisy, we set

to the standard deviation of the noise.

Using the prior assumption that the image has a small TV-norm (image composed of patch of color and few degradee), we will express the problem as

where b is the degraded image and A an linear operator representing the mask. is a free parameter that tunes the confidence to the measurements. This is not the only way to define the problem. We could also write:

with the first function playing the role of a data fidelity term and the second a prior assumption on the signal. adjusts the tradeoff between measurement fidelity and prior assumption. We call it the regularization parameter. The smaller it is, the more we trust the measurements and vice-versa. play a similar role as . Note that there exist a bijection between the parameters and leading to the same solution. The bijection function is not trivial to determine. Choosing between one or the other problem will affect the solvers and the convergence rate.

Solving problem I

The UNLocBoX solvers take as input functions with their proximity operator or with their gradient. In the toolbox, functions are modelized with structure object with at least two fields. One field contains an operator to evaluate the function and the other allows to compute either the gradient (in case of differentiable function) or the proxitity operator ( in case of non differentiable functions). In this example, we need to provide two functions:


  • The proximal operator of is defined as:

    This function is defined in Matlab using:

    paramtv.verbose=1;
    paramtv.maxit=50;
    f1.prox=@(x, T) prox_tv(x, T, paramtv);
    f1.eval=@(x) tv_norm(x);

    This function is a structure with two fields. First, f1.prox is an operator taking as input and and evaluating the proximity operator of the function ( plays the role of is the equation above). Second, and sometime optional, f1.eval is also an operator evaluating the function at .
    The proximal operator of the TV norm is already implemented in the UNLocBoX by the function prox_tv. We tune it by setting the maximum number of iterations and a verbosity level. Other parameters are also available (see documentation http://unlocbox.sourceforge.net/doc.php).

    • paramtv.verbose selects the display level (0 no log, 1 summary at convergence and 2 display all steps).

    • paramtv.maxit defines the maximum number of iteration.

  • is the indicator function of the set S defines by . We define the proximity operator of as

    with is zero if x is in the set S and infinite otherwise. This previous problem has an identical solution as:

    It is simply a projection on the B2-ball. In matlab, we write:

    param_proj.epsilon=epsilon;
    param_proj.A=A;
    param_proj.At=A;
    param_proj.y=y;
    f2.prox=@(x,T) proj_b2(x,T,param_proj);
    f2.eval=@(x) eps;

    The prox field of f2 is in that case the operator computing the projection. Since we suppose that the constraint is satisfied, the value of the indicator function is . For implementation reasons, it is better to set the value of the operator f2.eval to eps than to . Note that this hypothesis could lead to strange evolution of the objective function. Here the parameter A and At are mendatory. Note that A = At, since the masking operator can be performed by a diagonal matrix containing 1’s for observed pixels and 0’s for hidden pixels.

At this point, a solver needs to be selected. The UNLocBoX contains many different solvers. You can try them and observe the convergence speed. Just remember that some solvers are optimized for specific problems. In this example, we present two of them forward_backward and douglas_rachford. Both of them take as input two functions (they have generalization taking more functions), a starting point and some optional parameters.

In our problem, both functions are not smooth on all points of the domain leading to the impossibility to compute the gradient. In that case, solvers (such as forward backward) using gradient descent cannot be used. As a consequence, we will use Douglas Rachford instead. In matlab, we write:

param.verbose=1;
param.maxit=100;
param.tol=10e-5;
param.gamma=1;
sol = douglas_rachford(y,f1,f2,param);

  • param.verbose selects the display level (0 no log, 1 summary at convergence and 2 display all steps).

  • param.maxit defines the maximum number of iteration.

  • param.tol is stopping criterion for the loop. The algorithm stops if

    where is the objective function at iteration t

  • param.gamma defines the stepsize. It is a compromise between convergence speed and precision. Note that if gamma is too big, the algorithm might not converge.

The solution is displayed in figure 2

Figure 2: This figure shows the reconstructed image by solving problem I using Douglas Rachford algorithm.

Solving problem II

Solving problem II instead of problem I can be done with a small modification of the previous code. First we define another function as follow:

param_l2.A=A;
param_l2.At=A;
param_l2.y=y;
param_l2.verbose=1;
f3.prox=@(x,T) prox_l2(x,lambda*T,param_l2);
f3.grad=@(x) 2*lambda*A(A(x)-y);
f3.eval=@(x) lambda*norm(A(x)-y,’fro’);

The structure of f3 contains a field f3.grad. In fact, the l2 norm is a smooth function. As a consequence the gradient is well defined on the entire domain. This allows using the forward backward solver. However, we can in this case also use the Douglas Rachford solver. For this we have defined the field f3.prox.

We remind that forward backward will not use the field f3.prox and Douglas Rachford will not use the field f3.grad. The solvers can be called by:

sol21 = forward_backward(y,f1,f3,param);

Or:

sol22 = douglas_rachford(y,f3,f1,param);

These two solvers will converge (up to numerical errors) to the same solution. However, convergence speed might be different. As we perform only 100 iterations with both of them, we do not obtain exactly the same result. Results is shown in figure 3.

Figure 3: This figure shows the reconstructed image by solving problem II.

Remark: The parameter lambda (the regularization parameter) and epsilon (The radius of the l2 ball) can be chosen empirically. Some methods allow to compute those parameter. However, this is far beyond the scope of this tutorial.

Appendix: example’s code

%% Initialisation
clear all;
close all;
% Loading toolbox
global GLOBAL_useGPU;
init_unlocbox();
verbose=2; % verbosity level
%% Load an image
% Original image
im_original=cameraman;
% Displaying original image
imagescgray(im_original,1,’Original image’);
%% Creation of the problem
sigma_noise = 20/255;
im_noisy=im_original+sigma_noise*randn(size(im_original));
% Create a matrix with randomly 50 % of zeros entry
p=0.5;
matA=rand(size(im_original));
matA=(matA>(1-p));
% Define the operator
A=@(x) matA.*x;
% Depleted image
y=matA.*im_noisy;
% Displaying noiy image
imagescgray(im_noisy,2,’Noisy image’);
% Displaying depleted image
imagescgray(y,3,’Depleted image’);
%% Setting the proximity operator
% setting the function f1 (norm TV)
paramtv.useGPU = GLOBAL_useGPU;   % Use GPU
paramtv.verbose = verbose-1;
paramtv.maxit = 50;
f1.prox=@(x, T) prox_tv(x, T, paramtv);
f1.eval=@(x) tv_norm(x);
% setting the function f2
param_proj.epsilon = sqrt(sigma_noise^2*length(im_original(:))*p);
param_proj.A = A;
param_proj.At = A;
param_proj.y = y;
param_proj.verbose = verbose-1;
f2.prox=@(x,T) proj_b2(x,T,param_proj);
f2.eval=@(x) eps;
% setting the function f3
lambda = 10;
param_l2.A = A;
param_l2.At = A;
param_l2.y = y;
param_l2.verbose = verbose-1;
param_l2.tight = 0;
param_l2.nu = 1;
f3.prox=@(x,T) prox_l2(x,lambda*T,param_l2);
f3.grad=@(x) 2*lambda*A(A(x)-y);
f3.eval=@(x) lambda*norm(A(x)-y,’fro’)^2;
%% Solving problem I
% setting different parameters for the simulation
param.verbose = verbose;    % display parameter
param.maxit = 100;    % maximum number of iterations
param.tol = 1e-5;    % tolerance to stop iterating
param.gamma = 1 ;     % Convergence parameter
% solving the problem with Douglas Rachord
sol = douglas_rachford(y,f1,f2,param);
%% Displaying the result
imagescgray(sol,4,’Problem I - Douglas Rachford’);
%% Solving problem II (forward backward)
param.gamma=0.5/lambda;     % Convergence parameter
param.tol=1e-5;
% solving the problem with Douglas Rachord
sol21 = forward_backward(y,f1,f3,param);
%% Displaying the result
imagescgray(sol21,5,’Problem II - Forward Backward’);
%% Solving problem II (Douglas Rachford)
param.gamma=0.5/lambda;     % Convergence parameter
sol22 = douglas_rachford(y,f3,f1,param);
 %% Displaying the result
imagescgray(sol22,6,’Problem II - Douglas Rachford’);
%% Close the UNLcoBoX
close_unlocbox();

References