 # A family of multi-parameterized proximal point algorithms

In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that the proposed algorithm performs better than some well-established methods

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We focus on the following convex minimization problem with linear equality constraints,

 min{f(x) |Ax=b,x∈X}, (1)

where is a proper closed convex function but possibly nonsmooth, and

are given matrix and vector, respectively,

is a closed convex set. Without loss of generality, the solution set of the problem (1) denoted by is assumed to be nonempty.

The augmented Lagrangian method (ALM), independently proposed by Hestenes  and Powell , is a benchmark method for solving problem (1). Its iteration scheme reads as

 ⎧⎨⎩xk+1=argminx∈Xf(x)+β2∥Ax−b−λk/β∥2,λk+1=λk−β(Axk+1−b),

where denote the penalty parameter and the Lagrange multiplier w.r.t. the equality constraint, respectively. As analyzed in , ALM can be viewed as an application of the well-known proximal point algorithm (PPA) that can date back to the seminal work of Martinet  and Rockafellar  for the dual problem of (1). Obviously, the efficiency of ALM heavily depends on the solvability of the -subproblem, that is, whether or not the core -subproblem has closed-form solution. Unfortunately, in many real applications [2, 4, 9, 10], the coefficient matrix

is not identity matrix (or does not satisfies

), which makes it difficult even infeasible for solving this subproblem of ALM. To overcome such difficulty, Yang and Yuan  proposed a linearized ALM aiming at linearizing the -subproblem such that its closed-form solution can be easily derived. We refer to the recent progress on this direction [3, 6].

Under basic regularity condition , it is well-known that is an optimal solution of (1) if and only if there exists such that the following variational inequality holds

 VI(f,J,M):f(x)−f(x∗)+(w−w∗)TJ(w∗)≥0,∀w∈M, (2)

where

 w=(xλ),w∗=(x∗λ∗),J(w)=(−ATλAx−b)andM=X×Rm.

From the aforementioned assumption on the solution set of the problem (1), the solution set of (2) denoted by is also nonempty. When PPA is applied to solve the variational inequality , it usually reads the unified updating scheme: at the -th iteration, find satisfying

 f(x)−f(xk+1)+(w−wk+1)T[J(wk+1)+G(wk+1−wk)]≥0,∀w∈M. (3)

We call the proximal matrix that is usually required to be symmetric positive definite to ensure the convergence of (3). To our knowledge, this idea was initialized by He et al. [HLHY02]. Clearly, different structures of would result in different versions of PPA. From a computational perspective, our motivation is to design a multi-parameterized PPA for solving problem (1) while maintaining the efficiency as the linearized ALM, although the feasible starting point may be different. Interestingly, many customized proximal matrices shown in [5, 7, 10, 16] turn out to be special cases of our multi-parameterized proximal matrix (See Remark 2.2 for details). In this sense, our proposed algorithm can be viewed as a general customized PPA for solving problem (1). Moreover, we adopt a relaxation strategy to accelerate the convergence.

## 2 Main Algorithm

In this paper, we design the following multi-parameterized proximal matrix

 G=[rIn+(θ−1)2−ρsATA(θ−1)AT(θ−1)AsIm]∈R(n+m)×(n+m), (4)

where denotes the identity matrix, is an arbitrary real scalar and

 ρ∈(−∞,1],(r,s)∈{(r,s)| r>0,s>0,rs>∥ATA∥2}. (5)

The notation represents the spectral norm of It is easy to check that the above matrix is symmetric positive definite for any parameters satisfying (5).

Now, substituting the matrix into (3) we have

 {Axk+1−b+(θ−1)A(xk+1−xk)+s(λk+1−λk)=0,xk+1∈X,f(x)−f(xk+1)+(x−xk+1)TRk+1≥0,∀x∈X, (6)

with

 Rk+1=−ATλk+1+[rIn+(θ−1)2−ρsATA](xk+1−xk)+(θ−1)AT(λk+1−λk). (7)

By the equation in (6), it can be deduced that

 λk+1=λk−1s[θ(Axk+1−b)+(1−θ)(Axk−b)],

which further makes (7) become

 Rk+1 = −AT[(2−θ)λk+1+(θ−1)λk+ρ−(θ−1)2sA(xk+1−xk)]+r(xk+1−xk) = −AT[λk−(2−θ)s(Axk−b)]+[rIn+ρ−1sATA](xk+1−xk).

Based on the inequality in (6), i.e., the first-order optimality condition of -subproblem, we obtain

 xk+1=argminx∈X{f(x)+r2∥∥∥x−xk−1rAT[λk−2−θs(Axk−b)]∥∥∥2+ρ−12s∥∥A(x−xk)∥∥2}. (8)

Then, our relaxed multi-parameterized PPA (RM-PPA) is described as Algorithm 2.1, where we use to replace the output of (3) with given iterate , and we use to stand for the new iterate after combining a relaxation step. Finally, the inequality (3) becomes

 (9)
###### Algorithm 2.1

[RM-PPA for solving problem (1)]1Choose and satisfying (5).
2 Initialize
3 for do
4   .
5   .
6   .
7 end

###### Remark 2.1

If we set in (8), then the

-subproblem amounts to estimating the proximity operator of

when . The implementation of (8) for such cases is thus extremely simple. Here, we allow just from the theoretical point of view.

###### Remark 2.2

Note that in step 5 actually plays a role of penalty parameter in ALM, while can be treated as the proximal parameter as used in the customized PPA . The quadratic term

 ρ−12s∥∥A(x−xk)∥∥2=ρ−12s∥∥(Ax−b)−(Axk−b)∥∥2

plays a second penalty role for the equality constraint relating to its -th iteration. By the way of updating , it uses the convex combination of the feasibility error at the current iteration and the former iteration when . The parameterized matrix designed in this paper is more general than some in the literature:

• If , then our matrix given by (4) will become that in [7, Eq.(2.5)]. And in such case, the variable updates practically in the same way as in ALM, but the core -subproblem in Algorithm 2.1 is a proximal mapping to have a unique minimum since the subproblem is strongly convex.

• If , then our parameterized proximal matrix turns to the matrix involved in [5, page 158]. If , then our matrix is identical to that in [10, Eq. (3.1)] but Algorithm 2.1 uses an additional relaxation step for fast convergence. Moreover, we establish the worst-case ergodic convergence rate for the objective function value error and the feasibility error.

• Regardless of step 6, it is easy to check that Algorithm 2.1 with is a linearization of ALM:

 ⎧⎪⎨⎪⎩xk+1=argminx∈X{f(x)+12s∥Ax−b−sλk∥2+12∥x−xk∥2rIn−1sATA},λk+1=λk−1s(Axk+1−b). (10)

Specifically, by letting the scheme (10) is ALM with extra proximal term which eliminates the term in the iteration. Algorithm 2.1, in such choice of parameters, is a linearized ALM. Besides, our parameter is more general and flexible than that in .

## 3 Convergence Analysis

Before analyzing the global convergence and sublinear convergence rate of Algorithm 2.1, we give a fundamental lemma as the following.

###### Lemma 3.1

The sequence generated by Algorithm 2.1 satisfies

 ∥wk+1−w∗∥2˜G+1−ρs∥Axk+1−Ax∗∥2≤∥wk−w∗∥2˜G+1−ρs∥Axk−Ax∗∥2−Tk+1,  ∀w∗∈M∗,

where is given by (16) and

 ˜G=[rIn+(θ−1)2−1sATA(θ−1)AT(θ−1)AsIm]. (11)

Proof According to the inequality (2

) and the skew-symmetric property of

, i.e.

 (w−¯w)T[J(w)−J(¯w)]≡0,% for any w,¯w∈M, (12)

the inequality (9) with setting gives Note that the step 6 shows

 ˜wk−wk=(wk+1−wk)/σ, (13)

so we have

 (w∗−˜wk)TG(wk+1−wk)≥0.

Since the matrix can be decomposed as where

denotes the zero matrix of size

and is given by (11), we thus obtain

 (w∗−˜wk)T˜G(wk+1−wk)+1−ρs(Ax∗−A˜xk)T(Axk+1−Axk)≥0. (14)

Then, applying the identity

 2(a−l)˜G(c−d)=∥a−d∥2˜G−∥a−c∥2˜G+∥c−l∥2˜G−∥d−l∥2˜G

to the left-hand side of (14), the following inequality holds immediately

 ∥w∗−wk+1∥2˜G+1−ρs∥Ax∗−Axk+1∥2≤∥w∗−wk∥2˜G+1−ρs∥Ax∗−Axk∥2−Tk+1, (15)

where

 Tk+1=∥wk−˜wk∥2˜G−∥wk+1−˜wk∥2˜G+1−ρs(∥A(xk−˜xk)∥2−∥A(xk+1−˜xk)∥2).

Substituting (13) into the expression of , it can be deduced that

 Tk+1 = (16) = σ(2−σ)∥wk−˜wk∥2˜G+(1−ρ)σ(2−σ)s∥A(xk−˜xk)∥2 = 2−σσ∥wk−wk+1∥2˜G+(1−ρ)(2−σ)sσ∥A(xk−xk+1)∥2.

This completes the whole proof.

Lemma 3.1 shows the sequence is contractive under the -norm w.r.t. the solution set , since the matrix is positive definite and the term . Similar to the convergence proof in e.g.  and the proof of Lemma 3.1, the global convergence and sublinear convergence rate of Algorithm 2.1 can be easily established as as below, whose proof is omitted here for the sake of conciseness.

###### Theorem 3.1

Let satisfy (5) and be generated by Algorithm 2.1. Then,

• there exists a such that ;

• for any , let and . Then,

 f(xt)−f(x)+(wt−w)TJ(w)≤12σ(t+1){∥w0−w∥2˜G+1−ρs∥Ax0−Ax∥2},∀w∈M.

Theorem 3.1 illustrates that Algorithm 2.1 converges globally with a sublinear ergodic convergence rate. Furthermore, we can deduce a compact result as the following corollary by making using of the second result in Theorem 3.1. For any , let and

 γξ=infx∗∈X∗supλ∈Γξ∥∥(x0−x∗;λ0−λ)∥∥2˜G+1−ρs∥Ax0−b∥2. (17)
###### Corollary 3.1

Let be generated by Algorithm 2.1. For any , there exists a defined in (17) such that for any , we have

 f(xt)−f(x∗)+ξ∥Axt−b∥≤γξ2σ(t+1),∀x∗∈X∗.

Proof By making use of the identity in (12) and by setting into the second result of Theorem 3.1, we have

 f(xt)−f(x∗)+(wt−w)TJ(w) (18) = f(xt)−f(x∗)−λTA(xt−x∗)+(λt−λ)T(Ax∗−b) = f(xt)−f(x∗)−λT(Axt−b)≤12σ(t+1){∥∥(x0−x∗;λ0−λ)∥∥2˜G+1−ρs∥Ax0−b∥2},

where the second equality and the final inequality use . Then, it follows from (18) that

 f(xt)−f(x∗)+ξ∥Axt−b∥ = supλ∈Γξ[f(xt)−f(x∗)−λT(Axt−b)] ≤ 12σ(t+1){infx∗∈X∗supλ∈Γξ∥∥(x0−x∗;λ0−λ)∥∥2˜G+1−ρs∥Ax0−b∥2},

which, by the definition of in (17), completes the proof.

In a similar analysis to (18) together with (2), we can derive showing that So, taking in Corollary 3.1, the following inequality

 (∥λ∗∥+1)∥Axt−b∥≤f(xt)−f(x∗)+(2∥λ∗∥+1)∥Axt−b∥≤¯γ2σ(t+1),

holds with given by (17). Rearranging the above inequality, we have

 ∥Axt−b∥≤12σ(t+1) ¯γ∥λ∗∥+1. (19)

Hence, we will also have showing that

 |f(xt)−f(x∗)|≤¯γ2σ(t+1). (20)

According to (20) and (19), both the objective function value error and the feasibility error at the ergodic iterate will decrease in the order of as goes to infinity.

## 4 Numerical Experiments

In this section, we apply the proposed algorithm to solve the following -minimization problem from signal processing , which aims to reconstruct a length sparse signal from observations:

 min{∥x∥1 |Ax=b,x∈Rn}. (21)

Note that this is a special case of (1) with specifications and . Applying Algorithm 2.1 to problem (21), we derive444The proximity operator is defined as that can be explicitly expressed by the shrinkage operator  to be coded by the MATLAB inner function ‘withresh’. Followed by Lemma 3.1, we use the following stopping criterions under given tolerance:

 It\_{err}(k):=max{∥xk−xk−1∥,∥λk−λk−1∥}max{∥xk−1∥,∥λk−1∥,1}≤10−4andEq\_err(k):=∥Axk−b∥∥b∥≤10−4. (22)

All of the forthcoming experiments use the same starting points and are tested in MATLAB R2018a (64-bit) on Windows 10 system with an Intel Core i7-8700K CPU (3.70 GHz) and 16GB memory.

Consider an original signal containing 180 spikes with amplitude . The measurement matrix is drawn firstly from the standard norm distribution and then each of its row is normalized. The observation is generated by , where

is generated by the Gaussian distribution

on . With the tuned parameters , some computational results under different parameter are shown in Table 1 in which we present the number of iterations (IT), the CPU time in seconds (CPU), the final obtained residuals It_err and Eq_err, as well as the recovery error . Reported results from Table 1 indicate that the choice of could make a great effect on the performance of our algorithm w.r.t. IT and CPU. And it seems that setting would be a reasonable choice to save the CPU time and to cost fewer number of iterations. The reconstruction results under are shown in Fig. 1, from which the solution obtained by our algorithm always has the correct number of pieces and is closer to the original noseless signal.

IT CPU It_err Eq_err RE
-5 886 37.40 9.97e-5 7.96e-5 6.93e-2
-2 827 34.85 9.99e-5 8.34e-5 6.92e-2
-1 844 34.76 9.96e-5 8.52e-5 6.92e-2
-0.5 851 34.83 9.98e-5 8.61e-5 6.92e-2
0 845 34.50 9.97e-5 8.61e-5 6.92e-2
0.2 851 35.15 9.99e-5 8.62e-5 6.92e-2
0.5 826 33.93 9.98e-5 8.62e-5 6.91e-2
1 840 34.64 9.97e-5 8.65e-5 6.91e-2
2 832 34.15 9.99e-5 8.65e-5 6.91e-2
5 855 35.68 9.99e-5 8.43e-5 6.91e-2
10 881 36.63 9.93e-5 7.84e-5 6.92e-2

Table 1: Results by Algorithm 2.1 with different parameter . Fig. 1: Comparison results by its minimum energy reconstruction (a) and by RM-PPA with θ=0.5 (b). Fig. 2: Convergence behaviors of the residuals LER and LIR by different algorithms.

Next, we consider the problem (21) with large-scale dimensions . By comparing the proposed Algorithms 2.1 (RM-PPA) with the aforementioned tuned parameters to

• Algorithm 2.1 without the relaxation step (“M-PPA”),

• The customized PPA (“C-PPA”, ) with parameters ,

• The parameterized PPA (“P-PPA”, ) with parameters ,

we show comparative results about the convergence behaviors of the residuals and in Fig. 2, respectively. The effect on recovering the original signal with different algorithms is shown in Fig. 3. Here, we emphasize that the parameter values in [7, 10] can not terminate the algorithms C-PPA and P-PPA because of the fact for their examples, so we set the same value as ours but keep as the value in their experiments. From Figs. 2-3, we observe that M-PPA is competitive to P-PPA and RM-PPA (that is, Algorithm 2.1) performs better than the rest three algorithms. Fig. 3: Comparison results of the problem (21) with n=20000 by different algorithms.

## References

• 
•  J. Bai, H. Zhang, J. Li, A parameterized proximal point algorithm for separable convex optimization, Optim. Lett. 12 (2018) 1589-1608.
•  J. Bai, J. Liang, K. Guo, Y. Jing, Accelerated symmetric ADMM and its applications in signal processing, (2019) arXiv:1906.12015v2.
•  D. Donoho, Y. Tsaig, Fast solution of -norm minimization problems when the solution may be sparse, IEEE Trans. Inform. Theory, 54 (2008) 4789-4812.
•  G. Gu, B. He, X. Yuan, Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach, Comput. Optim. Appl. 59 (2014) 135-161.
•  B. He, F. Ma, X. Yuan, Optimal proximal augmented Lagrangian method and its application to full Jacobian splitting for multi-block separable convex minimization problems, IMA J. Numer. Anal. (2019) doi:10.1093/imanum/dry092.
•  B. He, X. Yuan, W. Zhang, A customized proximal point algorithm for convex minimization with linear constraints, Comput. Optim. Appl. 56 (2013) 559-572.
•  M. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl. 4 (1969) 303-320.
•  S. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinvesky, An interior-point method for large-scale -regularized least squares, IEEE J-STSP, 1 (2007) 606-617.
•  F. Ma, M. Ni, A class of customized proximal point algorithms for linearly constrained convex optimization, Comp. Appl. Math. 37 (2018) 896-911.
•  B. Martinet, Brve communication, Rgularisation d’inquations variationnelles par approximations successives, ESAIM: Math. Model. Numer. Anal. 4(R3), (1970) 154-159.
•  M. Powell, A method for nonlinear constraints in minimization problems, Optimization (R. Fletcher ed.). New York: Academic Press, (1969) 283-298.
•  R. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res. 1(1976) 97-116.
•  R. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim. 14, (1976) 97-116.
•  J. Yang, X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization, Math. Comput. 82 (2013) 301-329.
•  Y. Zhu, J. Wu, G. Yu, A fast proximal point algorithm for -minimization problem in compressed sensing, Appl. Math. Comput. 270 (2015) 777-784.
•