# Solving Large Scale Quadratic Constrained Basis Pursuit

Inspired by alternating direction method of multipliers and the idea of operator splitting, we propose a efficient algorithm for solving large-scale quadratically constrained basis pursuit. Experimental results show that the proposed algorithm can achieve 50  100 times speedup when compared with the baseline interior point algorithm implemented in CVX.

## Authors

• 10 publications
12/29/2014

We consider a proximal operator given by a quadratic function subject to...
07/01/2019

### The Constrained L_p-L_q Basis Pursuit Denoising Problem

In this paper, we consider the constrained L_p-L_q basis pursuit denoisi...
04/04/2014

### Orthogonal Rank-One Matrix Pursuit for Low Rank Matrix Completion

In this paper, we propose an efficient and scalable low rank matrix comp...
09/01/2017

### Iteratively Linearized Reweighted Alternating Direction Method of Multipliers for a Class of Nonconvex Problems

In this paper, we consider solving a class of nonconvex and nonsmooth pr...
09/09/2019

### An Efficient Algorithm for Multiple-Pursuer-Multiple-Evader Pursuit/Evasion Game

We present a method for pursuit/evasion that is highly efficient and and...
07/14/2021

### Solving discrete constrained problems on de Rham complex

The main difficulty in solving the discrete constrained problem is its p...
04/07/2017

### "RAPID" Regions-of-Interest Detection In Big Histopathological Images

The sheer volume and size of histopathological images (e.g.,10^6 MPixel)...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Theoretical guarantees

We reformulate (0.1) as

 minx,zg(x)+f(z), s.t. Ax=z, (1.1)

where , and is an indicator function defined as

 f(z):={0,z∈Ω,∞,z∉Ω,Ω={z:∥z−y∥2≤η}. (1.2)

The Lagrangian dual of (1.2) is

 L(v) :=minx,z(g(x)+f(z)+vT(Ax−z)) =−f∗(v)−g∗(−ATv), (1.3)

where is the dual variable, is the convex conjugate of at , and is the convex conjugate of at , i.e.,

 g∗(−ATv):={0,−ATv∈Ω′,∞,−ATv∉Ω′,Ω′:={x:∥x∥∞≤1} (1.4)

and

 f∗(v):=y+η∥v∥2v. (1.5)

The dual problem can be formulated as

 maxv,μ−f∗(v)−g∗(μ), s.t. −ATv=μ. (1.6)

Assume the Slater’s condition holds, i.e., there exists such that , then the convexity of problem (0.1) implies that the optimal solution will achieve zero duality gap, i.e.,

 g(x)+f(z)=−f∗(v)−g∗(μ). (1.7)

From KKT conditions, the optimal solution must satisfy (1.7) and

 Ax=z,−ATv=μ. (1.8)

Thus, the (1.7) and (1.8) can be used as optimality certificates or stopping criterion in algorithm design. More specifically, we define primal residual, dual residual, and duality gap with respect to a certain tuple as

 rp:=∥Ax−z∥,rd:=∥ATv+μ∥,δg=g(x)+f(z)+f∗(v)+g∗(μ). (1.9)

## 2 Algorithm design based on ADMM

We adopt ideas from alternating projection methods, and reformulate (1.1) as

 minx,z,x′,z′g(x)+f(z)+IG(x′,z′), s.t. [xz]=[x′z′], (2.1)

where is defined as

 IG(x′,z′):={0,(x′,z′)∈G,∞,(x′,z′)∉G,G:={(x,z):Ax=z}. (2.2)

The augmented Lagrangian of (2.2) becomes

 Lρ(x,y,x′,y′,v):=g(x)+f(z)+IG(x′,z′)+VT([xz]−[x′z′])+ρ2∥∥∥[xz]−[x′z′]∥∥∥22, (2.3)

where is the dual variable, and is a parameter. Define

 X:=[xz]∈Rd+m,X′:=[x′z′]∈Rd+m, (2.4)

and we get the iterations in ADMM are

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩Xk+1:=argminXLρ(X,X′k,Vk),X′k+1:=argminX′Lρ(Xk+1,X′,Vk),vk+1:=Vk+ρ(Xk+1−X′k+1). (2.5)

More specifically,

 Xk+1 :=argminXLρ(X,X′k,Vk) =⎡⎢⎣argminx(g(x)+vkxT(x−x′k)+ρ2∥x−x′k∥22)argminz(f(z)+vkzT(z−z′k)+ρ2∥z−z′k∥22)⎤⎥⎦ =⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣argminx(g(x)+ρ2∥∥∥x−(x′k−vkxρ)∥∥∥22)argminz(f(z)+ρ2∥∥∥z−(z′k−vkzρ)∥∥∥22)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (2.6)

or simply

 Xk+1=[proxg(x′k−~xk)proxf(z′k−~zk)] (2.7)

where is the proximator of function at which is defined as

 proxg(v) =argminx(g(x)+ρ2∥x−v∥22) (2.8)

The and are defined as

 ~xk:=vkxρ,~zk:=vkzρ. (2.9)

More specifically, the proximator of at is

 proxg(v)=S1/ρ(v), (2.10)

where is the elementwise soft thresholding function, i.e.,

 [S1/ρ(v)]i:=⎧⎨⎩vi−1/ρ,vi>1/ρ,0,|vi|≤1/ρ,vi+1/ρ,vi<−1/ρ. (2.11)

The proximator of at is

 proxf(v)=η∥v−y∥2(v−y)+y. (2.12)

The updating rule for can be specified as

 X′k+1 :=argminX′Lρ(Xk+1,X′,Vk) =argminX′∈G12∥∥∥X′−(Xk+1+vkρ)∥∥∥22 =argminX′∈G12∥∥x′−(xk+1+~xk)∥∥22+12∥∥z′−(zk+1+~zk)∥∥22 =∏(xk+1+~xk,zk+1+~zk) (2.13)

where is the projection of onto , i.e., the solution to

 minx′,z′12∥x′−x∥22+12∥z′−z∥22, s.t. Ax′=z′. (2.14)

Define

 ~X:=Vρ=[vxρvzρ], (2.15)

and the updating rule for dual variable can be written as

 ~Xk+1=~Xk+(Xk+1−X′k). (2.16)

### 2.1 Analytic solution to (2.14)

Since (2.14) is convex, from the KKT conditions of (2.14), we know that are the optimal solution to (2.14) if and only if there exists such that

 ⎧⎨⎩Ax′=z′,x′−c+ATλ=0,z′−d−λ=0, (2.17)

which implies that the optimal can be obtained from solving the following linear system

 [A−Im×mId×dAT][x′z′]=[0ATz+x] (2.18)

Remarks: (1) the matrix is highly sparse, and this structure can be combined with other potential structured of to simplify the computation; (2) even simple elimination can be used to simplify the problem, i.e.,

 {z′=(AAT+Im×m)−1(AATz+Ax),x′=AT(z−z′)+x, (2.19)

or

 {x′=(ATA+Id×d)−1(x+ATz),z′=Ax′. (2.20)

Both the two matrices and are positive definite, thus factorization techniques can be used to accelerate the computation; (3) since , the (2.19) will be more efficient; (4) apply Cholesky decomposition once to get ; (5) calculate once; (6) solve for backward, i.e., ;

### 2.2 Algorithm in pseudocodes

The algorithm can be summarized as in Algorithm 1.

Computational complexity - running time: (1) line 5, 7, and 8 takes ; (2) line 6 takes for Cholesky decomposition over , for once, for backward solving using (2.19); (3) line 9 and 10 takes . Thus, but only once in total;

Computational complexity - space or memory: ;

Baseline algorithm, CVX using interior point method: (1) but multiple times.

## 3 Numerical experiments

Computational environment: (1) desktop with Intel(R) Core(TM) i7-6700 CPU 3.40GHz 3.40 GHz, 32.0 GB RAM; (2) OS Windows 10 Education; (3) MATLAB R2018a; (4) baseline CVX which solves (0.1) using interior point method;

Computational setup: (1) is assumed to be sparse with cardinality , and generated randomly; (2) , and generate randomly; (3) generate noise randomly and normalize it to have magnitude ; (4) is assumed to be generated via ;

Results: see Table 1 and Figure 1