# A unified approach for projections onto the intersection of ℓ_1 and ℓ_2 balls or spheres

This paper focuses on designing a unified approach for computing the projection onto the intersection of an ℓ_1 ball/sphere and an ℓ_2 ball/sphere. We show that the major computational efforts of solving these problems all rely on finding the root of the same piecewisely quadratic function, and then propose a unified numerical method to compute the root. In particular, we design breakpoint search methods with/without sorting incorporated with bisection, secant and Newton methods to find the interval containing the root, on which the root has a closed form. It can be shown that our proposed algorithms without sorting possess O(n log n) worst-case complexity and O(n) in practice. The efficiency of our proposed algorithms are demonstrated in numerical experiments.

## Authors

• 14 publications
• 220 publications
• 1 publication
01/05/2021

### Effcient Projection Onto the Nonconvex ℓ_p-ball

This paper primarily focuses on computing the Euclidean projection of a ...
09/07/2020

### Efficient Projection Algorithms onto the Weighted l1 Ball

Projected gradient descent has been proved efficient in many optimizatio...
03/27/2019

### A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems

In this paper, we consider high-dimensional nonconvex square-root-loss r...
03/23/2015

### A Machine Learning Approach to Predicting the Smoothed Complexity of Sorting Algorithms

Smoothed analysis is a framework for analyzing the complexity of an algo...
10/31/2019

### Computing with functions in the ball

A collection of algorithms in object-oriented MATLAB is described for nu...
02/12/2021

### From perspective maps to epigraphical projections

The projection onto the epigraph or a level set of a closed proper conve...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper, we consider designing a unified numerical method for computing the solution of the following three types of problems: projection onto the intersection of an ball and an ball:

 minimizex∈Rn12∥x−v∥22subject to x∈Bt1∩B2, (1.1)

projection onto the intersection of an ball and an sphere:

 minimizex∈Rn12∥x−v∥22subject to x∈Bt1∩S2, (1.2)

and projection onto the intersection of an sphere and an sphere:

 minimizex∈Rn12∥x−v∥22subject to x∈St1∩S2. (1.3)

Here the (i.e., Euclidean) norm on is indicated as with the unit ball (sphere) defined as ), and the norm is indicated as with the ball (sphere) with radius denoted as (). Notice that . Trivial cases for the problems of interests are: (a) , in this case implies , meaning , and . (b) , in this case implies , meaning , and . Without loss of generality, we assume in the remainder of this paper.

Problems (1.1) (1.2) and (1.3

) arise widely in modern science, engineering and business. For example, the gradient projection methods for Sparse Principal Component Analysis (sPCA)

[1, 2, 3, 4] often involve problems of (1.1) or (1.3), and (1.3) is also an integral part in efficient sparse non-negative matrix factorization [6, 7]

, supervised online autoencoder intended for classification using neural networks that features sparse activity and sparse connectivity

[8], and dictionary learning with sparseness-enforcing projections [8]. Problem (1.2) often arises in Sparse Generalized Canonical Correlation Analysis (SGCCA)[5], and Witten et al. [4] use (1.2) for computing the rank-1 approximation for a given matrix along with a block coordinate decent method, which can be applied to sparse principal components and canonical correlation.

Our contribution in this paper can be summarized as follows:

• We propose a unified analysis for solving these problems. Specifically, we show that their solutions can all be determined by the root of a piecewisely quadratic auxiliary function.

• A series of properties of the proposed auxiliary function are provided, which provide detailed characterization of the solutions of these problems.

• A unified method with/without sorting is designed for finding the root of the auxiliary function, which accounts for the major computational efforts of solving these problems.

### 1.1 Organization

In the remainder of this section, we outline our notation and introduce various concepts that will be employed throughout the paper. In §2, we discuss the most related existing problems and algorithms. In §3, we introduce our proposed auxiliary function and provide a series of properties of the auxiliary function. We use the proposed auxiliary function to characterize the optimal solutions in §4. A unified algorithm is proposed in §5 for finding the root of the auxiliary function. The results of numerical results are shown in §6. Concluding remarks are provided in §7.

### 1.2 Notation

For any , let be the -th element of and be the nonnegative orthant of , i.e., . Denote the soft thresholding operator in with threshold by , i.e., for any , for . Given , denote as the projection of onto the nonnegative orthant , i.e. . The norm of is defined as with and , where is the number of groups. For a compact set and , denote . The function is convex, then the subdifferential of at is given by

Denote

be the vector of all ones. The largest and the second-largest of

are denoted as and , respectively. To simplify the analysis, we assume are the distinct components of such that with and .

## 2 Related methods

We discuss the most related works in this section.

Projection onto ball. As for projection onto a single ball, many algorithms have emerged. It can be shown [9, 10, 11] that the projection of onto can be characterized by the root of the auxiliary function

 ψ(λ):=n∑i=1max(vi−λ,0)−t=∥(v−λ1)+∥1−t.

The properties of are summarized in the proposition below.

###### Proposition 2.1.

Function is continuous, strictly decreasing and piecewisely linear on with breakpoints , and for any .

By Proposition 2.1, has a unique root on since as and . The algorithms for computing the ball projection are summarized and compared in [12], in which an efficient algorithm is also proposed with worst-case complexity and observed complexity .

Group ball projection. The first related work is the Euclidean projection onto the intersection of and norm balls ( or ) proposed by Su et al. [13]. With and one group, this problem reverts to (1.1). They proved that the projection can be reduced to finding the root of an auxiliary function

 ϕ1(λ):=∥Sλ(v)∥1/∥Sλ(v)∥2−t,λ∈[0,vmax).

Su et al. [13] studied the properties of this auxiliary function, which are summarized in the following Lemma 2.2. Based on this lemma, a bisection algorithm is proposed to find the root of .

###### Lemma 2.2 ([13] Theorem 1).

The following statements hold true: (i) is continuous piece-wise smooth on ; (ii) is monotonically decreasing and has a unique root in .

Remark: However, part (ii) of this lemma may not hold in general. We show this by the following two counterexamples.

Example 1. Consider and . Then for

 ϕ1(λ)=max(1−λ,0)+max(−λ,0)√(max(1−λ,0))2+(max(−λ,0))2−1.2=−0.2.

Obviously, for this instance, has no root on . Therefore, Lemma 2.2 does not hold.

Example 2. Consider and . Then for

 ϕ1(λ)=2max(1−λ,0)+max(−λ,0)√2(max(1−λ,0))2+(max(−λ,0))2−√2≡0.

Clearly, any point in is the root of , so that Lemma 2.2 does not hold.

Sparseness-enforcing projection operator. Another related work is the “sparseness-enforcing projection operator” proposed by Hoyer [6], which requires the solution to satisfy a normalized smooth “sparseness measure” defined by

 σ:Rn∖{0}→[0,1], v↦(√n−∥v∥1/∥v∥2)/(√n−1).

This leads to solving the problem of (1.3).

Theis et al. [14] shown that the projection is almost surely unique for drawn from a continuous distribution, and if it is unique, the projection is shown to be determined by the root of . We summarized the results in Lemma 2.3. Algorithms for solving (1.3) mainly include the alternating projection method in [6, 14], the method of Lagrange multipliers based on sorted in [7], and the method in [8] based on computing the root of the auxiliary function .

###### Lemma 2.3.

([8, Lemma 3 in Appendix]) Let be a point such that is unique and . Then is well defined and the following hold:

1. is continuous on ;

2. is differentiable on .

3. is strictly decreasing on , and is constant on .

4. has a unique root , and .

Remark: Here the condition holds if and only . Compared with Theorem 4.3, Lemma 2.3 may not include the situation where the projection is not unique or the projection is unique but .

Projection onto intersection of an ball and an sphere. Tenenhaus et al. [5] provided a close form of the solution (1.2). The algorithms for solving (1.2) mainly include the root finding with bisection proposed by [5] and the root finding method with sorting by [15]. Let and suppose the elements are sorted in descent order. They analyzed the properties of in the following lemma.

###### Lemma 2.4.

([15, Proposition 1]) The following statements hold true.

• is continuous and decreasing.

• Let be the number of elements of equal to . For , there exists and such that .

• is a solution of a second degree polynomial equation.

Remark. Part (ii) of Lemma 2.4 shown that is the sufficient condition that have a root on . However, Example 1 is a counterexample indicating that is not sufficient to guarantee has a root on .

## 3 Proposed auxiliary function

Based on the discussion in §2, most existing projection algorithms onto the intersection of and balls/spheres are constructed by using the auxiliary function . Our proposed methods are based on different auxiliary functions for characterizing the properties of the projections, which is the main focus of this section.

We first show that the solutions of (1.1)/(1.2)/(1.3), have the same sign as the given , which is a generalized result of the ball projection in [9, 16].

###### Proposition 3.1.

Let be the first-order optimal to (1.1)/(1.2)/(1.3), then for .

###### Proof.

Assume by contradiction that there exists such that . Define such that and for all , implying . Therefore, is feasible for (1.1)/(1.2)/(1.3). However,

 12(∥x−v∥22−∥^x−v∥22)=12((xi0−vi0)2−(−xi0−vi0)2)=−2vi0xi0>0,

contradicting that is optimal for (1.1)/(1.2)/(1.3). This completes the proof. ∎

Using the symmetry of the feasible region stated in Proposition 3.1, we can transform the original problems (1.1), (1.2) and (1.3) to their corresponding problems restricted in , so that from now on we can focus on the following problems

 minimizex12∥x−v∥22subject to x∈Bt1∩B2∩Rn+, (3.1)
 minimizex12∥x−v∥22subject to x∈Bt1∩S2∩Rn+, (3.2)
 minimizex12∥x−v∥22subject to x∈St1∩S2∩Rn+, (3.3)

corresponding to (1.1), (1.2) and (1.3), respectively.

We define the following univariate function for given and :

 ϕ(λ):= ∥(v−λ1)+∥21−t2∥(v−λ1)+∥22.

Denote the index set of components greater than or equal to a given :

 Iλ={i:vi≥λ,i=1,…,n}  and  Iλ=|Iλ|.

The summations of those components and the squared components are denoted as and respectively. For simplicity, for the distinct values in , we write

 Ij=Iλj, Ij=Iλj, sj=sλj, wj=wλj,  for j=1,…k.

Notice that since ,

 Ij⊂Ij+1, Ij

In particular, it is obvious that

 ⎧⎪⎨⎪⎩Iλ=Ij,Iλ=Ij,sλ=sj,wλ=wj, ∀λ∈(λj+1,λj], j=1,...,k,Ik=n,sk=n∑i=1vi,wk=n∑i=1v2i. (3.4)

Therefore, we can rewrite as

 ϕ(λ) = (Iλ−t2)(Iλλ−2sλ)λ+s2λ−t2wλ. (3.5)

For , , , , define

 φj(λ):=(Ij−t2)(Ijλ−2sj)λ+s2j−t2wj, j=1,…,k. (3.6)

For brevity, let which must exist by the fact that and .

The properties of dependent on are analyzed below.

###### Lemma 3.2.
1. For , is concave on and strictly increasing on .

2. If and , is convex and strictly decreasing on . If and , is convex on and strictly decreasing on and

 φjt(λ)≡s2jt−t2wjt≤0, ∀λ∈R, (3.7)

where the equality holds only if .

3. For and , for any .

4. For , the smaller root for is

 λφj=1Ij⎛⎜⎝sj−t ⎷Ijwj−s2jIj−t2⎞⎟⎠. (3.8)

There is no root for if .

###### Proof.

(i) It follows from (3.6) that the first and second derivative of is

 φ′j(λ)=2(Ij−t2)(Ijλ−sj)  andφ′′j(λ)=2Ij(Ij−t2). (3.9)

Note that for any . Therefore, both the sign of and are determined by the sign of . For , on and on since by the definition of .

(ii) For and , we have on and on . For and , we have on and on ; in particular, and takes constant (3.7) on by the definition (3.6).

(iii) It holds naturally that

 Ij= Ij−1+(Ij−Ij−1), sj= sj−1+(Ij−Ij−1)λj, wj= wj−1+(Ij−Ij−1)λ2j.

Plugging this into yields that . Moreover, it can be easily verified that for ,

 φ′j(λ)−φ′j−1(λ)=2(Ij−Ij−1)[(Ij−t2)(λ−λj)+Ij−1λ−sj−1].

If , then , meaning for . In addition,

 Ij−1λ−sj−1=λ|Ij−1|−∑i∈Ij−1vi<λj|Ij−1|−∑i∈Ij−1vi<0

for any . Therefore, for , it holds that It then follows that for any , completing the proof of (iii).

(iv) The discriminant of is Now we discuss the sign of . By the Cauchy-Schwarz inequality

 Ijwj−s2j=|Ij|∑i∈Ijv2i−(∑i∈Ijvi)2≥0,

where the inequality holds strictly for since there are at least two distinct values in the summation. Therefore, if , then and the smaller root of is given by (3.8). In particular, if , then and has a unique root . Moreover, if , then since and , implying has no root. This completes the proof of (iv). ∎

###### Proposition 3.3.

The following statements hold true.

1. is continuous on .

2. Suppose , is decreasing, piecewisely convex and quadratic on .

3. Suppose , is decreasing, piecewisely convex and quadratic on and on .

4. Suppose . is increasing and piecewisely concave and quadratic on . Furthermore, if , then is decreasing and piecewisely convex and quadratic on ; if , then is decreasing and piecewise quadratic convex on , and on

 φjt(λ)≡s2jt−t2wjt.
5. For any ,

 ϕ(λ)=max{qjt(λ),…,qk(λ)}, (3.10)

and is convex on . Furthermore, for for and .

###### Proof.

Part (i) is trivial.

For part (ii), it can easily verified that

 ϕ(λ)=φj(λ),∀λ∈(λj+1,λj], j=1,...,k (3.11)

since (3.4) and (3.5). Moreover, for .

Part (iii) follows naturally from Part (ii) and Lemma 3.2(ii).

For part (iv), Lemma 3.2(ii) shows is convex for . Since Lemma 3.2(iii), takes form (3.10). Therefore, is convex on with . ∎

Using Proposition 3.3, we can summarize the behavior of as follows.

###### Proposition 3.4.

For , the following statements hold true:

1. If , then for any .

2. If , then for any and for any .

3. If , possesses a unique root on and this root lies in . Furthermore, possesses a unique root on if and only if .

###### Proof.

(i) If , then . By Proposition 3.3(ii), is strictly decreasing on . Therefore, part (i) is true.

(ii) If , then and since . By Proposition 3.3(iii), is decreasing on . Hence part (ii) is true.

(iii) If , then and is strictly increasing on by Proposition 3.3 (iv). Now we consider two cases. If , is decreasing on by Proposition 3.3 (iv); this together with the fact is continuous and , implies part (iii) is true. If , is strictly decreasing on and keeps a negative constant by Proposition 3.3 (iv) because and . This implies that attains 0 only once on , and more precisely we know the root lies in . Overall, we know part (iii) is true. ∎

## 4 Characterizing the solution

In this section, we use to characterize the solution of (3.1), (3.2) and (3.3). Notice that (3.1) is convex; (3.2) and (3.3) are nonconvex. We develop a unified framework using the partial Lagrangian duality, which takes form

Here for each problem the dual variables is associated with the ball/sphere constraint and is associated with the ball/sphere constraint, respectively. The dual function is given by

 g(λ,μ)=infx∈Rn+L(x,λ,μ). (P)

The properties of are analyzed in the following lemma.

###### Lemma 4.1.

For given , the following hold.

1. Suppose . If , then the optimal solution of () is ; if , then any satisfying

 xi≥0,i∈I1, and  xi=0,i∉I1; (4.1)

is optimal. In both cases, we have

 g(λ,μ)=12∥v∥22−λt+12.
2. Suppose . The solution of () is

 x(λ,μ)=11+μ(v−λ1)+. (4.2)

with dual function being

 g(λ,μ)=12∥v∥22−λt−μ2−12(1+μ)∥(v−λ1)+∥22

and partial derivative

 ∂g∂λ =11+μ∥(v−λ1)+∥1−t, (4.3) ∂g∂μ =1(1+μ)2∥(v−λ1)+∥22−1. (4.4)

Moreover, if and only if with and . In addition, reduces to .

3. If or and , then .

###### Proof.

(i) Suppose . We have Clearly, if , the optimal solution of () is ; if , the solution must satisfy (4.1). The rest of (i) is trivial.

(ii) Suppose . Let be the multipliers for . The optimal must satisfy

 x−v+λ1+μx−ζ=0, xTζ=0, x≥0, ζ≥0.

If , meaning , it follows that . If , it follows that . Therefore, (4.2) is true, and (4.1), (4.3) and (4.4) can be computed accordingly.

Now, suppose is stationary for . It holds that , implying and

 ϕ(λ∗)=(1+μ∗)2[∂∂λg(λ∗,μ∗)+t]2−(1+μ∗)2t2[∂∂μg(λ∗,μ∗)+1]2=0.

Conversely, if with , letting

 μ∗=√∥(v−λ∗1)+∥2−1,

we can see and . Hence is stationary for . This completes the proof of part (ii).

(iii) It can be verified trivially.

### 4.1 Projection onto Bt1∩B2∩Rn+

We first use the dual to analyze the properties of the solution of (3.1). Consider the Lagrangian dual problem of (3.1)

 maximizeg(λ,μ)subject toλ≥0,μ≥0. (D1)

Let solve dual (). If the solution of () for given is feasible for (3.1) and satisfies the complementary condition

 λ∗(1Tx∗−t)=0and  μ∗(∥x∗∥22−1)=0, (4.5)

then