## I Introduction

Most natural signals are well described by only a few significant coefficients in an appropriate transform domain, with the number of significant coefficients much smaller than the signal size. Therefore, for a vector

that represents the signal and an appropriate*sparsifying transform*, is a signal transform-coefficient vector with most elements having negligible magnitudes. The idea behind compressed sensing [Candes2006] is to

*sense*the significant components of using a small number of measurements. Define the noiseless measurement vector , where and . Most effort in compressed sensing has focused on the linear sparsifying transform and noiseless measurement models with

(1a) | |||

(1b) |

where and are known *sparsifying dictionary*
and *sensing* matrices. Here, we consider signals that belong
to a closed convex set in addition to their sparsity in the transform
domain. The nonnegative signal scenario with

(2) |

is of significant practical interest and applicable to X-ray CT, SPECT, PET, and MRI, where the pixel values correspond to inherently nonnegative density or concentration maps [PrinceLinks2015]. Harmany2012 consider such a nonnegative sparse signal model and develop in [Harmany2012] and [Harmany2010Gradient] a convex-relaxation SPIRAL and a linearly constrained gradient projection method for Poisson and Gaussian linear measurements, respectively. In addition to signal nonnegativity, other convex-set constraints have been considered in the literature, such as prescribed value in the Fourier domain; box, geometric, and total-energy constraints; and intersections of these sets [YoulaWebb1982, SezanStark1983].

We adopt the analysis regularization framework and minimize | |||

(3a) | |||

with respect to the signal , where is a convex
differentiable data-fidelity (NLL) term, is a
scalar tuning constant that quantifies the weight of the convex
regularization term that
imposes signal sparsity and the convex-set constraint: | |||

(3b) |

and is the indicator function. Common choices for the signal sparsifying transform are the linear map in (1a), isotropic gradient map

(4) |

and their combinations; here, is the index set of neighbors of in an appropriate (e.g., 2D) arrangement. Summing (4) over leads to the isotropic TV penalty [gdasil15, Harmany2012, Beck2009TV]; in the 2D case, anisotropic TV penalty is slightly different and easy to accommodate as well. Assume

(5) |

which ensures that is computable for all and closure ensures that points in but close to its open boundary, if there is any, will not be excluded upon projecting onto the closed set . If is not empty, then is not computable in it, which needs special attention; see Section III.

Define the proximal operator for a function scaled by :

(6) |

References [DupeFadiliStarck2012, Raguet2013GFB, Condat2013Primal] view (3a) as a sum of three terms, , , and , and minimize it by splitting schemes, such as forward-backward, Douglas-Rachford, and primal-dual. A potential benefit of splitting schemes is that they apply proximal operations on individual summands rather than on their combination, which is useful if all individual proximal operators are easy to compute. However, [DupeFadiliStarck2012] requires the proximal operator of , which is difficult in general and needs an inner iteration. Both [DupeFadiliStarck2012] and GFB splitting [Raguet2013GFB] require inner iterations for solving (see (1a) and (6)) in the general case where the sparsifying matrix is not orthogonal. The elegant PDS method in [Condat2013Primal, Vu2013] does not require inner iterations. The convergence rate of both GFB and PDS methods can be upper-bounded by where is the number of iterations and the constant is determined by values of the tuning proximal and relaxation constants [Liang2014Conv, Davis2015ConvPDS].

In this paper, we develop a PNPG method whose momentum acceleration
accommodates (increasing) adaptive step size selection (see also
[gdasil15, gdasil14, NesterovTechReport]) and convex-set constraint on
the signal . PNPG needs an inner iteration to compute the
proximal operator with respect to , which implies inexact proximal
operation. We account for this inexactness and establish
convergence rate of the PNPG method *as
well as* convergence of its iterates; the obtained convergence conditions
motivate our selection of convergence criteria for proximal-mapping
iterations. We modify the original Nesterov’s acceleration
[Nesterov1983, Beck2009FISTA] so that we can establish these
convergence results when the step size is adaptive and
adjusts to the local curvature of the NLL.
Thanks to the step-size adaptation, PNPG *does not* require
Lipschitz continuity of the gradient of the NLL and applies to the
Poisson compressed-sensing scenario described in
Section II-A. Our integration of the adaptive step size
and convex-set constraint extends the application of the Nesterov-type
acceleration to more general measurement models than those used previously.
Furthermore, a convex-set constraint can bring significant improvement to
signal reconstructions compared with imposing signal sparsity only, as
illustrated in Section LABEL:sec:linear1dex. See
Section LABEL:sec:Okminustwoaccelerationapproaches for discussion of other
acceleration approaches: Auslender-Teboulle (AT)
[Auslender2006AT, Becker2011TFOCS]
and BonettiniPortaRuggiero2015
[BonettiniPortaRuggiero2015].
Proximal Quasi-Newton type methods with *problem-specific* diagonal
Hessian approximations have been considered in
[BonettiniPortaRuggiero2015, BonettiniLorisPortaPrato2015];
[BonettiniLorisPortaPrato2015] applies step-size adaptation and
accounts for inaccurate proximal operator, but *does not* employ
acceleration or provide fast convergence-rate guarantees.

PNPG code is easy to maintain: for example, the proximal-mapping
computation can be easily replaced as a module by the latest
state-of-the-art solver. Furthermore, PNPG requires minimal
*application-independent tuning*; indeed, we use the same set of
tuning parameters in two different application examples. This is in
contrast with the existing splitting methods, which require
problem-dependent (NLL-dependent) design and tuning.

We introduce the notation: , ,

, denote the vectors of zeros and ones and identity matrix, respectively; “

” is the elementwise version of “”. For a vector , define the projection and soft-thresholding operators:(7a) | |||||

(7b) | |||||

and the elementwise logarithm and exponential functions and . The projection onto and the proximal operator (6) for the -norm can be computed in closed form: | |||||

(7c) |

Define the -subgradient [Rockafellar1970, Sec. 23]:

and an *inexact proximal operator*:

###### Definition 1

We say that is an approximation of with -precision [Villa2013Inexact], denoted

(9a) | |||

if | |||

(9b) |

Note that (9a) implies

(10) |

We introduce representative NLL functions in Section II, describe the proposed PNPG reconstruction algorithm in Section III, establish its convergence properties (Section LABEL:sec:convergence_analysis), present numerical examples (Section LABEL:sec:NumEx), and make concluding remarks (Section LABEL:sec:conclusion).

## Ii Probabilistic Measurement Models

For numerical stability, we normalize the likelihood function so that the corresponding NLL

is lower-bounded by zero. For NLL that correspond to discrete GLM, this normalization corresponds to the generalized Kullback-Leibler divergence form of the NLL and is also closely related to the Bregman divergence

[BanerjeeDhillon2005].### Ii-a Poisson Generalized Linear Model

GLM with Poisson observations are often adopted in astronomic, optical, hyperspectral, and tomographic imaging [Willett2014, PrinceLinks2015, StarckMurtagh2006, Snyder1993] and used to model event counts, e.g., numbers of particles hitting a detector. Assume that the measurements

are independent Poisson-distributed

^{1}

^{1}1 Here, we use the extended Poisson pmf for all by defining to accommodate the identity-link model. with means .

Upon ignoring constant terms and normalization, we obtain the generalized Kullback-Leibler divergence form [ZanniBertero2014] of the NLL

(11a) | |||

The NLL is a convex function of the signal . Here, the relationship between the linear predictor and the expected value of the measurements is summarized by the link function [McCullagh1989]: | |||

(11b) |

Note that .

Two typical link functions in the Poisson GLM are log and identity, described in the following:

#### Ii-A1 Identity link

The identity link function with

(12) |

is used for modeling the photon count in optical imaging [Snyder1993]
and radiation activity in emission tomography
[PrinceLinks2015, Ch. 9.2], as well as for astronomical image
deconvolution [StarckMurtagh2006, Sec. 3.5.4]. Here, and are
the known sensing matrix and intercept term, respectively; the intercept
models background radiation and scattering determined, e.g., by
calibration before the measurements have been collected.
The nonnegative set in (2) satisfies (5),
where we have used the fact that the elements of are nonnegative.
If has zero components, is *not empty*
and the NLL does not have a Lipschitz-continuous gradient.

Setting leads to the identity link without intercept used, e.g., in [Snyder1993, StarckMurtagh2006, Harmany2012].

#### Ii-A2 Log link

The log-link function

(13) |

has been used to account for the exponential attenuation of particles
(e.g., in tomographic imaging), where is the incident energy before
attenuation. The intercept term is often assumed known [Lange2013, Sec.
8.10]. The Poisson GLM with log link function is referred
to as the *log-linear model* in [McCullagh1989, Ch. 6], which
treats known and unknown as the same model.

Log link with unknown intercept. For unknown ,
(11a) does not hold because the underlying NLL is a
function of *both* and . Substituting
(13) into the NLL function, concentrating it with
respect to , and ignoring constant terms yields the following convex
concentrated (profile) NLL:

(14) |

see [NesterovTechReport, App. LABEL:report-app:derconcentratednllpoisson], where we also derive the Hessian of (14). Note that ; hence, any closed convex satisfies (5).

### Ii-B Linear Model with Gaussian Noise

Linear measurement model (1b) with zero-mean AWGN leads to the following scaled NLL:

(15) |

where is the measurement vector and constant terms (not functions of ) have been ignored. This NLL belongs to the Gaussian GLM with identity link without intercept: . Here, , any closed convex satisfies (5), and the set is empty.

Minimization of the objective function (3a) with penalty
(3b) and Gaussian NLL (15) can be thought of
as an *analysis BPDN problem with a convex signal constraint*;
see also [NesterovTechReport, gdasil14] which use the nonnegative
in (2). A synthesis BPDN problem with a convex signal
constraint was considered in [YamagishiYamada2009].

## Iii Reconstruction Algorithm

We propose a PNPG approach for minimizing (3a) that combines convex-set projection with Nesterov acceleration [Nesterov1983, Beck2009FISTA] and applies adaptive step size to adapt to the local curvature of the NLL and restart to ensure monotonicity of the resulting iteration. The pseudo code in Algorithm 1 summarizes our PNPG method.

Define the quadratic approximation of the NLL :

(16) |

with chosen so that (16) majorizes in the neighborhood of . Iteration of the PNPG method proceeds as follows:

(17a) | |||

(17d) | |||

(17e) | |||

(17aw) |

We first prove Lemma LABEL:thm:lll and then derive the acceleration (17a)–(17e) and prove Theorem LABEL:thm:conv.

###### Proof:

The following result from [BertsekasOzdaglarNedic2003, Proposition 2.2.1] states that the distance between and can be reduced by projecting them onto a closed convex set .

###### Lemma 2 (Projection theorem)

The projection mapping onto a nonempty closed convex set is nonexpansive

(A2) |

for all .

We now derive the Nesterov’s acceleration step (17)–(17e) with goal to select in (17aw) that achieves the convergence rate of .

Define sequences and , multiply them with (LABEL:eq:stari) and (LABEL:eq:im1i), respectively, add the resulting expressions, and multiply by to obtain

(A3) | |||||

where

(A4a) | |||||

(A4b) | |||||

(A4c) |

We arranged (A3) using completion of squares so that the first two summands are similar (but with opposite signs), with goal to facilitate cancellations as we sum over . Since we have control over the sequences and , we impose the following conditions for :

(A5a) | |||||

(A5b) |

where

(A6) |

Now, due to , the inequality (A7c) leads to

(A8) |

with simple upper bound on the right-hand side, thanks to summand cancellations facilitated by the assumptions (A5).

As long as grows at a rate of and the inaccuracy of the proximal mappings leads to bounded , the centered objective function can achieve the desired bound decrease rate of . Now, we discuss how to satisfy (A5) and the growth rate of by an appropriate selection of .

### A-I Satisfying Conditions (A5)

#### A-Ia Imposing equality in (A5a)

(A5a) holds with equality for all and any when we choose that satisfy

(A9) |

Now, (A9) requires equal coefficients multiplying on both sides, thus for all , where is a constant (not a function of ), which implies and , see also (A4a). Upon defining

(A10a) | |||||

we have | |||||

(A10b) |

Plug (A10) into (A9) and reorganize to obtain the following form of momentum acceleration:

(A11) |

Although satisfies (A5a), it
is not guaranteed to be within ; consequently, the
proximal-mapping step for this selection *may not* be computable.

#### A-Ib Selecting that satisfies (A5a)

We now seek within that satisfies the inequality (A5a). Since and are in , by the convexity of ; see (A4c). According to Lemma 2, projecting (A11) onto preserves or reduces the distance between points. Therefore,

(A12) |

belongs to *and* satisfies the condition (A5a):

(A13a) | |||||

(A13b) |

where (A13a) and (A13b) follow from (A9) and by using Lemma 2, respectively; see also (A4b).

Without loss of generality, set and rewrite and modify (A6), (A4b), and (A7c) using (A10) to obtain

(A14a) | |||||

(A14b) | |||||

(A14c) |

where (A14c) is obtained by discarding the negative term and the zero term (because ) on the left-hand side of (A7c). Now, (LABEL:eq:DueTo0Theta) follows from (A8) by using (see (17)), (A10), and (A14b) with .

#### A-Ic Satisfying (A5b)

### A-Ii Connection to Convergence-Rate Analysis of Fista

If the step-size sequence is non-increasing (e.g., in the backtracking-only scenario with ), (17) with also satisfies the inequality (LABEL:eq:solvedTheta). In this case, (LABEL:eq:DueTo0Theta) still holds but (LABEL:eq:upperboundonDeltawithbetaonly) does not because (LABEL:eq:thetaGrow) no longer holds. However, because , we have and

(A15) |

which generalizes [Beck2009FISTA, Th. 4.4] to include the inexactness of the proximal operator and the convex-set projection.

## Appendix B Convergence of Iterates

To prove convergence of iterates, we need to show that the centered objective function decreases faster than the right-hand side of (LABEL:eq:upperboundonDeltawithbetaonly). We introduce Lemmas 3 and 4 and then use them to prove Theorem LABEL:thm:convItr. Throughout this Appendix, we assume that Assumption LABEL:th1cond of Theorem LABEL:thm:convItr holds, which justifies (LABEL:eq:twoineq) and (LABEL:eq:thetaCond0) as well as results from Appendix 17 that we use in the proofs.

###### Lemma 3

Under Assumptions LABEL:th1cond–LABEL:convitergammabcond of Theorem LABEL:thm:convItr,

(B1) |

###### Proof:

By letting in (A14c) and using (LABEL:eq:Econverges), we obtain

(B2) |

For , rewrite (A14a) using expressed in terms of (based on (17)):

(B3) | |||||

where the inequality in (B3) is due to ; see Assumption LABEL:convitergammabcond. Apply nonexpansiveness of the projection operator to (LABEL:eq:im1i) and use (A11) to obtain

(B4) |

then multiply both sides of (B4) by , sum over and reorganize:

(B5a) | |||||

(B5b) | |||||

where (see (A14a)) | |||||

(B5c) | |||||

(B5d) |

and we drop the zero term and the negative term from (B5a) and use the fact that implied by (B3) to get (B5b). Finally, let and use (LABEL:eq:Econverges) and (B2) to conclude (B1).

###### Lemma 4

For ,

(B6) |

###### Proof:

For ,

(B7a) | |||||

(B7b) |

where we obtain the inequality (B7a) by combining the terms on the right-hand size and using (LABEL:eq:grb1) and (B7b) holds because is an increasing sequence (see Section LABEL:sec:convergence_analysis). Now,

(B8a) | |||||

(B8b) |

where (B8a) follows by using (17d), (LABEL:eq:thetaCond) with , and fraction-term cancellation; (B8b) is obtained by substituting (B7b) into (B8a) and canceling summation terms. (B8b) implies (B6) by using (LABEL:eq:grb1) with .

Define

(B9) |

Since converges to as the iteration index grows and is a minimizer, it is sufficient to prove the convergence of , see [Chambolle2015Convergence, Th. 4.1].

###### Proof:

Use (LABEL:eq:stari) and the fact that to get

(B10) |

Now,

(B11a) | |||||

(B11b) |

where (B11a) and (B11b) follow by using the nonexpansiveness of the projection operator (see also (A11)) and the identity

(B12) |

respectively. Combine the inequalities (B11b) and (B10) to get

(B13a) | |||||

(B13b) |

where (B13b) is due to (see (LABEL:eq:xi)) and the following

(B14a) | |||||

(B14b) |

where we have used (17d) and that is an increasing sequence, (see Section LABEL:sec:stepsize), and (LABEL:eq:xi).

According to (LABEL:eq:thetaGrow) and Assumption LABEL:stepsizeseqcond that the sequence is bounded, there exists an integer such that

(B15) |

for all , where the second inequality follows from the first and the definition of , see (17d). Then

(B16a) | |||||

(B16b) |

for , where the inequality in (B16a) follows by combining the inequalities (B13b) and and (B16b) follows by recursively applying inequality (B16a) with replace by . Now, sum the inequalities (B16b) over and exchange the order of summation over and on the right-hand side:

Comments

There are no comments yet.