Most natural signals are well described by only a few significant coefficients in an appropriate transform domain, with the number of significant coefficients much smaller than the signal size. Therefore, for a vectorthat represents the signal and an appropriate sparsifying transform , is a signal transform-coefficient vector with most elements having negligible magnitudes. The idea behind compressed sensing [Candes2006] is to sense the significant components of using a small number of measurements. Define the noiseless measurement vector , where and . Most effort in compressed sensing has focused on the linear sparsifying transform and noiseless measurement models with
where and are known sparsifying dictionary and sensing matrices. Here, we consider signals that belong to a closed convex set in addition to their sparsity in the transform domain. The nonnegative signal scenario with
is of significant practical interest and applicable to X-ray CT, SPECT, PET, and MRI, where the pixel values correspond to inherently nonnegative density or concentration maps [PrinceLinks2015]. Harmany2012 consider such a nonnegative sparse signal model and develop in [Harmany2012] and [Harmany2010Gradient] a convex-relaxation SPIRAL and a linearly constrained gradient projection method for Poisson and Gaussian linear measurements, respectively. In addition to signal nonnegativity, other convex-set constraints have been considered in the literature, such as prescribed value in the Fourier domain; box, geometric, and total-energy constraints; and intersections of these sets [YoulaWebb1982, SezanStark1983].
|We adopt the analysis regularization framework and minimize|
|with respect to the signal , where is a convex differentiable data-fidelity (NLL) term, is a scalar tuning constant that quantifies the weight of the convex regularization term that imposes signal sparsity and the convex-set constraint:|
and is the indicator function. Common choices for the signal sparsifying transform are the linear map in (1a), isotropic gradient map
and their combinations; here, is the index set of neighbors of in an appropriate (e.g., 2D) arrangement. Summing (4) over leads to the isotropic TV penalty [gdasil15, Harmany2012, Beck2009TV]; in the 2D case, anisotropic TV penalty is slightly different and easy to accommodate as well. Assume
which ensures that is computable for all and closure ensures that points in but close to its open boundary, if there is any, will not be excluded upon projecting onto the closed set . If is not empty, then is not computable in it, which needs special attention; see Section III.
Define the proximal operator for a function scaled by :
References [DupeFadiliStarck2012, Raguet2013GFB, Condat2013Primal] view (3a) as a sum of three terms, , , and , and minimize it by splitting schemes, such as forward-backward, Douglas-Rachford, and primal-dual. A potential benefit of splitting schemes is that they apply proximal operations on individual summands rather than on their combination, which is useful if all individual proximal operators are easy to compute. However, [DupeFadiliStarck2012] requires the proximal operator of , which is difficult in general and needs an inner iteration. Both [DupeFadiliStarck2012] and GFB splitting [Raguet2013GFB] require inner iterations for solving (see (1a) and (6)) in the general case where the sparsifying matrix is not orthogonal. The elegant PDS method in [Condat2013Primal, Vu2013] does not require inner iterations. The convergence rate of both GFB and PDS methods can be upper-bounded by where is the number of iterations and the constant is determined by values of the tuning proximal and relaxation constants [Liang2014Conv, Davis2015ConvPDS].
In this paper, we develop a PNPG method whose momentum acceleration accommodates (increasing) adaptive step size selection (see also [gdasil15, gdasil14, NesterovTechReport]) and convex-set constraint on the signal . PNPG needs an inner iteration to compute the proximal operator with respect to , which implies inexact proximal operation. We account for this inexactness and establish convergence rate of the PNPG method as well as convergence of its iterates; the obtained convergence conditions motivate our selection of convergence criteria for proximal-mapping iterations. We modify the original Nesterov’s acceleration [Nesterov1983, Beck2009FISTA] so that we can establish these convergence results when the step size is adaptive and adjusts to the local curvature of the NLL. Thanks to the step-size adaptation, PNPG does not require Lipschitz continuity of the gradient of the NLL and applies to the Poisson compressed-sensing scenario described in Section II-A. Our integration of the adaptive step size and convex-set constraint extends the application of the Nesterov-type acceleration to more general measurement models than those used previously. Furthermore, a convex-set constraint can bring significant improvement to signal reconstructions compared with imposing signal sparsity only, as illustrated in Section LABEL:sec:linear1dex. See Section LABEL:sec:Okminustwoaccelerationapproaches for discussion of other acceleration approaches: Auslender-Teboulle (AT) [Auslender2006AT, Becker2011TFOCS] and BonettiniPortaRuggiero2015 [BonettiniPortaRuggiero2015]. Proximal Quasi-Newton type methods with problem-specific diagonal Hessian approximations have been considered in [BonettiniPortaRuggiero2015, BonettiniLorisPortaPrato2015]; [BonettiniLorisPortaPrato2015] applies step-size adaptation and accounts for inaccurate proximal operator, but does not employ acceleration or provide fast convergence-rate guarantees.
PNPG code is easy to maintain: for example, the proximal-mapping computation can be easily replaced as a module by the latest state-of-the-art solver. Furthermore, PNPG requires minimal application-independent tuning; indeed, we use the same set of tuning parameters in two different application examples. This is in contrast with the existing splitting methods, which require problem-dependent (NLL-dependent) design and tuning.
We introduce the notation: , ,
, denote the vectors of zeros and ones and identity matrix, respectively; “” is the elementwise version of “”. For a vector , define the projection and soft-thresholding operators:
|and the elementwise logarithm and exponential functions and . The projection onto and the proximal operator (6) for the -norm can be computed in closed form:|
Define the -subgradient [Rockafellar1970, Sec. 23]:
and an inexact proximal operator:
We say that is an approximation of with -precision [Villa2013Inexact], denoted
Note that (9a) implies
We introduce representative NLL functions in Section II, describe the proposed PNPG reconstruction algorithm in Section III, establish its convergence properties (Section LABEL:sec:convergence_analysis), present numerical examples (Section LABEL:sec:NumEx), and make concluding remarks (Section LABEL:sec:conclusion).
Ii Probabilistic Measurement Models
For numerical stability, we normalize the likelihood function so that the corresponding NLL
is lower-bounded by zero. For NLL that correspond to discrete GLM, this normalization corresponds to the generalized Kullback-Leibler divergence form of the NLL and is also closely related to the Bregman divergence[BanerjeeDhillon2005].
Ii-a Poisson Generalized Linear Model
GLM with Poisson observations are often adopted in astronomic, optical, hyperspectral, and tomographic imaging [Willett2014, PrinceLinks2015, StarckMurtagh2006, Snyder1993] and used to model event counts, e.g., numbers of particles hitting a detector. Assume that the measurements
are independent Poisson-distributed111 Here, we use the extended Poisson pmf for all by defining to accommodate the identity-link model. with means .
Upon ignoring constant terms and normalization, we obtain the generalized Kullback-Leibler divergence form [ZanniBertero2014] of the NLL
|The NLL is a convex function of the signal . Here, the relationship between the linear predictor and the expected value of the measurements is summarized by the link function [McCullagh1989]:|
Note that .
Two typical link functions in the Poisson GLM are log and identity, described in the following:
Ii-A1 Identity link
The identity link function with
is used for modeling the photon count in optical imaging [Snyder1993] and radiation activity in emission tomography [PrinceLinks2015, Ch. 9.2], as well as for astronomical image deconvolution [StarckMurtagh2006, Sec. 3.5.4]. Here, and are the known sensing matrix and intercept term, respectively; the intercept models background radiation and scattering determined, e.g., by calibration before the measurements have been collected. The nonnegative set in (2) satisfies (5), where we have used the fact that the elements of are nonnegative. If has zero components, is not empty and the NLL does not have a Lipschitz-continuous gradient.
Setting leads to the identity link without intercept used, e.g., in [Snyder1993, StarckMurtagh2006, Harmany2012].
Ii-A2 Log link
The log-link function
has been used to account for the exponential attenuation of particles (e.g., in tomographic imaging), where is the incident energy before attenuation. The intercept term is often assumed known [Lange2013, Sec. 8.10]. The Poisson GLM with log link function is referred to as the log-linear model in [McCullagh1989, Ch. 6], which treats known and unknown as the same model.
Log link with unknown intercept. For unknown , (11a) does not hold because the underlying NLL is a function of both and . Substituting (13) into the NLL function, concentrating it with respect to , and ignoring constant terms yields the following convex concentrated (profile) NLL:
Ii-B Linear Model with Gaussian Noise
Linear measurement model (1b) with zero-mean AWGN leads to the following scaled NLL:
where is the measurement vector and constant terms (not functions of ) have been ignored. This NLL belongs to the Gaussian GLM with identity link without intercept: . Here, , any closed convex satisfies (5), and the set is empty.
Minimization of the objective function (3a) with penalty (3b) and Gaussian NLL (15) can be thought of as an analysis BPDN problem with a convex signal constraint; see also [NesterovTechReport, gdasil14] which use the nonnegative in (2). A synthesis BPDN problem with a convex signal constraint was considered in [YamagishiYamada2009].
Iii Reconstruction Algorithm
We propose a PNPG approach for minimizing (3a) that combines convex-set projection with Nesterov acceleration [Nesterov1983, Beck2009FISTA] and applies adaptive step size to adapt to the local curvature of the NLL and restart to ensure monotonicity of the resulting iteration. The pseudo code in Algorithm 1 summarizes our PNPG method.
Define the quadratic approximation of the NLL :
with chosen so that (16) majorizes in the neighborhood of . Iteration of the PNPG method proceeds as follows:
The following result from [BertsekasOzdaglarNedic2003, Proposition 2.2.1] states that the distance between and can be reduced by projecting them onto a closed convex set .
Lemma 2 (Projection theorem)
The projection mapping onto a nonempty closed convex set is nonexpansive
for all .
Define sequences and , multiply them with (LABEL:eq:stari) and (LABEL:eq:im1i), respectively, add the resulting expressions, and multiply by to obtain
We arranged (A3) using completion of squares so that the first two summands are similar (but with opposite signs), with goal to facilitate cancellations as we sum over . Since we have control over the sequences and , we impose the following conditions for :
Now, due to , the inequality (A7c) leads to
with simple upper bound on the right-hand side, thanks to summand cancellations facilitated by the assumptions (A5).
As long as grows at a rate of and the inaccuracy of the proximal mappings leads to bounded , the centered objective function can achieve the desired bound decrease rate of . Now, we discuss how to satisfy (A5) and the growth rate of by an appropriate selection of .
A-I Satisfying Conditions (A5)
A-Ia Imposing equality in (A5a)
(A5a) holds with equality for all and any when we choose that satisfy
Although satisfies (A5a), it is not guaranteed to be within ; consequently, the proximal-mapping step for this selection may not be computable.
A-Ib Selecting that satisfies (A5a)
We now seek within that satisfies the inequality (A5a). Since and are in , by the convexity of ; see (A4c). According to Lemma 2, projecting (A11) onto preserves or reduces the distance between points. Therefore,
belongs to and satisfies the condition (A5a):
where (A14c) is obtained by discarding the negative term and the zero term (because ) on the left-hand side of (A7c). Now, (LABEL:eq:DueTo0Theta) follows from (A8) by using (see (17)), (A10), and (A14b) with .
A-Ic Satisfying (A5b)
A-Ii Connection to Convergence-Rate Analysis of Fista
If the step-size sequence is non-increasing (e.g., in the backtracking-only scenario with ), (17) with also satisfies the inequality (LABEL:eq:solvedTheta). In this case, (LABEL:eq:DueTo0Theta) still holds but (LABEL:eq:upperboundonDeltawithbetaonly) does not because (LABEL:eq:thetaGrow) no longer holds. However, because , we have and
which generalizes [Beck2009FISTA, Th. 4.4] to include the inexactness of the proximal operator and the convex-set projection.
Appendix B Convergence of Iterates
To prove convergence of iterates, we need to show that the centered objective function decreases faster than the right-hand side of (LABEL:eq:upperboundonDeltawithbetaonly). We introduce Lemmas 3 and 4 and then use them to prove Theorem LABEL:thm:convItr. Throughout this Appendix, we assume that Assumption LABEL:th1cond of Theorem LABEL:thm:convItr holds, which justifies (LABEL:eq:twoineq) and (LABEL:eq:thetaCond0) as well as results from Appendix 17 that we use in the proofs.
Under Assumptions LABEL:th1cond–LABEL:convitergammabcond of Theorem LABEL:thm:convItr,
By letting in (A14c) and using (LABEL:eq:Econverges), we obtain
then multiply both sides of (B4) by , sum over and reorganize:
|where (see (A14a))|
where we obtain the inequality (B7a) by combining the terms on the right-hand size and using (LABEL:eq:grb1) and (B7b) holds because is an increasing sequence (see Section LABEL:sec:convergence_analysis). Now,
where (B8a) follows by using (17d), (LABEL:eq:thetaCond) with , and fraction-term cancellation; (B8b) is obtained by substituting (B7b) into (B8a) and canceling summation terms. (B8b) implies (B6) by using (LABEL:eq:grb1) with .
Since converges to as the iteration index grows and is a minimizer, it is sufficient to prove the convergence of , see [Chambolle2015Convergence, Th. 4.1].
Use (LABEL:eq:stari) and the fact that to get
where (B13b) is due to (see (LABEL:eq:xi)) and the following
where we have used (17d) and that is an increasing sequence, (see Section LABEL:sec:stepsize), and (LABEL:eq:xi).
According to (LABEL:eq:thetaGrow) and Assumption LABEL:stepsizeseqcond that the sequence is bounded, there exists an integer such that
for all , where the second inequality follows from the first and the definition of , see (17d). Then
for , where the inequality in (B16a) follows by combining the inequalities (B13b) and and (B16b) follows by recursively applying inequality (B16a) with replace by . Now, sum the inequalities (B16b) over and exchange the order of summation over and on the right-hand side: