Plug and play methods for magnetic resonance imaging

Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. But, compared to other clinical imaging modalities (e.g., CT or ultrasound), the data acquisition process for MRI is inherently slow. Furthermore, dynamic applications demand collecting a series of images in quick succession. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades. The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications. More recently, deep learning techniques have been shown to outperform CS methods. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) to map from the measured Fourier samples and the final image. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. An alternative is to use "plug-and-play" (PnP) algorithms, which iterate image denoising with forward-model based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, with PnP, state-of-the-art image-denoising techniques, such as those based on DNNs, can be directly exploited for compressive MRI image reconstruction. The objective of this article is two-fold: i) to review recent advances in plug-and-play methods, and ii) to discuss their application to compressive MRI image reconstruction.


Machine Learning in Magnetic Resonance Imaging: Image Reconstruction

Magnetic Resonance Imaging (MRI) plays a vital role in diagnosis, manage...

Transform Learning for Magnetic Resonance Image Reconstruction: From Model-based Learning to Building Neural Networks

Magnetic resonance imaging (MRI) is widely used in clinical practice for...

Comparison of Algorithms for Compressed Sensing of Magnetic Resonance Images

Magnetic resonance imaging (MRI) is an essential medical tool with inher...

A review of deep learning methods for MRI reconstruction

Following the success of deep learning in a wide range of applications, ...

Clinically Deployed Distributed Magnetic Resonance Imaging Reconstruction: Application to Pediatric Knee Imaging

Magnetic resonance imaging is capable of producing volumetric images wit...

Learning-Based Compressive MRI

In the area of magnetic resonance imaging (MRI), an extensive range of n...

Denoising Generalized Expectation-Consistent Approximation for MRI Image Recovery

To solve inverse problems, plug-and-play (PnP) methods have been develop...

I Introduction

Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. Compared to other clinical imaging modalities (e.g., CT or ultrasound), however, the data acquisition process for MRI is inherently slow. A typical clinical MRI exam consists of multiple scans and can take more than an hour to complete. For each scan, the patient may be asked to stay still for several minutes, with slight motion potentially resulting in image artifacts. Furthermore, dynamic applications demand collecting a series of images in quick succession. Due to the limited time window in many dynamic applications (e.g., contrast enhanced MR angiography), it is not always feasible to collect fully sampled datasets. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades.

The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications [1], including dynamic applications, and has been included in the default image processing framework offered by several major MRI vendors. More recently, deep learning techniques have been shown to outperform CS methods [2]. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) that learns the end-to-end mapping between the measured Fourier samples and the final image [3]

. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. Other deep learning techniques explicitly account for the forward model and use convolutional neural network (CNN) based regularization. Typically, these techniques alternate between gradient descent of the data fidelity term and the CNN-based regularization 

[4, 5]

. Integrating learning-based methods into physical inverse problems remains an open challenge, since the conventional formulation via maximum-a-posteriori (MAP) estimation is not extendable to non-optimization based methods, e.g., DNN.

An alternative is to use “plug-and-play” (PnP) algorithms [6], which integrate image denoising with forward-model based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, state-of-the-art image-denoising techniques, such as those based on DNNs (e.g., [7, 8]), can be directly exploited for compressive MRI image reconstruction through the PnP methodology. The objective of this article is thus two-fold: i) to review recent advances in plug-and-play methods, and ii) to discuss their application to compressive MRI image reconstruction.

The remainder of the paper is organized as follows. We first detail the inverse problem encountered in MRI reconstruction. We then review several PnP methods, where the unifying commonality is to iteratively call a denoising subroutine as one step of a larger optimization-inspired algorithm. Next, we describe how the result of the PnP method can be interpreted as a solution to an equilibrium equation, allowing convergence analysis from the equilibrium perspective. Finally, we present illustrative examples of PnP methods applied to MRI image recovery.

Ii Inverse problems in MRI

In this section, we describe typical inverse problems in MRI. Briefly, the measurements are samples of the Fourier transform of the image, where the Fourier domain is often referred to as “k-space.” The transform can be taken across two or three spatial dimensions and includes an additional temporal dimension in dynamic applications. Furthermore, measurements are often collected in parallel from multiple receiver coils. In dynamic parallel MRI with Cartesian sampling, the time-

k-space measurements from the coil take the form



is the vectorized 2D or 3D image at discrete time

, is a diagonal matrix containing the sensitivity map for the coil, is the 2D or 3D discrete Fourier transform (DFT), the sampling matrix contains rows of the identity matrix, and is additive white Gaussian noise (AWGN). Note that the sampling pattern may change across frames . The MRI literature often refers to as the “acceleration rate.”

MRI measurements are acquired using a sequence of measurement trajectories through k-space. These trajectories can be Cartesian or non-Cartesian in nature. Cartesian trajectories are essentially lines through k-space. In the Cartesian case, one k-space dimension (i.e., the frequency encoding) is fully sampled, while the other one or two dimensions (i.e., the phase encodings) are undersampled to reduce acquisition time. Typically, one line, or “readout,” is collected after every RF pulse, and the process is repeated several times to collect adequate samples of k-space. Non-Cartesian trajectories include radial or spiral curves, which have the effect of distributing the undersampling among all dimensions of k-space. Compared to Cartesian sampling, non-Cartesian sampling provides more efficient coverage of k-space and yields an “incoherent” forward operator that is more conducive to compressed-sensing reconstruction [9]. But Cartesian sampling remains the method of choice in clinical practice, due to its higher tolerance to system imperfections and an extensive record of success.

Since the sensitivity map, , is patient-specific and varies with the location of the coil with respect to the imaging plane, both and are unknown in practice. Although calibration-free methods have been proposed to estimate (e.g., [10]) or to jointly estimate and (e.g., [11]), it is more common to first estimate through a calibration procedure and then treat as known in (1). Stacking , , and into vectors , , and , and packing into a known block-diagonal matrix , we obtain the linear inverse problem of recovering from


Iii Signal Recovery and Denoising

The maximum likelihood (ML) estimate of from in (2) is , where

, the probability density of

conditioned on , is known as the “likelihood function.” The ML estimate is often written in the equivalent form . In the case of

-variance AWGN

, we have that , and so , which can be recognized as least-squares estimation. Although least-squares estimation can give reasonable performance when is tall and well conditioned, this is rarely the case under moderate to high acceleration (i.e., ). With acceleration, it is critically important to exploit prior knowledge of signal structure.

The traditional approach to exploiting such prior knowledge is to formulate and solve an optimization problem of the form


where the regularization term encodes prior knowledge of . In fact, in (3) can be recognized as the maximum a posteriori (MAP) estimate of under the prior density model . To see why, recall that the MAP estimate maximizes the posterior distribution . That is, . Since Bayes’ rule implies that , we have


Recalling that the first term in (3) was shown to be (plus a constant) under AWGN noise, the second term in (3) must obey . We will find this MAP interpretation useful in the sequel.

It is not easy to design good regularizers for use in (3). They must not only mimic the negative log signal-prior, but also enable tractable optimization. One common approach is to use convex regularizations of the form , where are transforms, are non-negative regularization weights, and the norm rewards sparsity in the transform outputs (see, e.g., [12]). Due to the richness of data structure in MRI, especially for dynamic applications, utilizing multiple () linear sparsifying transforms has been shown to improve recovery quality [12], but tuning multiple regularization weights, , is a non-trivial problem [13].

Particular insight comes from considering the special case of , where in (2) reduces to an AWGN-corrupted version of the image , i.e.,


The problem of recovering from noisy , known as “denoising,” has been intensely researched for decades. While it is possible to perform denoising by solving a regularized optimization problem of the form (3) with , today’s state-of-the-art approaches are either algorithmic (e.g., [14]) or DNN-based (e.g., [7, 8]). This begs an important question: can these state-of-the-art denoisers be leveraged for MRI signal reconstruction, by exploiting the connections between the denoising problem and (3)? As we shall see, this is precisely what the PnP methods do.

Iv Plug-and-Play Methods

In this section, we review several approaches to PnP signal reconstruction. What these approaches have in common is that they recover from measurements of the form (2) by iteratively calling a sophisticated denoiser within a larger optimization or inference algorithm.

Iv-a Prox-based PnP

To start, let us imagine how the optimization in (3) might be solved. Through what is known as “variable splitting,” we could introduce a new variable, , to decouple the regularizer from the data fidelity term . The variables and could then be equated using an external constraint, leading to the constrained minimization problem


Equation (6) suggests an algorithmic solution that alternates between separately estimating and estimating , with an additional mechanism to asymptotically enforce the constraint .

The original PnP method [6] is based on the alternating direction method of multipliers (ADMM) [15]. For ADMM, (6) is first reformulated as the “augmented Lagrangian”


where are Lagrange multipliers and is a design parameter that affects the convergence speed of the algorithm, but not the final solution. With , (7) can be rewritten as


ADMM solves (8) by alternating the optimization of and with gradient ascent of , i.e.,


where and , known as “proximal maps,” are defined as


Under some weak technical constraints, it can be proven [15] that when is convex, the ADMM iteration (9) converges to , the global minimum of (3) and (6).

From the discussion in Section III, we immediately recognize in (12) as the MAP denoiser of under AWGN variance and signal prior . The key idea behind the original PnP work [6] was, in the ADMM recursion (9), to “plug in” a powerful image denoising algorithm like BM3D [14] in place of the proximal denoiser from (12). If the plug-in denoiser is denoted by “,” then the PnP ADMM algorithm becomes


A wide variety of empirical results (see, e.g., [6, 16, 17][18]) have demonstrated that, when is a powerful denoising algorithm like BM3D, the PnP algorithm (13) produces far better recoveries than the regularization-based approach (9).

The success of PnP methods raises important theoretical questions. Since is not in general the proximal map of any regularizer , the iterations (13) may not minimize a cost function of the form in (3), and (13) may not be an implementation of ADMM. It is then unclear if the iterations (13) will converge. And if they do converge, it is unclear what they converge to. The consensus equilibrium framework, which we discuss in Section V, provides answers to these questions.

The use of a generic denoiser in place of a proximal denoiser can be translated to non-ADMM algorithms, such as FISTA [19], primal-dual algorithms [20], and others, as in [21, 22, 23]. Instead of optimizing as in (13), PnP FISTA [21] uses the iterative update


where (14a) is a gradient descent (GD) step on the negative log-likelihood at with step-size , (14b) is the plug-in replacement of the usual proximal denoising step in FISTA, and (14c) is an acceleration step, where it is typical to use and .

Comparing PnP FISTA (14) to PnP ADMM (13), one can see that they differ in how the data fidelity term is handled: PnP ADMM uses the proximal update (11), while PnP FISTA uses the GD step (14a). If the proximal update is solved exactly, then PnP ADMM usually requires many fewer iterations than PnP FISTA. But exactly solving the proximal update (11) is much more computationally costly than taking a GD step (14a). Another option is to approximate the proximal update (11) using, e.g., several iterations of GD, which will simplify (11) but increase the number of ADMM iterations. In the end, the most efficient approach will depend on the relative complexity of the denoiser and the (exact or approximated) proximal update (11).

Iv-B The balanced FISTA approach

Consider a denoiser that executes the following three steps: i) transform to the wavelet domain,111Although we focus on non-decimated wavelet , the results of this section hold for any tight frame . ii) soft-threshold, and iii) transform back to the original domain, i.e.,


and denotes the soft-threshold level. Such “wavelet thresholding” denoisers work particularly well when the wavelet transform is non-decimated [24], in which case is tall. In the sequel, we assume that . One advantage of is its computational simplicity: it uses little more than two fast (wavelet) transforms. The computational advantages of are especially evident in the case of 3D signals, like those that occur in cardiac MRI, in comparison to algorithmic approaches like BM4D [25]. Note that, in the 3D case, different threshold levels should be used in the spatial and temporal dimensions for best performance.

Note that, for non-orthogonal , the denoiser in (15) is not the proximal map of any , and so it does not solve a convex optimization problem of the form (3). That said, could be plugged into any of the prox-based PnP algorithms discussed previously. Interestingly, if is used in PnP FISTA, then it does solve a convex optimization problem, although one with a different form than (3). This approach was first proposed in [26, 27] under the name “balanced FISTA” (bFISTA) and applied to parallel cardiac MRI. Notably, bFISTA was proposed before the advent of PnP FISTA. More details are provided below.

In Section III, when discussing the optimization problem (3), the regularizer was mentioned as a popular option. The resulting optimization problem,


is said to be stated in “analysis” form [28, 29]. Note that the proximal denoiser associated with (16) has the form , which is computationally intensive to solve for non-orthogonal . The problem (16) can also be stated in “synthesis” form as


where are wavelet coefficients. Equivalently, (17) can be expressed as the unconstrained problem


for projection matrix as .

In practice, it is not possible to take and, for finite values of , the problems (17) and (18) are not equivalent. However, problem (18) under finite is interesting to consider in its own right, and it is sometimes referred to as the “balanced” approach [30]. If we solve (18) using FISTA with step-size (recall (14a)) and choose the particular value then, remarkably, the resulting algorithm takes the form of PnP FISTA (14) with . Note that this particular choice of is motivated by computational efficiency (since it leads to the use of ) rather than recovery performance. Still, as we demonstrate in Section VI, it performs very well.

Iv-C Regularization by denoising

Another PnP approach, proposed by Romano, Elad, and Milanfar in [31], recovers from measurements in (2) by finding the that solves the optimality condition222We begin our discussion of RED by focusing on the real-valued case, but later we state the RED algorithms for the complex-valued case of interest in MRI.


where is an arbitrary (i.e., “plug in”) image denoiser and is a tuning parameter. In [31], several algorithms were proposed to solve (19). Numerical experiments in [31] suggest that, when is a sophisticated denoiser (like BM3D) and is well tuned, the solutions to (19) are state-of-the-art, similar to those of PnP ADMM.

The approach (19) was coined “regularization by denoising” (RED) in [31] because, under certain conditions, the that solve (19) are the solutions to the regularized least-squares problem


where the regularizer is explicitly constructed from the plug-in denoiser . But what are these conditions? Assuming that is differentiable almost everywhere, it was shown in [32] that the solutions of (19) correspond to those of (20) when i) is locally homogeneous333Locally homogeneous means that for all and sufficiently small nonzero . and ii) has a symmetric Jacobian matrix (i.e., ). But it was demonstrated in [32] that these properties are not satisfied by popular image denoisers, such as the median filter, wavelet-domain thresholding, NLM, BM3D, TNRD, and DnCNN. Furthermore, it was proven in [32] that if the Jacobian of is non-symmetric, then there does not exist any regularizer under which the solutions of (19) minimize a regularized loss of the form in (3).

One may then wonder how to justify (19). In [32], Reehorst and Schniter proposed an explanation for (19) based on “score matching” [33], which we now summarize. Suppose we are given a large corpus of training images , from which we could build the empirical prior model

where denotes the Dirac delta. Since images are known to exist outside , it is possible to build an improved prior model

using kernel density estimation (KDE), i.e.,


where is a tuning parameter. If we adopt as the prior model for , then the MAP estimate of (recall (4)) becomes


Because is differentiable, the solutions to (22) must obey


A classical result known as “Tweedie’s formula” [34, 35] says that


where is the minimum mean-squared error (MMSE) denoiser under the prior and -variance AWGN. That is, , where and . Applying (24) to (23), the MAP estimate under the KDE prior obeys


which matches the RED condition (19) when . Thus, if we could implement the MMSE denoiser for a given training corpus , then RED provides a way to compute the MAP estimate of under the KDE prior .

Although the MMSE denoiser can be expressed in closed form (see [32, (67)]), it is not practical to implement for large . Thus the question remains: Can the RED approach (19) also be justified for non-MMSE denoisers , especially those that are not locally homogeneous or Jacobian-symmetric? As shown in [32], the answer is yes. Consider a practical denoiser parameterized by tunable weights (e.g., a DNN). A typical strategy is to choose to minimize the mean-squared error on , i.e., set , where the expectation is taken over and . By the MMSE orthogonality principle, we have


and so we can write


where (28) follows by from (24). Equation (28) says that choosing to minimize the MSE is equivalent to choosing so that best matches the “score” . This is an instance of the “score matching” framework, as described in [33]. In summary, the RED approach (19) approximates the KDE-MAP approach (23) by using a plug-in denoiser to approximate the MMSE denoiser . When , RED exactly implements MAP-KDE, but with a practical , RED implements a score-matching approximation of MAP-KDE. Thus, a more appropriate title for RED might be “score matching by denoising.”

Comparing the RED approach from this section to the prox-based PnP approach from Section IV-A, we see that RED starts with the KDE-based MAP estimation problem (22) and replaces the -based MMSE denoiser with a plug-in denoiser , while PnP ADMM starts with the -based MAP estimation problem (3) and replaces the -based MAP denoiser from (12) with a plug-in denoiser . It has been recently demonstrated that, in prox-based PnP, the use of MAP denoisers leads to higher performance than MMSE denoisers [36, 37, 38, 39], which is not surprising in light of the discussion above. Thus it is interesting that RED offers an approach to MAP-based recovery using MMSE denoising, which is much easier to implement than MAP denoising [36, 37, 38, 39].

Further insight into the difference between RED and prox-based PnP can be obtained by considering the case of symmetric linear denoisers, i.e., with , where we will also assume that is invertible. Although such denoisers are far from state-of-the-art, they are useful for interpretation. It is easy to show [40] that is the proximal map of , i.e., that , recalling (12). With this proximal denoiser, we know that the prox-based PnP algorithm solves the optimization problem


Meanwhile, since is both locally homogeneous and Jacobian-symmetric, we know from (20) that the RED under this solves the optimization problem


By comparing (29) and (30), we see a clear difference between RED and prox-based PnP. Section V-B compares RED to prox-based PnP from yet another perspectice: consensus equilibrium.

So far, we have described RED as solving for in (19). But how exactly is this accomplished? In the original RED paper [31], three algorithms were proposed to solve (19

): GD, inexact ADMM, and a “fixed point” heuristic that was later recognized

[32] as a special case of the proximal gradient (PG) algorithm [41, 42]. Generalizations of PG RED were proposed in [32]. The fastest among them is the accelerated-PG RED algorithm, which uses the iterative update444The RED equations (31) and (32) may be used with complex-valued quantities.


where was defined in (11), line (31b) uses the same acceleration as PnP FISTA (14b), and is a design parameter. When and , (31) reduces to the “fixed point” heuristic from [31]. If was too complex to implement, one could replace (31a) with the GD update


to achieve a similar complexity reduction as when going from PnP ADMM to PnP FISTA (as discussed in Section IV-A). The result would be an “accelerated GD” [43] form of RED. Convergence of the RED algorithms will be discussed in Section V-B.

Iv-D Amp

The approximate message passing (AMP) algorithm [44] estimates the signal using the iteration555Equation (33) states the AMP algorithm for real-valued quantities. To implement it for complex-valued quantities, one would replace with in (33b) [45].


starting with and some . Note that in (33), and refer to the dimensions of and the denoiser may change with iteration . In practice, the trace of the Jacobian in (33a) can be approximated using [46]


using small and random with i.i.d.  entries. The original AMP paper [44] considered only proximal denoising (i.e., for some ), but the PnP version of AMP shown in (33) was proposed in [47][48] under the name “denoising AMP.”

The rightmost term in (33a) is known as the “Onsager correction term,” due to connections with statistical physics [49]. If we remove this term from (33a), the resulting recursion is no more than unaccelerated FISTA (also known as ISTA [50]), i.e., (14) with and a particular stepsize . The Onsager term is what gives AMP the remarkable properties discussed below.

When is i.i.d. Gaussian and the denoiser is Lipschitz, the PnP AMP iteration (33) has remarkable properties in the large-system limit (i.e., with fixed ), as proven in [51]. First,


That is, the denoiser input behaves like an AWGN corrupted version of the true image . Moreover, the mean-squared of AMP’s estimate is exactly predicted by the “state evolution”


Furthermore, if is the MMSE denoiser (i.e., ) and the state-evolution (36) has a unique fixed-point, then converges to the MMSE: the holy grail of signal recovery. Or, if is the proximal denoiser and the state-evolution (36) has a unique fixed-point, then AMP’s converges to the MAP solution, even if is non-convex.

The remarkable properties of AMP are only guaranteed to manifest with large i.i.d. Gaussian . For this reason, a “vector AMP” (VAMP) algorithm was proposed [52] with a rigorous state-evolution and denoiser-input property (35) that hold under the larger class of right-rotationally invariant (RRI)666Matrix

is said to be RRI if its singular-value decomposition

has Haar , i.e., distributed uniformly over the group of orthogonal matrices for any and . Note that i.i.d. Gaussian is a special case of RRI where is also Haar and the singular values in have a particular distribution. matrices . Later, a PnP version of VAMP was proposed [53] and rigorously analyzed under Lipschitz in [54]. Similar to how AMP can be understood as ISTA with Onsager correction, VAMP can be understood as the symmetric variant [55] of ADMM with Onsager correction [56]. Although PnP versions of AMP and VAMP have been shown to outperform PnP ADMM in computed tomography applications [57][58], more work is needed to handle the structured operators (recall Section II) that manifest in MRI.

V Understanding PnP through Consensus Equilibrium

The success of the PnP methods in Section IV raises important theoretical questions. For example, in the case of PnP ADMM, if the plug-in denoiser is not the proximal map of any regularizer , then it is not clear what cost function is being minimized or whether the algorithm will even converge. Similarly, in the case of RED, if the plug-in denoiser is not the MMSE denoiser , then RED no longer solves the MAP-KDE problem, and it is not clear what RED does solve, or whether a given RED algorithm will even converge. In this section, we show that many of these questions can be answered through the consensus equilibrium (CE) framework [59, 40, 60, 23, 32]. We start by discussing CE for the PnP approaches from Section IV-A and follow with a discussion of CE for the RED approaches from Section IV-C.

V-a Consensus equilibrium for prox-based PnP

Let us start by considering the PnP ADMM algorithm (13). Rather than viewing (13) as minimizing some cost function, we can view it as seeking a solution to


which, by inspection, must hold when (13) is at a fixed point. Not surprisingly, by setting in the PnP FISTA algorithm (14), it is straightforward to show that it too seeks a solution to (37). A similar procedure can be applied to the primal-dual PnP algorithm from [22]. With (37), the goal of the prox-based PnP algorithms becomes well defined! The pair (37) reaches a “consensus” in that the denoiser and the data fitting operator agree on the output . The “equilibrium” comes from the opposing signs for the noise-like term : the data fitting operator removes it, while the denoiser adds it back.

By viewing the goal of prox-based PnP as solving the fixed-point problem (37), it becomes clear that other solvers beyond ADMM, FISTA, and primal-dual can be used. For example, it was shown in [59] that the PnP CE condition (37) can be achieved by finding a fixed-point of the system


where777The paper [59] actually considers the consensus equilibrium among agents, whereas here we consider the simple case of agents.


There exist many algorithms to solve (38). For example, one could use the Mann iteration [61]


when is nonexpansive. The paper [59] also shows that this fixed point is equivalent to the solution of , in which case Newton’s method or other root-finding methods could be applied.

The CE viewpoint also provides a path to proving the convergence of the PnP ADMM algorithm. Sreehari et al. [16] used a classical result from convex analysis to show that a sufficient condition for convergence is that i) is non-expansive, i.e., for any and , and ii) is a sub-gradient of some convex function, i.e., there exists such that . If these two conditions are met, then PnP ADMM (13) will converge to a global solution. Similar observations were made in other recent studies, e.g., [40][62]. That said, Chan et al. [17] showed that many practical denoisers do not satisfy these conditions, and so they designed a variant of PnP ADMM in which is decreased at every iteration. Under appropriate conditions on and the rate of decrease, this latter method also guarantees convergence, although not exactly to a fixed point of (37) since is no longer fixed.

Similar techniques can be used to prove the convergence of other prox-based PnP algorithms. For example, under certain technical conditions, including non-expansiveness of , it was established [23] that PnP FISTA converges to the same fixed-point as PnP ADMM.

V-B Consensus equilibrium for RED

Just as the prox-based PnP algorithms in Section IV-A can be viewed as seeking the consensus equilibrium of (37), it was shown in [32] that the proximal-gradient and ADMM-based RED algorithms seek the consensus equilibrium of


where was defined in (11) and is the algorithmic parameter that appears in (31). 888The parameter also manifests in ADMM RED, as discussed in [32]. Since (41) is stated in the same form as (37), a direct comparison between the CE conditions of RED and prox-based PnP can be made.

Perhaps a more intuitive way to compare the CE conditions of RED and prox-based PnP follows from rewriting (41b) as , after which the RED CE condition becomes


which involves no inverse operations. In the typical case of , we see that (42) matches (37), except that the “noise” is added after denoising in (42b) and before denoising in (37b).

Yet another way to compare the CE conditions of RED and prox-based PnP is to eliminate the variable. Solving (42b) for and plugging the result back into (42a) gives the fixed-point condition


Applying the same procedure to (37) yields the fixed-point condition


or, equivalently, the fixed-point condition stated earlier in (38).

The CE framework also facilitates the convergence analysis of RED algorithms. For example, using the Mann iteration [61], it was proven in [32] that when is nonexpansive and , the PG RED algorithm converges to a fixed point.

Vi Demonstration of PnP in MRI

In this final section, we demonstrate the application of PnP methods to parallel cardiac MRI. In particular, we demonstrate the recovery of a cardiac cine dataset collected using 1.5 T scanner (Magnetom Avanto, Siemens Healthcare, Erlangen, Germany) and an 18-channel body array coil. A fully sampled dataset was collected using breath-held, segmented acquisition and was retrospectively downsampled at using variable density pseudo-random Cartesian sampling [63]. Other imaging parameters included: 10 mm slice thickness, flip angle, field-of-view, matrix size, 36.5 ms temporal resolution, and a total of 25 frames. The 18 channels were compressed to virtual channels (coils) for faster computation [64]. The sensitivity maps, were estimated from the fully sampled time-averaged data using ESPIRiT [65]. The measurements were modeled as described in equation (1).

Note that, in this demonstration, the space-time signal vector in (2) has dimension and the measurement vector in (2) has dimension . As such, it is computationally impractical to compute the matrix inverse in (11), or even store the matrix inverse in memory. The forward and adjoint operators and , however, have fast fast Fourier transform (FFT) implementations. For this reason, we focus on PnP algorithms that handle the data-fidelity term using GD, such as PnP FISTA (recall (14a)), bFISTA, and accelerated GD RED (recall (32)).

Because, in this example, the signal is a cine (i.e., a video) rather than a still image, there are relatively few options available for sophisticated denoisers. Although algorithmic denoisers like BM4D [25] have been proposed, they tend to run very slowly at these dimensions, especially when compared to the implementations of the linear operators and . For this reason, we consider the wavelet-thresholding denoiser from (15), which requires only computing a wavelet transform and its inverse. For the wavelet transform, we used a single-level 3D non-decimated Haar transform. Note that, when PnP FISTA is used with , it coincides with the earlier bFISTA technique, as discussed in Section IV-B. We also consider a denoiser that performed adaptive Wiener filtering on overlapping spatiotemporal patches in each of the eight subbands of our non-decimated wavelet transform [66].

Figures 1-2 show the results of recovering an MRI cine. Figure 1 shows normalized MSE (NMSE) versus run time for PnP FSTA and accelerated GD RED. For this experiment, we implemented the algorithms using a 4-core Intel Xeon W-2123 processor with an Nvidia GeForce GTX TITAN Z GPU (for the wavelet transform) and we calculated NMSE with respect to the fully-sampled least-squares solution. Figure 2(b) shows one frame from the image sequence recovered using PnP FISTA with the wavelet-thresholding denoiser (i.e., bFISTA). A comparison to the fully-sampled least-squares solution in Figure 2(a) shows excellent reconstruction quality. Figure 2(c) shows the recovery error for that frame, confirming the excellent reconstruction quality.

Fig. 1: NMSE versus runtime for PnP FISTA and accelerated GD RED, using denoisers that performed soft-thresholding (ST) or blockwise Wiener filtering (W) of the non-decimated wavelet coefficients, when recovering a 25-frame cardiac cine sequence using 12 coils and acceleration .
(a) (b)
Fig. 2: Results from the recovery of a 25-frame cardiac cine sequence using 12 coils and acceleration . (a): A frame from the fully-sampled least-squares solution. (b): A frame from the image sequence recovered using PnP FISTA with ST. (c): the error of the recovery in b).


  • [1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance Med., vol. 58, no. 6, pp. 1182–1195, 2007.
  • [2] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
  • [3] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domain-transform manifold learning,” Nature, vol. 555, no. 7697, p. 487, 2018.
  • [4] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance Med., vol. 79, no. 6, pp. 3055–3071, 2018.
  • [5] H. K. Aggarwal, M. P. Mani, and M. Jacob, “Model based image reconstruction using deep learned priors (MODL),” in Proc. IEEE Int. Symp. Biomed. Imag., 2018, pp. 671–674.
  • [6] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Info. Process., 2013, pp. 945–948.
  • [7] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1256–1272, 2017.
  • [8] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, 2017.
  • [9] B. Adcock, A. Bastounis, and A. C. Hansen, “From global to local: Getting more from compressed sensing,” SIAM News, October 2017.
  • [10] P. J. Shin, P. E. Larson, M. A. Ohliger, M. Elad, J. M. Pauly, D. B. Vigneron, and M. Lustig, “Calibrationless parallel imaging reconstruction based on structured low-rank matrix completion,” Magnetic Resonance Med., vol. 72, no. 4, pp. 959–970, 2014.
  • [11] L. Ying and J. Sheng, “Joint image reconstruction and sensitivity estimation in SENSE (JSENSE),” Magnetic Resonance Med., vol. 57, no. 6, pp. 1196–1202, 2007.
  • [12] C. Bilen, I. W. Selesnick, Y. Wang, R. Otazo, D. Kim, L. Axel, and D. K. Sodickson, “On compressed sensing in parallel MRI of cardiac perfusion using temporal wavelet and TV regularization,” in Proc. IEEE Int. Conf. Acoust. Speech & Signal Process., 2010, pp. 630–633.
  • [13] R. Ahmad and P. Schniter, “Iteratively reweighted approaches to sparse composite regularization,” IEEE Trans. Comp. Imag., vol. 10, no. 2, pp. 220–235, Dec. 2015.
  • [14] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007.
  • [15] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
  • [16]

    S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,”

    IEEE Trans. Comp. Imag., vol. 2, pp. 408–423, 2016.
  • [17] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Trans. Comp. Imag., vol. 3, no. 1, pp. 84–98, 2017.
  • [18] S. V. Venkatakrishnan, “Code for plug-and-play-priors,”, accessed: 2019-03-15.
  • [19] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, 2009.
  • [20] E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science,” SIAM J. Imag. Sci., vol. 3, no. 4, pp. 1015–1046, 2010.
  • [21] U. Kamilov, H. Mansour, and B. Wohlberg, “A plug-and-play priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Process. Lett., vol. 24, no. 12, pp. 1872–1876, May 2017.
  • [22] S. Ono, “Primal-dual plug-and-play image restoration,” IEEE Signal Process. Lett., vol. 24, no. 8, pp. 1108–1112, 2017.
  • [23] Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” IEEE Trans. Comp. Imag., to appear 2019 (see also arXiv:1809.04693).
  • [24] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed.   San Diego, CA: Academic Press, 2008.
  • [25] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, “Nonlocal transform-domain filter for volumetric data denoising and reconstruction,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 119–133, 2013.
  • [26] S. T. Ting, R. Ahmad, N. Jin, J. Craft, J. Serafim da Silverira, H. Xue, and O. P. Simonetti, “Fast implementation for compressive recovery of highly accelerated cardiac cine MRI using the balanced sparse model,” Magnetic Resonance Med., vol. 77, no. 4, pp. 1505–1515, Apr. 2017.
  • [27] Y. Liu, Z. Zhan, J.-F. Cai, D. Guo, Z. Chen, and X. Qu, “Projected iterative soft-thresholding algorithm for tight frames in compressed sensing magnetic resonance imaging,” IEEE Trans. Med. Imag., vol. 35, no. 9, pp. 2130–2140, 2016.
  • [28] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, pp. 947–968, 2007.
  • [29] E. J. Candès, Y. Eldar, D. Needell, and P. Randall, “Compressed sensing with coherent and redundant dictionaries,” Appl. Computational Harmonic Anal., vol. 31, no. 1, pp. 59–73, 2010.
  • [30] Z. Shen, K.-C. Toh, and S. Yun, “An accelerated proximal gradient algorithm for frame-based image restoration via the balanced approach,” SIAM J. Imag. Sci., vol. 4, no. 2, pp. 573–596, 2011.
  • [31] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, 2017.
  • [32] E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Trans. Comp. Imag., vol. 5, no. 1, pp. 52–67, Mar. 2019.
  • [33] A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,” J. Mach. Learn. Res., vol. 6, pp. 695–709, 2005.
  • [34] H. Robbins, “An empirical Bayes approach to statistics,” in Proc. Berkeley Symp. Math. Stats. Prob., 1956, pp. 157–163.
  • [35] B. Efron, “Tweedie’s formula and selection bias,” J. Am. Statist. Assoc., vol. 106, no. 496, pp. 1602–1614, 2011.
  • [36] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised MAP inference for image super-resolution,” arXiv:1610.04490 (and ICLR 2017), 2016.
  • [37] J.H.R. Chang, C.-L. Li, B. Póczos, B.V.K.V. Kumar, and A. C. Sankaranarayanan, “One network to solve them all—Solving linear inverse problems using deep projection models,” in Proc. IEEE Intl. Conf. Comp. Vision, 2017, pp. 5888–5897.
  • [38] T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proc. IEEE Intl. Conf. Comp. Vision, 2017, pp. 1781–1790.
  • [39] S. Bigdeli and S. Süsstrunk, “Image denoising via MAP estimation using deep neural networks,” in Proc. Intl. Biomed. Astronom. Signal Process. (BASP) Frontiers Workshop, 2019, p. 76.
  • [40] S. H. Chan, “Performance analysis of plug-and-play ADMM: A graph signal processing perspective,” IEEE Trans. Comp. Imag., to appear 2019 (see also arXiv:1809.00020).
  • [41] A. Beck and M. Teboulle, “Gradient-based algorithms with applications to signal recovery,” in Convex Pptimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds.   Cambridge, 2009, pp. 42–88.
  • [42] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, and H. Wolkowicz, Eds.   Springer, 2011, pp. 185–212.
  • [43] Y. Nesterov, “A method for solving the convex programming problem with convergence rate O(1/k^2),” Soviet Math. Dokl., vol. 27, no. 2, pp. 372–376, 1983.
  • [44] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
  • [45] P. Schniter, “Turbo reconstruction of structured sparse signals,” in Proc. Conf. Inform. Science & Syst., Princeton, NJ, Mar. 2010, pp. 1–6.
  • [46] S. Ramani, T. Blu, and M. Unser, “Monte-Carlo SURE: A black-box optimization of regularization parameters for general denoising algorithms,” IEEE Trans. Image Process., vol. 17, no. 9, pp. 1540–1554, 2008.
  • [47] C. A. Metlzer, A. Maleki, and R. G. Baraniuk, “BM3D-AMP: A new image recovery algorithm based on BM3D denoising,” in Proc. IEEE Int. Conf. Image Process., 2015, pp. 3116–3120.
  • [48] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inform. Theory, vol. 62, no. 9, pp. 5117–5144, 2016.
  • [49] L. Zdeborová and F. Krzakala, “Statistical physics of inference: Thresholds and algorithms,” Advances in Physics, vol. 65, no. 5, pp. 453–552, 2016.
  • [50] A. Chambolle, R. A. De Vore, N.-Y. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 319–335, 1998.
  • [51] R. Berthier, A. Montanari, and P.-M. Nguyen, “State evolution for approximate message passing with non-separable functions,” Inform. Inference, 2019.
  • [52] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” arXiv:1610.03082, 2016.
  • [53] P. Schniter, S. Rangan, and A. K. Fletcher, “Denoising-based vector approximate message passing,” in Proc. Intl. Biomed. Astronom. Signal Process. (BASP) Frontiers Workshop, 2017.
  • [54] A. K. Fletcher, S. Rangan, S. Sarkar, and P. Schniter, “Plug-in estimation in high-dimensional linear inverse problems: A rigorous analysis,” in Proc. Neural Inform. Process. Syst. Conf., 2018 (see also arXiv:1806.10466).
  • [55] B. He, F. Ma, and X. Yuan, “Convergence study on the symmetric version of ADMM with larger step sizes,” SIAM J. Imag. Sci., vol. 9, no. 3, pp. 1467–1501, 2016.
  • [56] A. K. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, “Expectation consistent approximate inference: Generalizations and convergence,” arXiv:1602.07795, 2016.
  • [57] A. Perelli, M. Lexa, A. Can, and M. Davies, “Denoising message passing for X-ray computed tomography reconstruction,” in arXiv:1609.04661, 2016.
  • [58] S. Sarkar, S. Rangan, A. K. Fletcher, and P. Schniter, “Bilinear recovery using adaptive vector-AMP,” in arXiv:1809.00024, 2018.
  • [59] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium,” SIAM J. Imag. Sci., vol. 11, no. 3, pp. 2001–2020, 2018.
  • [60] X. Wang, J. Juang, and S. H. Chan, “Automatic foreground extraction using multi-agent consensus equilibrium,” arXiv:1808.08210, 2018.
  • [61] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim., vol. 3, no. 1, pp. 123–231, 2013.
  • [62] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “Scene-adapted plug-and-play algorithm with convergence guarantees,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process., Sep. 2017, pp. 1–6.
  • [63] R. Ahmad, H. Xue, S. Giri, Y. Ding, J. Craft, and O. P. Simonetti, “Variable density incoherent spatiotemporal acquisition (VISTA) for highly accelerated cardiac MRI,” Magnetic Resonance Med., vol. 74, no. 5, pp. 1266–1278, 2015.
  • [64] M. Buehrer, K. P. Pruessmann, P. Boesiger, and S. Kozerke, “Array compression for MRI with large coil arrays,” Magnetic Resonance Med., vol. 57, no. 6, pp. 1131–1139, 2007.
  • [65]

    M. Uecker, P. Lai, M. J. Murphy, P. Virtue, M. Elad, J. M. Pauly, S. S. Vasanawala, and M. Lustig, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,”

    Magnetic Resonance Med., vol. 71, no. 3, pp. 990–1001, 2014.
  • [66] J. S. Lim, Two-Dimensional Signal and Image Processing.   Englewood Cliffs, NJ: Prentice Hall, 1990.