I Introduction
Magnetic Resonance Imaging (MRI) is a noninvasive diagnostic tool that provides excellent softtissue contrast without the use of ionizing radiation. Compared to other clinical imaging modalities (e.g., CT or ultrasound), however, the data acquisition process for MRI is inherently slow. A typical clinical MRI exam consists of multiple scans and can take more than an hour to complete. For each scan, the patient may be asked to stay still for several minutes, with slight motion potentially resulting in image artifacts. Furthermore, dynamic applications demand collecting a series of images in quick succession. Due to the limited time window in many dynamic applications (e.g., contrast enhanced MR angiography), it is not always feasible to collect fully sampled datasets. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades.
The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications [1], including dynamic applications, and has been included in the default image processing framework offered by several major MRI vendors. More recently, deep learning techniques have been shown to outperform CS methods [2]. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) that learns the endtoend mapping between the measured Fourier samples and the final image [3]
. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. Other deep learning techniques explicitly account for the forward model and use convolutional neural network (CNN) based regularization. Typically, these techniques alternate between gradient descent of the data fidelity term and the CNNbased regularization
[4, 5]. Integrating learningbased methods into physical inverse problems remains an open challenge, since the conventional formulation via maximumaposteriori (MAP) estimation is not extendable to nonoptimization based methods, e.g., DNN.
An alternative is to use “plugandplay” (PnP) algorithms [6], which integrate image denoising with forwardmodel based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, stateoftheart imagedenoising techniques, such as those based on DNNs (e.g., [7, 8]), can be directly exploited for compressive MRI image reconstruction through the PnP methodology. The objective of this article is thus twofold: i) to review recent advances in plugandplay methods, and ii) to discuss their application to compressive MRI image reconstruction.
The remainder of the paper is organized as follows. We first detail the inverse problem encountered in MRI reconstruction. We then review several PnP methods, where the unifying commonality is to iteratively call a denoising subroutine as one step of a larger optimizationinspired algorithm. Next, we describe how the result of the PnP method can be interpreted as a solution to an equilibrium equation, allowing convergence analysis from the equilibrium perspective. Finally, we present illustrative examples of PnP methods applied to MRI image recovery.
Ii Inverse problems in MRI
In this section, we describe typical inverse problems in MRI. Briefly, the measurements are samples of the Fourier transform of the image, where the Fourier domain is often referred to as “kspace.” The transform can be taken across two or three spatial dimensions and includes an additional temporal dimension in dynamic applications. Furthermore, measurements are often collected in parallel from multiple receiver coils. In dynamic parallel MRI with Cartesian sampling, the time
kspace measurements from the coil take the form(1) 
where
is the vectorized 2D or 3D image at discrete time
, is a diagonal matrix containing the sensitivity map for the coil, is the 2D or 3D discrete Fourier transform (DFT), the sampling matrix contains rows of the identity matrix, and is additive white Gaussian noise (AWGN). Note that the sampling pattern may change across frames . The MRI literature often refers to as the “acceleration rate.”MRI measurements are acquired using a sequence of measurement trajectories through kspace. These trajectories can be Cartesian or nonCartesian in nature. Cartesian trajectories are essentially lines through kspace. In the Cartesian case, one kspace dimension (i.e., the frequency encoding) is fully sampled, while the other one or two dimensions (i.e., the phase encodings) are undersampled to reduce acquisition time. Typically, one line, or “readout,” is collected after every RF pulse, and the process is repeated several times to collect adequate samples of kspace. NonCartesian trajectories include radial or spiral curves, which have the effect of distributing the undersampling among all dimensions of kspace. Compared to Cartesian sampling, nonCartesian sampling provides more efficient coverage of kspace and yields an “incoherent” forward operator that is more conducive to compressedsensing reconstruction [9]. But Cartesian sampling remains the method of choice in clinical practice, due to its higher tolerance to system imperfections and an extensive record of success.
Since the sensitivity map, , is patientspecific and varies with the location of the coil with respect to the imaging plane, both and are unknown in practice. Although calibrationfree methods have been proposed to estimate (e.g., [10]) or to jointly estimate and (e.g., [11]), it is more common to first estimate through a calibration procedure and then treat as known in (1). Stacking , , and into vectors , , and , and packing into a known blockdiagonal matrix , we obtain the linear inverse problem of recovering from
(2) 
Iii Signal Recovery and Denoising
The maximum likelihood (ML) estimate of from in (2) is , where
, the probability density of
conditioned on , is known as the “likelihood function.” The ML estimate is often written in the equivalent form . In the case ofvariance AWGN
, we have that , and so , which can be recognized as leastsquares estimation. Although leastsquares estimation can give reasonable performance when is tall and well conditioned, this is rarely the case under moderate to high acceleration (i.e., ). With acceleration, it is critically important to exploit prior knowledge of signal structure.The traditional approach to exploiting such prior knowledge is to formulate and solve an optimization problem of the form
(3) 
where the regularization term encodes prior knowledge of . In fact, in (3) can be recognized as the maximum a posteriori (MAP) estimate of under the prior density model . To see why, recall that the MAP estimate maximizes the posterior distribution . That is, . Since Bayes’ rule implies that , we have
(4) 
Recalling that the first term in (3) was shown to be (plus a constant) under AWGN noise, the second term in (3) must obey . We will find this MAP interpretation useful in the sequel.
It is not easy to design good regularizers for use in (3). They must not only mimic the negative log signalprior, but also enable tractable optimization. One common approach is to use convex regularizations of the form , where are transforms, are nonnegative regularization weights, and the norm rewards sparsity in the transform outputs (see, e.g., [12]). Due to the richness of data structure in MRI, especially for dynamic applications, utilizing multiple () linear sparsifying transforms has been shown to improve recovery quality [12], but tuning multiple regularization weights, , is a nontrivial problem [13].
Particular insight comes from considering the special case of , where in (2) reduces to an AWGNcorrupted version of the image , i.e.,
(5) 
The problem of recovering from noisy , known as “denoising,” has been intensely researched for decades. While it is possible to perform denoising by solving a regularized optimization problem of the form (3) with , today’s stateoftheart approaches are either algorithmic (e.g., [14]) or DNNbased (e.g., [7, 8]). This begs an important question: can these stateoftheart denoisers be leveraged for MRI signal reconstruction, by exploiting the connections between the denoising problem and (3)? As we shall see, this is precisely what the PnP methods do.
Iv PlugandPlay Methods
In this section, we review several approaches to PnP signal reconstruction. What these approaches have in common is that they recover from measurements of the form (2) by iteratively calling a sophisticated denoiser within a larger optimization or inference algorithm.
Iva Proxbased PnP
To start, let us imagine how the optimization in (3) might be solved. Through what is known as “variable splitting,” we could introduce a new variable, , to decouple the regularizer from the data fidelity term . The variables and could then be equated using an external constraint, leading to the constrained minimization problem
(6) 
Equation (6) suggests an algorithmic solution that alternates between separately estimating and estimating , with an additional mechanism to asymptotically enforce the constraint .
The original PnP method [6] is based on the alternating direction method of multipliers (ADMM) [15]. For ADMM, (6) is first reformulated as the “augmented Lagrangian”
(7) 
where are Lagrange multipliers and is a design parameter that affects the convergence speed of the algorithm, but not the final solution. With , (7) can be rewritten as
(8) 
ADMM solves (8) by alternating the optimization of and with gradient ascent of , i.e.,
(9a)  
(9b)  
(9c) 
where and , known as “proximal maps,” are defined as
(10)  
(11)  
(12) 
Under some weak technical constraints, it can be proven [15] that when is convex, the ADMM iteration (9) converges to , the global minimum of (3) and (6).
From the discussion in Section III, we immediately recognize in (12) as the MAP denoiser of under AWGN variance and signal prior . The key idea behind the original PnP work [6] was, in the ADMM recursion (9), to “plug in” a powerful image denoising algorithm like BM3D [14] in place of the proximal denoiser from (12). If the plugin denoiser is denoted by “,” then the PnP ADMM algorithm becomes
(13a)  
(13b)  
(13c) 
A wide variety of empirical results (see, e.g., [6, 16, 17] [18]) have demonstrated that, when is a powerful denoising algorithm like BM3D, the PnP algorithm (13) produces far better recoveries than the regularizationbased approach (9).
The success of PnP methods raises important theoretical questions. Since is not in general the proximal map of any regularizer , the iterations (13) may not minimize a cost function of the form in (3), and (13) may not be an implementation of ADMM. It is then unclear if the iterations (13) will converge. And if they do converge, it is unclear what they converge to. The consensus equilibrium framework, which we discuss in Section V, provides answers to these questions.
The use of a generic denoiser in place of a proximal denoiser can be translated to nonADMM algorithms, such as FISTA [19], primaldual algorithms [20], and others, as in [21, 22, 23]. Instead of optimizing as in (13), PnP FISTA [21] uses the iterative update
(14a)  
(14b)  
(14c) 
where (14a) is a gradient descent (GD) step on the negative loglikelihood at with stepsize , (14b) is the plugin replacement of the usual proximal denoising step in FISTA, and (14c) is an acceleration step, where it is typical to use and .
Comparing PnP FISTA (14) to PnP ADMM (13), one can see that they differ in how the data fidelity term is handled: PnP ADMM uses the proximal update (11), while PnP FISTA uses the GD step (14a). If the proximal update is solved exactly, then PnP ADMM usually requires many fewer iterations than PnP FISTA. But exactly solving the proximal update (11) is much more computationally costly than taking a GD step (14a). Another option is to approximate the proximal update (11) using, e.g., several iterations of GD, which will simplify (11) but increase the number of ADMM iterations. In the end, the most efficient approach will depend on the relative complexity of the denoiser and the (exact or approximated) proximal update (11).
IvB The balanced FISTA approach
Consider a denoiser that executes the following three steps: i) transform to the wavelet domain,^{1}^{1}1Although we focus on nondecimated wavelet , the results of this section hold for any tight frame . ii) softthreshold, and iii) transform back to the original domain, i.e.,
(15) 
and denotes the softthreshold level. Such “wavelet thresholding” denoisers work particularly well when the wavelet transform is nondecimated [24], in which case is tall. In the sequel, we assume that . One advantage of is its computational simplicity: it uses little more than two fast (wavelet) transforms. The computational advantages of are especially evident in the case of 3D signals, like those that occur in cardiac MRI, in comparison to algorithmic approaches like BM4D [25]. Note that, in the 3D case, different threshold levels should be used in the spatial and temporal dimensions for best performance.
Note that, for nonorthogonal , the denoiser in (15) is not the proximal map of any , and so it does not solve a convex optimization problem of the form (3). That said, could be plugged into any of the proxbased PnP algorithms discussed previously. Interestingly, if is used in PnP FISTA, then it does solve a convex optimization problem, although one with a different form than (3). This approach was first proposed in [26, 27] under the name “balanced FISTA” (bFISTA) and applied to parallel cardiac MRI. Notably, bFISTA was proposed before the advent of PnP FISTA. More details are provided below.
In Section III, when discussing the optimization problem (3), the regularizer was mentioned as a popular option. The resulting optimization problem,
(16) 
is said to be stated in “analysis” form [28, 29]. Note that the proximal denoiser associated with (16) has the form , which is computationally intensive to solve for nonorthogonal . The problem (16) can also be stated in “synthesis” form as
(17) 
where are wavelet coefficients. Equivalently, (17) can be expressed as the unconstrained problem
(18) 
for projection matrix as .
In practice, it is not possible to take and, for finite values of , the problems (17) and (18) are not equivalent. However, problem (18) under finite is interesting to consider in its own right, and it is sometimes referred to as the “balanced” approach [30]. If we solve (18) using FISTA with stepsize (recall (14a)) and choose the particular value then, remarkably, the resulting algorithm takes the form of PnP FISTA (14) with . Note that this particular choice of is motivated by computational efficiency (since it leads to the use of ) rather than recovery performance. Still, as we demonstrate in Section VI, it performs very well.
IvC Regularization by denoising
Another PnP approach, proposed by Romano, Elad, and Milanfar in [31], recovers from measurements in (2) by finding the that solves the optimality condition^{2}^{2}2We begin our discussion of RED by focusing on the realvalued case, but later we state the RED algorithms for the complexvalued case of interest in MRI.
(19) 
where is an arbitrary (i.e., “plug in”) image denoiser and is a tuning parameter. In [31], several algorithms were proposed to solve (19). Numerical experiments in [31] suggest that, when is a sophisticated denoiser (like BM3D) and is well tuned, the solutions to (19) are stateoftheart, similar to those of PnP ADMM.
The approach (19) was coined “regularization by denoising” (RED) in [31] because, under certain conditions, the that solve (19) are the solutions to the regularized leastsquares problem
(20) 
where the regularizer is explicitly constructed from the plugin denoiser . But what are these conditions? Assuming that is differentiable almost everywhere, it was shown in [32] that the solutions of (19) correspond to those of (20) when i) is locally homogeneous^{3}^{3}3Locally homogeneous means that for all and sufficiently small nonzero . and ii) has a symmetric Jacobian matrix (i.e., ). But it was demonstrated in [32] that these properties are not satisfied by popular image denoisers, such as the median filter, waveletdomain thresholding, NLM, BM3D, TNRD, and DnCNN. Furthermore, it was proven in [32] that if the Jacobian of is nonsymmetric, then there does not exist any regularizer under which the solutions of (19) minimize a regularized loss of the form in (3).
One may then wonder how to justify (19). In [32], Reehorst and Schniter proposed an explanation for (19) based on “score matching” [33], which we now summarize. Suppose we are given a large corpus of training images , from which we could build the empirical prior model
where denotes the Dirac delta. Since images are known to exist outside , it is possible to build an improved prior model
using kernel density estimation (KDE), i.e.,
(21) 
where is a tuning parameter. If we adopt as the prior model for , then the MAP estimate of (recall (4)) becomes
(22) 
Because is differentiable, the solutions to (22) must obey
(23) 
A classical result known as “Tweedie’s formula” [34, 35] says that
(24) 
where is the minimum meansquared error (MMSE) denoiser under the prior and variance AWGN. That is, , where and . Applying (24) to (23), the MAP estimate under the KDE prior obeys
(25) 
which matches the RED condition (19) when . Thus, if we could implement the MMSE denoiser for a given training corpus , then RED provides a way to compute the MAP estimate of under the KDE prior .
Although the MMSE denoiser can be expressed in closed form (see [32, (67)]), it is not practical to implement for large . Thus the question remains: Can the RED approach (19) also be justified for nonMMSE denoisers , especially those that are not locally homogeneous or Jacobiansymmetric? As shown in [32], the answer is yes. Consider a practical denoiser parameterized by tunable weights (e.g., a DNN). A typical strategy is to choose to minimize the meansquared error on , i.e., set , where the expectation is taken over and . By the MMSE orthogonality principle, we have
(26) 
and so we can write
(27)  
(28) 
where (28) follows by from (24). Equation (28) says that choosing to minimize the MSE is equivalent to choosing so that best matches the “score” . This is an instance of the “score matching” framework, as described in [33]. In summary, the RED approach (19) approximates the KDEMAP approach (23) by using a plugin denoiser to approximate the MMSE denoiser . When , RED exactly implements MAPKDE, but with a practical , RED implements a scorematching approximation of MAPKDE. Thus, a more appropriate title for RED might be “score matching by denoising.”
Comparing the RED approach from this section to the proxbased PnP approach from Section IVA, we see that RED starts with the KDEbased MAP estimation problem (22) and replaces the based MMSE denoiser with a plugin denoiser , while PnP ADMM starts with the based MAP estimation problem (3) and replaces the based MAP denoiser from (12) with a plugin denoiser . It has been recently demonstrated that, in proxbased PnP, the use of MAP denoisers leads to higher performance than MMSE denoisers [36, 37, 38, 39], which is not surprising in light of the discussion above. Thus it is interesting that RED offers an approach to MAPbased recovery using MMSE denoising, which is much easier to implement than MAP denoising [36, 37, 38, 39].
Further insight into the difference between RED and proxbased PnP can be obtained by considering the case of symmetric linear denoisers, i.e., with , where we will also assume that is invertible. Although such denoisers are far from stateoftheart, they are useful for interpretation. It is easy to show [40] that is the proximal map of , i.e., that , recalling (12). With this proximal denoiser, we know that the proxbased PnP algorithm solves the optimization problem
(29) 
Meanwhile, since is both locally homogeneous and Jacobiansymmetric, we know from (20) that the RED under this solves the optimization problem
(30) 
By comparing (29) and (30), we see a clear difference between RED and proxbased PnP. Section VB compares RED to proxbased PnP from yet another perspectice: consensus equilibrium.
So far, we have described RED as solving for in (19). But how exactly is this accomplished? In the original RED paper [31], three algorithms were proposed to solve (19
): GD, inexact ADMM, and a “fixed point” heuristic that was later recognized
[32] as a special case of the proximal gradient (PG) algorithm [41, 42]. Generalizations of PG RED were proposed in [32]. The fastest among them is the acceleratedPG RED algorithm, which uses the iterative update^{4}^{4}4The RED equations (31) and (32) may be used with complexvalued quantities.(31a)  
(31b)  
(31c) 
where was defined in (11), line (31b) uses the same acceleration as PnP FISTA (14b), and is a design parameter. When and , (31) reduces to the “fixed point” heuristic from [31]. If was too complex to implement, one could replace (31a) with the GD update
(32) 
to achieve a similar complexity reduction as when going from PnP ADMM to PnP FISTA (as discussed in Section IVA). The result would be an “accelerated GD” [43] form of RED. Convergence of the RED algorithms will be discussed in Section VB.
IvD Amp
The approximate message passing (AMP) algorithm [44] estimates the signal using the iteration^{5}^{5}5Equation (33) states the AMP algorithm for realvalued quantities. To implement it for complexvalued quantities, one would replace with in (33b) [45].
(33a)  
(33b)  
(33c) 
starting with and some . Note that in (33), and refer to the dimensions of and the denoiser may change with iteration . In practice, the trace of the Jacobian in (33a) can be approximated using [46]
(34) 
using small and random with i.i.d. entries. The original AMP paper [44] considered only proximal denoising (i.e., for some ), but the PnP version of AMP shown in (33) was proposed in [47][48] under the name “denoising AMP.”
The rightmost term in (33a) is known as the “Onsager correction term,” due to connections with statistical physics [49]. If we remove this term from (33a), the resulting recursion is no more than unaccelerated FISTA (also known as ISTA [50]), i.e., (14) with and a particular stepsize . The Onsager term is what gives AMP the remarkable properties discussed below.
When is i.i.d. Gaussian and the denoiser is Lipschitz, the PnP AMP iteration (33) has remarkable properties in the largesystem limit (i.e., with fixed ), as proven in [51]. First,
(35) 
That is, the denoiser input behaves like an AWGN corrupted version of the true image . Moreover, the meansquared of AMP’s estimate is exactly predicted by the “state evolution”
(36a)  
(36b) 
Furthermore, if is the MMSE denoiser (i.e., ) and the stateevolution (36) has a unique fixedpoint, then converges to the MMSE: the holy grail of signal recovery. Or, if is the proximal denoiser and the stateevolution (36) has a unique fixedpoint, then AMP’s converges to the MAP solution, even if is nonconvex.
The remarkable properties of AMP are only guaranteed to manifest with large i.i.d. Gaussian . For this reason, a “vector AMP” (VAMP) algorithm was proposed [52] with a rigorous stateevolution and denoiserinput property (35) that hold under the larger class of rightrotationally invariant (RRI)^{6}^{6}6Matrix
is said to be RRI if its singularvalue decomposition
has Haar , i.e., distributed uniformly over the group of orthogonal matrices for any and . Note that i.i.d. Gaussian is a special case of RRI where is also Haar and the singular values in have a particular distribution. matrices . Later, a PnP version of VAMP was proposed [53] and rigorously analyzed under Lipschitz in [54]. Similar to how AMP can be understood as ISTA with Onsager correction, VAMP can be understood as the symmetric variant [55] of ADMM with Onsager correction [56]. Although PnP versions of AMP and VAMP have been shown to outperform PnP ADMM in computed tomography applications [57][58], more work is needed to handle the structured operators (recall Section II) that manifest in MRI.V Understanding PnP through Consensus Equilibrium
The success of the PnP methods in Section IV raises important theoretical questions. For example, in the case of PnP ADMM, if the plugin denoiser is not the proximal map of any regularizer , then it is not clear what cost function is being minimized or whether the algorithm will even converge. Similarly, in the case of RED, if the plugin denoiser is not the MMSE denoiser , then RED no longer solves the MAPKDE problem, and it is not clear what RED does solve, or whether a given RED algorithm will even converge. In this section, we show that many of these questions can be answered through the consensus equilibrium (CE) framework [59, 40, 60, 23, 32]. We start by discussing CE for the PnP approaches from Section IVA and follow with a discussion of CE for the RED approaches from Section IVC.
Va Consensus equilibrium for proxbased PnP
Let us start by considering the PnP ADMM algorithm (13). Rather than viewing (13) as minimizing some cost function, we can view it as seeking a solution to
(37a)  
(37b) 
which, by inspection, must hold when (13) is at a fixed point. Not surprisingly, by setting in the PnP FISTA algorithm (14), it is straightforward to show that it too seeks a solution to (37). A similar procedure can be applied to the primaldual PnP algorithm from [22]. With (37), the goal of the proxbased PnP algorithms becomes well defined! The pair (37) reaches a “consensus” in that the denoiser and the data fitting operator agree on the output . The “equilibrium” comes from the opposing signs for the noiselike term : the data fitting operator removes it, while the denoiser adds it back.
By viewing the goal of proxbased PnP as solving the fixedpoint problem (37), it becomes clear that other solvers beyond ADMM, FISTA, and primaldual can be used. For example, it was shown in [59] that the PnP CE condition (37) can be achieved by finding a fixedpoint of the system
(38) 
where^{7}^{7}7The paper [59] actually considers the consensus equilibrium among agents, whereas here we consider the simple case of agents.
(39) 
There exist many algorithms to solve (38). For example, one could use the Mann iteration [61]
(40) 
when is nonexpansive. The paper [59] also shows that this fixed point is equivalent to the solution of , in which case Newton’s method or other rootfinding methods could be applied.
The CE viewpoint also provides a path to proving the convergence of the PnP ADMM algorithm. Sreehari et al. [16] used a classical result from convex analysis to show that a sufficient condition for convergence is that i) is nonexpansive, i.e., for any and , and ii) is a subgradient of some convex function, i.e., there exists such that . If these two conditions are met, then PnP ADMM (13) will converge to a global solution. Similar observations were made in other recent studies, e.g., [40] [62]. That said, Chan et al. [17] showed that many practical denoisers do not satisfy these conditions, and so they designed a variant of PnP ADMM in which is decreased at every iteration. Under appropriate conditions on and the rate of decrease, this latter method also guarantees convergence, although not exactly to a fixed point of (37) since is no longer fixed.
Similar techniques can be used to prove the convergence of other proxbased PnP algorithms. For example, under certain technical conditions, including nonexpansiveness of , it was established [23] that PnP FISTA converges to the same fixedpoint as PnP ADMM.
VB Consensus equilibrium for RED
Just as the proxbased PnP algorithms in Section IVA can be viewed as seeking the consensus equilibrium of (37), it was shown in [32] that the proximalgradient and ADMMbased RED algorithms seek the consensus equilibrium of
(41a)  
(41b) 
where was defined in (11) and is the algorithmic parameter that appears in (31). ^{8}^{8}8The parameter also manifests in ADMM RED, as discussed in [32]. Since (41) is stated in the same form as (37), a direct comparison between the CE conditions of RED and proxbased PnP can be made.
Perhaps a more intuitive way to compare the CE conditions of RED and proxbased PnP follows from rewriting (41b) as , after which the RED CE condition becomes
(42a)  
(42b) 
which involves no inverse operations. In the typical case of , we see that (42) matches (37), except that the “noise” is added after denoising in (42b) and before denoising in (37b).
Yet another way to compare the CE conditions of RED and proxbased PnP is to eliminate the variable. Solving (42b) for and plugging the result back into (42a) gives the fixedpoint condition
(43)  
(44) 
Applying the same procedure to (37) yields the fixedpoint condition
(45) 
or, equivalently, the fixedpoint condition stated earlier in (38).
Vi Demonstration of PnP in MRI
In this final section, we demonstrate the application of PnP methods to parallel cardiac MRI. In particular, we demonstrate the recovery of a cardiac cine dataset collected using 1.5 T scanner (Magnetom Avanto, Siemens Healthcare, Erlangen, Germany) and an 18channel body array coil. A fully sampled dataset was collected using breathheld, segmented acquisition and was retrospectively downsampled at using variable density pseudorandom Cartesian sampling [63]. Other imaging parameters included: 10 mm slice thickness, flip angle, fieldofview, matrix size, 36.5 ms temporal resolution, and a total of 25 frames. The 18 channels were compressed to virtual channels (coils) for faster computation [64]. The sensitivity maps, were estimated from the fully sampled timeaveraged data using ESPIRiT [65]. The measurements were modeled as described in equation (1).
Note that, in this demonstration, the spacetime signal vector in (2) has dimension and the measurement vector in (2) has dimension . As such, it is computationally impractical to compute the matrix inverse in (11), or even store the matrix inverse in memory. The forward and adjoint operators and , however, have fast fast Fourier transform (FFT) implementations. For this reason, we focus on PnP algorithms that handle the datafidelity term using GD, such as PnP FISTA (recall (14a)), bFISTA, and accelerated GD RED (recall (32)).
Because, in this example, the signal is a cine (i.e., a video) rather than a still image, there are relatively few options available for sophisticated denoisers. Although algorithmic denoisers like BM4D [25] have been proposed, they tend to run very slowly at these dimensions, especially when compared to the implementations of the linear operators and . For this reason, we consider the waveletthresholding denoiser from (15), which requires only computing a wavelet transform and its inverse. For the wavelet transform, we used a singlelevel 3D nondecimated Haar transform. Note that, when PnP FISTA is used with , it coincides with the earlier bFISTA technique, as discussed in Section IVB. We also consider a denoiser that performed adaptive Wiener filtering on overlapping spatiotemporal patches in each of the eight subbands of our nondecimated wavelet transform [66].
Figures 12 show the results of recovering an MRI cine. Figure 1 shows normalized MSE (NMSE) versus run time for PnP FSTA and accelerated GD RED. For this experiment, we implemented the algorithms using a 4core Intel Xeon W2123 processor with an Nvidia GeForce GTX TITAN Z GPU (for the wavelet transform) and we calculated NMSE with respect to the fullysampled leastsquares solution. Figure 2(b) shows one frame from the image sequence recovered using PnP FISTA with the waveletthresholding denoiser (i.e., bFISTA). A comparison to the fullysampled leastsquares solution in Figure 2(a) shows excellent reconstruction quality. Figure 2(c) shows the recovery error for that frame, confirming the excellent reconstruction quality.
(a)  (b) 
(c)  
References
 [1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance Med., vol. 58, no. 6, pp. 1182–1195, 2007.
 [2] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
 [3] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domaintransform manifold learning,” Nature, vol. 555, no. 7697, p. 487, 2018.
 [4] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance Med., vol. 79, no. 6, pp. 3055–3071, 2018.
 [5] H. K. Aggarwal, M. P. Mani, and M. Jacob, “Model based image reconstruction using deep learned priors (MODL),” in Proc. IEEE Int. Symp. Biomed. Imag., 2018, pp. 671–674.
 [6] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plugandplay priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Info. Process., 2013, pp. 945–948.
 [7] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1256–1272, 2017.
 [8] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, 2017.
 [9] B. Adcock, A. Bastounis, and A. C. Hansen, “From global to local: Getting more from compressed sensing,” SIAM News, October 2017.
 [10] P. J. Shin, P. E. Larson, M. A. Ohliger, M. Elad, J. M. Pauly, D. B. Vigneron, and M. Lustig, “Calibrationless parallel imaging reconstruction based on structured lowrank matrix completion,” Magnetic Resonance Med., vol. 72, no. 4, pp. 959–970, 2014.
 [11] L. Ying and J. Sheng, “Joint image reconstruction and sensitivity estimation in SENSE (JSENSE),” Magnetic Resonance Med., vol. 57, no. 6, pp. 1196–1202, 2007.
 [12] C. Bilen, I. W. Selesnick, Y. Wang, R. Otazo, D. Kim, L. Axel, and D. K. Sodickson, “On compressed sensing in parallel MRI of cardiac perfusion using temporal wavelet and TV regularization,” in Proc. IEEE Int. Conf. Acoust. Speech & Signal Process., 2010, pp. 630–633.
 [13] R. Ahmad and P. Schniter, “Iteratively reweighted approaches to sparse composite regularization,” IEEE Trans. Comp. Imag., vol. 10, no. 2, pp. 220–235, Dec. 2015.
 [14] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007.
 [15] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.

[16]
S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plugandplay priors for bright field electron tomography and sparse interpolation,”
IEEE Trans. Comp. Imag., vol. 2, pp. 408–423, 2016.  [17] S. H. Chan, X. Wang, and O. A. Elgendy, “Plugandplay ADMM for image restoration: Fixedpoint convergence and applications,” IEEE Trans. Comp. Imag., vol. 3, no. 1, pp. 84–98, 2017.
 [18] S. V. Venkatakrishnan, “Code for plugandplaypriors,” https://github.com/venkatpurdue/plugandplaypriors, accessed: 20190315.
 [19] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, 2009.
 [20] E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primaldual algorithms for convex optimization in imaging science,” SIAM J. Imag. Sci., vol. 3, no. 4, pp. 1015–1046, 2010.
 [21] U. Kamilov, H. Mansour, and B. Wohlberg, “A plugandplay priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Process. Lett., vol. 24, no. 12, pp. 1872–1876, May 2017.
 [22] S. Ono, “Primaldual plugandplay image restoration,” IEEE Signal Process. Lett., vol. 24, no. 8, pp. 1108–1112, 2017.
 [23] Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plugandplay algorithm for regularized image reconstruction,” IEEE Trans. Comp. Imag., to appear 2019 (see also arXiv:1809.04693).
 [24] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. San Diego, CA: Academic Press, 2008.
 [25] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, “Nonlocal transformdomain filter for volumetric data denoising and reconstruction,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 119–133, 2013.
 [26] S. T. Ting, R. Ahmad, N. Jin, J. Craft, J. Serafim da Silverira, H. Xue, and O. P. Simonetti, “Fast implementation for compressive recovery of highly accelerated cardiac cine MRI using the balanced sparse model,” Magnetic Resonance Med., vol. 77, no. 4, pp. 1505–1515, Apr. 2017.
 [27] Y. Liu, Z. Zhan, J.F. Cai, D. Guo, Z. Chen, and X. Qu, “Projected iterative softthresholding algorithm for tight frames in compressed sensing magnetic resonance imaging,” IEEE Trans. Med. Imag., vol. 35, no. 9, pp. 2130–2140, 2016.
 [28] M. Elad, P. Milanfar, and R. Rubinstein, “Analysis versus synthesis in signal priors,” Inverse Problems, vol. 23, pp. 947–968, 2007.
 [29] E. J. Candès, Y. Eldar, D. Needell, and P. Randall, “Compressed sensing with coherent and redundant dictionaries,” Appl. Computational Harmonic Anal., vol. 31, no. 1, pp. 59–73, 2010.
 [30] Z. Shen, K.C. Toh, and S. Yun, “An accelerated proximal gradient algorithm for framebased image restoration via the balanced approach,” SIAM J. Imag. Sci., vol. 4, no. 2, pp. 573–596, 2011.
 [31] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, 2017.
 [32] E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Trans. Comp. Imag., vol. 5, no. 1, pp. 52–67, Mar. 2019.
 [33] A. Hyvärinen, “Estimation of nonnormalized statistical models by score matching,” J. Mach. Learn. Res., vol. 6, pp. 695–709, 2005.
 [34] H. Robbins, “An empirical Bayes approach to statistics,” in Proc. Berkeley Symp. Math. Stats. Prob., 1956, pp. 157–163.
 [35] B. Efron, “Tweedie’s formula and selection bias,” J. Am. Statist. Assoc., vol. 106, no. 496, pp. 1602–1614, 2011.
 [36] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised MAP inference for image superresolution,” arXiv:1610.04490 (and ICLR 2017), 2016.
 [37] J.H.R. Chang, C.L. Li, B. Póczos, B.V.K.V. Kumar, and A. C. Sankaranarayanan, “One network to solve them all—Solving linear inverse problems using deep projection models,” in Proc. IEEE Intl. Conf. Comp. Vision, 2017, pp. 5888–5897.
 [38] T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proc. IEEE Intl. Conf. Comp. Vision, 2017, pp. 1781–1790.
 [39] S. Bigdeli and S. Süsstrunk, “Image denoising via MAP estimation using deep neural networks,” in Proc. Intl. Biomed. Astronom. Signal Process. (BASP) Frontiers Workshop, 2019, p. 76.
 [40] S. H. Chan, “Performance analysis of plugandplay ADMM: A graph signal processing perspective,” IEEE Trans. Comp. Imag., to appear 2019 (see also arXiv:1809.00020).
 [41] A. Beck and M. Teboulle, “Gradientbased algorithms with applications to signal recovery,” in Convex Pptimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, 2009, pp. 42–88.
 [42] P. L. Combettes and J.C. Pesquet, “Proximal splitting methods in signal processing,” in FixedPoint Algorithms for Inverse Problems in Science and Engineering, H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, and H. Wolkowicz, Eds. Springer, 2011, pp. 185–212.
 [43] Y. Nesterov, “A method for solving the convex programming problem with convergence rate O(1/k^2),” Soviet Math. Dokl., vol. 27, no. 2, pp. 372–376, 1983.
 [44] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
 [45] P. Schniter, “Turbo reconstruction of structured sparse signals,” in Proc. Conf. Inform. Science & Syst., Princeton, NJ, Mar. 2010, pp. 1–6.
 [46] S. Ramani, T. Blu, and M. Unser, “MonteCarlo SURE: A blackbox optimization of regularization parameters for general denoising algorithms,” IEEE Trans. Image Process., vol. 17, no. 9, pp. 1540–1554, 2008.
 [47] C. A. Metlzer, A. Maleki, and R. G. Baraniuk, “BM3DAMP: A new image recovery algorithm based on BM3D denoising,” in Proc. IEEE Int. Conf. Image Process., 2015, pp. 3116–3120.
 [48] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inform. Theory, vol. 62, no. 9, pp. 5117–5144, 2016.
 [49] L. Zdeborová and F. Krzakala, “Statistical physics of inference: Thresholds and algorithms,” Advances in Physics, vol. 65, no. 5, pp. 453–552, 2016.
 [50] A. Chambolle, R. A. De Vore, N.Y. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 319–335, 1998.
 [51] R. Berthier, A. Montanari, and P.M. Nguyen, “State evolution for approximate message passing with nonseparable functions,” Inform. Inference, 2019.
 [52] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” arXiv:1610.03082, 2016.
 [53] P. Schniter, S. Rangan, and A. K. Fletcher, “Denoisingbased vector approximate message passing,” in Proc. Intl. Biomed. Astronom. Signal Process. (BASP) Frontiers Workshop, 2017.
 [54] A. K. Fletcher, S. Rangan, S. Sarkar, and P. Schniter, “Plugin estimation in highdimensional linear inverse problems: A rigorous analysis,” in Proc. Neural Inform. Process. Syst. Conf., 2018 (see also arXiv:1806.10466).
 [55] B. He, F. Ma, and X. Yuan, “Convergence study on the symmetric version of ADMM with larger step sizes,” SIAM J. Imag. Sci., vol. 9, no. 3, pp. 1467–1501, 2016.
 [56] A. K. Fletcher, M. SahraeeArdakan, S. Rangan, and P. Schniter, “Expectation consistent approximate inference: Generalizations and convergence,” arXiv:1602.07795, 2016.
 [57] A. Perelli, M. Lexa, A. Can, and M. Davies, “Denoising message passing for Xray computed tomography reconstruction,” in arXiv:1609.04661, 2016.
 [58] S. Sarkar, S. Rangan, A. K. Fletcher, and P. Schniter, “Bilinear recovery using adaptive vectorAMP,” in arXiv:1809.00024, 2018.
 [59] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plugandplay unplugged: Optimizationfree reconstruction using consensus equilibrium,” SIAM J. Imag. Sci., vol. 11, no. 3, pp. 2001–2020, 2018.
 [60] X. Wang, J. Juang, and S. H. Chan, “Automatic foreground extraction using multiagent consensus equilibrium,” arXiv:1808.08210, 2018.
 [61] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim., vol. 3, no. 1, pp. 123–231, 2013.
 [62] A. M. Teodoro, J. M. BioucasDias, and M. A. T. Figueiredo, “Sceneadapted plugandplay algorithm with convergence guarantees,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process., Sep. 2017, pp. 1–6.
 [63] R. Ahmad, H. Xue, S. Giri, Y. Ding, J. Craft, and O. P. Simonetti, “Variable density incoherent spatiotemporal acquisition (VISTA) for highly accelerated cardiac MRI,” Magnetic Resonance Med., vol. 74, no. 5, pp. 1266–1278, 2015.
 [64] M. Buehrer, K. P. Pruessmann, P. Boesiger, and S. Kozerke, “Array compression for MRI with large coil arrays,” Magnetic Resonance Med., vol. 57, no. 6, pp. 1131–1139, 2007.

[65]
M. Uecker, P. Lai, M. J. Murphy, P. Virtue, M. Elad, J. M. Pauly, S. S. Vasanawala, and M. Lustig, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,”
Magnetic Resonance Med., vol. 71, no. 3, pp. 990–1001, 2014.  [66] J. S. Lim, TwoDimensional Signal and Image Processing. Englewood Cliffs, NJ: Prentice Hall, 1990.