1 Introduction
The problem of recovering an underlying unknown image from noisy and/or incomplete measured data is fundamental in computational imaging, in applications including magnetic resonance imaging (MRI) (Fessler, 2010), computed tomography (CT) (Elbakri & Fessler, 2002), microscopy (Aguet et al., 2008; Zheng et al., 2013), and inverse scattering (Katz et al., 2014; Metzler et al., 2017b). This image recovery task is often formulated as an optimization problem that minimizes a cost function, i.e.,
(1) 
where is a datafidelity term that ensures consistency between the reconstructed image and measured data. is a regularizer that imposes certain prior knowledge, e.g. smoothness (Osher et al., 2005; Ma et al., 2008), sparsity (Yang et al., 2010; Liao & Sapiro, 2008; Ravishankar & Bresler, 2010), low rank (Semerci et al., 2014; Gu et al., 2017) and nonlocal selfsimilarity (Mairal et al., 2009; Qu et al., 2014), regarding the unknown image. The problem in Eq. (1) is often solved by firstorder iterative proximal algorithms, e.g. fast iterative shrinkage/thresholding algorithm (FISTA) (Beck & Teboulle, 2009) and alternating direction method of multipliers (ADMM) (Boyd et al., 2011), to tackle the nonsmoothness of the regularizers.
To handle the nonsmoothness caused by regularizers, firstorder algorithms rely on the proximal operators (Beck & Teboulle, 2009; Boyd et al., 2011; Chambolle & Pock, 2011; Parikh et al., 2014; Geman, 1995; Esser et al., 2010) defined by
(2) 
Interestingly, given the mathematical equivalence of the proximal operator to the regularized denoising, the proximal operators can be replaced by any offtheshelf denoisers with noise level , yielding a new framework namely plugandplay (PnP) prior (Venkatakrishnan et al., 2013). The resulting algorithms, e.g. PnPADMM, can be written as
(3)  
(4)  
(5) 
where denotes the th iteration, is the terminal time, and indicate the denoising strength (of the denoiser) and the penalty parameter used in the th iteration respectively.
In this formulation, the regularizer can be implicitly defined by a plugged denoiser, which opens a new door to leverage the vast progress made on the image denoising front to solve more general inverse imaging problems. To plug wellknown image denoisers, e.g. BM3D (Dabov et al., 2007) and NLM (Buades et al., 2005), into optimization algorithms often leads to sizeable performance gain compared to other explicitly defined regularizers, e.g. total variantion. That is PnP as a standalone framework can combine the benefits of both deep learning based denoisers and optimization methods, e.g. (Zhang et al., 2017b; Rick Chang et al., 2017; Meinhardt et al., 2017). These highly desirable benefits are in terms of fast and effective inference whilst circumventing the need of expensive network retraining whenever the specific problem changes.
Whilst a PnP framework offers promising image recovery results, a major drawback is that its performance is highly sensitive to the internal parameter selection, which generically includes the penalty parameter , the denoising strength (of the denoiser) and the terminal time . The body of literature often utilizes manual tweaking e.g. (Rick Chang et al., 2017; Meinhardt et al., 2017) or handcrafted criteria e.g. (Chan et al., 2017; Zhang et al., 2017b; Eksioglu, 2016; Tirer & Giryes, 2018) to select parameters for each specific problem setting. However, manual parameter tweaking requires several trials, which is very cumbersome and timeconsuming. Semiautomated handcrafted criteria (for example monotonically decreasing the denoising strength) can, to some degree, ease the burden of exhaustive search of large parameter space, but often leads to suboptimal local minimum. Moreover, the optimal parameter setting differs imagebyimage, depending on the measurement model, noise level, noise type and unknown image itself. These differences can be noticed in the further detailed comparison in Fig. 1
, where peak signaltonoise ratio (PSNR) curves are displayed for four images under varying denoising strength.
This paper is devoted to addressing the aforementioned challenge – how to deal with the manual parameter tuning problem in a PnP framework. To this end, we formulate the internal parameter selection as a sequential decisionmaking problem. To do this, a policy is adopted to select a sequence of internal parameters to guide the optimization. Such problem can be naturally fit into a reinforcement learning (RL) framework, where a policy agent seeks to map observations to actions, with the aim of maximizing cumulativereward. The reward reflects the to do or not to do events for the agent, and a desirable high reward can be obtained if the policy leads to a faster convergence and better restoration accuracy.
We demonstrate, through extensive numerical and visual experiments, the advantage of our algorithmic approach on Compressed Sensing MRI and phase retrieval problems. We show that the policy well approximates the intrinsic function that maps the input state to its optimal parameter setting. By using the learned policy, the guided optimization can reach comparable results to the ones using oracle parameters tuned via the inaccessible ground truth. An overview of our algorithm is shown in Fig. 2. Our contributions are as follows:

We present a tuningfree PnP algorithm that can customize parameters towards diverse images, which often demonstrates faster practical convergence and better empirical performance than handcrafted criteria.

We introduce an efficient mixed modelfree and modelbased RL algorithm. It can optimize jointly the discrete terminal time, and the continuous denoising strength/penalty parameters.

We validate our approach with an extensive range of numerical and visual experiments, and show how the performance of the PnP is affected by the parameters. We also show that our welldesigned approach leads to better results than stateoftheart techniques on compressed sensing MRI and phase retrieval.
2 Related Work
The body of literature has reported several PnP algorithmic techniques. In this section, we provide a short overview of these techniques.
Plugandplay (PnP). The definitional concept of PnP was first introduced in (Danielyan et al., 2010; Zoran & Weiss, 2011; Venkatakrishnan et al., 2013), which has attracted great attention owing to its effectiveness and flexibility to handle a wide range of inverse imaging problems. Following this philosophy, several works have been developed, and can be roughly categorized in terms of four aspects, i.e., proximal algorithms, imaging applications, denoiser priors, and the convergence. (i) proximal algorithms include halfquadratic splitting (Zhang et al., 2017b), primaldual method (Ono, 2017), generalized approximate message passing (Metzler et al., 2016b) and (stochastic) accelerated proximal gradient method (Sun et al., 2019a). (ii) imaging applications have such as bright field electronic tomography (Sreehari et al., 2016); diffraction tomography (Sun et al., 2019a); lowdose CT imaging (He et al., 2018); Compressed Sensing MRI (Eksioglu, 2016); electron microscopy (Sreehari et al., 2017); singlephoton imaging (Chan et al., 2017); phase retrieval (Metzler et al., 2018); Fourier ptychography microscopy (Sun et al., 2019b); lightfield photography (Chun et al., 2019); hyperspectral sharpening (Teodoro et al., 2018); denoising (Rond et al., 2016)
; and image processing – e.g. demosaicking, deblurring, superresolution and inpainting
(Heide et al., 2014; Meinhardt et al., 2017; Zhang et al., 2019a; Tirer & Giryes, 2018).Moreover, (iii) denoiser priors include BM3D (Heide et al., 2014; Dar et al., 2016; Rond et al., 2016; Sreehari et al., 2016; Chan et al., 2017), nonlocal means (Venkatakrishnan et al., 2013; Heide et al., 2014; Sreehari et al., 2016)
(Teodoro et al., 2016, 2018), weighted nuclear norm minimization (Kamilov et al., 2017), and deep learningbased denoisers (Meinhardt et al., 2017; Zhang et al., 2017b; Rick Chang et al., 2017). Finally, (iv) theoretical analysis on the convergence include the symmetric gradient (Sreehari et al., 2016), the bounded denoiser (Chan et al., 2017) and the nonexpansiveness assumptions (Sreehari et al., 2016; Teodoro et al., 2018; Sun et al., 2019a; Ryu et al., 2019; Chan, 2019).Differing from these aspects, in this work we focus on the challenge of parameter selection in PnP, where a bad choice of parameters often leads to severe degradation of the results (Romano et al., 2017; Chan et al., 2017). Unlike existing semiautomated parameter tuning criteria (Wang & Chan, 2017; Chan et al., 2017; Zhang et al., 2017b; Eksioglu, 2016; Tirer & Giryes, 2018), our method is fully automatic and is purely learned from the data, which significantly eases the burden of manual parameter tuning.
Automated Parameter Selection. There are some works that considering automatic parameter selection in inverse problems. However, the prior term in these works is restricted to certain types of regularizers, e.g. Tikhonov regularization (Hansen & O鈥橪eary, 1993; Golub et al., 1979), smoothed versions of the norm (Eldar, 2008; Giryes et al., 2011), or general convex functions (Ramani et al., 2012). To the best of our knowledge, none of them can be applicable to the PnP framework with sophisticated nonconvex and learned priors.
Deep Unrolling. Perhaps the most confusable concept to PnP in the deep learning era is the socalled deep unrolling methods (Gregor & LeCun, 2010; Hershey et al., 2014; Wang et al., 2016; Yang et al., 2016; Zhang & Ghanem, 2018; Diamond et al., 2017; Metzler et al., 2017a; Adler & Oktem, 2018; Dong et al., 2018; Xie et al., 2019), which explicitly unroll/truncate iterative optimization algorithms into learnable deep architectures. In this way, the penalty parameters (and the denoiser prior) are treated as trainable parameters, meanwhile the number of iterations has to be fixed to enable endtoend training. By contrast, our PnP approach can adaptively select a stop time and penalty parameters given varying input states, though using the offtheshelf denoiser as prior.
Reinforcement Learning for Image Recovery. Although Reinforcement Learning (RL) has been applied in a range of domains, from game playing (Mnih et al., 2013; Silver et al., 2016) to robotic control (Schulman et al., 2015), only few works have successfully employed RL to the image recovery tasks. Authors of that (Yu et al., 2018) learned a RL policy to select appropriate tools from a toolbox to progressively restore corrupted images. The work of (Zhang et al., 2019b) proposed a recurrent image restorer whose endpoint was dynamically controlled by a learned policy. In (Furuta et al., 2019), authors used RL to select a sequence of classic filters to process images gradually. The work of (Yu et al., 2019) learned network path selection for image restoration in a multipath CNN. In contrast to these works, we apply a mixed modelfree and modelbased deep RL approach to automatically select the parameters for the PnP image recovery algorithm.
3 Tuningfree PnP Proximal Algorithm
In this work,we elaborate on our tuningfree PnP proximal algorithm, as described in (3)(5). This section describes in detail our approach, which contains three main parts. Firstly, we describe how the automated parameter selection is driven. Secondly, we introduce our environment model, and finally, we introduce the policy learning, which is guided by a mixed modelfree and a modelbased RL.
It is worth mentioning that our method is generic, and can be applicable to PnP methods derived from other proximal algorithms, e.g. forward backward splitting, as well. The reason is that these are distinct methods, they share the same fixed points as PnPADMM (Meinhardt et al., 2017).
3.1 RL Formulation for Automated Parameter Selection
This work mainly focuses on the automated parameter selection problem in the PnP framework, where we aim to select a sequence of parameters ) to guide optimization such that the recovered image is close to the underlying image
. We formulate this problem as a Markov decision process (MDP), which can be addressed via reinforcement learning (RL).
We denote the MDP by the tuple , where is the state space, is the action space, is the transition function describing the environment dynamics, and is the reward function. Specifically, for our task, is the space of optimization variable states, which includes the initialization and all intermedia results in the optimization process. is the space of internal parameters, including both discrete terminal time and the continuous denoising strength/penalty parameters (, ). The transition function maps input state to its outcome state after taking action . The state transition can be expressed as , which is composed of one or several iterations of optimization. On each transition, the environment emits a reward in terms of the reward function , which evaluates actions given the state. Applying a sequence of parameters to the initial state results in a trajectory of states, actions and rewards: . Given a trajectory , we define the return as the summation of discounted rewards after ,
(6) 
where is a discount factor and prioritizes earlier rewards over later ones.
Our goal is to learn a policy , denoted as for the decisionmaking agent, in order to maximize the objective defined as
(7) 
where represents expectation, is the initial state, and is the corresponding initial state distribution. Intuitively, the objective describes the expected return over all possible trajectories induced by the policy . The expected return on states and stateaction pairs under the policy are defined by statevalue functions and actionvalue functions respectively, i.e.,
(8)  
(9) 
In our task, we decompose actions into two parts: a discrete decision on terminal time and a continuous decision on denoising strength and penalty parameter. The policy also consists of two subpolicies: , a stochastic policy and a deterministic policy that generate and respectively. The role of is to decide whether to terminate the iterative algorithm when the next state is reached. It samples a booleanvalued outcome from a twoclass categorical distribution
, whose probability mass function is calculated from the current state
. We move forward to the next iteration if , otherwise the optimization would be terminated to output the final state. Compared to the stochastic policy , we treat deterministically, i.e. sinceis differentiable with respect to the environment, such that its gradient can be precisely estimated.
3.2 Environment Model
In RL, the environment is characterized by two components: the environment dynamics and reward function. In our task, the environment dynamics is described by the transition function related to the PnPADMM. Here, we elucidate the detailed setting of the PnPADMM as well as the reward function used for training policy.
Denoiser Prior. Differentiable environment makes the policy learning more efficient. To make the environment differentiable with respect to ^{1}^{1}1 is nondifferentiable towards environment regardless of the formulation of the environment.
, we take a convolutional neural network (CNN) denoiser as the image prior. In practice, we use a residual UNet
(Ronneberger et al., 2015) architecture, which was originally designed for medical image segmentation, but was founded to be useful in image denoising recently. Besides, we incorporate an additional tunable noise level map into the input as (Zhang et al., 2018), enabling us to provide continuous noise level control (i.e. different denoising strength) within a single network.Proximal operator of datafidelity term. Enforcing consistency with measured data requires evaluating the proximal operator in (4). For inverse problems, there might exist fast solutions due to the special structure of the observation model. We adopt the fast solution if feasible (e.g.
closedform solution using fast Fourier transform, rather than the general matrix inversion) otherwise a single step of gradient descent is performed as an inexact solution for (
4).Transition function. To reduce the computation cost, we define the transition function to involve iterations of the optimization. At each time step, the agent thus needs to decide the internal parameters for iterates. We set and the max time step in our algorithm, leading to 30 iterations of the optimization at most.
Reward function. To take both image recovery performance and runtime efficiency into account, we define the reward function as
(10) 
The first term, , denotes the PSNR increment made by the policy, where denotes the PSNR of the recovered image at step . A higher reward is acquired if the policy leads to higher performance gain in terms of PSNR. The second term, , implies penalizing the policy as it does not select to terminate at step , where sets the degree of penalty. A negative reward is given if the PSNR gain does not exceed the degree of penalty, thereby encouraging the policy to early stop the iteration with diminished return. We set in our algorithm^{2}^{2}2
The choice of the hyperparameters
and is discussed in the suppl. material..3.3 RLbased policy learning
In this section, we present a mixed modelfree and modelbased RL algorithm to learn the policy. Specifically, modelfree RL (agnostic to the environment dynamics) is used to train , while modelbased RL is utilized to optimize to make full use of the environment model^{3}^{3}3 can also be optimized in a modelfree manner. The comparison can be found in the Section 4.2.. We apply the actorcritic framework (Sutton et al., 2000), that uses a policy network (actor) and a value network (critic) to formulate the policy and the statevalue function respectively^{4}^{4}4Details of networks are given in the suppl. material.. The policy and the value networks are learned in an interleaved manner. For each gradient step, we optimize the value network parameters by minimizing
(11) 
where is the distribution of previously sampled states, practically implemented by a state buffer. This partly serves as a role of the experience replay mechanism (Lin, 1992), which is observed to ”smooth” the training data distribution (Mnih et al., 2013). The update makes use of a target value network , where is the exponentially moving average of the value network weights and has been shown to stabilize training (Mnih et al., 2015).
The policy network has two subpolicies, which employs shared convolutional layers to extract image features, followed by two separated groups of fullyconnected layers to produce termination probability (after softmax) or denoising strength/penalty parameters (after sigmoid). We denote the parameters of the subpolices as and respectively, and we seek to optimize so that the objective is maximized. The policy network is trained using policy gradient methods (Peters & Schaal, 2006). The gradient of is estimated in a modelfree manner by a likelihood estimator, while the gradient of
is estimated relying on backpropagation via environment dynamics in a modelbased manner. Specifically, for discrete terminal time decision
, we apply the policy gradient theorem (Sutton et al., 2000) to obtain unbiased Monte Carlo estimate of using advantage function as target, i.e.,(12) 
For continuous denoising strength and penalty parameter selection , we utilize the deterministic policy gradient theorem (Silver et al., 2014) to formulate its gradient, i.e.,
(13) 
where we approximate the actionvalue function by given its unfolded definition.
Using the chain rule, we can directly obtain the gradient of
by backpropagation via the reward function, the value network and the transition function, in contrast to relying on the gradient backpropagated from only the learned actionvalue function in the modelfree DDPG algorithm (Lillicrap et al., 2016).Performance  DnCNN  MemNet  UNet 

Denoising Perf.  27.18  27.32  27.40 
PnP Perf.  25.43  25.67  25.76 
Times  8.09  64.65  5.65 
Policies  PSNR  #IT.  PSNR  #IT.  PSNR  #IT. 

handcrafted  30.05  30.0  27.90  30.0  25.76  30.0 
handcrafted  30.06  29.1  28.20  18.4  26.06  19.4 
fixed  23.94  30.0  24.26  30.0  22.78  30.0 
fixed  28.45  1.6  26.67  3.4  24.19  7.3 
fixed optimal  30.02  30.0  28.27  30.0  26.08  16.7 
fixed optimal  30.03  6.7  28.34  12.6  26.16  30.0 
oracle  30.25  30.0  28.60  30.0  26.41  30.0 
oracle  30.26  8.0  28.61  13.9  26.45  21.6 
modelfree  28.79  30.0  27.95  30.0  26.15  30.0 
Ours  30.33  5.0  28.42  5.0  26.44  15.0 
Dataset  Traditional  Deep Unrolling  PnP  

RecPF  FCSA  ADMMNet  ISTANet  BM3DMRI  IRCNN  Ours  
Medical7  
MICCAI  
4 Experiments
In this section, we detail the experiments and evaluate our proposed algorithm. We mainly focus on the tasks of Compressed Sensing MRI (CSMRI) and phase retrieval (PR), which are the representative linear and nonlinear inverse imaging problems respectively.
4.1 Implementation Details
Our algorithm requires two training processes for: the denoising network and the policy network (and value network). For training the denoising network, we follow the common practice that uses 87,000 overlapping patches (with size ) drawn from 400 images from the BSD dataset (Martin et al., 2001). For each patch, we add white Gaussian noise with noise level sampled from
. The denoising networks are trained with 50 epoch using
loss and Adam optimizer (Kingma & Ba, 2014) with batch size 32. The base learning rate is set to and halved at epoch 30, then reduced to at epoch 40.To train the policy network and value network, we use the 17,125 resized images with size from the PASCAL VOC dataset (Everingham et al., 2014). Both networks are trained using Adam optimizer with batch size 48 and 1500 iterations, with a base learning rate of for the policy network and for the value network. Then we set these learning rates to and at iteration 1000. We perform 10 gradient steps at every iteration.
For the CSMRI application, a single policy network is trained to handle multiple sampling ratios (with x2/x4/x8 acceleration) and noise levels (5/10/15), simultaneously. Similarly, one policy network is learned for phase retrieval under different settings.
RecPF  FCSA  ADMMNet  ISTANet  BM3DMRI  IRCNN  Ours  GroundTruth 
22.57  22.27  24.15  24.61  23.64  24.16  25.28  PSNR 
18.74  19.23  20.48  21.37  20.62  20.91  22.02  PSNR 
24.89  24.47  26.85  27.90  26.72  27.74  28.65  PSNR 
4.2 Compressed sensing MRI
The forward model of CSMRI can be mathematically described as , where is the underlying image, the operator , with , denotes the partiallysampled Fourier transform, and is the additive white Gaussian noise. The datafidelity term is whose proximal operator is given in (Eksioglu, 2016).
Denoiser priors. To show how denoiser priors affect the performance of the PnP, we train three stateoftheart CNNbased denoisers, i.e. DnCNN (Zhang et al., 2017a), MemNet (Tai et al., 2017) and residual UNet (Ronneberger et al., 2015), with tunable noise level map. We compare both the Gaussian denoising performance and the PnP performance^{5}^{5}5We exhaustively search the best denoising strength/penalty parameters to exclude the impact of internal parameters. using these denoisers. As shown in Table 1, the residual UNet and MemNet consistently outperform DnCNN in terms of denoising and CSMRI. It seems to imply a better Gaussian denoiser is also a better denoiser prior for the PnP framework^{6}^{6}6Further investigation of this argument can be found in the suppl. material.. Since UNet is significantly faster than MemNet, we choose UNet as our denoiser prior.
HIO  WF  DOLPHIn  SPAR  BM3DprGAMP  prDeep  Ours  GroundTruth 
14.40  15.52  19.35  22.48  25.66  27.72  28.01  PSNR 
15.10  16.27  19.62  22.51  23.61  24.59  25.12  PSNR 
Algorithms  PSNR  PSNR  PSNR 

HIO  35.96  25.76  14.82 
WF  34.46  24.96  15.76 
DOLPHIn  29.93  27.45  19.35 
SPAR  35.20  31.82  22.44 
BM3DprGAMP  40.25  32.84  25.43 
prDeep  39.70  33.54  26.82 
Ours  40.33  33.90  27.23 
Comparisons of different policies. We start by giving some insights of our learned policy by comparing the performance of PnPADMM with different polices: i) the handcrafted policy used in IRCNN (Zhang et al., 2017b); ii) the fixed policy that uses fixed parameters (, ); iii) the fixed optimal policy that adopts fixed parameters searched to maximize the average PSNR across all testing images; iv) the oracle policy that uses different parameters for different images such that the PSNR of each image is maximized and v) our learned policy based on a learned policy network to optimize parameters for each image. We remark that all compared polices are run for 30 iteration whilst ours automatically choose the terminal time.
To understand the usefulness of the early stopping mechanism, we also report the results of these polices with optimal early stopping^{7}^{7}7It should be noted some policies (e.g. ”fixed optimal” and ”oracle”) requires to access the ground truth to determine parameters, which is generally impractical in real testing scenarios. . Moreover, we analyze whether the modelbased RL benefits our algorithm by comparing it with the learned policy by modelfree RL whose is optimized using the modelfree DDPG algorithm (Lillicrap et al., 2016).
The results of all aforementioned policies are provided in Table 2. We can see that the bad choice of parameters (see “fixed”) induces poor results, in which the early stopping is quite needed to rescue performance (see “fixed”). When the parameters are properly assigned, the early stopping would be helpful to reduce computation cost. Our learned policy leads to fast practical convergence as well as excellent performance, sometimes even outperforms the oracle policy tuned via inaccessible ground truth (in case). We note this is owing to the varying parameters across iterations generated automatically in our algorithm, which yield extra flexibility than constant parameters over iterations. Besides, we find the learned modelfree policy produces suboptimal denoising strength/penalty parameters compared with our mixed modelfree and modelbased policy, and it also fails to learn early stopping behavior.
Comparisons with stateofthearts. We compare our method against six stateoftheart methods for CSMRI, including the traditional optimizationbased approaches (RecPF (Yang et al., 2010) and FCSA (Huang et al., 2010)), the PnP approaches (BM3DMRI (Eksioglu, 2016) and IRCNN (Zhang et al., 2017b)), and the deep unrolling approaches (ADMMNet (Yang et al., 2016) and ISTANet (Zhang & Ghanem, 2018)). To keep comparison fair, for each deep unrolling method, only single network is trained to tackle all the cases using the same dataset as ours. Table 3 shows the method performance on two set of medical images, i.e. 7 widely used medical images (Medical7) (Huang et al., 2010) and 50 medical images from MICCAI 2013 grand challenge dataset^{8}^{8}8https://my.vanderbilt.edu/masi/. The visual comparison can be found in Fig. 3. It can be seen that our approach significantly outperforms the stateoftheart PnP method (IRCNN) by a large margin, especially under the difficult case. In the simple cases (e.g. ), our algorithm only runs 5 iterations to arrive at the desirable performance, in contrast with 30 or 70 iterations required in IRCNN and BM3DMRI respectively.
4.3 Phase retrieval
The goal of phase retrieval (PR) is to recover the underlying image from only the amplitude, or intensity of the output of a complex linear system. Mathematically, PR can be defined as the problem of recovering a signal or from measurement of the form , where the measurement matrix represents the forward operator of the system, and represents shot noise. We approximate it with . The term controls the sigmatonoise ratio in this problem.
We test algorithms with coded diffraction pattern (CDP) (Candès et al., 2015). Multiple measurements, with different random spatial modulator (SLM) patterns are recorded. We model the capture of four measurements using a phaseonly SLM as (Metzler et al., 2018). Each measurement operator can be mathematically described as , where can be represented by the 2D Fourier transform and is diagonal matrices with nonzero elements drawn uniformly from the unit circle in the complex planes.
We compare our method with three classic approaches (HIO (Fienup, 1982), WF (Candes et al., 2014), and DOLPHIn (Mairal et al., 2016)) and three PnP approaches (SPAR (Katkovnik, 2017), BM3DprGAMP (Metzler et al., 2016a) and prDeep (Metzler et al., 2018)). Table 4 and Fig. 4 summarize the results of all competing methods on twelve images used in (Metzler et al., 2018). It can be seen that our method still leads to stateoftheart performance in this nonlinear inverse problem, and produces cleaner and clearer results than other competing methods.
5 Conclusion
In this work, we introduce RL into the PnP framework, yielding a novel tuningfree PnP proximal algorithm for a wide range of inverse imaging problems. We underline the main message of our approach the main strength of our proposed method is the policy network, which can customize wellsuited parameters for different images. Through numerical experiments, we demonstrate our learned policy often generates highlyeffective parameters, which even often reaches to the comparable performance to the ”oracle” parameters tuned via the inaccessible ground truth.
References
 Adler & Oktem (2018) Adler, J. and Oktem, O. Learned primaldual reconstruction. IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018.
 Aguet et al. (2008) Aguet, F., Van De Ville, D., and Unser, M. Modelbased 2.5d deconvolution for extended depth of field in brightfield microscopy. IEEE Transactions on Image Processing, 17(7):1144–1153, 2008.
 Beck & Teboulle (2009) Beck, A. and Teboulle, M. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
 Boyd et al. (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.

Buades et al. (2005)
Buades, A., Coll, B., and Morel, J.M.
A nonlocal algorithm for image denoising.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 60–65, 2005.  Candes et al. (2014) Candes, E., Li, X., and Soltanolkotabi, M. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61, 07 2014.
 Candès et al. (2015) Candès, E. J., Li, X., and Soltanolkotabi, M. Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277–299, 2015.
 Chambolle & Pock (2011) Chambolle, A. and Pock, T. A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.
 Chan (2019) Chan, S. H. Performance analysis of plugandplay admm: A graph signal processing perspective. IEEE Transactions on Computational Imaging, 5(2):274–286, 2019.
 Chan et al. (2017) Chan, S. H., Wang, X., and Elgendy, O. A. Plugandplay admm for image restoration: Fixedpoint convergence and applications. IEEE Transactions on Computational Imaging, 3(1):84–98, 2017.
 Chun et al. (2019) Chun, I. Y., Huang, Z., Lim, H., and Fessler, J. A. Momentumnet: Fast and convergent iterative neural network for inverse problems. arXiv preprint arXiv:1907.11818, 2019.
 Dabov et al. (2007) Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. Image denoising by sparse 3d transformdomain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080, 2007.
 Danielyan et al. (2010) Danielyan, A., Katkovnik, V., and Egiazarian, K. Image deblurring by augmented lagrangian with bm3d frame prior. In Workshop on Information Theoretic Methods in Science and Engineering, pp. 16–18, 2010.
 Dar et al. (2016) Dar, Y., Bruckstein, A. M., Elad, M., and Giryes, R. Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing, 25(7):3044–3058, 2016.
 Diamond et al. (2017) Diamond, S., Sitzmann, V., Heide, F., and Wetzstein, G. Unrolled optimization with deep priors. arXiv preprint arXiv:1705.08041, 2017.
 Dong et al. (2018) Dong, W., Wang, P., Yin, W., Shi, G., Wu, F., and Lu, X. Denoising prior driven deep neural network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10):2305–2318, 2018.
 Eksioglu (2016) Eksioglu, E. M. Decoupled algorithm for mri reconstruction using nonlocal block matching model: Bm3dmri. Journal of Mathematical Imaging and Vision, 56(3):430–440, 2016.
 Elbakri & Fessler (2002) Elbakri, I. A. and Fessler, J. A. Segmentationfree statistical image reconstruction for polyenergetic xray computed tomography. In IEEE International Symposium on Biomedical Imaging, pp. 828–831, 2002.
 Eldar (2008) Eldar, Y. C. Generalized sure for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2):471–481, 2008.
 Esser et al. (2010) Esser, E., Zhang, X., and Chan, T. F. A general framework for a class of first order primaldual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences, 3(4):1015–1046, 2010.
 Everingham et al. (2014) Everingham, M., Eslami, S., Van Gool, L., Williams, C., Winn, J., and Zisserman, A. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111, 01 2014.
 Fessler (2010) Fessler, J. A. Modelbased image reconstruction for mri. IEEE Signal Processing Magazine, 27(4):81–89, 2010.
 Fienup (1982) Fienup, J. R. Phase retrieval algorithms: a comparison. Applied Optics, 21(15):2758–2769, 1982.

Furuta et al. (2019)
Furuta, R., Inoue, N., and Yamasaki, T.
Fully convolutional network with multistep reinforcement learning
for image processing.
In
AAAI Conference on Artificial Intelligence
, pp. 3598–3605, 2019.  Geman (1995) Geman, D. Nonlinear image recovery with halfquadratic regularization. IEEE Transactions on Image Processing, 4(7):932–946, 1995.
 Giryes et al. (2011) Giryes, R., Elad, M., and Eldar, Y. C. The projected gsure for automatic parameter tuning in iterative shrinkage methods. Applied and Computational Harmonic Analysis, 30(3):407–422, 2011.
 Golub et al. (1979) Golub, G. H., Heath, M., and Wahba, G. Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979.
 Gregor & LeCun (2010) Gregor, K. and LeCun, Y. Learning fast approximations of sparse coding. In International Conference on Machine Learning (ICML), pp. 399–406, 2010.
 Gu et al. (2017) Gu, S., Xie, Q., Meng, D., Zuo, W., Feng, X., and Zhang, L. Weighted nuclear norm minimization and its applications to low level vision. International Journal of Computer Vision, 121(2):183–208, 2017.
 Hansen & O鈥橪eary (1993) Hansen, P. C. and O鈥橪eary, D. P. The use of the lcurve in the regularization of discrete illposed problems. SIAM Journal on Scientific Computing, 14(6):1487–1503, 1993.
 He et al. (2018) He, J., Yang, Y., Wang, Y., Zeng, D., Bian, Z., Zhang, H., Sun, J., Xu, Z., and Ma, J. Optimizing a parameterized plugandplay admm for iterative lowdose ct reconstruction. IEEE Transactions on Medical Imaging, 38(2):371–382, 2018.
 Heide et al. (2014) Heide, F., Steinberger, M., Tsai, Y.T., Rouf, M., Pajak, D., Reddy, D., Gallo, O., Liu, J., Heidrich, W., Egiazarian, K., et al. Flexisp: A flexible camera image processing framework. ACM Transactions on Graphics, 33(6):231, 2014.
 Hershey et al. (2014) Hershey, J. R., Roux, J. L., and Weninger, F. Deep unfolding: Modelbased inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574, 2014.
 Huang et al. (2010) Huang, J., Zhang, S., and Metaxas, D. Efficient mr image reconstruction for compressed mr imaging. Medical Image Analysis, 15:135–142, 2010.
 Kamilov et al. (2017) Kamilov, U. S., Mansour, H., and Wohlberg, B. A plugandplay priors approach for solving nonlinear imaging inverse problems. IEEE Signal Processing Letters, 24(12):1872–1876, 2017.
 Katkovnik (2017) Katkovnik, V. Phase retrieval from noisy data based on sparse approximation of object phase and amplitude. arXiv preprint arXiv:1709.01071, 2017.
 Katz et al. (2014) Katz, O., Heidmann, P., Fink, M., and Gigan, S. Noninvasive singleshot imaging through scattering layers and around corners via speckle correlations. Nature Photonics, 8(10):784, 2014.
 Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Liao & Sapiro (2008) Liao, H. Y. and Sapiro, G. Sparse representations for limited data tomography. In IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1375–1378. IEEE, 2008.
 Lillicrap et al. (2016) Lillicrap, T., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. international conference on learning representations (ICLR), 2016.
 Lin (1992) Lin, L. Selfimproving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3):293–321, 1992.
 Ma et al. (2008) Ma, S., Yin, W., Zhang, Y., and Chakraborty, A. An efficient algorithm for compressed mr imaging using total variation and wavelets. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, 2008.
 Mairal et al. (2016) Mairal, Julien, Tillmann, Andreas, M., Eldar, Yonina, and C. Dolphindictionary learning for phase retrieval. IEEE Transactions on Signal Processing, 2016.
 Mairal et al. (2009) Mairal, J., Bach, F. R., Ponce, J., Sapiro, G., and Zisserman, A. Nonlocal sparse models for image restoration. In IEEE International Conference on Computer Vision (ICCV), volume 29, pp. 54–62, 2009.
 Martin et al. (2001) Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE International Conference on Computer Vision (ICCV), pp. 416–423, 2001.
 Meinhardt et al. (2017) Meinhardt, T., Moller, M., Hazirbas, C., and Cremers, D. Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. In IEEE International Conference on Computer Vision (ICCV), Oct 2017.
 Metzler et al. (2017a) Metzler, C., Mousavi, A., and Baraniuk, R. Learned damp: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems (NIPS), pp. 1772–1783. 2017a.
 Metzler et al. (2018) Metzler, C., Schniter, P., Veeraraghavan, A., et al. prdeep: Robust phase retrieval with a flexible deep network. In International Conference on Machine Learning (ICML), pp. 3498–3507, 2018.
 Metzler et al. (2016a) Metzler, C. A., Maleki, A., and Baraniuk, R. G. Bm3dprgamp: Compressive phase retrieval based on bm3d denoising. In IEEE International Conference on Image Processing, 2016a.
 Metzler et al. (2016b) Metzler, C. A., Maleki, A., and Baraniuk, R. G. From denoising to compressed sensing. IEEE Transactions on Information Theory, 62(9):5117–5144, 2016b.
 Metzler et al. (2017b) Metzler, C. A., Sharma, M. K., Nagesh, S., Baraniuk, R. G., Cossairt, O., and Veeraraghavan, A. Coherent inverse scattering via transmission matrices: Efficient phase retrieval algorithms and a public dataset. In IEEE International Conference on Computational Photography (ICCP), pp. 1–16, 2017b.
 Mnih et al. (2013) Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
 Mnih et al. (2015) Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Humanlevel control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
 Ono (2017) Ono, S. Primaldual plugandplay image restoration. IEEE Signal Processing Letters, 24(8):1108–1112, 2017.
 Osher et al. (2005) Osher, S., Burger, M., Goldfarb, D., Xu, J., and Yin, W. An iterative regularization method for total variationbased image restoration. Multiscale Modeling and Simulation, 4(2):460–489, 2005.
 Parikh et al. (2014) Parikh, N., Boyd, S., et al. Proximal algorithms. Foundations and Trends® in Optimization, 1(3):127–239, 2014.
 Peters & Schaal (2006) Peters, J. and Schaal, S. Policy gradient methods for robotics. International Conference on Intelligent Robots and Systems (IROS), pp. 2219–2225, 2006.
 Qu et al. (2014) Qu, X., Hou, Y., Lam, F., Guo, D., Zhong, J., and Chen, Z. Magnetic resonance image reconstruction from undersampled measurements using a patchbased nonlocal operator. Medical Image Analysis, 18(6):843–856, 2014.
 Ramani et al. (2012) Ramani, S., Liu, Z., Rosen, J., Nielsen, J.F., and Fessler, J. A. Regularization parameter selection for nonlinear iterative image restoration and mri reconstruction using gcv and surebased methods. IEEE Transactions on Image Processing, 21(8):3659–3672, 2012.
 Ravishankar & Bresler (2010) Ravishankar, S. and Bresler, Y. Mr image reconstruction from highly undersampled kspace data by dictionary learning. IEEE Transactions on Medical Imaging, 30(5):1028–1041, 2010.
 Rick Chang et al. (2017) Rick Chang, J. H., Li, C.L., Poczos, B., Vijaya Kumar, B. V. K., and Sankaranarayanan, A. C. One network to solve them all – solving linear inverse problems using deep projection models. In IEEE International Conference on Computer Vision (ICCV), 2017.
 Romano et al. (2017) Romano, Y., Elad, M., and Milanfar, P. The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.
 Rond et al. (2016) Rond, A., Giryes, R., and Elad, M. Poisson inverse problems by the plugandplay scheme. Journal of Visual Communication and Image Representation, 41:96–108, 2016.
 Ronneberger et al. (2015) Ronneberger, O., Fischer, P., and Brox, T. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 234–241, 2015.
 Ryu et al. (2019) Ryu, E., Liu, J., Wang, S., Chen, X., Wang, Z., and Yin, W. Plugandplay methods provably converge with properly trained denoisers. In International Conference on Machine Learning (ICML), pp. 5546–5557, 2019.
 Schulman et al. (2015) Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning (ICML), pp. 1889–1897, 2015.
 Semerci et al. (2014) Semerci, O., Hao, N., Kilmer, M. E., and Miller, E. L. Tensorbased formulation and nuclear norm regularization for multienergy computed tomography. IEEE Transactions on Image Processing, 23(4):1678–1693, 2014.
 Silver et al. (2014) Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministic policy gradient algorithms. International Conference on Machine Learning (ICML), 2014.
 Silver et al. (2016) Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484, 2016.

Sreehari et al. (2016)
Sreehari, S., Venkatakrishnan, S. V., Wohlberg, B., Buzzard, G. T., Drummy,
L. F., Simmons, J. P., and Bouman, C. A.
Plugandplay priors for bright field electron tomography and sparse interpolation.
IEEE Transactions on Computational Imaging, 2(4):408–423, 2016.  Sreehari et al. (2017) Sreehari, S., Venkatakrishnan, S., Bouman, K. L., Simmons, J. P., Drummy, L. F., and Bouman, C. A. Multiresolution data fusion for superresolution electron microscopy. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 88–96, 2017.
 Sun et al. (2019a) Sun, Y., Wohlberg, B., and Kamilov, U. S. An online plugandplay algorithm for regularized image reconstruction. IEEE Transactions on Computational Imaging, 2019a.
 Sun et al. (2019b) Sun, Y., Xu, S., Li, Y., Tian, L., Wohlberg, B., and Kamilov, U. S. Regularized fourier ptychography using an online plugandplay algorithm. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7665–7669, 2019b.
 Sutton et al. (2000) Sutton, R., Mcallester, D., Singh, S., and Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems (NIPS), 2000.
 Tai et al. (2017) Tai, Y., Yang, J., Liu, X., and Xu, C. Memnet: A persistent memory network for image restoration. In IEEE International Conference on Computer Vision (ICCV), Oct 2017.
 Teodoro et al. (2016) Teodoro, A. M., BioucasDias, J. M., and Figueiredo, M. A. Image restoration and reconstruction using variable splitting and classadapted image priors. In IEEE International Conference on Image Processing, pp. 3518–3522, 2016.
 Teodoro et al. (2018) Teodoro, A. M., BioucasDias, J. M., and Figueiredo, M. A. A convergent image fusion algorithm using sceneadapted gaussianmixturebased denoising. IEEE Transactions on Image Processing, 28(1):451–463, 2018.
 Tirer & Giryes (2018) Tirer, T. and Giryes, R. Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing, 28(3):1220–1234, 2018.
 Venkatakrishnan et al. (2013) Venkatakrishnan, S. V., Bouman, C. A., and Wohlberg, B. Plugandplay priors for model based reconstruction. In IEEE Global Conference on Signal and Information Processing, pp. 945–948, 2013.
 Wang et al. (2016) Wang, S., Fidler, S., and Urtasun, R. Proximal deep structured models. In Advances in Neural Information Processing Systems (NIPS), pp. 865–873, 2016.
 Wang & Chan (2017) Wang, X. and Chan, S. H. Parameterfree plugandplay admm for image restoration. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1323–1327, 2017.
 Xie et al. (2019) Xie, X., Wu, J., Liu, G., Zhong, Z., and Lin, Z. Differentiable linearized admm. In International Conference on Machine Learning (ICML), pp. 6902–6911, 2019.
 Yang et al. (2010) Yang, J., Zhang, Y., and Yin, W. A fast alternating direction method for tvl1l2 signal reconstruction from partial fourier data. IEEE Journal of Selected Topics in Signal Processing, 4(2):288–297, 2010.
 Yang et al. (2016) Yang, Y., Sun, J., Li, H., and Xu, Z. Deep admmnet for compressive sensing mri. In Advances in Neural Information Processing Systems (NIPS), pp. 10–18. 2016.
 Yu et al. (2018) Yu, K., Dong, C., Lin, L., and Change Loy, C. Crafting a toolchain for image restoration by deep reinforcement learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2452, 2018.
 Yu et al. (2019) Yu, K., Wang, X., Dong, C., Tang, X., and Loy, C. C. Pathrestore: Learning network path selection for image restoration. arXiv preprint arXiv:1904.10343, 2019.
 Zhang & Ghanem (2018) Zhang, J. and Ghanem, B. Istanet: Interpretable optimizationinspired deep network for image compressive sensing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 Zhang et al. (2017a) Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017a.
 Zhang et al. (2017b) Zhang, K., Zuo, W., Gu, S., and Zhang, L. Learning deep cnn denoiser prior for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017b.
 Zhang et al. (2018) Zhang, K., Zuo, W., and Zhang, L. Ffdnet: Toward a fast and flexible solution for cnnbased image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018.
 Zhang et al. (2019a) Zhang, K., Zuo, W., and Zhang, L. Deep plugandplay superresolution for arbitrary blur kernels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019a.
 Zhang et al. (2019b) Zhang, X., Lu, Y., Liu, J., and Dong, B. Dynamically unfolding recurrent restorer: A moving endpoint control method for image restoration. In International Conference on Learning Representations (ICLR), 2019b.
 Zheng et al. (2013) Zheng, G., Horstmeyer, R., and Yang, C. Widefield, highresolution fourier ptychographic microscopy. Nature Photonics, 7(9):739, 2013.
 Zoran & Weiss (2011) Zoran, D. and Weiss, Y. From learning models of natural image patches to whole image restoration. In IEEE International Conference on Computer Vision (ICCV), pp. 479–486, 2011.
Comments
There are no comments yet.