Image demosaicing and single image super-resolution (SISR) are two important image processing tasks to the pipeline of color imaging. Demosaicing is a necessary step to reconstruct full-resolution color images from so-called Color filter Array (CFA) such as Bayer pattern. SISR is a cost-effective alternative to more expensive hardware-based solution (i.e., optical zoom). Both problems have been extensively yet separately studied in the literature - from model-based methods [1, 2, 3, 4, 5, 6, 7, 8, 9] to learning-based approaches [10, 11, 12, 13, 14, 15, 16, 17, 18]. Treating demosaicing and SISR as two independent problems may generate undesirable edge blurring as shown in Fig. 1. Moreover, the processes of demosaicing and SISR can be integrated and optimized together from a practical application point of view (e.g., digital zoom for smartphone cameras such as Google Pixel 3 and Huawei P30).
Inspired by the success of joint demosaicing and denoising , we propose to study the problem of joint image demosaicing and super-resolution (JDSR) in this paper and develop a principled solution leveraging latest advances in deep learning to computational imaging. We argue that the newly formulated JDSR problem has high practical impact (e.g., to support the mission of NASA Mars Curiosity and smartphone applications). The problem of JDSR is intellectually appealing but has been under-researched so far. The only existing work we can find in the open literature is a recently published paper  which contained a straightforward application of ResNet  and considered the scaling ratio of two only. As demonstrated in Fig. 1, our optimized solution to JDSR can achieve significantly better visual quality (both subjectively and objectively) than the ad-hoc approach .
The motivation behind our approach is mainly two-fold. On one hand, rapid advances in deep residual learning have offered a rich set of tools for image demosaicing and SISR. For example, DenseNet  has been adapted to fully exploit hierarchical features for the problem of SR in SRDenseNet  and residual dense network (RDN) ; residual channel attention network (RCAN)  allows us to develop much deeper networks (over 1000 layers) with squeeze-and-excitation (SE) blocks  than previous works (e.g., [14, 26]). However, to the best of our knowledge, the issue of spatio-spectral attention mechanism has not been explicitly addressed for color images in the open literature. How to jointly exploit spatial and spectral dependency for JDSR in network design deserves a systematic study.
On the other hand, we propose to optimize the perceptual quality for JDSR because that is what really matters in real-world applications. Generative adversarial network  is arguably the most popular approach toward perceptual optimization and has demonstrated convincing improvement for SISR in SRGAN . However, it has also been widely observed that the training of GAN suffers from stability issue which could have catastrophic impact on reconstructed images. There has been a flurry of latest works (e.g., Relativistic average GAN (RaGAN) , enhanced SRGAN (ESRGAN)  and perception-enhances SR (PESR) ) showing the potential of relativistic discriminator in stabilizing GAN and improving visual quality of SISR images. How to leverage those latest advances to optimize the perceptual quality for JDSR has practical significance.
Overall, our contributions are summarized as follows:
Network design: we propose a Densely-connected Squeeze-and-Excitation Residual Network (DSERN) for JDSR. A novel Dense Squeeze-and-Excitation Residual Block (D_SERB) is designed to facilitate information flow in deeper and wider networks by smooth activation, which can more effectively suppress spatio-spectral aliasing.
Perceptual optimization: we have leveraged the latest advance RaGAN  from SISR to JDSR and studied the choices of perceptual loss function for JDSR. In addition to improved stability, we have found that Texture-enhanced RaGAN (TRaGAN) with a before-activation perceptual loss function can produce visually more pleasant results.
Real-world application: we have applied the proposed DSERN+TRaGAN solution to raw Bayer pattern data collected by the Mast Camera (Mastcam) of NASA Mars Curiosity Rover. Our experimental results have shown visually superior high-resolution image reconstruction can be achieved at the scaling ratio as large as 4.
Ii Related Works
Both image demosaicing and super-resolution have been studied in decades in the open literature. In this section, we review image demosaicing and image super-resolution approaches separately and focus on deep learning based methods.
Ii-a Image Demosaicing
Existing approaches toward image demosaicing can be classified into two categories: model-based methods[1, 2, 3, 4] and learning-based methods [10, 11, 13]
. Model-based approaches rely on hand-crafted parametric models which often suffer from lacking of the generalization capability to handle varying characteristics in color images (i.e., the potential model-data mismatch). Recently, deep learning methods show the advantages in image demosaicing field. Inspired by single image super-resolution model SRCNN, DMCNN  utilized super-resolution based CNN model and ResNet  to investigate image demosacing problem. CDM-CNN  introduced to apply residual learning  with a two-phase network architecture which firstly recovers green channel as guidance prior and then uses this guidance prior to reconstruct the RGB channels. Besides to explore image demosacing methods only, there are several works studying joint image demosaicing and denoising (JDD) problem. Dong et.al. 
developed a deep neural network with generative adversarial networks (GAN) and perceptual loss functions to solve JDD problems. Inspired by classical image regularization and majorization-minimization optimization, Kokkinos and Lefkimmiatis  proposed a deep neural network to solve JDD problem. Deep learning based image demosaicing techniques have shown convincingly improved performance over model-based ones on several widely-used benchmark dataset (e.g., Kodak and McMaster ). However, the issue of suppressing spatio-spectral aliasing has not been addressed in the open literature as far as we know.
Ii-B Image Super-resolution
Model-based approaches towards SISR [5, 6, 7, 8, 9] suffer from notorious aliasing artifacts and edge blurring. Recently, deep learning-based approaches have advanced rapidly. SRCNN  first introduced deep learning based method to solve single image super-solution task with three convolutional layers and achieved much better performance than model based methods. Benefit by concept of ResNet , VDSR 
firstly trained 20 layers deep networks with long residual connection which can only learn more high-frequency information and increase the convergence speed. EDSR
proposed to integrate several resblocks and remove batch-normalization layer, which can save GPU memory, stack more layers and make networks wider, to further improve SISR performance. LapSRN proposed to super-resolve LR image several times to save GPU memory and achieve better performance.
Most recent advances include SRDenseNet  which applied denseNet  to solve SISR task, RDN  which utilized ResNet and DenseNet to create residual dense block (RDB). Through local feature fusion, the proposed RDB can allow larger growth rate to boost the performance. RCAN  first introduced attention mechanism inspired by SENet  to calibrate feature maps and proposed residual in residual structure to achieve a very deep convolutional networks which achieved new state-of-art performance for SISR task. Besides objective measures such as PSNR/SSIM , SRGAN  introduced a novel generative adversarial networks (GAN)  based architecture to optimize the perceptual quality of SR images, benefit by GAN, SRGAN can reconstruct more textures from low-res images. An enhanced version of SRGAN named ESRGAN  using relativistic average GAN (RaGAN) was developed in  as well as  which can recover more realistic super-resolved image compared with SRGAN. By contrast, the problem of JDSR has been under-researched so far with the only exception of .
Iii Network Design: Spatio-Spectral Attention
Iii-a DSERN: Deeper and Wider are Better
Channel attention mechanism has been successfully applied in both high-level (e.g., SENet  and LS-CNN ) and low-level (e.g., RCAN ) tasks. A channel attention module first squeezes the input feature map and then activates one-time reduction-and-expansion to excite the squeezed feature map. Such strategy is not optimal for recovering missing high-frequency information in SISR when the network is very deep; meanwhile, JDSR problem requires simultaneous recovery of incomplete color information across R, G, B bands, which requires extra attention toward the dependency among spectral bands. How to generalize the channel attention mechanism from spatial-only to joint spatio-spectral serves as the key motivation behind our approach.
As discussed in , high-frequency components often correspond to regions in an image such as textures, edges, corners and so on. Conventional Conv layers have limited capability of exploiting contextual information outside the local receptive field especially due to missing data in Bayer pattern. To overcome this difficulty, we propose to design a new Densely-Connected Squeeze-and-Excitation Residual Block (D_SERB) as shown in Fig. 3b and Fig. 4. The proposed D_SERB is designed to implement a deeper and wider spatio-spectral channel attention mechanism for the purpose of more effectively suppressing spatio-spectral aliasing in low-resolution (LR) Bayer pattern.
Unlike SENet  and RCAN  (using one-time reduction-and-expansion), D_SERB uses multiple expansion modules after reduction to assure more faithful information recovery when the network gets deeper and wider. As shown in Fig. 3, we have kept both long skip and short skip connections like RCAN in order to make the overall training stable and facilitate the information flow both inside and outside the D_SERB modules. Although similar idea of local feature fusion existed in residual dense block of RDN , our hybrid design - i.e., the Dense SE (DSE) block combining the ideas in RDN and RCAN - is novel from the perspective of achieving joint spatio-spectral attention for JDSR. Spatio-spectral channel attention mechanism in the proposed D_SERB module can help to recalibrate input features via channel statistics  across spectral bands. In SISR, one-time reduction-and-expansion operation might be sufficient for capturing channel-wise dependencies for low-resolution color images; however our JDSR task aims at recovering two-third of missing data in spectral bands in addition to high-frequency spatial details, which calls for the design of deeper and wider networks.
Iii-B Dense Squeeze-and-Excitation Residual Block
The key to deeper and wider networks lies in the design of D_SERB module - i.e., how to use multiple expansions after reduction to assure more faithful information recovery both inside and outside D_SERB modules? As shown in Fig. 3b), we propose a Dense Squeeze-and-Excitation (DSE) block in which the channel size can be expanded step by step (see Fig. 4). The key advantages of this newly designed DSE block include: 1) the reduced channel descriptor can be smoothly activated multiple times and therefore more faithful information across spatio-spectral domain is accumulated; 2) dense-connection can increase the network depth and width
without running into the notorious vanishing-gradient problem; 3) both information flow and network stability, which are important to a principled solution to JDSR, can be jointly improved by introducing dense connections to SE residual blocks (so we can train even deeper than RCAN ).
More specifically, to implement the DSE block, we first apply global average pooling to squeeze input feature maps. Let us denote the input feature maps by , which contains feature maps with the dimension of . Then the global average pooling output can be calculated by:
where is the -th element of , is the pixel value of the -th feature at position from input feature maps. Then we propose to implement a simple gating mechanism as adopted by previous works including SENet  and RCAN :
refers to a sigmoid function,
denotes the ReLU function. Note that bothand are Conv layers with weights and , is the reduction ratio to reduce the dimension of
(details about this hyperparameter controlling the tradeoff between the capacity and the complexity can be found in SENet).
In order to achieve deeper and wider channel attention, we propose a novel strategy of activating the reduced features step by step (instead of one-shot) with dense connections. As shown in Fig. 4, after reducing by a factor of , we can gradually expand (i.e., smooth activation) the feature map times where . The detailed procedure of our proposed DCA module can be written as:
where , , , are the same as Eq. (2), and refers to the concatenation of feature maps by each DSE layer.
Finally, we can rescale the input feature map by
With the new DSE block, we can train even deeper network than RCAN  thanks to the improved information flow.
Iv Perceptual Optimization: Relativistic Discriminator and Loss Function
Iv-a Texture-enhanced Relativistic average GAN (TRaGAN)
The discriminator in standard GAN , where is sigmoid function, is non-transformed layer, is the input image. Such idea has been successfully applied to the problem of SISR such as SRGAN  in which the super-resolved image (fake version) is compared against the ground-truth (real version). In other words, discriminator serves as a judge for perceptual optimization of generator (as shown in Fig. 2(b)).
Unlike standard GAN, relativistic average GAN (RaGAN)  can make the discriminator to estimate the probability based on both real and fake images, making a real image more realistic than a fake one (on the average). According to , RaGAN can not only generate more realistic images but also stabilize the training progress. Recently, the benefit of RaGAN over conventional GAN has been demonstrated for SISR in  and . Here we propose to leverage the idea of RaGAN to JDSR and demonstrate how relativistic discriminator can work with the proposed DSERN (generator) for the purpose of perceptual optimization (overlooked in RDN  and RCAN ).
To implement RaGAN, we represent the real and fake images by and respectively; then we can formulate the output of a modified discriminator for RaGAN by:
where and are the expectation functions. It follows that the discriminator loss function and adversarial loss function can be written as:
It has been observed that the class of texture images is often more difficult for SISR due to spatial aliasing 
. One way of achieving better texture reconstruction is through attention mechanism at the image level - i.e., to emphasize (i.e., increase the weight) difficult samples and overlook (i.e., down-weighting) easy ones. Such idea of weighting can be conveniently incorporated into the RaGAN package because the PyTorch implementation allows an optional weight input. More specifically, we propose to consider the following weighted function with a new hyperparametertailored for Texture enhancement:
Iv-B Perceptual Loss Function
We have implemented the following perceptual loss function based on [38, 15, 16, 29]. With a pre-trained VGG19 model , we can extract high-level perceptual features of both high-resolution (HR) and SR images from the 4-
convolutional layer of VGG19 before the activation function is applied. Inspired by, we propose to extract high-level features before the activation function layer because it can further improve the performance. Let’s define perceptual loss as and -norm distance as . Then the total loss for our generator can be formulated as follows:
where coefficients and are used to balance different loss terms. . denotes the mean-squared error function (MSE), and
are the high-level features extracted from VGG19-54 layer. Note that although similar loss functions were considered in previous studies including and , their experiments include synthetic low-resolution images only. In this paper, we will demonstrate the effectiveness of the proposed perceptual optimization for JDSR on real-world data next.
V Experimental results
V-a Implementation details
In our proposed DSERN networks, we have kept the basic setting same as RCAN : D_RG is set to 10 and every D_RG contains 20 D_SERBs. All kernel size of Conv layers is with 64 filters () except the Conv layers in our DSE modules. The reduction ratio is . The upscale module we have used is the same as . The last layer filter is set to 3 in order to output super-resolved color images. For the discriminator setting, we have implemented the same discriminator networks structure as SRGAN . All kernel size of Conv layers is as shown in Fig. 2(b).
In our PyTorch implementation of DSERN, we first randomly crop the 3-channel Bayer patterns as small patches with the size of 48 48, and crop the corresponding HR color images, with a batch size of 16; then we augment the training set by standard geometric transformations (flipping and rotation). Our model is trained and optimized by ADAM  with , , and . The initial learning rate is set to , the decay factor is set to 5, which decreases the learning rate by half after [, , , ] steps; the loss function is applied to minimize the error between HR and SR images. To train GAN-based networks, we have used the trained DSERN to initialize the generator of GAN to get a better initial SR image for discriminator. The same learning rate and decay strategies are adopted here. and in Eq. (12) are set to and respectively as .
Because the codes of RDSR  are not publicly available, we have tried our own best to reproduce RDSR using PyTorch while keeping the batch size (16), patch size () and number of residual blocks (24) the same as the original work . The learning rate and decay steps in RDSR implementation are the same as those in our DSERN. This way, we have striven to make the experimental comparison against RDSR  as fair as possible.
V-B Training Dataset
In our experiment, we have used DIV2K dataset  as the training set, which includes 800 images (2K resolution). For testing, we have evaluated both popular image super-resolution benchmark datasets including Set5 , Set14 , B100 , Urban100 , and Manga109 , and popular image demosaicing datasets such as McMaster  and Kodak PhotoCD. To pre-process training and testing data, we downsample original high-resolution images by a factor of , ,
using Bicubic interpolation then generate the ‘RGGB’ Bayer pattern. Based on previous work and our own study (refer to next paragraph), supplying three-channels separately as the input (instead of the mosaicked single-channel composition) works better for the proposed network architecture. All experiments are implemented using PyTorch framework  and trained on NVIDIA Titan Xp GPUs.
Note that we have to be careful about four different spatial arrangements of Bayer patterns ) in our definition of feature maps. One can either treat the Bayer pattern like a gray-scale image (one-channel setting) which ignores the important spatial arrangement of R/G/B; or take spatial arrangement as a priori knowledge and pad missing values across R,G,B bands by zeroes (three-channel setting). As shown in Fig. 5, the former has the tendency of producing color misregistration artifacts, which suggests the latter works better. Our experimental result has confirmed a similar finding previously reported in .
V-C PSNR/SSIM Comparisons
It is convenient to further improve the performance of our DSERN by a so-called self-ensemble strategy (as done in previous works [26, 50, 18, 17]). The improved results are denoted as “DSERN”. We have compared our methods against two benchmark methods: a separated (brute-force) approach Flex  + RCAN , recently published literature RDSR . To evaluate the results of Flex  + RCAN  approach, we first demosaiced the LR mosaiced images by using Flex to get LR color images, then super-resolved them by applying a pre-trained RCAN model. Note that we have used the pre-trained RCAN weights provided by the authors on GitHub.
Table I shows PSNR/SSIM comparison results for scaling factors of , and . It can be seen that our DSERN method perform the best for all datasets and scale factors. Even without self-ensemble, the performance of DSERN still leads all of datasets and scaling factors. We observe that moderate PSNR/SSIM gains () over previous state-of-the-art. Since PSNR/SSIM metrics do not always faithfully reflect the visual quality of images, we have also included the subjective quality comparison results for image “TotteokiNoABC” in Fig. 6. It can be readily observed that for the top of the pink sock, only our DSERN can faithfully recover stripe patterns; both brute-force approach (Flex+RCAN) and RDSR have produced severe blurring artifacts. Taking another example, Fig. 7 shows the comparison at two other scaling factors ( and ). For “img_062”, we observe that all approaches contain noticeable visual distortion, but our DSERN method can recover more shape details than other competing approaches; for “253027”, zebra pattern recovered by DSERN appears to have the highest quality. For more visual comparison, see Fig. 9 and Fig. 10 which show more visual comparison among various competing approaches (please zoom in for a detailed comparison).
V-D Perceptual Index (PI) Comparisons
where MA denotes a no-reference quality metric  and NIQE referred to Natural Image Quality Evaluator . Note that the lower PI score, the better perceptual quality (i.e., contrary to SSIM ). Objective comparison of competing JDSR methods in terms of PI is shown in Table II. We have observed that GAN-based methods produce the lowest PI scores for all datasets and scaling factors. Fig. 8 provides the visual comparison with image ”IMG0019” (). It can be observed that GAN-based methods can recover sharper edges and overcome the issue of over-smoothed regions. Additionally, TRaGAN is capable of achieving even lower PI scores than standard GAN.
V-E Ablation Studies
To demonstrate the effect of proposed DSE module, we study the networks: 1) only based on ResNet; 2) ResNet with channel attention module (RCAN); 3) ResNet with proposed DSE module (DSERN). All three networks are trained under same setting for fair comparison. The general SR benchmark datasets are used, scale factor is 2. From Table III, we have found that ResNet has similar performance on Set5, Set14 and B100 to more advanced RCAN and DSERN. But when compared on Urban100, Manga109 and McM, RCAN and DSERN have better performance than ResNet; and the proposed DSERN has the best performance on most benchmark datasets.
V-F Performance on the Real-world Data
Finally, we have tested our proposed JDSR technique on some real-world data collected by the Mastcam of NASA Mars Curiosity. The raw data are ’RGGB’ bayer pattern sized by
. Due to hardware constraints, the left camera and the right camera of Mastcam have different focal lengths (the left is about 3 times weaker than the right). To compensate such a “lazy-eye” effect on raw Bayer patterns, it is desirable to develop a joint demosaicking and SR technique with at least a scaling factor of 3 (in order to support high-level standard stereo-based vision tasks such as 3D reconstruction and object recognition). Our proposed JDSR algorithm is a perfect fit for this task, which shows the great potential of computer vision and deep learning in deep space exploration.
The visual comparison results are shown in Fig. 11 for a scaling factor of 4. It can be seen that brute-force approach (Flex+RCAN) suffers from undesired artifacts especially around the edge of rocks. Our proposed DSERN method can overcome this difficulty but the result appears over-smoothed. DSERN_GAN improves the visual quality to some degree - e.g., more fine details are present and sharper edges can be observed. Replacing GAN by TRaGAN can further improve the visual quality not only around the textured regions (e.g., roads and rocks) but also in the background (e.g., terrain appears visually clearer and sharper). Fig. 12 shows the visual comparison among Flex+RCAN, DSERN, DSERN_GAN and DSERN_TRaGAN approaches. The raw image is captured by the right eye of NASA Mast Camera. The scale factor is 4 (please zoom in to get a better view).
In this paper, we proposed to study the problem of joint demosaicing and super-resolution (JDSR) - a topic has been underexplored in the literature of deep learning. Our solution consists of a new densely-connected squeeze-and-excitation residual network for image reconstruction and an improved GAN with relativistic discriminator and new loss functions for texture enhancement. Compared with naive network designs, our proposed network can stack more layers and be trained deeper by newly designed DSE block. This is because DSE makes multiple expansions on a reduced channel descriptor to allow more faithful information flow. Additionally, we have studied the problem of perceptual optimization for JDSR. Our experimental results have verified that TRaGAN can generate more realistically-looking images (especially around textured regions) and achieve lower PI scores than standard GAN. Finally, we have evaluated our proposed method (DSERN_TRaGAN) on real-world Bayer patterns collected by the Mastcam of NASA Mars Curiosity Rover, which supports its superiority to naive network design (e.g., Flex+RCAN) and the effectiveness of perceptual optimization. Another potential application of JDSR in practice is the digital zoom feature in smartphone cameras.
The authors would like to thank Dr. Chiman Kwan for supplying real-world Bayer pattern collected by NASA Mars Curiosity. This work is partially supported by the DoJ/NIJ under grant NIJ 2018-75-CX-0032, NSF under grant OAC-1839909 and the WV Higher Education Policy Commission Grant (HEPC.dsr.18.5).
-  L. Zhang and X. Wu, “Color demosaicking via directional linear minimum mean square-error estimation,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2167–2178, 2005.
-  X. Li, B. Gunturk, and L. Zhang, “Image demosaicing: A systematic survey,” in Visual Communications and Image Processing 2008, vol. 6822. International Society for Optics and Photonics, 2008, p. 68221J.
-  X. Li, “Demosaicing by successive approximation,” IEEE Transactions on Image Processing, vol. 14, no. 3, pp. 370–379, 2005.
-  W. Ye and K.-K. Ma, “Color image demosaicing using iterative residual interpolation,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5879–5891, 2015.
-  M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution through neighbor
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 1. IEEE, 2004, pp. I–I.
-  R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proceedings of the IEEE international conference on computer vision, 2013, pp. 1920–1927.
-  J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
-  R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International conference on curves and surfaces. Springer, 2010, pp. 711–730.
F.-L. He, Y.-C. F. Wang, and K.-L. Hua, “Self-learning approach to color demosaicking via support vector regression,” inImage Processing (ICIP), 2012 19th IEEE International Conference on. IEEE, 2012, pp. 2765–2768.
-  O. Kapah and H. Z. Hel-Or, “Demosaicking using artificial neural networks,” in Applications of Artificial Neural Networks in Image Processing V, vol. 3962. International Society for Optics and Photonics, 2000, pp. 112–121.
-  F. Kokkinos and S. Lefkimmiatis, “Deep image demosaicking using a cascade of convolutional residual denoising networks,” in The European Conference on Computer Vision (ECCV), September 2018.
-  J. Sun and M. F. Tappen, “Separable markov random field model and its applications in low level vision,” IEEE transactions on image processing, vol. 22, no. 1, pp. 402–407, 2013.
-  J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network.” in CVPR, vol. 2, no. 3, 2017, p. 4.
-  X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in The European Conference on Computer Vision Workshops (ECCVW), September 2018.
-  Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
-  Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in The European Conference on Computer Vision (ECCV), September 2018.
-  F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian et al., “FlexISP: A flexible camera image processing framework,” ACM Transactions on Graphics (TOG), vol. 33, no. 6, p. 231, 2014.
-  M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint demosaicking and denoising,” ACM Transactions on Graphics (TOG), vol. 35, no. 6, p. 191, 2016.
-  R. Zhou, R. Achanta, and S. Süsstrunk, “Deep residual network for joint demosaicing and super-resolution,” in Color and Imaging Conference, vol. 2018, no. 1. Society for Imaging Science and Technology, 2018, pp. 75–80.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
-  G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
-  T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4799–4807.
-  J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
-  B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
-  A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard GAN,” arXiv preprint arXiv:1807.00734, 2018.
-  T. Vu, T. M. Luu, and C. D. Yoo, “Perception-enhanced image super-resolution via relativistic generative adversarial networks,” in The European Conference on Computer Vision (ECCV) Workshops, September 2018.
-  C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision. Springer, 2014, pp. 184–199.
-  N.-S. Syu, Y.-S. Chen, and Y.-Y. Chuang, “Learning deep convolutional networks for demosaicing,” arXiv preprint arXiv:1802.03769, 2018.
-  R. Tan, K. Zhang, W. Zuo, and L. Zhang, “Color image demosaicking via deep residual learning,” in 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017, pp. 793–798.
-  W. Dong, M. Yuan, X. Li, and G. Shi, “Joint demosaicing and denoising with perceptual optimization on a generative adversarial network,” arXiv preprint arXiv:1802.04723, 2018.
-  W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate superresolution,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, no. 3, 2017, p. 5.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
Q. Wang and G. Guo, “Ls-cnn: Characterizing local patches at multiple scales for face recognition,”IEEE Transactions on Information Forensics and Security, 2019.
-  Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
-  J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision. Springer, 2016, pp. 694–711.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1874–1883.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-resolution: Dataset and Study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
-  R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International conference on curves and surfaces. Springer, 2010, pp. 711–730.
-  D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2. IEEE, 2001, pp. 416–423.
-  J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5197–5206.
-  Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimedia Tools and Applications, vol. 76, no. 20, pp. 21 811–21 838, 2017.
-  L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local directional interpolation and nonlocal adaptive thresholding,” Journal of Electronic imaging, vol. 20, no. 2, p. 023016, 2011.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-W, 2017.
-  B. K. Gunturk, J. Glotzbach, Y. Altunbasak, R. W. Schafer, and R. M. Mersereau, “Demosaicking: color filter array interpolation,” IEEE Signal processing magazine, vol. 22, no. 1, pp. 44–54, 2005.
-  R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve example-based single image super resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1865–1873.
-  Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
-  Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor, “2018 PIRM Challenge on Perceptual Image Super-resolution,” arXiv preprint arXiv:1809.07517, 2018.
-  C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang, “Learning a no-reference quality metric for single-image super-resolution,” Computer Vision and Image Understanding, vol. 158, pp. 1–16, 2017.
-  A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a” completely blind” image quality analyzer.” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, 2013.