Single image super resolution (SISR) aims to recover high-resolution (HR) images from their low-resolution (LR) counterparts. SISR is a fundamental problem in the community of computer vision and can be applied in many image analysis tasks including surveillance and satellite image. It is a widely known ill-posed problem since each LR input may have multiple HR solutions. With the development of deep learning, a number of SR methods[8, 35]
have been proposed. Most of them are optimized by the mean squared error (MSE) which measures the pixel-wise distances between SR images and the HR ones. However, such optimizing objective impels a deep model to produce an image which may be a statistical average of possible HR solutions to the one-to-many problem. As a result, such methods usually generate blurry images with high peak signal-to-noise ratio (PSNR).
Hence, several methods aiming to recover photo-realistic images have recently utilized the generative adversarial network (GAN) , such as SRGAN , EnhanceNet , ESRGAN  and NatSR . While GAN-based methods can generate high-fidelity SR results, there are always geometric distortions along with sharp edges and fine textures. Some SR examples are presented in Figure 1. We can see RCAN  recovers blurry but straight edges for the bricks, while edges restored by perceptual-driven methods are sharper but twisted. In fact, GAN-based methods generally suffer from structural inconsistency since the discriminators may introduce unstable factors to the optimization procedure. Some methods have been proposed to balance the trade-off between the merits of two kinds of SR methods. For example, Controllable Feature Space Network (CFSNet)  designs an interactive framework to transfer continuously between two objectives of perceptual quality and distortion reduction. Nevertheless, the intrinsic problem is not mitigated since the two goals cannot be achieved simultaneously. Hence it is necessary to explicitly guide perceptual-driven SR methods to preserve structures for further enhancing the SR performance.
In this paper, we propose a structure-preserving super resolution method to alleviate the above-mentioned issue. Since the gradient map reveals the sharpness of each local region in an image, we exploit this powerful tool to guide image recovery. On the one hand, we design a gradient branch which converts the gradient maps of LR images to the HR ones as an auxiliary SR problem. The recovered gradients can be integrated into the SR branch to provide structure prior for SR. Besides, the gradients can highlight the regions where sharpness and structures should be paid more attention to, so as to guide the high-quality generation explicitly. This idea is motivated by the observation that once edges are recovered with high-fidelity, the SR task can be treated as a color-filling problem with strong clues given by the LR images. On the other hand, we propose a gradient loss to explicitly supervise the gradient maps of recovered images. Together with the image-space loss functions in existing methods, the gradient loss restricts the second-order relationship of neighboring pixels. Hence the structural configuration can be better retained with such guidance, and the SR results with high perceptual quality and fewer geometric distortions can be obtained. Moreover, our method is model-agnostic, which can be potentially used for off-the-shelf SR networks. To the best of our knowledge, we are the first to explicitly consider preserving geometric structures in GAN-based SR methods. Experimental results on benchmark datasets show that our method succeeds in enhancing SR fidelity by reducing structural distortions.
2 Related Work
which can be classified into two categories: PSNR-oriented methods and perceptual-driven ones. We also investigate methods relevant to gradient.
PSNR-Oriented Methods: Most previous approaches target high PSNR. As a pioneer, Dong et al.  propose SRCNN, which firstly maps LR images to HR ones by a three-layer CNN. DRCN  and VDSR  are further proposed by Kim et al. to improve SR performance. Moreover, Ledig et al.  propose SRResNet by employing the idea of ResNet . Zhang et al.  propose RDN by utilizing residual dense blocks in the SR framework. They further introduce RCAN  and achieve superior performance on PSNR. Li et al.  propose a feedback framework to refine the super-resolved results step by step.
Perceptual-Driven Methods: The methods mentioned above all focus on achieving high PSNR and thus use the MSE loss or L1 loss as loss functions. However, these methods usually produce blurry images. Johnson et al.  propose perceptual loss to improve the visual quality of recovered images. Ledig et al.  utilize adversarial loss  to construct SRGAN, which becomes the first framework able to generate photo-realistic HR images. Furthermore, Sajjadi et al.  restore high-fidelity textures by texture loss. Wang et al.  enhance the previous frameworks by introducing Residual-in-Residual Dense Block (RRDB) to the proposed ESRGAN. Wang et al.  exploit semantic segmentation maps as priors to generate more natural textures for specific categories. Rad et al.  propose a targeted perceptual loss on the basis of the labels of object, background and boundary. Although these existing perceptual-driven methods indeed improve the overall visual quality of super-resolved images, they sometimes generate unnatural artifacts including geometric distortions when recovering details.
Gradient-Relevant Methods: Gradient information has been utilized in previous work [29, 2]. For SR methods, Fattal  proposes a method based on edge statistics of image gradients by learning the prior dependency of different resolutions. Sun et al.  propose a gradient profile prior to represent image gradients and a gradient field transformation to enhance sharpness of super-resolved images. Yan et al. 
propose a SR method based on gradient profile sharpness which is extracted from gradient description models. In these methods, statistical dependencies are modeled by estimating HR edge-related parameters according to those observed in LR images. However, the modeling procedure is accomplished point by point, which is complex and inflexible. In fact, deep learning is outstanding in handling probability transformation over the distribution of pixels. However, few methods have utilized its powerful abilities in gradient-relevant SR methods. Moreover, Zhuet al.  propose a gradient-based SR method by collecting a dictionary of gradient patterns and modeling deformable gradient compositions. Yang et al.  propose a recurrent residual network to reconstruct fine details guided by the edges which are extracted by off-the-shelf edge detector. While edge reconstruction and gradient field constraint have been utilized in some methods, their purposes are mainly to recover high-frequency components for PSNR-orientated SR methods. Different from these methods, we aim to reduce geometric distortions produced by GAN-based methods and exploit gradient maps as structure guidance for SR. For deep adversarial networks, gradient-space constraint may provide additional supervision for better image reconstruction. To the best of our knowledge, no GAN-based SR method has exploited gradient-space guidance for preserving texture structures. In this work, we aim to leverage gradient information to further improve the GAN-based SR methods.
In this section, we first introduce the overall framework. Then we present the details of gradient branch, attentive fusion module and final objective functions accordingly.
In SISR, we aim to take LR images as inputs and generate SR images given their HR counterparts as ground-truth. We denote the generator as and its parameters as and then we have . should be as similar to as possible. If the parameters are optimized by an loss function , we have the following formulation:
The overall framework is depicted as Figure 2. The generator is composed of two branches, one of which is a structure-preserving SR branch and the other is a gradient branch. The SR branch takes as input and aims to recover the SR output with the guidance provided by the SR gradient map from the gradient branch.
3.2 Details in Architecture
3.2.1 Gradient Branch
The target of the gradient branch is to estimate the translation of gradient maps from the LR modality to the HR one. The gradient map for an image is obtained by computing the difference between adjacent pixels:
where stands for the operation to extract gradient map whose elements are gradient lengths for pixels with coordinates
. The operation to get the gradients can be easily achieved by a convolution layer with a fixed kernel. In fact, we do not consider gradient direction information since gradient intensity is adequate to reveal the sharpness of local regions in recovered images. Hence we adopt the intensity maps as the gradient maps. Such gradient maps can be regarded as another kind of images, so that techniques for image-to-image translation can be utilized to learn the mapping between two modalities. The translation process is equivalent to the spatial distribution translation from LR edge sharpness to HR edge sharpness. Since most area of the gradient map is close to zero, the convolutional neural network can concentrates more on the spatial relationship of outlines. Therefore, it may be easier for the network to capture structure dependency and consequently produce approximate gradient maps for SR images.
As shown in Figure 2, the gradient branch incorporates several intermediate-level representations from the SR branch. The motivation of such scheme is that the well-designed SR branch is capable of carrying rich structural information which is pivotal to the recovery of gradient maps. Hence we utilize the features as a strong prior to promote the performance of the gradient branch, whose parameters can be largely reduced in this case. Between each two intermediate features, there is a gradient block which can be any basic block to extract higher-level features. Once we get the SR gradient maps by the gradient branch, we are able to integrate the obtained gradient features into the SR branch to guide SR reconstruction in turn. The magnitude of gradient map can implicitly reflect whether a recovered region should be sharp or smooth. In practice, we feed the feature maps produced by the next-to-last layer of gradient branch to the SR branch. Meanwhile, we generate the output gradient maps by a convolution layer with these feature maps as inputs.
3.2.2 Structure-Preserving SR Branch
We design a structure-preserving SR branch to get the final SR outputs. This branch constitutes of two parts. The first part is a regular SR network comprising of multiple generative neural blocks which can be any architecture. Here we introduce the Residual in Residual Dense Block (RRDB) proposed in ESRGAN . There are 23 RRDB blocks in the original model. Therefore, we incorporate the feature maps from the 5th, 10th, 15th, 20th blocks to the gradient branch. Since regular SR models produce images with only 3 channels, we remove the last convolutional reconstruction layer and feed the output feature to the consecutive part. The second part of the SR branch wires the SR gradient feature maps obtained from the gradient branch as mentioned above. We fuse the structure information by a fusion block which fuses the features from two branches together. Specifically, we concatenate the two features and then use another RRDB block and convolutional layer to reconstruct the final SR features. It is noteworthy that we only add one RRDB block into the SR branch. Thus the parameter increment is slight compared to the original model with 23 blocks.
3.3 Objective Functions
Conventional Loss: Most SR methods optimize the elaborately designed networks by a common pixelwise loss, which is efficient for the task of super resolution measured by PSNR. This metric can reduce the average pixel difference between recovered images and ground-truths but the results may be too smooth to maintain sharp edges for visual effects. However, this loss is still widely used to accelerate convergence and improve SR performance:
Perceptual loss has been proposed in  to improve perceptual quality of recovered images. Features containing semantic information are extracted by a pre-trained VGG network . The Euclidean distances between the features of HR images and SR ones are minimized in perceptual loss:
where denotes the th layer output of the VGG model.
Methods [27, 42] based on generative adversarial networks (GANs) [15, 21, 33, 3, 16, 4] also play an important role in the SR problem. The discriminator and the generator are optimized by a two-player game as follows:
Following [21, 42] we conduct relativistic average GAN (RaGAN) to achieve better optimization in practice. Models supervised by the above objective functions merely consider the image-space constraint for images, but neglect the semantically structural information provided by the gradient space. While the generated results look photo-realistic, there are also a number of undesired geometric distortions. Thus we introduce the gradient loss to alleviate this issue.
Gradient Loss: Our motivation can be illustrated clearly by Figure 3. Here we only consider a simple 1-dimensional case. If the model is only optimized in image space by the L1 loss, we usually get a SR sequence as Figure 3 (b) given an input testing sequence whose ground-truth is a sharp edge as Figure 3 (a). The model fails to recover sharp edges for the reason that the model tends to give an statistical average of possible HR solutions from training data. In this case, if we compute and show the gradient magnitudes of two sequences, it can be observed that the SR gradient is flat with low values while the HR gradient is a spike with high values. They are far from each other. This inspires us that if we add a second-order gradient constraint to the optimization objective, the model may learn more from the gradient space. It helps the model focus on neighboring configuration, so that the local intensity of sharpness can be inferred more appropriately. Therefore, if the gradient information as Figure 3 (f) is captured, the probability of recovering Figure 3 (c) is increased significantly. SR methods can benefit from such guidance to avoid over-smooth or over-sharpening restoration. Moreover, it is easier to extract geometric characteristics in the gradient space. Hence geometric structures can be also preserved well, resulting in more photo-realistic SR images.
Here we propose a gradient loss to achieve the above goals. Since we have mentioned the gradient map is an ideal tool to reflect structural information of an image, it can also be utilized as a second-order constraint to provide supervision to the generator. We formulate the gradient loss by diminishing the distance between the gradient map extracted from the SR image and the one from the corresponding HR image. With the supervision in both image and gradient domains, the generator can not only learn fine appearance, but also attach importance to avoiding detailed geometric distortions. Therefore, we design two terms of loss to penalize the difference in the gradient maps (GM) of the SR and HR images. One is based on the pixelwise loss as follows:
The other is to discriminate whether a gradient patch is from the HR gradient map. We design another gradient discriminator network to achieve this goal:
The gradient discriminator can also supervise the generation of SR results by adversarial learning:
|Dataset||Metric||Bicubic||SFTGAN ||SRGAN ||ESRGAN ||NatSR ||SPSR|
Note that each step in the operation is differentiable. Hence the model with gradient loss can be trained in an end-to-end manner. Furthermore, it is convenient to adopt gradient loss as additional guidance in any generative model due to the concise formulation and strong transferability.
Overall Objective: In conclusion, we have two discriminators and which are optimized by and , respectively. For the generator, two terms of loss are used to provide supervision signals simultaneously. One is imposed on the structure-preserving SR branch while the other is to reconstruct high-quality gradient maps by minimizing the pixelwise loss in the gradient branch (GB). The overall objective is defined as follows:
, , , and denote the trade-off parameters of different losses. Among these, , and are the weights of the pixel losses for SR images, gradient maps of SR images and SR gradient maps respectively. and are the weights of the adversarial losses for SR image and their gradient maps.
4.1 Implementation Details
Datasets and Evaluation Metrics: We evaluate the SR performance of our proposed SPSR method. We utilize DIV2K  as the training dataset and five commonly used benchmarks for testing: Set5 , Set14 , BSD100 , Urban100  and General100 
. We downsample HR images by bicubic interpolation to get LR inputs and only consider the scaling factor ofin our experiments. We choose Perceptual Index (PI) , Learned Perceptual Image Patch Similarity (LPIPS) , PSNR and Structure Similarity (SSIM) 
as the evaluation metrics. Lower PI and LPIPS values indicate higher perceptual quality.
Training Details: We use the architecture of ESRGAN  as the backbone of our SR branch and the RRDB block  as the gradient block. We randomly sample 15 patches from LR images for each input mini-batch. Therefore the ground-truth HR patches have a size of . We initialize the generator with the parameters of a pre-trained PSNR-oriented model. The pixelwise loss, perceptual loss, adversarial loss and gradient loss are used as the optimizing objectives. A pre-trained 19-layer VGG network  is employed to calculate the feature distances in the perceptual loss. We also use a VGG-style network to perform discrimination. ADAM optimizor  with and is used for optimization. We set the learning rates to for both generator and discriminator, and reduce them to half at 50k, 100k, 200k, 300k iterations. As for the trade-off parameters of losses, we follow the settings in  and set and to and , accordingly. Then we set the weights of gradient loss equal to those of image-space loss. Hence and . In terms of , we set it to
for better performance of gradient translation. All the experiments are implemented by PyTorch on NVIDIA GTX 1080Ti GPUs.
4.2 Results and Analysis
Quantitative Comparison: We compare our method quantitatively with state-of-the-art perceptual-driven SR methods including SFTGAN , SRGAN , ESRGAN  and NatSR . Results of PI, LPIPS, PSNR and SSIM values are presented in Table 1. In each row, the best result is highlighted in red while the second best is in blue. We can see in all the testing datasets SPSR achieves the best PI and LPIPS performance. Meanwhile, we get the second best PSNR and SSIM values in most datasets. It is noteworthy that while NatSR gets the highest PSNR and SSIM values in all the datasets, our method surpasses NatSR by a large margin in terms of PI and LPIPS. Moreover, NatSR cannot achieve the second best PI and LPIPS values in any testing set. Thus NatSR is more like a PSNR-oriented SR method, which tends to produce relatively blurry results with high PSNR compared to other perceptual-driven methods. Besides, we get better performance than ESRGAN with only a little increment on network parameters in the SR branch. Therefore, the results demonstrate the superior ability of our SPSR method to obtain excellent perceptual quality and minor distortions simultaneously.
Qualitative Comparison: We also conduct visual comparison to perceptual-driven SR methods. From Figure 4 we see that our results are more natural and realistic than other methods. For the first image, SPSR infers sharp edges of the bricks properly, indicating that our method is capable of capturing structural characteristics of objects in images. In other rows, our method also recovers better textures than the compared SR methods. The structures in our results are clear without severe distortions, while other methods fail to show satisfactory appearance for the objects. Gradient maps for the last row are shown in Figure 5. We can see the gradient maps of other methods tend to have small values or contain structure degradation while ours are bold and natural. The qualitative comparison proves that our proposed SPSR method can learn more structure information from the gradient space, which helps generate photo-realistic SR images by preserving geometric structures.
|SPSR w/o GB||2.864||26.027||0.785||2.370||25.376||0.659||3.604||23.939||0.940|
|SPSR w/o GL||3.028||26.547||0.794||2.456||25.214||0.647||3.605||24.309||0.942|
User Study: We further perform a user study to evaluate visual quality of different SR methods. Detailed settings and results are presented in the supplementary material.
Ablation Study: We conduct more experiments on different models to validate the necessity of each part in our proposed framework. Since we apply the architecture of ESRGAN  in our SR branch, we use ESRGAN as the baseline. We compare three models with it. The first one has the same architecture as ESRGAN without the gradient branch (GB) and is trained by both the image-space and gradient-space loss. The second one is trained without the gradient loss (GL), but has the gradient branch in the network. The third is our proposed SPSR model, utilizing both the gradient loss and the gradient branch. Quantitative comparison is presented in Table 2. It is observed that SPSR w/o GB has a significant enhancement on PI performance over ESRGAN, which demonstrates the effectiveness of the proposed gradient loss in improving perceptual quality. Besides, the results of SPSR w/o GL also show that the gradient branch can significantly help improve PI or PSNR while relatively preserving the other one. In terms of the complete model, we can see SPSR surpasses ESRGAN on all the measurements in all the testing sets. Therefore, the effectiveness of our method is verified clearly.
Effects of the Gradient Branch: In order to validate the effectiveness of the gradient branch, we also visualize the output gradient maps as shown in Figure 6. Given HR images with sharp edges, the extracted HR gradient maps may have thin and clear outlines for objects in the images. However, the gradient maps extracted from the LR counterparts commonly have thick lines after the bicubic upsampling. Our gradient branch takes LR gradient maps as inputs and produce HR gradient maps so as to provide explicit structural information as a guidance for the SR branch. By treating gradient generation as an image translation problem, we can exploit the strong generative ability of the deep model. From the output gradient map in Figure 6 (d), we can see our gradient branch successfully recover thin and structure-pleasing gradient maps.
We conduct another experiment to evaluate the effectiveness of the gradient branch. With a complete SPSR model, we remove the features from the gradient branch by setting them to and only use the SR branch for inference. The visualization results are shown in Figure 7. From the patches, we can see the furs and whiskers super-resolved by only the SR branch are more blurry than those recovered by the complete model. The change of detailed textures reveals that the gradient branch can help produce sharp edges for better perceptual fidelity.
In this paper, we have proposed a structure-preserving super resolution method (SPSR) with gradient guidance to alleviate the issue of geometric distortions commonly existing in the SR results of perceptual-driven methods. We have preserved geometric structures in two aspects. Firstly, we build a gradient branch which aims to recover high-resolution gradient maps from the LR ones and provides gradient information to the SR branch as an explicit structural guidance. Secondly, we propose a new gradient loss to impose second-order restrictions on the recovered images. Geometric relationship can be better captured with both the image-space and gradient-space supervision. Quantitative and qualitative experimental results on five popular benchmark testing sets have shown the effectiveness of our proposed method.
This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientfic Research Program.
-  (2017) Ntire 2017 challenge on single image super-resolution: dataset and study. In CVPR, pp. 126–135. Cited by: §4.1.
-  (2019) Night-to-day image translation for retrieval-based localization. In ICRA, pp. 5958–5964. Cited by: §2.
-  (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §3.3.
-  (2017) Began: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717. Cited by: §3.3.
-  (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, Cited by: §4.1.
-  (2018) The 2018 pirm challenge on perceptual image super-resolution. In ECCV, pp. 334–355. Cited by: §4.1.
-  (2004) Super-resolution through neighbor embedding. In CVPR, pp. 275–282. Cited by: §2.
-  (2014) Learning a deep convolutional network for image super-resolution. In ECCV, pp. 184–199. Cited by: §1, §2.
-  (2016) Accelerating the super-resolution convolutional neural network. In ECCV, pp. 391–407. Cited by: §4.1.
-  (1979) Lanczos filtering in one and two dimensions. Journal of applied meteorology 18 (8), pp. 1016–1022. Cited by: §2.
-  (2007) Image upsampling via imposed edge statistics. TOG 26 (3), pp. 95. Cited by: §2.
-  (2011) Image and video upscaling from local self-examples. TOG 30 (2), pp. 12. Cited by: §2.
-  (2002) Example-based super-resolution. CG&A 22 (2), pp. 56–65. External Links: Cited by: §2.
-  (2009) Super-resolution from a single image. In ICCV, pp. 349–356. Cited by: §2.
-  (2014) Generative adversarial nets. In NIPS, pp. 2672–2680. Cited by: §1, §2, §3.3.
-  (2017) Improved training of wasserstein gans. In NIPS, pp. 5767–5777. Cited by: §3.3.
-  (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778. Cited by: §2.
-  (2015) Single image super-resolution from transformed self-exemplars. In CVPR, pp. 5197–5206. Cited by: §4.1.
-  (1991) Improving resolution by image registration. CVGIP 53 (3), pp. 231–239. Cited by: §2.
-  (2016) Perceptual losses for real-time style transfer and super-resolution. In ECCV, pp. 694–711. Cited by: §2, §3.3.
-  (2018) The relativistic discriminator: a key element missing from standard gan. External Links: Cited by: §3.3, §3.3.
-  (1982-01) Cubic convolution interpolation for digital image processing. ieee trans acoust speech signal process. TASSP 29, pp. 1153 – 1160. External Links: Cited by: §2.
-  (2016) Accurate image super-resolution using very deep convolutional networks. Conference Proceedings In CVPR, pp. 1646–1654. Cited by: §2.
-  (2016) Deeply-recursive convolutional network for image super-resolution. Conference Proceedings In CVPR, pp. 1637–1645. Cited by: §2.
-  (2010) Single-image super-resolution using sparse regression and natural image prior. TPAMI 32 (6), pp. 1127–1133. Cited by: §2.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, pp. 4681–4690. Cited by: Appendix A, Appendix B, 1(c), §1, §2, §2, §3.3, Table 1, §4.2.
-  (2019) Feedback network for image super-resolution. In CVPR, pp. 3867–3876. Cited by: §2.
-  (2017) Deep photo style transfer. In CVPR, pp. 4990–4998. Cited by: §2.
-  (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, pp. 416–425. Cited by: §4.1.
-  (2017) Automatic differentiation in pytorch. In NIPS-W, Cited by: §4.1.
-  (2019) SROBB: targeted perceptual loss for single image super-resolution. arXiv preprint arXiv:1908.07222. Cited by: §2.
-  (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §3.3.
-  (2017) Enhancenet: single image super-resolution through automated texture synthesis. Conference Proceedings In ICCV, pp. 4491–4500. Cited by: Appendix B, §1, §2.
-  (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, pp. 1874–1883. Cited by: §1.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.3, §4.1.
-  (2019) Natural and realistic single image super-resolution with explicit natural manifold discrimination. In CVPR, pp. 8122–8131. Cited by: Appendix A, Appendix B, 1(e), §1, Table 1, §4.2.
-  (2008) Image super-resolution using gradient profile prior. In CVPR, pp. 1–8. Cited by: §2.
-  (2010) Gradient profile prior and its applications in image super-resolution and enhancement. TIP 20 (6), pp. 1529–1542. Cited by: §2.
-  (2019) CFSNet: toward a controllable feature space for image restoration. arXiv preprint arXiv:1904.00634. Cited by: §1.
-  (2018) Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR, pp. 606–615. Cited by: Appendix B, §2, Table 1, §4.2.
-  (2018) Esrgan: enhanced super-resolution generative adversarial networks. In ECCV, pp. 63–79. Cited by: Appendix A, Appendix B, 1(d), §1, §2, §3.2.2, §3.3, §3.3, Table 1, §4.1, §4.2, §4.2, Table 2.
-  (2004) Image quality assessment: from error visibility to structural similarity. TIP 13 (4), pp. 600–612. Cited by: §4.1.
-  (2010) Robust web image/video super-resolution. TIP 19 (8), pp. 2017–2028. Cited by: §2.
-  (2015) Single image superresolution based on gradient profile sharpness. TIP 24 (10), pp. 3187–3202. Cited by: §2.
-  (2008) Image super-resolution as sparse representation of raw image patches. In CVPR, pp. 1–8. Cited by: §2.
-  (2010) Image super-resolution via sparse representation. TIP 19 (11), pp. 2861–2873. Cited by: §2.
-  (2017) Deep edge guided recurrent residual learning for image super-resolution. TIP 26 (12), pp. 5895–5907. Cited by: §2.
-  (2010) On single image scale-up using sparse-representations. In ICCS, pp. 711–730. Cited by: §4.1.
The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pp. 586–595. Cited by: §4.1.
-  (2018) Image super-resolution using very deep residual channel attention networks. Conference Proceedings In ECCV, pp. 286–301. Cited by: 1(b), §1, §2.
-  (2018) Residual dense network for image super-resolution. Conference Proceedings In CVPR, pp. 2472–2481. Cited by: §2.
-  (2015) Modeling deformable gradient compositions for single-image super-resolution. In CVPR, Cited by: §2.
Appendix A User Study
We conduct a user study as a subjective assessment to evaluate the visual performance of different SR methods on benchmark datasets. HR images are displayed as references while SR results of our SPSR method, ESRGAN , NatSR  and SRGAN  are presented in a randomized sequence. Human raters are asked to rank the four SR versions according to the perceptual quality. Finally, we collect 1290 votes from 43 human raters. The summarized results are presented in Figure 8. As shown, our SPSR method gets much more votes of rank-1 than ESRGAN, NatSR and SRGAN. Meanwhile, most SR results of ESRGAN are voted the second best among the four methods since there are more structural distortions in the recovered images of ESRGAN than ours. NatSR and SRGAN fail to obtain satisfactory results. We think the reason is that they sometimes generate relatively blurry textures and undesirable artifacts. The comparison with the state-of-the-art GAN-based SR methods verifies the superiority of our proposed method in generating high-fidelity SR results.
Appendix B More Qualitative Results
We display more SR performance comparison with state-of-the-art SR methods including EnhanceNet , SFTGAN , SRGAN , ESRGAN  and NatSR , as shown in Figure 9, 10, 11, 12 and 13. The results show our SPSR method performs better than other SR methods in recovering structural-pleasant and photo-realistic images. We also visualize the outputs of the gradient branch, as shown in Figure 14. We can see the gradient branch succeeds in converting LR gradient maps to the HR ones.