Image super-resolution (SR) is a rapidly growing issue in computer vision and has received considerable critical attention in recent years. Image SR aims to recover high-resolution (HR) images from given low-resolution (LR) images. It is essential for a wide range of real-world applications such as medical image processing[19, 14, 18], compressed image/video enhancement [25, 2], surveillance and security [29, 44], etc. Image super-resolution is not only about improving the visual quality of images: more importantly, almost all vision tasks can benefit from it. HR images generated by SR methods offer more options for vision tasks because they contain richer information.
As for the field of remote sensing, image super-resolution is even more significant. SR methods can facilitate other remote sensing missions, such as target detection [1, 45, 35], environmental monitoring [4, 11, 27], scene analysis  and so on. It is common knowledge that remote sensing images are generally acquired by satellites from high altitude. This results in the low spatial resolution of remote sensing images. The quality of remote sensing images is also affected by many factors, such as motion blur, atmospheric interference (e.g., cloud cover), ultra-long-range imaging, transmission noise, etc. [38, 40]. Some of these problems can be solved by utilizing more advanced equipment, but this equipment is more expensive with respect to both launch deployment and routine maintenance. It is therefore easy to conclude that the most appropriate way to obtain high-resolution images is through SR algorithms.
Unfortunately, image super-resolution is a severely ill-posed problem, as the process of image degradation is not unique and there is therefore no certain solution. Traditional image super-resolution algorithms are basically interpolation-based or reconstruction-based. Interpolation-based methods obtain a pixel value by using its neighborhood information, and interpolation algorithms such as linear and bicubic interpolation and lanczos  upsampling are essentially different with respect to neighborhood selection and calculation methods. Although interpolation-based methods are less computationally demanding, they often fail to recover the high-frequency information and therefore result in an excessively smooth image with blurred details, especially edges and texture information. For reconstruction-based SR algorithms, they achieve better results by using different prior knowledge or constraints in the form of distribution or energy function between HR and LR images. Local [32, 34], nonlocal  and sparse priors [8, 28] are the most widely used constraints for image super-resolution tasks. Reconstruction-based methods mostly use singular prior knowledge or a combination of several instances of prior knowledge. However, in comparison to methods based on interpolation, they are more computationally demanding. Meanwhile, these manually designed priors may not perform well when the scene changes.
Recently, with the development of machine learning (ML), learning-based SR algorithms have emerged. These algorithms attempt to establish an implicit mapping from LR images to the corresponding HR images through ML models. Among these algorithms, there has been a huge explosion in the number of deep learning (DL)-based SR algorithms. In this paper, we only focus on DL-based SR algorithms. Since Dong et al. proposed SRCNN
in 2014, an increasing number of studies have been performed by using convolutional neural networks (CNNs) for SR tasks, due to the powerful non-linear fitting and learning capabilities of CNNs. At the same time, with the development of several excellent CNNs (ResNet, DenseNet , etc.), more CNN-based SR networks have emerged, such as VDSR , EDSR , DBPN  and others. These end-to-end models need to be trained with a large number of paired LR and HR images to learn the mapping from LR to HR images. Thanks to the advancement of DL, these models can use deeper and more complex networks to learn higher-level features and thus produce higher-quality HR images. Although CNN-based SR algorithms outperform the others, remote sensing images are somehow different from natural images. Compared with natural images, the target in a remote sensing image covers fewer pixels and the background is more complex. For example, a desert image may contain less textures, but a dense residential area will contain an extremely rich amount of textural information, making super-resolution of remote sensing images more difficult. The current work [38, 41] already addresses some of these issues to some extent.
In this work, we proposed a GAN -based SR network that increases the resolution for remote sensing images in a satisfactory way, which is called the multi-attention super-resolution network (MA-GAN), as shown in Figure 1 and Figure 2. The main body of the generator, which is the core of the super-resolution task, consists of two modules; one is the PCRDB block (Figure 3), and the other is the AUP block (Figure 4(b)). The structure of the PCRBD block is similar to that of ESRGAN , which includes skip  and dense  connections to better extract features. However, we replace the last convolution operation with an AttPConv of our own design. The AttPConv block first performs multi-scale convolution without requiring more computations, and subsequently computes channel attention  on the feature maps obtained after convolution at different scales in order to dynamically adjust each feature map to achieve better SR performance. Compared with ESRGAN, which uses a fixed parameter (which needs to be designed manually) to scale the residuals, our PCRBD undoubtedly makes more full use of the powerful learning ability of CNNs and is more generalizable. The AUP block is a module that can upsample the input feature map by an arbitrary multiple. It first upsamples the input by nearest neighbor interpolation and then fine-tunes the result of the upsampling by means of a small neural network and PA . Since simple upsampling can lead to excessively smooth output, the design of the AUP block can compensate for the drawbacks caused by interpolation and thus enable better SR results.
In summary, the contributions of this work are as follows:
First, we proposed a GAN-based SR network that offers the ability of increasing the resolution for remote sensing images with any scale factor. Our experiments on the NWPU-RESISC45 dataset demonstrate that the performance of the proposed MA-GAN approach outperforms the state-of-the-art SR methods.
We design the PCRBD block that contains AttPConv. PCRDB extracts features better by skip and dense connections, and features at different scales can be obtained by multi-scale convolution in the final AttPConv block, and then the residuals can be adjusted for each feature map by the final channel attention.
We design the AUP block. With its nearest neighbor interpolation at the beginning, the AUP can achieve upsampling at any scale. The small CNN and PA immediately after interpolation can be adjusted based on the interpolated feature maps to achieve better upsampling results.
The remainder of this paper is organized as follows. Section II briefly reviews the related works. Section III presents the proposed MA-GAN in detail. Experimental results are shown in Section IV. Finally, the conclusion is presented in Section V.
Ii Related Work
Ii-a Upsampling Method
The aim of image SR is to improve the resolution of the image, so the upsampling method is indispensable for any SR algorithm, and the way in which it is performed will also affect the final result. There are three commonly used upsampling methods. The first one is interpolation. The second is the subpixel convolution layer  proposed by Shi et al., which transforms the feature map into a new feature map by a scale factor of ( and are the height, width and channels of feature maps). The last upsampling method is transposed convolution, also called deconvolution in some papers. The most common approach to using transposed convolution is the up- and down-projection unit module proposed in the DBPN . In recent studies, researchers have favored the direct use of interpolation for upsampling. This is because operations such as transposed convolution can introduce checkerboard-like artifacts that affect the experimental results, and subpixel and transposed convolution can only upsample integer multiples, which is more restrictive. In this paper, interpolated upsampling is used to achieve super-resolution at any scale factor.
Ii-B Super-Resolution Framework
SR networks can be divided into four types, depending on where the upsampling operation is located in the network: pre-upsampling, post-upsampling, progressive upsampling, and iterative up and down sampling SR frameworks . The pre-upsampling framework [5, 6] achieves the difficult upsampling task by interpolation at the beginning and later refines the interpolated image. The pre-upsampling framework is easier to train because only the coarsest images need to be refined, but it costs more time and space because of its higher-dimensional feature maps. The post-upsampling framework [24, 26] can be trained much more rapidly because it includes the upsampling step at the end. By dividing the upsampling into smaller tasks, the progressive upsampling framework  not only speeds up training but also yields better results. The iterative up and down sampling framework [15, 38] allows for better mining of the relationships between paired LR and HR images, but because of its complex framework and large number of up- and down-sampling operations, it often requires heavy manual network design.
Ii-C Loss Function
The loss function is one of the most essential aspects in image SR tasks since one LR image could correspond to several HR images. A suitable loss function brings the generative model closer to the true HR image area in its latent space.
We denote for an LR image, for an HR image and for the generated HR image. Almost all SR algorithms introduce a pixelwise loss that brings closer to in pixel value. loss and
loss are the most frequently used pixelwise losses. The pixel-level loss improves the peak signal-to-noise ratio (PSNR) of, but results in loss of high-frequency information. Therefore, Lefig et al.  presented perceptual loss with respect to perceptually relevant characteristics. GAN-based models have adversarial loss, which helps the generator to produce higher quality images with the help of a discriminator. In addition to these losses, there are also texture losses , cycle consistency loss , etc.
Iii Proposed Multi-attention Generative Adversarial Nets
In this section, we will introduce the proposed MA-GAN in detail. First, we will present the overall framework of the GAN used in this paper. Second, we will introduce the multi-attention mechanism and describe each block in detail. Finally, we will present a description of the loss function used in this paper.
Iii-a Network Framework
To speed up the training, we design a generator which belongs to a post-upsampling framework. The generator network can be divided into four parts. The head of the generator is a Convolution-BatchNormalize-LeakyReLU block, while the body part consists of PCRDB modules, followed by upsampling blocks. Finally, the tail of the generator consists of the convolution-Tanh block. In this paper, we consider the SR task as an optimization task, so we perform pixel-level summation of the output of the tail with interpolated , which has the same spatial resolution of . Since all GANs are difficult to train, this operation reduces the difficulty of training and shortens the time consumption to obtain the desired model. The detailed architecture of the PCRBD and AUP blocks will be introduced in Section III-B and Section III-C.
The discriminator we use is the same as SRGAN , which mainly consists of a Convolution-BatchNormalize-LeakyReLU block. The generated image is fed into the discriminator together with to calculate adversarial loss to guide the generator training.
Iii-B Pyramidal Convolution in Residual-Dense Block
To introduce PCRBD, we first present the AttPConv block. AttPConv is a special convolution combining multi-scale convolution and multi-channel attention. As shown in Figure 4(a), our proposed multi-scale convolution is not simply composed of multiple parallel conventional convolutions, and we prefer to call it pyramidal convolution. The pyramidal convolution is implemented internally by grouping convolution without increasing the computational cost or the model complexity . The pyramid convolution in this paper has three different kernels, , and , corresponding to the feature maps groups of 1, 4 and 8, respectively. Each group convolution is followed by a channel attention module , and all the feature maps are concatenated together as the final output. Based on this structure, the AttPConv block has the same input and output forms as the conventional convolution.
Since the AttPConv module contains an attention module, it is used as the last convolutional module of the PCRDB module. AttPConv follows 3 more Convolution-LeakyReLU modules, as shown in Figure 3. They are connected by dense connection . The final output feature maps are obtained by element-level addition of the output of AttPConv and the input for the PCRDB module. It is worth mentioning that the residual-in-residual dense block (RRDB) proposed in ESRGAN has a structure similar to PCRDB. However, the RRDB block uses residual scaling [26, 33] to reduce the residuals by multiplying a constant between 0 and 1 before adding them to the main path to prevent instability. This constant can only be determined experimentally, which entails some additional work. We not only use convolution kernels of different size to extract feature information at different scales by the last AttPConv block in PCRDB. They also fulfill the role of scaling constant by channel attention, which is equivalent to multiplying a dynamic scaling constant, and this value is different for each feature map, thus scaling the residuals more accurately.
Iii-C Attention-Based Upsample Block
We mentioned several upsampling methods in Section II-A, and since the subpixel convolution layer  and transposed convolution present some drawbacks, we use an interpolation-based upsampling method in this paper. However, simple interpolation can cause excessive smoothing of the image; thus, we design an upsampling module based on the pixel attention, as shown in Figure 4(b).
First, we perform nearest neighbor interpolation on the input of the upsampling module to improve the spatial resolution of the feature map. It is then fed into two Convolution-LeakyReLU blocks, and the feature map of the final Convolution-LeakyReLU block is denoted as . The working process of the PA module can be described by the following Equation (1).
Here, refers to the output of the PA module and represents the convolution operation with a kernel size of . The channel number of the feature maps is reduced to after , and then the is obtained after sigmoid operation. The values of are between 0 and 1 so that the feature map can be adjusted at the element level to produce better SR results.
Iii-D Loss Function
In general, loss speeds up the training process, but it can introduce gradient explosion, and therefore the results are less robust. loss has a stable gradient and is more robust, but in contrast, the training is slower and the solution is less stable. Since and losses both exhibit their own advantages and disadvantages, we choose smooth- loss, which combines the advantages of and losses, as the pixelwise loss. The smooth- loss is calculated as:
where . , , and are the height, width and channels of , respectively.
makes the as close to the corresponding as possible at each pixel value to obtain high PSNR. However, this tends to ignore the high frequency information of the image and yield perceptually unsatisfying images. To solve this, inspired by the perceptual loss in SRGAN , we design the feature loss . The is based on the input for the last average pooling layer of the pretrained ResNet-18  model. We denote the process of obtaining the desired feature map by . The feature loss is then defined as:
We also introduced from the discriminator to assist in the training of the generator. The adversarial loss for is:
where denotes the output of generator and denotes the output of discriminator.
The total loss for the generator is:
where and are the weighting parameters for and , respectively.
For the discriminator, we use a loss function consistent with other GANs, shown as follows:
Iv Experimental Results
|S1||parking lot||S24||baseball diamond|
|S4||sparse residential||S27||circular farmland|
|S5||commercial area||S28||basketball court|
|S7||tennis court||S30||sea ice|
|S12||overpass||S35||thermal power station|
|S14||medium residential||S37||storage tank|
|S16||ground track field||S39||railway|
|S18||mobile home park||S41||palace|
|S20||rectangular farmland||S43||golf course|
We perform experiments on the widely used remote sensing image dataset NWPU-RESISC45 . This dataset contains a total of 31,500 images with pixels covering 45 scenes. NWPU-RESISC45 contains large-scale remote sensing images that vary greatly in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion. It also has high within-class diversity and between-class similarity. Therefore, it is a very challenging dataset for image SR. In this paper, we randomly divide the images of each of the scenes in the dataset into a training set, a validation set and a test set at a ratio of . All are resized to in the experiments.
Iv-B Implementation Details
We trained our network on a platform equipped with the Intel(R) Xeon(R) CPU (2.50 GHz) with 250-GB memory, an NVIDIA Tesla P100 with 16-GB memory and an AVAGO MR9361 with 2-TB capacity.
Our experiments used three scale factors: the simpler upsampling by a factor of 2, the more challenging upsampling by a factor of 4 and the more difficult upsampling by a factor of 8, which correspond with 4x, 16x, and 64x increases in image pixels, respectively. HR images are obtained from the divided dataset, then these images are downsampled by bicubic interpolation to obtain LR images , which represent the input for the generator, and all images are normalized to . The weighting parameter for adversarial loss in Equation (5) is set to and the weighting parameter for feature loss is set to .
The training process can be divided into two parts. We alternate between training the discriminator and the generator, first training the discriminator and then the generator. We first train the discriminator with . The initial learning rate of discriminator is 0.0001. The is decreased by when the training process is 25%, 50%, and 75% completed. We use Adam  with , to optimize the discriminator. Whenever a training iteration of the discriminator is completed, a generator is trained with immediately afterwards. The initial learning rate of generator is also set to 0.0001 and decreased by when the training process is completed at 25%, 50%, and 75%. For the generator optimizer, we also use Adam  with , .
For the whole training process, we have carried out more than update iterations both for the generator and the discriminator.
Iv-C Evaluation using different numbers of PCRBD blocks
As for the image super-resolution task, the generator that produces the HR images is undoubtedly the most important part. The PCRBD block and AUP block, as the main body of the generator, serve very important roles in it. The number of AUP blocks is determined by the scale factor in the experiment. We set each AUP block to upsample by a scale factor of 2 so that is 1, 2, and 3, corresponding to scale factors of 2, 4, and 8, respectively. Therefore, in this subsection, we conducted experiments on generators using different numbers of PCRDB blocks. We designed experiments with for scale factors r of 2, 4, and 8, respectively. We denote the model containing PCRDB blocks by MA-GAN(). To be fair, the PSNR and structural similarity index measure (SSIM) are used as indicators for evaluating quantitative comparison. The quantitative experimental results are shown in Table II. We also produced a graph to compare the experimental results, shown in Figure 8.
When , MA-GAN(7) reaches the highest PSNR value of 31.98 dB. The lowest value of 30.99 dB corresponds with MA-GAN(4), a difference of 0.99 dB. Additionally, the SSIM values obtained for these two models are 0.9102 and 0.8846, respectively, with a difference of 0.0256. In regard to , MA-GAN(7) reaches the highest PSNR value of 28.05 dB. The lowest is MA-GAN(5) at 27.49 dB, a difference of 0.56 dB. The highest SSIM at this point is that of MA-GAN(3) at 0.7640, while the lowest is that of MA-GAN(5) at 0.7442, a difference of 0.0198. Finally, for , MA-GAN(5) obtained the highest PSNR value of 24.58 dB, while the lowest was that of MA-GAN(6) at 23.85 dB, a difference of 0.73 dB. For SSIM, MA-GAN(4) obtained the highest value of 0.5450, and the lowest was that of MA-GAN(6) at 0.5144, a difference of 0.0306. It can also be more intuitively determined from Figure 8 that the performances of the different models do not differ much when varies. This is due to our multi-attention mechanism, which allows the model to automatically adjust the input of the corresponding feature maps to achieve better SR performance.
Iv-D Comparison with different methods
We further compare the performance of our MA-GAN with bicubic interpolation, and some advanced SR methods including SRCNN , FSRCNN , VDSR , EDSR  and DBPN , on the NWPU-RESISC45  dataset. SRCNN, FSRCNN and VDSR are the three CNN-based networks that minimize the loss while EDSR and DBPN minimize loss. All of these methods are optimized on the same dataset and environment for a fair comparison.
In Section IV-C, we reach the conclusion that there is little difference in the performance of the models when varies. Therefore, in this section, for each scale factor, we choose a better-performing MA-GAN. Specifically, for , we choose MA-GAN(7); for , we choose MA-GAN(3); and for , we choose MA-GAN(4).
Figures 5, 6, 7 show the visual comparisons between different SR methods when the scale factor , respectively. In these figures, we have selected only part of the for magnified display. The selected image area marked with red boxes is rich in texture information and is difficult to recover. In this way, we can observe the performances of different methods. In our experiments, EDSR did not successfully learn the mapping of LR images to HR images, and the results were less than satisfactory. From the figure, we can also determine that although our method can achieve the relative best results, there is still a gap between the and generated by our model, mainly the difficulty of recovering those tiny textural features.
Tables III, IV, and V show the quantitative comparisons when the scale factor , respectively. We have compared the super-resolution performances of the different models for each of the image scenes in the NWPU-RESISC45 dataset to obtain a more detailed comparison. The highest PSNR values for each scene are marked in red, and the highest SSIM is marked in blue.
For (Table III), MA-GAN(7) obtains an average PSNR of 31.98 dB and an average SSIM of 0.9102, both of which rank first among all the methods. The PSNR of MA-GAN(7) is 1.43 dB higher than the second-best value, while the SSIM is 0.0005 higher. In terms of PSNR, MA-GAN(7) outperforms almost all of the scenes and is ranked first in SSIM for nearly half of the scenes. With (Table IV), MA-GAN(3) performs best with an average PSNR of 27.62 dB and an average SSIM of 0.7640. The PSNR of MA-GAN(3) is 1.18 dB higher than the second-best value, while the SSIM is 0.0044 higher. The PSNR value of MA-GAN(3) is lower than those of the other comparison methods in 7 scenes. It obtains the highest SSIM value for 34 scenes and is lower than only DBPN in all other 11 scenes. The last one is the case of (Table V), where MA-GAN(4) achieves the highest average PSNR of 24.54 dB and the average SSIM of 0.5450. The PSNR of MA-GAN(4) is 0.98 dB higher than the second-best result, while the SSIM is 0.0063 higher. MA-GAN(4) performs better in PSNR than almost all other methods and is not ranked first in only 4 scenes. With respect to SSIM, MA-GAM(7) performs better on 29 scenes.
In summary, our MA-GAN outperforms all the other methods for , respectively. In particular, MA-GAN is basically more than 1 dB higher than the second-best results among all methods in terms of PSNR values, with a maximum difference of 1.43 dB. However, the lead is not as large in SSIM. The result is only 0.0005 higher than DBPN at
. This is probably because the SR task of upsampling by a scale factor of 2 is simpler and the gap is not significant in this aspect. Whenincreases, the advantage of MA-GAN in SSIM becomes more obvious, with a maximum improvement of 0.0063 over the second-place condition (). Overall, although there are still some unsatisfactory aspects, our MA-GAN still performs the best.
In this paper, we present a GAN-based SR network named the multi-attention-GAN that correctly learns the mapping from LR to HR images to generate perceptually pleasing HR images. Specifically, we first designed a GAN-based framework for the image SR task. The key to accomplishing the SR task is the image generator with post-upsampling that we designed. The main body of the generator contains two blocks; one is the PCRDB block, and the other is the AUP block. The AttPConv in the PCRDB block is a module that combines multi-scale convolution and channel attention to automatically learn and adjust the scaling of the residuals for better results. The AUP block is a module that combines pixel attention to perform arbitrary multiples of upsampling. These two blocks work together to help generate better quality images. For the loss function, we design a loss function based on pixel loss and introduce both adversarial loss and feature loss to guide the generator learning. Finally, it is demonstrated by our experiments that our proposed MA-GAN can perform better than some state-of-the-art SR methods.
Our MA-GAN still encounters difficulty in generating tiny textures, which is a problem for all super-resolution algorithms, and we will continue to work on this issue in the future. Additionally, inspired by the satisfactory performance of our model, we will attempt to integrate our SR models into other vision tasks to help improve the results.
-  (2018) Sod-mtgan: small object detection via multi-task generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 206–221. Cited by: §I.
Real-time video super-resolution with spatio-temporal networks and motion compensation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787. Cited by: §I.
Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE 105 (10), pp. 1865–1883. Cited by: §IV-A, §IV-D.
-  (2020) ResLap: generating high-resolution climate prediction through image super-resolution. IEEE Access 8, pp. 39623–39634. Cited by: §I.
-  (2014) Learning a deep convolutional network for image super-resolution. In European conference on computer vision, pp. 184–199. Cited by: §I, §II-B, §IV-D.
-  (2015) Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38 (2), pp. 295–307. Cited by: §II-B.
-  (2016) Accelerating the super-resolution convolutional neural network. In European conference on computer vision, pp. 391–407. Cited by: §IV-D.
-  (2016) Hyperspectral image super-resolution via non-negative structured sparse representation. IEEE Transactions on Image Processing 25 (5), pp. 2337–2352. Cited by: §I.
-  (1979) Lanczos filtering in one and two dimensions. Journal of applied meteorology 18 (8), pp. 1016–1022. Cited by: §I.
-  (2020) Pyramidal convolution: rethinking convolutional neural networks for visual recognition. arXiv preprint arXiv:2006.11538. Cited by: §III-B.
-  (2018) Monitoring the environmental risks around medinet habu and ramesseum temple at west luxor, egypt, using remote sensing and gis techniques. Journal of Archaeological Method and Theory 25 (2), pp. 587–610. Cited by: §I.
-  (2015) Texture synthesis using convolutional neural networks. In Advances in neural information processing systems, pp. 262–270. Cited by: §II-C.
-  (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §I.
-  (2009) Super-resolution in medical imaging. The computer journal 52 (1), pp. 43–63. Cited by: §I.
-  (2018) Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673. Cited by: §I, §II-A, §II-B, §IV-D.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §I, §I, §III-D.
-  (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §I, §I, §III-B.
-  (2017) Simultaneous super-resolution and cross-modality synthesis of 3d medical images using weakly-supervised joint convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6070–6079. Cited by: §I.
-  (2015) Super resolution techniques for medical image processing. In 2015 International Conference on Technologies for Sustainable Development (ICTSD), pp. 1–6. Cited by: §I.
-  (1981) Cubic convolution interpolation for digital image processing. IEEE transactions on acoustics, speech, and signal processing 29 (6), pp. 1153–1160. Cited by: §I.
-  (2016) Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654. Cited by: §I, §IV-D.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-B.
-  (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 624–632. Cited by: §II-B.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. Cited by: §II-B, §II-C, §III-A, §III-D.
-  (2016) Video super-resolution using an adaptive superpixel-guided auto-regressive model. Pattern Recognition 51, pp. 59–71. Cited by: §I.
-  (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144. Cited by: §I, §II-B, §III-B, §IV-D.
-  (2017) Change detection in heterogenous remote sensing images via homogeneous pixel transformation. IEEE Transactions on Image Processing 27 (4), pp. 1822–1834. Cited by: §I.
-  (2013) Image super-resolution via double sparsity regularized manifold learning. IEEE transactions on circuits and systems for video technology 23 (12), pp. 2022–2033. Cited by: §I.
Convolutional neural network super resolution for face recognition in surveillance monitoring. In International conference on articulated motion and deformable objects, pp. 175–184. Cited by: §I.
-  (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883. Cited by: §II-A, §III-C.
-  (2017) Can a machine generate humanlike language descriptions for a remote sensing image?. IEEE Transactions on Geoscience and Remote Sensing 55 (6), pp. 3623–3634. Cited by: §I.
-  (2008) Image super-resolution using gradient profile prior. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: §I.
Inception-v4, inception-resnet and the impact of residual connections on learning. In
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. Cited by: §III-B.
-  (2010) Super resolution using edge prior and single image detail synthesis. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 2400–2407. Cited by: §I.
-  (2001) Super-resolution target identification from remotely sensed images using a hopfield neural network. IEEE Transactions on Geoscience and Remote Sensing 39 (4), pp. 781–796. Cited by: §I.
-  (2018) Esrgan: enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. Cited by: §I.
-  (2010) Exploiting self-similarities for single frame super-resolution. In Asian conference on computer vision, pp. 497–510. Cited by: §I.
-  (2020) E-dbpn: enhanced deep back-projection networks for remote sensing scene image superresolution. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I, §I, §II-B.
-  (2018) Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 701–710. Cited by: §II-C.
-  (2020) Remote sensing image super-resolution via mixed high-order attention network. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
-  (2020) Scene-adaptive remote sensing image super-resolution using a multiscale attention network. IEEE Transactions on Geoscience and Remote Sensing. Cited by: §I.
-  (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301. Cited by: §I, §III-B.
-  (2020) Efficient image super-resolution using pixel attention. arXiv preprint arXiv:2010.01073. Cited by: §I.
-  (2011) Very low resolution face recognition problem. IEEE Transactions on image processing 21 (1), pp. 327–340. Cited by: §I.
-  (2016) Ship detection in spaceborne optical image with svd networks. IEEE Transactions on Geoscience and Remote Sensing 54 (10), pp. 5832–5845. Cited by: §I.