Single image super-resolution (SISR) refers to estimate a high-resolution (HR) image from a single low-resolution (LR) observation, which is of great significance to many image processing and analysis systems. However, the SISR problem is very challenging duo to the ill-posed condition. In other words, a LR image corresponds to a set of HR images, while most of them are not expected. In general, the reconstructed HR image should be visually pleasant and close to the real one as much as possible.
The SISR problem has been widely researched over the past 20 years and plenty of algorithms have been proposed. Roughly speaking, interpolation-based[1, 2, 3, 4, 5, 6, 7, 8, 9], reconstruction-based [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], and learning-based [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43] algorithms are the three main classes of SISR methods. Generally, the interpolation-based super-resolution (SR) approaches estimate unknown HR pixels using their neighborhoods (the known LR pixels) according to local structure properties. For the reconstruction-based methods, the observation model of LR image and prior knowledge of HR image are integrated to formulate an energy function, and thus the SR task can be converted to an optimization problem. The prior knowledge, which greatly affects SR performance, is the research focus for this kind of methods. The commonly used priors include gradient [10, 11, 12], sparsity [13, 14, 15], nonlocal self-similarity [14, 15, 16, 17, 18, 19, 20], etc. Many reconstruction-based SR methods use two or more priors to combine their complementary properties. The pre-trained mapping between LR images and HR images is usually adopted to guide the SR process in learning-based methods. According to the core of learning-based methods, it can be roughly divided into the following five subclasses further, i.e., neighbor embedding-based [21, 22, 23], example-based [24, 25, 26, 27], sparse coding-based [28, 29, 30, 31, 32], regression-based [33, 34, 35, 36]
, and deep learning-based[37, 38, 39, 40, 41, 42, 43]. With fast execution speed and outstanding restoration quality, deep learning-based methods show great potential for SR problem. Meanwhile, some researchers attempted to combine different kinds of SR methods, thus integrating their merits [44, 45].
In some practical applications, such as mobile communication and internet, limited by storage capacity and transmission bandwidth, images and videos are generally downsampled and compressed to reduce data volume. In these cases, the observations usually suffer from both of the downsampling and compression degradations, which makes the SR problem more difficult. Although much research has been done on SISR problem and plenty of effective SISR methods have been proposed over the past few decades, few methods were concerned with compressed images [46, 47, 48, 49, 50, 51]. Roughly, there are two kinds of frameworks for compressed images SR. Some researchers converted this task to an optimization problem via compression process modeling and prior knowledge regularization. In SRCDFOE , the compression distortion is seen as the spatially correlated Gaussian noise, and the Markov random field and total variation are used to regularize the estimated HR images. To realize decompression and SR simultaneously, the DCSRMOTV  incorporates multi-order total variation model into JPEG image acquisition model. For this type of methods, the main pain point is how to realize the accurate modeling of compression process. In addition, it is difficult to balance compression artifacts reduction (CAR) and details preservation. Another commonly used strategy is to decompose this task into two subproblems (i.e., CAR and SR) and use a cascading framework to address them. For example, Xiong et al.  combined adaptive regularization and learning-based SR to reduce compression noise and compensate details, respectively. Kang et al.  proposed a sparse coding-based SR method for compressed images, in which the patches with/without compression artifacts are processed differently. Using a denoised training dataset, Lee et al.  presented a dual-learning-based algorithm for compression noise reduction and SR. More recently, Zhao et al.  constructed a three-steps-process framework for compressed images SR, which is composed of BM3D filtering-based compression noise reduction, local encoding-based patch classification, and mapping-based reconstruction. However, for most of this kind of algorithms, the compression noise reduction and upsampling are treated as two independent stages. Consequently, the resultant images of existing methods are apt to still contain compression noise or be over smoothed. On the whole, the research on SR of compressed images is lacking and there is still much room for performance improvement.
The core issue of compressed images SR is how to reduce compression noise and preserve details as much as possible when enhancing image resolution. On the one hand, it is hard to remove compression artifacts in super-resolved images without a CAR or denoising stage. What is worse, the compression noise in LR may be significantly magnified in HR. On the other hand, the CAR and SR operations should not be separated as part of the details removed in CAR stage are useful for SR. On the basis of the above insights, an end-to-end trainable deep convolutional neural network (CNN) is designed to perform SR on compressed images, and we name it CISRDCNN. The CISRDCNN takes the compressed LR image as input and outputs the resultant HR image directly, without any preprocessing or postprocessing stage. Fig. 1 gives an example of the result of CISRDCNN, and we can see that our result is much more visually pleasant than the LR input and the resultant image of Bicubic interpolation. The framework of the proposed CISRDCNN is illustrated in Fig. 2, and our contributions in this work are mainly in the following aspects:
We propose a deep CNN-based SR framework for compressed images, which reduces compression artifacts and enhances image resolution simultaneously.
To preserve the functions of different modules in CISRDCNN and achieve joint optimization of CAR and SR, a special strategy is used to train the proposed network, i.e., individual training and joint optimization.
Extensive experiments show that the proposed CISRDCNN achieves outstanding SR performance on simulation experiments as well as the test on real low quality web images.
We explore the application of the proposed CISRDCNN in low bit-rate image coding, and the experimental results demonstrate that it can improve the rate-distortion performance of JPEG in a wide coding bit-rate range.
The proposed method can be easily extended to other compression standards, such as JPEG 2000, H.264, HEVC, etc. In addition, this work provides some insights on the SR of low quality LR images (e.g., noisy and blurry), and more attention would be attracted to concern this kind of problems.
The rest of this paper is organized as follows. Section 2 briefly reviews related works. Section 3 presents the proposed CISRDCNN. Extensive experiments are shown in Section 4. Finally, Section 5 concludes this paper.
2 Related work
2.1 Problem formulation of compressed image SR
Let and be HR and LR images, respectively. Then the conventional SR problem can be formulated as
where represents the synthetic operator of downsampling and blurring, and denotes the additive noise. The aim of SR is to obtain a high quality estimation of from .
In this work, we address the problem of compressed image SR. Therefore, the compression process should be concerned. Let be the composite operator of compression and decompression, and thus the Eq. 1 is changed to
where is the compressed LR image. Note that we neglect the additive noise in this work as our focus is compression artifacts. Correspondingly, estimating from the compressed LR observation is our goal. For convenience and clarity of representation, we still let , so we have . Hence, only suffers from blurring and downsampling, but suffers from blurring, downsampling, and compression degradations. The compression process would cause information lost, especially at high compression ratio. Intuitively, it is much harder to recover from than from . Therefore, we try to use the intermediate observation to assist the reconstruction process in the proposed CISRDCNN. Note that the intermediate observation just exists in training phase, while the only input in testing phase is .
Many compression methods have been proposed for still images, nevertheless, JPEG still remains one of the most widely used standards. Hence, in this work, we take the JPEG as an example to test the performance of the proposed CISRDCNN.
2.2 Deep neural networks for image SR
Deep neural networks have been widely used to address image restoration problems, including SR [37, 38, 39, 40, 41, 42, 43, 52, 53], denoising [53, 52], CAR [52, 53, 54, 55], deblurring , dehazing , etc. In this section, we review some relevant deep neural networks-based SR methods.
In , Dong et al. proposed a CNN-based SR framework (SRCNN), which is composed of three convolutional layers. The three layers realize patch extraction, non-linear mapping, and reconstruction, respectively. The SRCNN has drawn wide attention for its excellent performance and simple network architecture. Later on, Dong et al.  presented an accelerated version of SRCNN, which is named FSRCNN. The FSRCNN incorporates the upsampling operation into the network and has a hourglass-shape structure, thus achieving remarkable restoration quality and fast execution speed. By contrast, the SRCNN and FSRCNN are relatively shallow. Kim et al. 
designed a 20-layer CNN (VDSR), which produces great performance enhancement over SRCNN. For more stable training and better performance, residual-learning and gradient clipping are used in VDSR. More recently, Zhang et al. proposed a similar network (DnCNN), which combines more advances on deep learning, including Residual Learning 59]60]. The DnCNN shows great effectiveness in several general image denoising problems, including SR and CAR. Overall, the deep neural networks-based SR methods always result in compelling performance, and most of them are efficient in testing phase.
To the best of our knowledge, nevertheless, very few research has been done on deep neural networks-based SR methods for compressed images. The aim of this work is to propose an effective SR method for compressed images using the considerable advances on deep CNN. We try to design a deep network that realizes CAR and SR jointly.
2.3 Advances on deep neural networks
In recent years, many advances have been achieved on deep learning. In the following, we introduce some representative achievements related to this work, i.e., residual learning , batch normalization , and ReLU .
2.3.1 Residual learning
In , He et al. firstly proposed the residual learning strategy to address the performance degradation problem caused by the increase of network depth. The main assumption of residual learning is that the learning of residual mapping is much easier than the original mapping. With the residual learning framework, deeper network can be designed and well trained, thus achieving better performance. Actually, similar idea has also been used in many learning-based SR methods, in which the residual image between ground truth and initialization (generally, the interpolated image) is predicted, e.g., ScSR , ANR , and A+ .
2.3.2 Batch normalization
In order to ease the internal covariate shift, Ioffe et al.  presented the batch normalization. Internal covariate shift refers to the change in the distribution of each layer’s input, which slows down the training process and makes it harder. To address this problem, Ioffe et al. proposed to normalize layer inputs. More specifically, in each layer, a normalization step and a scale and shift step are incorporated before the nonlinearity. To realize batch normalization, two extra parameters are added per activation, and these parameters can be learned in network training stage. The batch normalization brings a lot of benefits, such as strong robustness to initialization, fewer training steps, and better performance.
ReLU is a commonly used activation function in deep neural networks, which outputsfor non-positive inputs and retains positive inputs . The definition of ReLU is:
The ReLU alleviates the gradient vanishing problem to some extent, thus making the training of deep neural networks easier.
For fast and stable training procedure and excellent restoration quality, the proposed CISRDCNN integrates residual learning, batch normalization, and ReLU. Unlike the DnCNN  that employs a single residual unit, the CISRDCNN uses two residual units due to the fact that the input and output of CISRDCNN are different in resolution. More details about CISRDCNN will be introduced in Section 3.
3 The proposed CISRDCNN
In this section, we present the CISRDCNN in detail. As illustrated in Fig. 2, the proposed CISRDCNN consists of three modules: deblocking module (DBCNN), upsampling module (USCNN), and quality enhancement module (QECNN). Firstly, the DBCNN removes compression artifacts in input and generates a better input for USCNN. Secondly, the USCNN magnifies its input to expected resolution, and thus no extra interpolation procedure is needed. Finally, the QECNN is integrated to improve the quality of upsampled image further. Although the three modules have their respective functions, they are not independent and the whole network is end-to-end trainable. Overall, the differences between CISRDCNN and relevant CNN-based SR methods (e.g., SRCNN , FSRCNN , VDSR , DnCNN ) are mainly in the following aspects:
The CISRDCNN is for compressed images, and thus compression noise is taken into consideration carefully. Nevertheless, most of the learning-based SR methods do not apply to noisy images.
The CISRDCNN is composed of three functional modules, however, it is end-to-end trainable. In this way, the functions of the three modules can be preserved to some extent; meanwhile, the whole network can be optimized to produce minimum prediction error.
The CISRDCNN is trained in a particular way. It firstly trains the three functional modules separately to achieve their respective goals, then optimizes the whole network jointly with the fine-tuning strategy.
In CISRDCNN, these specific design and improvements for compressed images enable more accurate estimation of the ground truth. In the following, more details about the architecture and training strategy of CISRDCNN are presented.
3.1 Network architecture
For the convenience of representation, the depths of DBCNN, USCNN, and QECNN are denoted as , , and , respectively.
The DBCNN is composed of two types of layers. The first convolutional layers use filters of size , and the batch normalization and ReLU are placed behind these convolutional layers as . The last layer (the -th layer) generates the restored image using filter of size . Since the input and output of DBCNN are very similar, learning the residual image is more suitable. Hence, we adopt residual learning strategy in this module. More specifically, an identity connection is used to pass the input of DBCNN to its output. Note that all of the batch normalization and ReLU are not presented in Fig. 2 for brevity.
The first convolutional layers of USCNN are the same, using filters of size and followed by batch normalization and ReLU. The last layer is a deconvolutional layer, which performs upsampling operation. The deconvolutional layer produces one upsampled image using filter of size .
The architecture of QECNN is similar to DBCNN. Therefore, we do not introduce the QECNN in detail to avoid redundancy.
3.2 Network training strategy
Let be the training image pairs. As introduced in Section 2.1, denotes a HR sample, denotes the corresponding LR sample that only suffers from blurring and downsampling, and represents the compressed version of .
As shown in Fig. 3, the training of CISRDCNN is mainly composed of four steps. Firstly, the set is used to train the deblocking network DBCNN. As we adopt the residual learning strategy, our goal is to learn a residual mapping that predicts the residual image
. Consequently, the loss function of DBCNN is defined as
where denotes the trainable parameter set in DBCNN, and .
Secondly, we train the upsampling network USCNN. Once the training of DBCNN is finished, we can get the estimation of (denoted as ) from its compressed observation . The training set for USCNN is . That is, USCNN aims to learn a function that maps to . Formally, the loss function of USCNN is defined as
where denotes the trainable parameter set in USCNN.
Thirdly, we train the quality enhancement network QECNN. Similarly, the HR version of is estimated using the learned DBCNN and USCNN, and the estimation is denoted as . Correspondingly, the training set for QECNN is . In QECNN, we also adopt the residual learning, and thus the goal is to learn a residual mapping that predicts the residual image . Hence, the loss function of QECNN is defined as
where denotes the trainable parameter set in QECNN, and .
Finally, the CISRDCNN is optimized in an end-to-end manner. The learned parameters of DBCNN, USCNN, and QECNN are used to initialize CISRDCNN firstly, and then we use the training sample set to optimize the whole network with the fine-tuning strategy. The loss function for the end-to-end optimization procedure is defined as
where denotes the trainable parameter set in CISRDCNN.
In CISRDCNN, the three modules are with specific functions, i.e., deblocking, upsampling, and quality enhancement. With the above training strategy, the goal of each module can be achieved, while the final joint optimization procedure minimizes prediction error. On the other hand, training a deep network directly is hard. Initializing the deep network with learned parameters is beneficial to obtaining stable training procedure and fast convergence speed. Note that for different compression quality factors (QFs), the networks can also be trained from the learned model with the fine-tuning strategy, rather than training from scratch.
|Quality Factor = 10|
|CONCOLOR-VDSR [61, 41]||23.454||28.292||26.918||27.323||24.397||22.336||29.555||26.801||29.795||24.202||26.307|
|ARCNN-VDSR [54, 41]||24.115||28.388||27.153||27.526||24.586||23.222||29.602||26.668||29.846||24.748||26.585|
|Quality Factor = 20|
|CONCOLOR-VDSR [61, 41]||24.628||29.344||28.093||28.804||25.779||23.969||31.683||28.109||31.415||25.941||27.777|
|ARCNN-VDSR [54, 41]||25.429||29.536||28.531||28.990||26.235||24.951||31.647||28.241||31.652||26.470||28.168|
|Quality Factor = 30|
|CONCOLOR-VDSR [61, 41]||25.316||29.992||29.085||29.630||26.455||24.972||32.405||28.975||32.252||27.091||28.617|
|ARCNN-VDSR [54, 41]||26.257||29.816||29.518||30.221||27.048||26.080||32.586||29.376||32.663||27.637||29.120|
PSNR (dB) scores of different methods on Set10 (SR factor: 2, QF: 10/20/30).
|Quality Factor = 10|
|CONCOLOR-VDSR [61, 41]||0.7422||0.8057||0.8255||0.8317||0.7705||0.7549||0.8486||0.8556||0.8118||0.8697||0.8116|
|ARCNN-VDSR [54, 41]||0.7494||0.8050||0.8278||0.8317||0.7678||0.8023||0.8477||0.8466||0.8089||0.8720||0.8159|
|Quality Factor = 20|
|CONCOLOR-VDSR [61, 41]||0.7835||0.8301||0.8535||0.8634||0.8159||0.8166||0.8863||0.8815||0.8369||0.9036||0.8471|
|ARCNN-VDSR [54, 41]||0.7902||0.8370||0.8548||0.8616||0.8186||0.8550||0.8795||0.8762||0.8365||0.9046||0.8514|
|Quality Factor = 30|
|CONCOLOR-VDSR [61, 41]||0.8047||0.8507||0.8734||0.8823||0.8359||0.8505||0.8979||0.8960||0.8490||0.9257||0.8666|
|ARCNN-VDSR [54, 41]||0.8165||0.8509||0.8748||0.8853||0.8412||0.8840||0.8948||0.8961||0.8504||0.9249||0.8719|
SSIM scores of different methods on Set10 (SR factor: 2, QF: 10/20/30).
|Quality Factor = 10|
|CONCOLOR-VDSR [61, 41]||2.115||1.268||1.298||1.401||2.222||2.555||1.345||1.626||1.310||1.716||1.686|
|ARCNN-VDSR [54, 41]||1.982||1.183||1.211||1.347||2.183||2.676||1.229||1.540||1.247||1.696||1.629|
|Quality Factor = 20|
|CONCOLOR-VDSR [61, 41]||2.710||1.627||1.683||1.873||2.889||3.320||1.831||2.085||1.725||2.279||2.202|
|ARCNN-VDSR [54, 41]||2.562||1.533||1.564||1.770||2.907||3.340||1.620||2.020||1.653||2.206||2.118|
|Quality Factor = 30|
|CONCOLOR-VDSR [61, 41]||3.096||1.905||2.023||2.201||3.297||3.810||2.042||2.400||2.003||2.686||2.546|
|ARCNN-VDSR [54, 41]||3.006||1.815||1.917||2.171||3.363||3.885||1.893||2.400||1.993||2.628||2.507|
IFC scores of different methods on Set10 (SR factor: 2, QF: 10/20/30).
4 Experimental results
The experimental settings are introduced firstly, and then extensive results are presented to verify the effectiveness of CISRDCNN in this section, including the test on real low quality web images. In addition, we take the low bit-rate coding as an example to show the application of the proposed CISRDCNN.
4.1 Experimental settings
Main parameters of CISRDCNN: in our implementation, we set , , and .
Training data: following , the 291 imageset that consists of 200 images from BSDS500 111Available: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/
grouping/segbench . and 91 images from Yang et al.  is used to train CISRDCNN. To increase the number of samples and improve SR performance, we also adopt data augmentation techniques. To generate LR observations, these HR images are downsampled using the imresize function (kernel: bicubic, downsampling factor: 2) in Matlab firstly, and then the downsampled images are compressed using JPEG.
Test images: Fig. 4 shows the ten test images (named Set10) used in our experiment, which are widely used to evaluate SR methods in literature. For color images, only the luminance components are processed.
Degradation model: for the simulation experiments, the original HR image is downsampled using the imresize function (kernel: bicubic, downsampling factor: 2) in Matlab firstly, and then the downsampled image is compressed using JPEG with different QFs.
Comparison baselines: the comparison baselines include Bicubic, A+ , FSRCNN , VDSR , CONCOLOR-VDSR [61, 41], ARCNN-VDSR [54, 41]222ARCNN  and CONCOLOR  are typical and effective compression artifacts reduction methods., SRCDFOE , and LJSRDB . For A+ , FSRCNN , and VDSR , we retrained their models according to our experimental settings. The CONCOLOR-VDSR [61, 41] and ARCNN-VDSR [54, 41] are cascading methods, which consist of the state-of-the-arts of deblocking and SR methods. The SRCDFOE  and LJSRDB  are two SR algorithms for JPEG compressed images.
Performance evaluation: resultant images of different methods are evaluated objectively and subjectively. For the simulation experiments, the PSNR, SSIM , and IFC  are adopted to perform objective evaluation. For the SR of real world compressed images, we use the no-reference quality metric for SR proposed in  to evaluate results objectively.
4.2 Super-resolution results on synthetic LR images
4.2.1 Objective evaluation
Due to the limited space, we only present the objective scores of different methods at QF = 10/20/30 in this subsection. It can be seen from the results reported in Table 1, Table 2, and Table 3 that the CISRDCNN consistently produces the highest PSNR/SSIM/IFC values. Overall, the VDSR  generates the second-best results. The FSRCNN  and ARCNN-VDSR [41, 54] achieve similar performance, and both of them are slightly inferior to the VDSR . The A+ , CONCOLOR-VDSR [41, 61] and SRCDFOE  are superior to Bicubic, but the gains are limited to some extent. Compared with Bicubic, the LJSRDB  produces worse results in some cases. For A+ , the severe compression noise in LR images causes significant performance degradation as it is sensitive to noise. The SRCDFOE  and LJSRDB  are unified frameworks for JPEG compressed images, however, they do not handle compression noise well. By contrast, more obvious improvement is produced by the proposed CISRDCNN. For example, at QF = 10, the CISRDCNN achieves average 1.977 dB/0.0911/0.694 PSNR/SSIM/IFC gains over Bicubic, and 0.577 dB/0.0135/0.205 over VDSR . Note that the VDSR  is one of the state-of-the-art SR methods. Compared with the SR methods for JPEG compressed images, i.e., the SRCDFOE  and LJSRDB , the average PSNR/SSIM/IFC gains are up to 1.411 dB/0.0474/0.501 and 1.871 dB/0.0751/0.636, respectively. Similar results can be observed at QF = 20 and QF = 30. In sum, the CISRDCNN achieves state-of-the-art performance.
4.2.2 Subjective evaluation
Part of the resultant images are presented to compare visual quality. To comprehensively show the performance of all methods, we deliberately illustrate the results at different QFs. Specifically, Fig. 5 shows the results of Butterfly at QF = 10. Fig. 6 shows the results of Ppt3 at QF = 20. Fig. 7 shows the results of House at QF = 30. For better view and comparison, two local regions are highlighted in each figure. The results of Bicubic, A+ , FSRCNN , SRCDFOE , and LJSRDB  contain obvious artifacts, especially at low QFs. The VDSR , CONCOLOR-VDSR [61, 41], ARCNN-VDSR [54, 41] can remove most of the compression artifacts, nevertheless, the results are blurred somewhat. Comparatively, the results of CISRDCNN are more visually pleasant, with fewer artifacts and clearer structures. For instance, the text in image Ppt3 (Fig. 6) and the eave in image House (Fig. 7). In sum, benefitting from the strong ability of deep CNN and the specific design for compressed images, the CISRDCNN realizes joint optimization of compression noise reduction process and SR process, thus leading to state-of-the-art performance.
The results in this subsection provide some insights for the further research on compressed images SR. The comparison set in this experiment is composed of different kinds of methods, including conventional SR method (A+ , FSRCNN , VDSR ), cascading SR method (CONCOLOR-VDSR [61, 41], ARCNN-VDSR [54, 41]), unified SR framework (SRCDFOE , LJSRDB ), and joint optimized SR method (CISRDCNN). According to their performance, we can obtain the following conclusion: the CAR stage is necessary, but it should not be independent of the SR stage. The CAR stage is beneficial to reducing compression artifacts, however, it is hard to control the degree of artifacts reduction. Therefore, joint optimization of CAR and SR is significant. These insights may also apply to the SR of noisy images and blurred images, which will be studied in our future work.
4.3 Robustness to quality factors
In this subsection, the robustness of CISRDCNN to compression QFs is tested. To conduct this experiment, a series of CNN models are trained at different QFs. Fig. 8 presents the average PSNR gains of CISRDCNN over Bicubic at different QFs on Set10. It can be observed that CISRDCNN achieves obvious PSNR gain in a wide range of QFs, even at low compression ratio. Hence, the CISRDCNN is robust to QFs, and it applies to compressed images in different quality.
4.4 Experimental results on image datasets
In order to evaluate the stability and robustness of CISRDCNN on different kinds of images, we conduct experiments on four standard imagesets, including Set5 , Set14 , B100 footnote1, and Urban100 . For the images in B100, we crop them to generate test images of size . Similarly, the images in Urban100 are cropped to generate small test images of size . Due to the limited space, we only take QF = 10 as an example in this experiment, and the Bicubic, VDSR , ARCNN-VDSR [54, 41], and SRCDFOE  are selected as baselines. The average PSNR/SSIM/IFC results are reported in Table 4. It can be observed that the CISRDCNN consistently outperforms all of the compared baselines.
We further draw the distributions of PSNR/SSIM/IFC gains of CISRDCNN over the baselines in Fig. 9. One can easily see that the CISRDCNN outperforms competitors for most of the test images in the four commonly used imagesets. The results shown in this subsection demonstrate the robustness and stability of CISRDCNN.
|ARCNN-VDSR [54, 41]||27.827||25.856||24.899||23.929|
|ARCNN-VDSR [54, 41]||0.7878||0.6803||0.6153||0.6774|
|ARCNN-VDSR [54, 41]||1.398||1.240||0.983||1.510|
Comparisons of average PSNR (dB)/SSIM/IFC scores on datasets (SR factor: 2, QF: 10).
4.5 Empirical study on computational time
In this subsection, we compare the running time and PSNR of different methods (QF = 10). This experiment is conducted on a desktop computer (Win7, Inter Core i5 CPU 3.3GHz, 12G memory, Matlab 2014a 64bit) 333The LJSRDB  is running on another computer as the code only can run in Matlab 32bit version, so we do not present the running time of this algorithm in Fig. 10. . The running time and PSNR of each method are average values of all the ten test images in Fig. 4. As depicted in Fig. 10, the proposed CISRDCNN achieves state-of-the-art performance with acceptable computational time444It is important to note that we use the Matlab test code of FSRCNN (available: http://mmlab.ie.cuhk.edu.hk/projects/FSRCNN.html), which is much slower the implementation used in .. In addition, the execution time of CISRDCNN can be greatly accelerated with a powerful GPU.
4.6 Super-resolution on real low quality web images
We further test the effectiveness of CISRDCNN on real low quality web images, which usually suffer from downsampling and compression due to the limited bandwidth and storage capacity. The test images used in this experiment are downloaded from internet 555Available: http://image.baidu.com .. As presented in Fig. 11, we can observe that the CISRDCNN achieves obvious perceptual quality enhancement over the original images and the interpolation results of Bicubic, with fewer artifacts and clearer structures.
Further, the no-reference image quality evaluation index for SR proposed in  is used to quantitatively compare these resultant images, and the scores are illustrated in Table 5. It can be seen that the CISRDCNN generates higher values than Bicubic on all of the three test images, which also indicates that the resultant images of CISRDCNN are of higher quality. The results in this subsection verify that the proposed CISRDCNN is also applicable to the compressed image in the real world.
No-reference image quality assessment on the SR results of low quality web images using the evaluation metric for SR proposed in (SR factor: 2).
4.7 Application in low bit-rate image coding
At low bit-rates, the existing compression methods (e.g., JPEG and JPEG 2000) always produce visually unpleasant compression artifacts. In this subsection, we take the JPEG as an example to show how to use the proposed CISRDCNN to construct a low bit-rate coding framework (CISRDCNN-LBRC), thus enhancing the rate-distortion performance of JPEG. The starting point is to reduce data volume but preserve main structure of the original image via placing a downsampling operator before JPEG encoder. Correspondingly, the CISRDCNN module is placed behind the JPEG decoder to perform upsampling. As shown in Fig. 12, the presented CISRDCNN-LBRC consists of four parts: downsampling, JPEG encoder, JPEG decoder, and CISRDCNN.
The test images Butterfly, Woman, Circuit, Leaves, Foreman, and Peppers are selected as examples to test the effectiveness of CISRDCNN-LBRC. Note that we use the luminance components of the six test images to conduct experiments in this subsection. For a fair comparison, the JPEG is used as the baseline in this experiment. The rate-distortion curves of JPEG and CISRDCNN-LBRC are presented in Figs. 13. It can be seen that the rate-distortion performance CISRDCNN-LBRC is obviously superior to the JPEG at low bit-rates. From another point of view, the CISRDCNN-LBRC can save lots of coding bits.
To compare the perceptual quality of the decoded images, we show the results of CISRDCNN-LBRC and JPEG at different bit-rates. Due to the limited space, only the results of Woman, Circuit, Leaves, and Foreman are presented in Fig. 14 to Fig. 17. We can observe that the CISRDCNN-LBRC generates fewer artifacts and preserves main structures better. For instance, the fingers in image Woman (Fig. 14) and the collar in image Foreman (Fig. 17). Overall, the CISRDCNN-LBRC performs better than JPEG at low bit-rates in terms of both objective and subjective evaluation.
In this paper we propose a SR algorithm for compressed images. Unlike the existing SR methods for compressed images, we treat this task as two relevant subproblems, i.e., CAR and SR. Further, a deep network is designed to realize joint optimization of the two subproblems. We take the compression standard JPEG as an example to test the effectiveness of the proposed CISRDCNN, and experiments on both synthetic images and real low quality web images show that it produces state-of-the-art SR results. Moreover, we show the application of the proposed SR method in low bit-rate image coding, which improves the rate-distortion performance of JPEG. Intuitively, the proposed SR method and the low bit-rate coding framework can be easily extended to other image and video compression standards, e.g., JPEG 2000, H.264, and HEVC. In addition, this work provides some insights on the SR of low quality LR images (e.g., noisy and blurry), which will attract other researchers to concern this kind of problems.
However, due to the high complexity of training process and the lake of high performance computing devices, the parameters of the proposed framework are not well optimized, such as the number of layers and filters, the size of kernels, etc. In future, we will study on the settings of main parameters, which may lead to better performance and lower complexity.
Funding: this work was supported by the National Natural Science Foundation of China [grant number 61471248]; and the National Postdoctoral Program for Innovative Talents of China [grant number BX201700163]; and the Post-Doctoral Research and Development Foundation of Sichuan University [grant number 2017SCU12003].
-  X. Li, M. T. Orchard, New edge-directed interpolation, IEEE Trans. Image Process. 10 (10) (2001) 1521-1527.
X. Zhang, X. Wu, Image interpolation by adaptive 2-d autoregressive modeling and soft-decision estimation, IEEE Trans. Image Process. 17 (6) (2008) 887-896.
-  Z. Wei, K. K. Ma, Contrast-guided image interpolation, IEEE Trans. Image Process. 22 (11) (2013) 4271-4285.
-  W. Dong, L. Zhang, G. Shi, X. Li, Sparse representation based image interpolation with nonlocal autoregressive modeling, IEEE Trans. Image Process. 22 (4) (2013) 1382-1394.
-  Y. Romano, M. Protter, M. Elad, Single image interpolation via adaptive nonlocal sparsity-based modeling, IEEE Trans. Image Process. 23 (7) (2014) 3085-3098.
-  F. Cao, M. Cai, Y. Tan, Image interpolation via low-rank matrix completion and recovery, IEEE Trans. Circuits Syst. Video Technol. 25 (8) (2015) 1261-1270.
J. J. Huang, W. C. Siu, T. R. Liu, Fast image interpolation via random forest, IEEE Trans. Image Process. 24 (10) (2015) 3232-3245.
-  W. Yang, J. Liu, M. Li, Z. Guo, Isophote-constrained autoregressive model with adaptive window extension for image interpolation, IEEE Trans. Circuits Syst. Video Technol. 2016. DOI: 10.1109/TCSVT.2016.2638864.
-  S. Zhu, B. Zeng, L. Zeng, M. Gabbouj, Image interpolation based on non-local geometric similarities and directional gradients, IEEE Trans. Multimedia. 18 (9) (2016) 1707-1719.
-  L. Wang, H. Wu, C. Pan, Fast image upsampling via the displacement field, IEEE Trans. Image Process. 23 (12) (2014) 5123-5135.
-  Q. Yan, Y. Xu, X. Yang, T. Q. Nguyen, Single image super-resolution based on gradient profile sharpness, IEEE Trans. Image Process. 25 (5) (2016) 2168-2183.
-  W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization, IEEE Trans. Image Process. 20 (7) (2011) 1838-1857.
-  W. Dong, L. Zhang, G. Shi, X. Li, Nonlocally centralized sparse representation for image restoration, IEEE Trans. Image Process. 22 (4) (2013) 1620-1630.
-  S. Mandal, A. Bhavsar, A. K. Sao, Noise adaptive super-resolution from single image via non-local mean and sparse representation, Signal Process. 132 (2017) 134-149.
-  K. Zhang, X. Gao, D. Tao, X. Li, Single image super-resolution with non-local means and steering kernel regression, IEEE Trans. Image Process. 21 (11) (2012) 4544-4556.
-  H. Chen, X. He, Q. Teng, R. Chao, Single image super resolution using local smoothness and nonlocal self-similarity priors, Signal Process., Image Commun. 43 (2016) 68-81.
-  C. Ren, X. He, Q. Teng, Y. Wu, T. Q. Nguyen, Single image super-resolution using local geometric duality and non-local similarity, IEEE Trans. Image Process. 25 (5) (2016) 2168-2183.
-  C. Ren, X. He, T. Q. Nguyen, Single image super-resolution via adaptive high-dimensional non-local total variation and adaptive geometric feature, IEEE Trans. Image Process. 26 (1) (2017) 90-106.
-  W. Gong, Y. Tang, X. Chen, Q. Yi, W. Li, Combining edge difference with nonlocal self-similarity constraints for single image super-resolution, Neurocomputing. 249 (2017) 157-170.
-  H. Chang, D. Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2004, pp. 275-282.
-  M. Bevilacqua, A. Roumy, C. Guillemot, M. L. Alberi-Morel, Low complexity single-image super-resolution based on nonnegative neighbor embedding, in: Proceedings of the British Machine Vision Conference (BMVC), 2012, pp. 135.1-135.10.
-  R. He, Z. Zhang, Locally affine patch mapping and global refinement for image super-resolution, Pattern Recognit. 44 (2011) 2210-2219.
-  Z. Xiong, D. Xu, X. Sun, F. Wu, Example-based super-resolution with soft information and decision, IEEE Trans. Multimedia. 15 (6) (2013) 1458-1465.
-  M. C. Yang, Y. C. Wang, A self-learning approach to single image super-resolution, IEEE Trans. Multimedia. 15 (3) (2013) 498-508.
-  J. B. Huang, A. Singh, N. Ahuja, Single image super-resolution from transformed self-exemplars, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015, pp. 5197-5206.
-  T. Li, X. He, Q. Teng, X. Wu, Rotation expanded dictionary-based single image super-resolution, Neurocomputing. 216 (2016) 1-17.
-  J. Yang, J. Wright, T. S. Huang, Y. Ma, Image super-resolution via sparse representation, IEEE Trans. Image Process. 19 (11) (2010) 2861-2873.
-  R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-representations, in: Proceedings of the International Conference on Curves & Surfaces, 2010, pp. 711-730.
-  S. Wang, L. Zhang, Y. Liang, Q. Pan, Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 2216-2223.
-  L. Shang, S. Liu, Y. Zhou, and Z. Sun, Modified sparse representation based image super-resolution reconstruction method, Neurocomputing. 228 (2017) 37-52.
-  S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang, Convolutional sparse coding for image super-resolution, in: Proceedings of the International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 1823-1831.
-  R. Timofte, V. De Smet, L. Van Gool, Anchored neighborhood regression for fast example-based super-resolution, in: Proceedings of the International Conference on Computer Vision (ICCV), IEEE, 2013, pp. 1920-1927.
-  R. Timofte, V. De Smet, L. Van Gool, A+: Adjusted anchored neighborhood regression for fast super-resolution, in: Proceedings of the Asian Conference on Computer Vision (ACCV), 2014, pp. 111-126.
-  Y. Zhang, Y. Zhang, J. Zhang, Q. Dai, CCR: Clustering and collaborative representation for fast single image super-resolution, IEEE Trans. Multimedia. 18 (3) (2016) 405-417.
-  W. Yang, Y. Tian, F. Zhou, Q. Liao, H. Chen, C. Zheng, Consistent coding scheme for single-image super-resolution via independent dictionaries, IEEE Trans. Multimedia. 18 (3) (2016) 313-325.
-  C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2016) 295-307.
-  Y. Liang, J. Wang, S. Zhou, Y. Gong, N. Zheng, Incorporating image priors with deep convolutional neural networks for image super-resolution, Neurocomputing. 194 (2016) 340-347.
-  D. Liu, Z. Wang, B. Wen, J. Yang, W. Han, T. S. Huang, Robust single image super-resolution via deep networks with sparse prior, IEEE Trans. Image Process. 25 (7) (2016) 3194-3207.
-  C. Dong, C. C. Loy, X. Tang, Accelerating the super-resolution convolutional neural network, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 391-407.
-  J. Kim, J. K. Lee, K. M. Lee, Accurate image super-resolution using very deep convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 1646-1654.
-  Y. Li, J. Hu, X. Zhao, W. Xie, and J. Li, Hyperspectral image super-resolution using deep convolutional neural network, Neurocomputing. 266 (2017) 29-41.
-  L. Wang, Z. Huang, Y. Gong, C. Pan, Ensemble based deep networks for image super-resolution, Pattern Recognit. 68 (2017) 191-198.
-  J. Liu, W. Yang, X. Zhang, Z. Guo, Retrieval compensated group structured sparsity for image super-resolution, IEEE Trans. Multimedia. 19 (2) (2017) 302-316.
-  H. Chen, X. He, L. Qing, Q. Teng, Single image super-resolution via adaptive transform-based nonlocal self-similarity modeling and learning-based gradient regularization, IEEE Trans. Multimedia. 19 (8) (2017) 1702-1717.
-  J. Xiao, C. Wang, X. Hu, Single image super-resolution in compressed domain based on field of expert prior, in: Proceedings of the International Congress on Image and Signal Processing (CISP), IEEE, 2012, pp. 607-611.
-  S. Ono, I. Yamada, Optimized JPEG image decompression with super-resolution interpolation using multi-order total variation, in: Proceedings of the International Conference on Image Processing (ICIP), IEEE, 2013, pp. 474-478.
-  Z. Xiong, X. Sun, F. Wu, Robust web image/video super-resolution, IEEE Trans. Image Process. 19 (8) (2010) 2017-2028.
-  L. W. Kang, C. C. Hsu, B. Zhuang, C. W. Lin, C. H. Yeh, Learning-based joint super-resolution and deblocking for a highly compressed image, IEEE Trans. Multimedia. 17 (7) (2015) 921-934.
-  O. Lee, J. W. Lee, D. Y. Lee, J. O. Lee, Joint super-resolution and compression artifact reduction based on dual-learning, in: Proceedings of the International Conference on Visual Communications and Image Processing (VCIP), IEEE, 2016, pp. 1-4.
-  Y. Zhao, W. Jia, L. Li, L. Cao, X. Liu, Filtered mapping based method for compressed web image super-resolution, IEEE Access. 5 (2017) 12682-12695.
-  H. Zhao, O. Gallo, I. Frosio, J. Kautz, Loss functions for image restoration with neural networks, IEEE Trans. Comput. Imag. 3 (1) (2017) 47-57.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a gaussian denoiser: residual learning of deep CNN for image denoising, IEEE Trans. Image Process. 26 (7) (2017) 3142-3155.
-  C. Dong, Y. Deng, C. C. Loy, X. Tang, Compression artifacts reduction by a deep convolutional network, in: Proceedings of the International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 576-584.
-  J. Guo, H. Chao, Building dual-domain representations for compression artifacts reduction, in: Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 628-644.
-  J. Sun, W. Cao, Z. Xu, J. Ponce, Learning a convolutional neural network for non-uniform motion blur removal, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015, pp. 769-777.
-  B. Cai, X. Xu, K. Jia, C. Qing, D. Tao, DehazeNet: An end-to-end system for single image haze removal, IEEE Trans. Image Process. 25 (11) (2016) 5187-5198.
-  K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 770-778.
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 448-456.
A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Proceedings of the Neural Information Processing Systems Conference (NIPS), 2012, pp. 1097-1105.
-  J. Zhang, R. Xiong, C. Zhao, Y. Zhang, S. Ma, W. Gao, CONCOLOR: Constrained non-convex low-rank model for image deblocking, IEEE Trans. Image Process. 25 (3) (2016) 1246-1259.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600-612.
-  H. R. Sheikh, A. C. Bovik, G. de Veciana, An information fidelity criterion for image quality assessment using natural scene statistics, IEEE Trans. Image Process. 14 (12) (2005) 2117-2128.
-  C. Ma, C. Y. Yang, X. Yang, M. H. Yang, Learning a no-reference quality metric for single-image super-resolution, Comput. Vis. Image Understand. 158 (2017) 1-16.