Image Super-Resolution (SR) is a class of image processing technology which can infer a High-Resolution (HR) image from one or a sequence of Low-Resolution (LR) images . It can transcend the limitations of current optical imaging systems, and has been widely applied in medical and remote sensing imaging, digital photographs, depth based 3D reconstruction, and intelligent video surveillance system [2, 3, 4].
The SR problem is a severely ill-posed inverse problem due to information loss during the image degradation process, e.g., image blurring, aliasing from subsampling and noise. How to reconstruct an HR image which looks pleasant from an LR one remains an extremely challenging task. The prior knowledge, such as piecewise smoothness [5, 6, 7], shape edges [8, 9], textures , local/nonlocal similar patterns [11, 12, 13, 14], low-rank constraint [15, 16], and sparse representations under certain transformations [17, 18, 19, 20, 21], have been investigated to regularize the SR reconstruction procedures. Generally speaking, the current methods fall into two general categories: multi-frame reconstruction approaches and learning-based single image SR approaches.
By making full use of the inter-frame complementary information, multi-frame reconstruction based SR approaches leverage a sequence of LR images of the same scene and fuses them to induce an HR output or a sequence of HR outputs. However, the sub-pixel registration is an exceedingly difficult problem and the magnification factor is limited in practice . Learning-based single image SR methods aim at learning the relationship between the LR and HR example pairs, and then applying the learned transformation to predict missing details of an observed LR image. In this paper, we focus on the single image SR problem.
Since the pioneer work by Freeman et al. , single image SR problem has increasingly been studied and attracted great research interests in recent decades. For example, Chang et al.  introduced the locally linear embedding  based manifold learning theory into SR problem for the first time, and then a series of neighbor embedding algorithms have been proposed [8, 26, 27, 28, 29]. They can well exploit the local manifold structure of image patch space. To adaptively select the neighbor samples, Yang et al.  proposed to use sparse representation algorithm to adaptively choose the most relevant neighbors, avoiding over- or under-fitting of these neighbor embedding based method and obtaining better results [30, 31, 32, 33]. In order to overcome the inconsistency between the LR and HR spaces, quite a few coupled learning based methods have also been developed recently [34, 35, 36]. They are essentially in order to learn the relationship from one domain/space to another domain/space, i.e., from the LR space to the corresponding HR one. The approach of Timofte et al.  leverages the divide and conquer strategy to learn the mapping relationship between the LR and HR samples in multiple local neighbor spaces, and a fast single image SR method based on Anchored Neighborhood Regression (ANR) is developed. In order to further enhance the quality of mapping relationship, they further combine ANR with simple function based method  and proposed the Adjusted ANR (A+ for short) approach 
. A+ studies the mapping relationship between the LR and HR samples in a much denser sample space, which can guarantee the performance of local linear regression. In addition to the work of[37, 38, 39], some regression algorithms also have been developed to directly learn the relationship between the LR samples and HR samples in a coarse-to-fine [40, 41], sparse [42, 43, 44], collaborative [45, 46], adaptive , local [47, 48], pairwise  or structured  manner. The above mentioned algorithms are simple, fast, and can well characterize the potential mapping between the LR and HR spaces (especially the local image patch space), and thus they produced very favorable performance.
Over the past few years, deep learning, the re-emergence of neural networks, has been tremendously and successfully used in a multitude of fields, such as self-driving cars, computer vision, speech recognition, and machine translation, and has achieved significant and impressive results. Most recently, this technology has also been introduced to solve the image SR problem by learning the mapping relationship between the LR and HR samples in an end-to-end manner [52, 53, 54, 55, 56, 57, 58, 59]. Super-Resolution using Deep Convolutional Networks (SRCNN) , Cascade of Sparse Coding based Networks (CSCN) , Very Deep Convolutional Networks (VDSR) , and Deeply-Recursive Convolutional Networks (DRCN)  based deep learning SR techniques carefully design different network structures to meet the challenge of SR reconstruction. Specifically, SRCNN  constructs a three convolutional layers, while CSCN  cascades sparse coding networks. In , VDSR makes use of the deep model up to 20 weights layers to predict residual image between the HR images and LR ones. By this very deep network, it can use large receptive field and take a large image context into account, thus well capturing the image structure especially when the scale factor increase. DRCN  recursively leverages the same convolutional network as many times as desired while does not introduce additional parameters for additional convolutions. To get better human perception, a number of photo-realism based Generative Adversarial Networks (GAN)  have also been presented newly [61, 62].
However, the aforementioned methods based on different shallow prior models (local manifold structure prior or sparse prior) or different deep networks have their own advantages and capture different image details. Over the years, we have witnessed a constant effort to design a better performance for the SR problem. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR task?
One very natural idea is to integrate the outputs of different SR methods (we call the SR algorithms to be ensembled as component super-resolvers in the following) in an ensemble learning framework and produce an output that is better than all component super-resolvers. Then, given a number of results obtained by the component super-resolvers, how to ensemble them to produce a better result? The most obvious way is directly averaging all the component super-resolvers equally. However, ensemble learning theory  has proved that it may be better to combine some instead of all of the learners. That is to say, when we know in advance that the performance of one component super-resolver is poor, we can remove it or set a relative small ensemble weigh in advance. So, the remaining question is how to determine whether a component super-resolver is superior or not. In other words, how to determine the ensemble weights is the essential problem in ensemble learning based SR problem.
In this paper, we contribute a simple but effective Ensemble learning SR algorithm with a Reference dataset, which is denoted as RefESR for short. Our method is inspired by external dataset based models. Unlike previously methods that learn prior knowledge for the parameters of one statistical model or the desired HR images, our method directly learn the SR abilities of different methods and use them to guide the optimization of ensemble parameters, i.e., the ensemble (or combination) weights. To estimate the optimal ensemble weights, in particular, the proposed RefESR method considers both the posterior reconstruction error deduced from the image degradation model and the ensemble weight prior learned from an additional reference dataset, and formulates them in a Maximum A Posteriori (MAP) framework. Moreover, we introduce a simple method to obtain an analytical solution of the ensemble parameters. Fig. 1 shows the pipeline of the proposed RefESR algorithm. To the best of our knowledge, this is the first time to leverage an additional reference dataset to guide the SR reconstruction. Although many previous works have presented to use an additional dataset to exploit the natural image prior, our proposed method directly leverages a reference dataset to obtain the SR ability (in terms of objective qualities) of different component super-resolvers, and applies it to guide the subsequence SR reconstruction. Experimental results demonstrate that our RefESR method is better than state-of-the-art deep learning based SR methods. Moreover, our method is very general and it can be used to ensemble the best methods fed into our framework to improve the SR performance, thus expecting to always achieve the best reconstruction results.
The following paragraphs of this paper are organized as follows: In Section II, we present some related works of ensemble learning based SR approaches. Section III introduces the proposed ensemble SR framework and the objective function optimization method in detail. The experimental results are presented in Section IV. Some deep analysis and discussions to the proposed ensemble learning framework are presented in Section V. Finally, we conclude this work in Section VI.
Ii Related Work
In statistics and machine learning, ensemble learning method is a powerful way to produce a better performance than that could be obtained from any of the component methods. It has been widely applied in the fields of data mining and pattern recognition. Although ensemble learning has achieved great success in machine learning problems, it has not been applied to image SR. Until most recently, two ensemble learning related SR methods have been proposed.
, a video SR method is presented. They decompose the video SR task into two stages: draft-ensemble generation and determine the optimal one via convolutional neural network deep learning. In essence, they leveraged the deep learning networks to select the candidate HR samples in the patch space, and this is the general idea of lots of learning-based SR methods. Through it is termed as ensemble-based, it is not strictly ensemble SR method because selecting the best samples for the following reconstruction is the basic idea of many learning based SR methods[23, 24, 17]. The other work is proposed by Wang et al. , they introduced the ensemble learning into the SR problem and proposed an ensemble based deep networks method for image SR. It focuses on one deep learning based SR method, and generates different models by different initializations of one specific neural network. Specifically, they took sparse coding based networks  as baseline, and developed an Ensemble based Sparse Coding Networks (ESCN) by changing the initializations of SCN . In ESCN, the ensemble weights are adaptively determined by a back-projection model.
ESCN based SR method has achieved better performance than the original SCN method , however, there are two limitations: Firstly, it essentially integrates only one deep learning model, SCN based neural network, with multiple outputs under different initial conditions. Unfortunately, due to the limited capacity of the same network, the complementary information obtained by only changing the initialization is insufficient, thus the improvement of the ensemble result is limited. Secondly, it only considers the reconstruction constraints when determining the ensemble weights and no other prior has been taken into consideration. Their model is actually ill-posed, and there are many solutions to meet its objective function. From their experiments we can also find that the optimal ensemble weights and average weights obtained almost the same results. Therefore, it is not really effective to consider only reconstruction constraints. In contrast, our proposed method ensembles a variety of different methods, including traditional state-of-the-art learning based methods and deep learning based methods with different neural networks emerged in recent years. Moreover, we introduce a reference dataset to measure the performance of different SR methods, which can be seen as the model prior and is incorporated into to our objective function as a regularization term.
Iii Proposed Method
In this section, we present the proposed RefESR method in detail. We firstly give the problem definition of RefESR in a Bayesian framework. Then, we show how to model the reconstruction constraint and the prior of ensemble weights. And then, we induce out the objective function of our proposed RefESR method. After that, we describe an analytical way to solve the optimization problem.
Iii-a Problem Setup
In our proposed ensemble learning based SR method, we can obtain the SR reconstruction results, , of different methods, , for the observed LR image, x. Here, can be seen as the -th SR model. Given and x in the ensemble SR framework, our aim is to infer the optimal ensemble weights, , where is associated with the -th SR model . After obtaining the optimal ensemble weights, we can predict the HR output of LR input by
Under the Bayesian framework, the regularized SR problem is related to a probabilistic model as follows:
Notice that the marginal likelihood, , does not depend on w. With the observation of and , the MAP estimation of w can be formulated as,
where the first term is the likelihood term and the second term denotes the prior knowledge of the w. By the definition of the likelihood term and the prior term, we can maximize the objective function (III-A) to obtain the optimal ensemble weights . Acquiring the optimal ensemble weights, we can expect to infer the target HR output.
Iii-B Reconstruction Constraint Modeling
For single image SR problem, the relationship between the HR image y and the LR one x can be modeled by the observation model :
Here, we denote the matrix B a blurring operator, the matrix D a matrix representing the down-sampling operator, and the matrix v
the additive Gaussian white noise. If we use the matrixH to denote the blurring and downsampling processes (the matrix H stands for the degradation operations), (3) can be rewritten as :
Since the matrix H has far fewer rows than columns, Eq. (4) is ill-posed and has an infinite number of solutions. Therefore, in order to recover a reasonable HR image, SR approaches typically try to find and model an appropriate prior knowledge of natural images. For example, gradient prior, self-similarity property (that some salient features repeat across different scales within an image), or the coupled LR/HR patches based algorithms have been used to effectively model the prior for building the inverse recovery mapping problem.
Developing sophisticated image priors has been the focus of much single image SR research in the past decade. In contrast, the reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, has received relatively little attention. Some algorithms do not enforce x = Hy at all. The representative ANR , A+ , and recently proposed deep learning based methods [52, 53, 54, 55] all ignore this reconstruction constraint.
To this end, in our ensemble learning based SR framework, we introduce this reconstruction constraint to our objective function. Specially, we enforce the blurred and downsampled HR ensemble output should approximately equal the low-res input image. We assume that the difference between the ensemble HR output and the LR input image, i.e.
, the reconstruction error obeys the Gaussian distribution, thus the likelihood term can be written as follows
denotes the standard deviation of the noise.
Iii-C Prior Modeling of Ensemble Weights
The aforementioned reconstruction constraint can be seen as the specific regularization for the ensemble weights of an observed LR image. In this subsection, we propose to regularize the ensemble weights by defining another prior of the ensemble weights, thus overcoming the ill-posed solution of Eq. (5).
In practice, the performance of component super-resolver is unknown. However, we can get their SR results on a reference dataset, which can be used to approximate the performance. Specifically, we introduce an additional reference dataset, and then test the performance of component super-resolvers. Then, their reconstruction quality evaluations can be obtained by combining their performances at different magnification factors, e.g., 2, 3, and 4 in our experiments,
We denote and the mean Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM)  results of the -th component super-resolver at scale , respectively. It is worth mentioning that more measurements can be incorporated to obtain the performance score. Our basic assumption is that the method obtaining a better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework. Fig. 2 shows the process of obtaining the ensemble weight prior.
Therefore, given the performance of component super-resolvers on the reference dataset, we define the
-th element of the reference weight vectoras follows,
where is the bandwidth parameter, and is the best performance of component super-resolvers, . The numerator represents the performance similarity between -th component super-resolver and the best method, while the denominator is normalization constant used to guarantee the sum of all element of to be one. The bandwidth parameter is crucial for the following SR task. Very large or small values will be detrimental to the final result. As shown in Fig. 3, when the value of is too small, the best component super-resolver will dominate the SR reconstruction, i.e., the weight of the best component super-resolver will be close to 1, while other component super-resolvers are almost 0. In contrast, when the value of is too large, all the component super-resolvers will contribute equally to the SR reconstruction, i.e., different component super-resolvers are assigned to the same weights. For more detailed analysis, please refer to the experimental section.
Note that denotes the prior weights learned from the reference dataset. Our aim is to obtain an input specific ensemble weight vector w that cannot differ too much from
. Thus, we can define the prior probability ofw by Gaussian model due to its simplicity:
where is a scale parameter for the prior distribution of ensemble weights w.
Iii-D Objective Function
The first term is the reconstruction error, while the second is the difference between a pre-learned weight vector and the optimal weight vector to be estimated. The regularization parameter is related to and by , and is used to balance the contributions between the reconstruction error and the prior knowledge of w.
In order to make the ensemble SR results interpretable, we present to incorporate the sum-to-one constraint to the objective function. Thus, we have
To obtain an optimal ensemble weight vector, we simultaneously take into consideration the input dependent reconstruction constraint and the prior of the ensemble methods learned from a reference dataset. The first term can be seen a global reconstruction constraint, which can guarantee the consistence between the degraded HR estimation and the input LR image. For these patch based SR methods [24, 37, 39], the averaged and fused HR estimation may not meet perfectly with the global reconstruction constraint [17, 70]. In other words, these patches based SR methods reconstruct the HR image locally (patchwise) and ignore the global information. Through adding this global reconstruction constraint, our method can guarantee the degraded HR image (Hy) is equal to the observed LR image (x), and thus capturing more information about the global structure of the target HR image. Therefore, the proposed ensemble model can avoid the problem of lack of flexibility due to the absence of data-based reconstruction constraints, or the problem of the solution is not unique due to ill-posed conditions.
For the blurring and downsampling processes are the liner operator, thus we have
Eq. (11) can be written as,
where and , , and is a unit matrix with the size of .
where 1 is a column vector of ones. Then, the problem (12) has the following analytical solution:
Upon acquiring the optimal ensemble weights of , we can just simply combine the results of component super-resolvers and through Eq. (1). It is worth noting that the objective functions of Cevikalp et al.  and our proposed method are essentially a constrained least squares problem, as proposed in . The work of 
tries to obtain the optimal combination weights of different classifiers to achieve the best classification performance, while our method focuses on the image SR problem, and tries to obtain the optimal combination (ensemble) weights with the global reconstruction constraint as well as the prior of the weight constraint. In this sense, they are different though they all use the same optimization method to solve their objective function. In fact, in the field of image processing and computer vision, the objective function of many methods is a very simple,i.e., a constrained least squares problem. The difference lies in that different methods use different constraints (prior knowledge) to regularize the solutions. How to find a good prior knowledge and how to model it effectively is the key to the success of an algorithm. The novelty of the proposed method is the introduction of a reference dataset and using it to produce prior knowledge to regularize the combination (ensemble) weights.
Iv Experimental Results
In this section, we present the experimental settings used to evaluate the proposed RefESR approach and show the reconstruction results generated by carrying out SR experiments on three public general image databases and some face image databases.
Iv-a Experimental Setup
Database. To test the performance, we leverage three commonly used image sets, SET5, SET14, and Urban100, as the testing images111SET14 includes 14 different scenes and was firstly used by Zeyde et al.  to show their results, SET5 includes 5 different scenes of image and was used by Bevilacqua et al. , and Urban100 is created by Huang et al.  and contains 100 HR images with a variety of real-world structures, such as urban, city, and architecture. The length and width of original HR images for SET14 (the first column), SET5 (the second column), and Urban100 (the last two columns) databases, are all from 200 pixels to 600 pixels.. Like many state-of-the-art single image SR methods [53, 39, 69, 52, 58, 54, 55]
, in our experiments the original HR images are degenerated by Bicubic interpolation (i.e., the imresize function in Matlab) with a factor of 2, 3, and 4, to generate the corresponding LR images. It should be noted that if the image degradation process of the input LR image is unknown, which can be seen as the blind image SR, the performance of our method will reduce sharply because the mismatch between the true image degradation and simulated image degradation of the training dataset .
Note that there are some contextual connections between the images in the reference set to image in the test set. This has been confirmed by many domain-specific image SR methods, i.e., face hallucination and text SR. When super-resolving LR face images, a good general image SR method which is trained by diversity general images is usually worse than a domain-specific face image SR method which is trained by face images. In this paper, we consider only the general image SR problem, so we hope that the reference dataset should be as diversity as possible.
Implementation Details. To ensemble different component super-resolvers, we first select some state-of-the-art SR algorithms, which include four non- deep learning, e.g., Kim , SelfExSR , A+ , and IA , and five deep learning based methods, e.g., SRCNN , CSCN , CSCN-MV , VDSR , and DRCN .222We select these nine methods for their representative, pleasurable performance, and also public availability of their source codes. Then we test their performance on the reference dataset. Because we know the ground truth of the input LR image, the SR abilities of these algorithms can be measured by some objective metrics, such as PSNR, SSIM, or their combination. And then, the reference weight vector (calculated by Eq. (6)) is applied to regularize the optimization of the ensemble weights.
In the testing phase, we first reconstruct the HR images of above-mentioned component super-resolvers. And then, the optimal ensemble weights is obtained by Eq. (14). The final HR output can be constructed by the combination of the HR resultant images of different component super-resolvers and the optimal ensemble weights.
Iv-B Parameter Analysis
In this subsection, we analyze the effect of model parameters for the performance of RefESR, and validate the proposed reconstruction constraint and reference ensemble weight prior used in the proposed network. Particularly, we conduct experiments on the testing image set of SET14 and the magnification is 3. For other cases, we can still draw a similar conclusion. Therefore, here we will not show up one by one. From the objective function (9) of our method, we learn that the bandwidth parameter and the regularization parameter have a great impact on the performance of the algorithm.
|Best Component Super-Resolver||29.78||0.8314|
|Without Reconstruction Constraint||29.89||0.8337|
|Without Weights Prior||29.70||0.8304|
|Ensemble Via Averaging||29.71||0.8301|
|The Proposed Method||29.90||0.8338|
Fig. 4 and Fig. 5 show the performance of our method when the other parameter is set to the optimal. As shown in Fig. 4, we can at least draw the following two conclusions: (i) The ensemble SR reconstruction is effective. This can be concluded by comparing the results when and . When , almost only the best component super-resolver is active (the 8-th method, i.e., VDSR . Please refer to the top-left of Fig. 3), while when , only three component super-resolvers (the 4-th, 8-th, and 9-th methods, i.e., IA , VDSR , and DRCN ) contribute to the final result. (ii) The prior knowledge learned from the reference dataset is effective. This can be concluded by comparing the results when and . When , all the component super-resolvers will be treated equally, i.e., the ensemble weights are set to the same value (please refer to the bottom-right of Fig. 3 )), the performance is worse. This can be illustrated by that the poor component super-resolver with unreasonable reconstruction of the results will pull down the overall reconstruction performance.
From Fig. 5, we can learn that the performance increases with the increase of the value of , and then slightly decrease. This indicates that the prior knowledge of the reference ensemble weights is very effective for the SR reconstruction. When , it reduces to the case of considering only the reconstruction constraint. There is 0.2 dB gain of the proposed method over the method neglecting the prior knowledge of the reference ensemble weights. The decrease after is because of overemphasizing the prior knowledge of the reference ensemble weights while neglecting the reconstruction constraint. This verifies our motivation of simultaneously taking into consideration of the reconstruction constraint (favors the degenerate model) and the prior knowledge generated from the reference dataset.
For the sake of convenience comparisons, in Table IV we tabulate the performance of above-mentioned cases: RefESR without reconstruction constraint, RefESR without weights prior, ensemble via averaging, and the proposed RefESR method. In the second row, we also list the performance of the best component super-resolver, i.e., VDSR . The two cases of introducing the reconstruction constraint and averaging based ensemble obtain the similar results, which is consistent with Wang et al.’s results (see the Table 4 in ). It also shows that it is not enough to consider reconstruction constraint alone. When compared with RefESR without reconstruction constraint and RefESR without weights prior, it indicates that the ensemble weight prior is effective and relatively more important than the reconstruction constraint. This is mainly because that the component super-resolvers used in our experiments are very competitive and have very good SR performance, and these methods essentially satisfy the reconstruction constraint. By incorporating the prior knowledge of ensemble weights, our method has a quite impressive gain, i.e., 0.2 dB. For image SR is a very hot topic and becomes a test bed for many emerging models and algorithms, and some very superior methods are constantly being presented, and thus it is very difficult for one new method to obtain a very large gain over previous methods. From Table IV, we can also see that simply averaging all the results of different methods will sacrifice the final ensemble performance, e.g., 0.07 dB decrease when compared with VDSR . This once again shows the effectiveness of adaptively assigning different ensemble weights to different component super-resolvers.
Under above optimal parameter settings, and , we examine the final ensemble weights of different testing images on the SET14. As shown in Fig. 6, three best component super-resolvers, IA , VDSR , and DRCN , dominate the SR reconstruction. The better the quality of SR performance over the reference dataset is, the larger the ensemble weight is. This verifies our assumption that better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework. Moreover, from the results we also can see that some component super-resolvers with low quality do not contribute substantially to the results. When we only select three methods that play dominant roles (i.e., the ensemble weights of these methods are relatively large), we find that this has little impact on the final performance of the proposed algorithm. This is consistent with the observation that it may be better to combine some instead of all of the component super-resolvers.
Iv-C Compare with State-of-the-art
To verify the effectiveness of the proposed RefESR method, we provide quantitative and qualitative comparisons with the eight component super-resolvers and Wang et al.’s ESCN method  over SET5, SET14, and Urban100 for different upscaling factors. We add the visual results of Bicubic interpolation, which can be seen as the baseline. In Table I, Table II, and Table III, we show the PSNR and SSIM for adjusted anchored ten comparison methods and our RefESR method. All the values in tables are the average over all the images within a dataset. From the results , we can learn that our method outperforms almost all existing methods, including the most competitive deep learning based methods, in all datasets and scale factors (in term of PSNR). Only in two situations our RefESR method is just a little worse than DRCN  (in term of SSIM). The visual comparisons of three typical images are shown in Fig. 7, Fig. 8, and Fig. 9. To make the comparison more notable, we also give the local region (marked by red boxes) magnification results. Our method produces relatively shaper boundaries and is free of the ringing artifacts.This can be explained as the following two reasons: (i) Through the ensemble strategy, it is possible to highlight the good side of these approaches with superior performance while inhibiting the poor side these approaches with poor performance. (ii). The reconstruction artifacts, ringing artifacts, of one component super-resolver can be weakened by fusing multiple results. But we must also see that if all methods produce ringing artifacts in the same region, the ensemble results cannot eliminate these artifacts.
Image SR is a very hot topic and becomes a test bed for many emerging models and algorithms, especially recently very popular deep learning techniques. Almost every few days there will be a new algorithm is released in arXiv. In the process of preparing this paper, a series of deep learning based SR algorithms are released and achieve very good performance. To further demonstrate the effectiveness of the proposed ensemble learning framework, we additionally ensemble the most competitive method, EDSR , with aforementioned nine component super-resolvers. Table V shows the results of EDSR and the proposed RefESR. In addition, we also give the results of the combination of geometric ensemble strategy and the proposed ensemble strategy, which is denoted as RefESR. From these results, we observe that: (i) Although the performance of EDSR is already very good, the proposed ensemble framework can still improve the final results. It shows that EDSR and other methods still have a certain degree of complementarity. (ii) RefE2SR is better than RefESR. This can be explained by the following reasons: when geometric ensemble strategy is applied to the component super-resolvers, their performance can be improved. With these improved SR results, our proposed method can further promote the overall performance of the combination ensemble strategy.
Iv-D Ensemble Super-Resolution Results with Face Images
In order to verify the universality of our proposed ensemble framework, we test our the proposed RefESR method on the task of face image SR, a.k.a. face hallucination . Similarly, through a reference set, the performance of different face SR algorithms is learned, i.e., their ensemble weights are estimated, and then the reconstruction results of different algorithms on the newly observed LR faces are integrated based on the estimated weights.
The reference face dataset consists of 600 images of 600 subjects, in which 200 subjects are from CAS-PEAL-R1 face database , 100 subjects are from CUHK face database , 200 subjects are from COX-S2V face database , and 100 subjects are from Scface face database . To evaluate the performance of the component super-resolvers, we additionally collect 20, 10, 20, and 10 face images from these four databases to form the evaluation dataset. For testing, we capture 42 High-Definition (HD) images, whos face images are very different from the face image in reference face dataset. Some example images are shown in Fig. 10. In our experiments, the component super-resolvers for face images include Wang et al.’s Eigentrasformation method , neighbor embedding (NE) , least squares representation (LSR) , sparse representation (SR) , locality-constrained representation (LcR) , smooth sparse representation (SSR) , dual regularization prior (DRP) .
Similarly, we apply the reference face dataset to train the component super-resolvers and use the evaluation dataset to obtain their performance in terms of PSNR and SSIM. Therefore, the the reference weight vector can be calculated according to Eq. (6). Based on the prior knowledge of w, we can obtain the optimal ensemble weight vector for each input LR face image by 9. In addition, we also conduct some experiments to test the robustness of our method when the input is contaminated by noise. Our first impression is: given the noise input, if the resulting images generated by different algorithms are not optimal (may contain noise), then the noise can be smoothed through fusion of different results.
Table LABEL:tab:face tabulates the performance (in terms of average PSNR and SSIM) of different component super-resolvers and the proposed RefESR method under different noise levels, i.e., . We learn that RefESR achieves the best average PSNR and SSIM results. The gains of the proposed method over the second best method are obvious, greater than 0.5 dB in term of PSNR. In addition, we also observe that with the increase of noise, the advantage of the proposed method is much more obvious. In particular, when the input is noiseless, the PSNR gain of the proposed method over the second best method is 0.53 dB. When the input is contaminated by different levels of noise, the gain is 0.58 dB for and 0.69 dB for , respectively. We attribute this to the advantages of ensemble learning, which can eliminate the uncertainties caused by noise in different methods. Fig. 11 shows some visual comparison results of component super-resolvers and the proposed method. From these results, we observe that the proposed RefESR method can remove most of the noise and well maintain the main structural information.
In this section, we show deep analysis to the proposed ensemble learning framework, so that readers can better capture our idea.
Time complexity. By ensembling the results of some state-of-the-art methods, we can expect better reconstruction performance. This will also result in very high computational complexity. Despite the efficient solution of the optimization procedures of ESR, which take around 0.06 seconds for each image, the computational complexity of our method is high because the total running time is the sum of (i) all component super-resolvers and (ii) the optimization procedures of ESR. Therefore, the computational complexity will be a bottleneck for our approach in practical applications.
Theoretical guarantee. Another drawback of the proposed algorithm is that there is no theoretical guarantee to produce a better result by ensembling different methods, which is also the limitation of conventional ensemble learning based machine learning methods . From the experiments, we learn that in most cases our RefESR method beats all the comparison methods. However, under some situations, our RefESR method is worse than the best comparison method. Therefore, in the future we will consider the learning of a safe prediction from multiple component super-resolvers, which is not worse than the performance of all component super-resolvers.
Model universality. Different methods can adapt to different kind of test images. For example, there are SR algorithms for general images and SR algorithms for specific images such as digital characters, faces, and irises. SR models trained on general images are not suitable for reconstruction of specific images, and vice versa. Furthermore, the ensemble weight prior (of ensemble learning) obtained from the general images of different methods may not necessarily reflect the SR ability on specific images. In this paper, the proposed ensemble learning based SR method is applied to the general images and face images SR tasks. Through the experiments, we believe that the proposed method is indeed effective in the sense of improving the performance of the existing generic image SR algorithms or face image SR algorithms. In summary, the proposed framework is very universal in the sense that given a reference dataset, the proposed method can improve the performance of existing SR methods when the input image is with the same class of the reference dataset.
Choice of component super-resolvers. In this paper, we do not consider the complementarity of different methods, but directly select several representative methods in the current SR field, including four shadow learning-based methods and five deep learning-based methods. We also believe that when choosing component super-resolvers, it should consider the characteristics of different algorithms. Ensembling component super-resolvers with different characteristics is more likely to improve the final ensemble performance.
Global reconstruction constraint. As shown in many previous works [84, 32, 17], global reconstruction constraint, which claims that the degenerated HR estimation should be consistent with the observed LR image [17, 70], is very effective for enhancing the final super-resolved results by an iterative back projection strategy. In our experiments, we have found that if the performance of the component super-resolver is good enough, the improvement brought by reconstruction constraint is very limited. In other words, when the component super-resolver is good enough, it can basically meet the reconstruction constraint.
In this paper, we present a novel framework based on ensemble learning to solve the single image SR problem. It introduces a reference dataset and incorporates the learned prior of each component super-resolver, which states that the method obtains a better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework, to regularize the optimization of ensemble weights. We simultaneously model this learned prior of ensemble weights and reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, by an MAP formulation. Finally, we present an analytical solution to this constrained least squares problem induced from the MAP framework. Results show the effectiveness of the introduced prior knowledge of ensemble weights learned from a reference dataset.
-  S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21–36, 2003.
-  N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A comprehensive survey to face hallucination,” Int. J. Comput. Vis., vol. 106, no. 1, pp. 9–30, 2014.
-  X. Liu, D. Zhai, R. Chen, X. Ji, D. Zhao, and W. Gao, “Depth super-resolution via joint color-guided internal and external regularizations,” IEEE Trans. Image Process., vol. 28, no. 4, pp. 1636–1645, 2019.
-  K. Jiang, Z. Wang, P. Yi, J. Jiang, J. Xiao, and Y. Yao, “Deep distillation recursive network for remote sensing imagery super-resolution,” Remote Sensing, vol. 10, no. 11, p. 1700, 2018.
-  H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” IEEE Trans. Image Process., vol. 14, no. 10, pp. 1647–1659, 2005.
-  X. Liu, D. Zhai, D. Zhao, G. Zhai, and W. Gao, “Progressive image denoising through hybrid graph laplacian regularization: A unified framework,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1491–1503, April 2014.
-  X. Liu, G. Cheung, X. Wu, and D. Zhao, “Random walk graph laplacian-based smoothness prior for soft decoding of JPEG images,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 509–524, Feb 2017.
-  J. Sun, J. Sun, Z. Xu, and H.-Y. Shum, “Image super-resolution using gradient profile prior,” in CVPR. IEEE, 2008, pp. 1–8.
-  H. Chen, X. He, L. Qing, and Q. Teng, “Single image super-resolution via adaptive transform-based nonlocal self-similarity modeling and learning-based gradient regularization,” IEEE Trans. on Multimedia, vol. PP, no. 99, pp. 1–1, 2017.
-  Y. HaCohen, R. Fattal, and D. Lischinski, “Image upsampling via texture hallucination,” in ICCP. IEEE, 2010, pp. 1–8.
-  D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in ICCV, Sept 2009, pp. 349–356.
-  X. Liu, D. Zhao, R. Xiong, S. Ma, W. Gao, and H. Sun, “Image interpolation via regularized local linear regression,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3455–3469, 2011.
-  Y. Zhang, D. Zhao, J. Zhang, R. Xiong, and W. Gao, “Interpolation-dependent image downsampling,” IEEE Trans. Image Process., vol. 20, no. 11, pp. 3291–3296, 2011.
-  J. Jiang, X. Ma, C. Chen, T. Lu, Z. Wang, and J. Ma, “Single image super-resolution via locally regularized anchored neighborhood regression and nonlocal means,” IEEE Trans. Multimedia, vol. 19, no. 1, pp. 15–26, 2017.
W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: A low-rank approach,”IEEE Trans. Image Process., vol. 22, no. 2, pp. 700–711, 2013.
-  W. Gong, L. Hu, J. Li, and W. Li, “Combining sparse representation and local rank constraint for single image super resolution,” Inf. Sci., vol. 325, pp. 1–19, 2015.
-  J. Yang, J. Wright, T. Huang, , and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
-  S. Yang, M. Wang, Y. Sun, F. Sun, and L. Jiao, “Compressive sampling based single-image super-resolution reconstruction by dual-sparsity and non-local similarity regularizer,” Pattern Recognition Letters, vol. 33, no. 9, pp. 1049–1059, 2012.
-  X. Li, H. He, R. Wang, and D. Tao, “Single image superresolution via directional group sparsity and directional features,” IEEE Trans. Image Process., vol. 24, no. 9, pp. 2874–2888, 2015.
-  S. Yang, J. Zhang, S. Cui, M. Wang, and L. Jiao, “Curvelet support value filters (csvfs) for image super-resolution,” Neurocomputing, vol. 211, pp. 53–59, 2016.
-  J. Ma, J. Zhao, J. Tian, X. Bai, and Z. Tu, “Regularized vector field learning with sparse approximation for mismatch removal,” Pattern Recognit., vol. 46, no. 12, pp. 3519–3532, 2013.
-  Z. Lin and H.-Y. Shum, “Fundamental limits of reconstruction-based superresolution algorithms under local translation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 83–97, Jan. 2004.
-  W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” in IEEE Computer Graphics and Applications, 2002, pp. 56–65.
-  H. Chang, D. Yeung, and Y. Xiong, “Super-resolution through neighbor embedding,” in CVPR, vol. 1, 2004, pp. 275 – 282.
-  S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” vol. 290, no. 5500, pp. 2323–2326, 2000.
-  M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” in Proceedings of British Machine Vision Conference (BMVC), 2012, pp. 1–10.
S. Yang, Z. Wang, L. Zhang, and M. Wang, “Dual-geometric neighbor embedding for image super resolution with sparse tensor,”IEEE Trans. Image Process., vol. 23, no. 7, pp. 2793–2803, July 2014.
J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. Tian, “Robust feature matching for remote sensing image registration via locally linear transforming,”IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6469–6481, 2015.
-  J. Ma, J. Wu, J. Zhao, J. Jiang, H. Zhou, and Q. Z. Sheng, “Nonrigid point set registration with robust transformation learning under manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., 2018.
-  H. Zhang, J. Yang, Y. Zhang, N. M. Nasrabadi, and T. S. Huang, “Close the loop: Joint blind image restoration and recognition with sparse representation prior,” in ICCV. IEEE, 2011, pp. 770–777.
-  K. I. Kim and Y. Kwon, “Single-image super-resolution using sparse regression and natural image prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 6, pp. 1127–1133, June 2010.
-  R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations.” Springer Berlin Heidelberg, 2012, vol. 6920, pp. 711–730.
-  J. Jiang, X. Ma, Z. Cai, and R. Hu, “Sparse support regression for image super-resolution,” IEEE Photonics J., vol. 7, no. 5, pp. 1–11, 2015.
-  K. Jia, X. Wang, and X. Tang, “Image transformation based on learning dictionaries across image spaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 2, pp. 367–380, 2013.
-  S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis,” in in CVPR, 2012, pp. 2216–2223.
-  W. Yang, Y. Tian, F. Zhou, Q. Liao, H. Chen, and C. Zheng, “Consistent coding scheme for single-image super-resolution via independent dictionaries,” IEEE Trans. on Multimedia, vol. 18, no. 3, pp. 313–325, March 2016.
-  R. Timofte, V. De, and L. Van Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in ICCV, Dec 2013, pp. 1920–1927.
-  C.-Y. Yang and M.-H. Yang, “Fast direct super-resolution by simple functions,” in ICCV, 2013, pp. 561–568.
-  R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Asian Conference on Computer Vision. Springer, 2014, pp. 111–126.
-  K. Zhang, D. Tao, X. Gao, X. Li, and J. Li, “Coarse-to-fine learning for single-image super-resolution,” IEEE Trans. Neural Netw. Learn. Syst., vol. PP, no. 99, pp. 1–14, 2016.
-  Y. Hu, N. Wang, D. Tao, X. Gao, and X. Li, “SERF: A simple, effective, robust, and fast image super-resolver from cascaded linear regression,” IEEE Trans. Image Process., vol. 25, no. 9, pp. 4091–4102, 2016.
-  Y. Tang and Y. Yuan, “Learning from errors in super-resolution,” IEEE Trans. Cybern., vol. 44, no. 11, pp. 2143–2154, Nov 2014.
-  W. Yang, T. Yuan, W. Wang, F. Zhou, and Q. Liao, “Single-image super-resolution by subdictionary coding and kernel regression,” IEEE Trans. Syst., Man, Cybern.: Systems, vol. 47, no. 9, pp. 2478–2488, Sept 2017.
-  J. Liu, W. Yang, X. Zhang, and Z. Guo, “Retrieval compensated group structured sparsity for image super-resolution,” IEEE Trans. Multimedia, vol. 19, no. 2, pp. 302–316, Feb 2017.
-  Y. Zhang, Y. Zhang, J. Zhang, and Q. Dai, “Ccr: Clustering and collaborative representation for fast single image super-resolution,” IEEE Trans. on Multimedia, vol. 18, no. 3, pp. 405–417, March 2016.
-  Y. Zhang, Y. Zhang, J. Zhang, D. Xu, Y. Fu, Y. Wang, X. Ji, and Q. Dai, “Collaborative representation cascade for single-image super-resolution,” IEEE Trans. Syst., Man, Cybern.: Systems, vol. PP, no. 99, pp. 1–16, 2018.
-  S. Yang, J. Liu, Y. Fang, and Z. Guo, “Joint-feature guided depth map super-resolution with face priors,” IEEE Trans. Cybern., vol. PP, no. 99, pp. 1–13, 2017.
-  H. Shen, L. Peng, L. Yue, Q. Yuan, and L. Zhang, “Adaptive norm selection for regularized image restoration and super-resolution,” IEEE Trans. Cybern., vol. 46, no. 6, pp. 1388–1399, June 2016.
-  Y. Tang and L. Shao, “Pairwise operator learning for patch-based single-image super-resolution,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 994–1003, 2017.
-  C. Deng, J. Xu, K. Zhang, D. Tao, X. Gao, and X. Li, “Similarity constraints-based structured output regression machine: An approach to image super-resolution,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 12, pp. 2472–2485, Dec 2016.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
-  C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
-  J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in CVPR, 2015, pp. 5197–5206.
-  J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in CVPR, 2016, pp. 1646–1654.
-  ——, “Deeply-recursive convolutional network for image super-resolution,” in CVPR, 2016, pp. 1637–1645.
K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image super-resolution,”IEEE Trans. Cybern., vol. 47, no. 1, pp. 27–37, Jan 2017.
-  Y. Huang, L. Shao, and A. F. Frangi, “DOTE: dual convolutional filter learning for super-resolution and cross-modality synthesis in MRI,” CoRR, vol. abs/1706.04954, 2017.
-  Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in ICCV, 2015, pp. 370–378.
-  Z. Wang, P. Yi, K. Jiang, J. Jiang, Z. Han, T. Lu, and J. Ma, “Multi-memory convolutional neural network for video super-resolution,” IEEE Trans. Image Process., pp. 1–1, 2018.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
-  J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016, pp. 694–711.
-  Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, “Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks,” in CVPRW, June 2018.
-  Z.-H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: many could be better than all,” Artificial intelligence, vol. 137, no. 1-2, pp. 239–263, 2002.
-  Z.-H. Zhou, Ensemble methods: foundations and algorithms. CRC press, 2012.
-  R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video super-resolution via deep draft-ensemble learning,” in ICCV, 2015, pp. 531–539.
-  L. Wang, Z. Huang, Y. Gong, and C. Pan, “Ensemble based deep networks for image super-resolution,” Pattern Recogn., vol. 68, pp. 191–198, 2017.
-  M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Process., vol. 6, no. 12, pp. 1646–1658, 1997.
-  Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600 –612, 2004.
-  R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve example-based single image super resolution,” in CVPR, 2016, pp. 1865–1873.
-  X. Gao, K. Zhang, D. Tao, and X. Li, “Joint learning for single image super-resolution via coupled constraint,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 469–480, 2012.
-  H. Cevikalp and R. Polikar, “Local classifier weighting by quadratic programming,” IEEE Trans. Neural Netw., vol. 19, no. 10, pp. 1832–1838, Oct 2008.
-  R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee et al., “Ntire 2017 challenge on single image super-resolution: Methods and results,” in CVPRW. IEEE, 2017, pp. 1110–1121.
-  B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in CVPRW, 2017, pp. 1132–1140.
-  X. Wang and X. Tang, “Hallucinating face by eigentransformation,” IEEE Trans. Syst. Man Cybern. Part C-Appl. Rev., vol. 35, no. 3, pp. 425 –434, 2005.
-  X. Ma, J. Zhang, and C. Qi, “Hallucinating face by position-patch,” Pattern Recogn., vol. 43, no. 6, pp. 2224 – 2236, 2010.
-  J. Jiang, R. Hu, Z. Wang, and Z. Han, “Noise robust face hallucination via locality-constrained representation,” IEEE Trans. Multimedia, vol. 16, no. 5, pp. 1268–1281, 2014.
-  J. Jiang, J. Ma, C. Chen, X. Jiang, and Z. Wang, “Noise robust face image super-resolution through smooth sparse representation,” IEEE Trans. Cybern., vol. 47, no. 11, pp. 3991–4002, 2017.
-  J. Shi and C. Qi, “Kernel-based face hallucination via dual regularization priors,” IEEE Signal Proc. Let., vol. 22, no. 8, pp. 1189–1193, 2015.
-  S. Baker and T. Kanade, “Hallucinating faces,” in Proc. IEEE Conf. on Automatic Face and Gesture (FG), 2000, pp. 83 –88.
-  W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, and D. Zhao, “The cas-peal large-scale chinese face database and baseline evaluations,” IEEE Trans. Syst. Man Cybern. Part A-Syst. Hum., vol. 38, no. 1, pp. 149 –161, 2008.
-  X. Wang and X. Tang, “Face photo-sketch synthesis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 1955–1967, 2009.
Z. Huang, S. Shan, R. Wang, H. Zhang, S. Lao, A. Kuerban, and X. Chen, “A benchmark and comparative study of video-based face recognition on cox face database,”IEEE Trans. Image Process., vol. 24, no. 12, pp. 5967–5981, 2015.
-  M. Grgic, K. Delac, and S. Grgic, “Scface–surveillance cameras face database,” Multimedia Tools Appl., vol. 51, no. 3, pp. 863–879, 2011.
-  S. Yang, M. Wang, Y. Chen, and Y. Sun, “Single-image super-resolution reconstruction via learned geometric dictionaries and clustered sparse coding,” IEEE Trans. Image Process., vol. 21, no. 9, pp. 4016–4028, 2012.