I Introduction
Image SuperResolution (SR) is a class of image processing technology which can infer a HighResolution (HR) image from one or a sequence of LowResolution (LR) images [1]. It can transcend the limitations of current optical imaging systems, and has been widely applied in medical and remote sensing imaging, digital photographs, depth based 3D reconstruction, and intelligent video surveillance system [2, 3, 4].
The SR problem is a severely illposed inverse problem due to information loss during the image degradation process, e.g., image blurring, aliasing from subsampling and noise. How to reconstruct an HR image which looks pleasant from an LR one remains an extremely challenging task. The prior knowledge, such as piecewise smoothness [5, 6, 7], shape edges [8, 9], textures [10], local/nonlocal similar patterns [11, 12, 13, 14], lowrank constraint [15, 16], and sparse representations under certain transformations [17, 18, 19, 20, 21], have been investigated to regularize the SR reconstruction procedures. Generally speaking, the current methods fall into two general categories: multiframe reconstruction approaches and learningbased single image SR approaches.
By making full use of the interframe complementary information, multiframe reconstruction based SR approaches leverage a sequence of LR images of the same scene and fuses them to induce an HR output or a sequence of HR outputs. However, the subpixel registration is an exceedingly difficult problem and the magnification factor is limited in practice [22]. Learningbased single image SR methods aim at learning the relationship between the LR and HR example pairs, and then applying the learned transformation to predict missing details of an observed LR image. In this paper, we focus on the single image SR problem.
Since the pioneer work by Freeman et al. [23], single image SR problem has increasingly been studied and attracted great research interests in recent decades. For example, Chang et al. [24] introduced the locally linear embedding [25] based manifold learning theory into SR problem for the first time, and then a series of neighbor embedding algorithms have been proposed [8, 26, 27, 28, 29]. They can well exploit the local manifold structure of image patch space. To adaptively select the neighbor samples, Yang et al. [17] proposed to use sparse representation algorithm to adaptively choose the most relevant neighbors, avoiding over or underfitting of these neighbor embedding based method and obtaining better results [30, 31, 32, 33]. In order to overcome the inconsistency between the LR and HR spaces, quite a few coupled learning based methods have also been developed recently [34, 35, 36]. They are essentially in order to learn the relationship from one domain/space to another domain/space, i.e., from the LR space to the corresponding HR one. The approach of Timofte et al. [37] leverages the divide and conquer strategy to learn the mapping relationship between the LR and HR samples in multiple local neighbor spaces, and a fast single image SR method based on Anchored Neighborhood Regression (ANR) is developed. In order to further enhance the quality of mapping relationship, they further combine ANR with simple function based method [38] and proposed the Adjusted ANR (A+ for short) approach [39]
. A+ studies the mapping relationship between the LR and HR samples in a much denser sample space, which can guarantee the performance of local linear regression. In addition to the work of
[37, 38, 39], some regression algorithms also have been developed to directly learn the relationship between the LR samples and HR samples in a coarsetofine [40, 41], sparse [42, 43, 44], collaborative [45, 46], adaptive [9], local [47, 48], pairwise [49] or structured [50] manner. The above mentioned algorithms are simple, fast, and can well characterize the potential mapping between the LR and HR spaces (especially the local image patch space), and thus they produced very favorable performance.Over the past few years, deep learning, the reemergence of neural networks, has been tremendously and successfully used in a multitude of fields, such as selfdriving cars, computer vision, speech recognition, and machine translation, and has achieved significant and impressive results
[51]. Most recently, this technology has also been introduced to solve the image SR problem by learning the mapping relationship between the LR and HR samples in an endtoend manner [52, 53, 54, 55, 56, 57, 58, 59]. SuperResolution using Deep Convolutional Networks (SRCNN) [52], Cascade of Sparse Coding based Networks (CSCN) [58], Very Deep Convolutional Networks (VDSR) [54], and DeeplyRecursive Convolutional Networks (DRCN) [55] based deep learning SR techniques carefully design different network structures to meet the challenge of SR reconstruction. Specifically, SRCNN [52] constructs a three convolutional layers, while CSCN [58] cascades sparse coding networks. In [54], VDSR makes use of the deep model up to 20 weights layers to predict residual image between the HR images and LR ones. By this very deep network, it can use large receptive field and take a large image context into account, thus well capturing the image structure especially when the scale factor increase. DRCN [55] recursively leverages the same convolutional network as many times as desired while does not introduce additional parameters for additional convolutions. To get better human perception, a number of photorealism based Generative Adversarial Networks (GAN) [60] have also been presented newly [61, 62].However, the aforementioned methods based on different shallow prior models (local manifold structure prior or sparse prior) or different deep networks have their own advantages and capture different image details. Over the years, we have witnessed a constant effort to design a better performance for the SR problem. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR task?
One very natural idea is to integrate the outputs of different SR methods (we call the SR algorithms to be ensembled as component superresolvers in the following) in an ensemble learning framework and produce an output that is better than all component superresolvers. Then, given a number of results obtained by the component superresolvers, how to ensemble them to produce a better result? The most obvious way is directly averaging all the component superresolvers equally. However, ensemble learning theory [63] has proved that it may be better to combine some instead of all of the learners. That is to say, when we know in advance that the performance of one component superresolver is poor, we can remove it or set a relative small ensemble weigh in advance. So, the remaining question is how to determine whether a component superresolver is superior or not. In other words, how to determine the ensemble weights is the essential problem in ensemble learning based SR problem.
In this paper, we contribute a simple but effective Ensemble learning SR algorithm with a Reference dataset, which is denoted as RefESR for short. Our method is inspired by external dataset based models. Unlike previously methods that learn prior knowledge for the parameters of one statistical model or the desired HR images, our method directly learn the SR abilities of different methods and use them to guide the optimization of ensemble parameters, i.e., the ensemble (or combination) weights. To estimate the optimal ensemble weights, in particular, the proposed RefESR method considers both the posterior reconstruction error deduced from the image degradation model and the ensemble weight prior learned from an additional reference dataset, and formulates them in a Maximum A Posteriori (MAP) framework. Moreover, we introduce a simple method to obtain an analytical solution of the ensemble parameters. Fig. 1 shows the pipeline of the proposed RefESR algorithm. To the best of our knowledge, this is the first time to leverage an additional reference dataset to guide the SR reconstruction. Although many previous works have presented to use an additional dataset to exploit the natural image prior, our proposed method directly leverages a reference dataset to obtain the SR ability (in terms of objective qualities) of different component superresolvers, and applies it to guide the subsequence SR reconstruction. Experimental results demonstrate that our RefESR method is better than stateoftheart deep learning based SR methods. Moreover, our method is very general and it can be used to ensemble the best methods fed into our framework to improve the SR performance, thus expecting to always achieve the best reconstruction results.
The following paragraphs of this paper are organized as follows: In Section II, we present some related works of ensemble learning based SR approaches. Section III introduces the proposed ensemble SR framework and the objective function optimization method in detail. The experimental results are presented in Section IV. Some deep analysis and discussions to the proposed ensemble learning framework are presented in Section V. Finally, we conclude this work in Section VI.
Ii Related Work
In statistics and machine learning, ensemble learning method is a powerful way to produce a better performance than that could be obtained from any of the component methods. It has been widely applied in the fields of data mining and pattern recognition
[64]. Although ensemble learning has achieved great success in machine learning problems, it has not been applied to image SR. Until most recently, two ensemble learning related SR methods have been proposed.In [65]
, a video SR method is presented. They decompose the video SR task into two stages: draftensemble generation and determine the optimal one via convolutional neural network deep learning. In essence, they leveraged the deep learning networks to select the candidate HR samples in the patch space, and this is the general idea of lots of learningbased SR methods. Through it is termed as ensemblebased, it is not strictly ensemble SR method because selecting the best samples for the following reconstruction is the basic idea of many learning based SR methods
[23, 24, 17]. The other work is proposed by Wang et al. [66], they introduced the ensemble learning into the SR problem and proposed an ensemble based deep networks method for image SR. It focuses on one deep learning based SR method, and generates different models by different initializations of one specific neural network. Specifically, they took sparse coding based networks [58] as baseline, and developed an Ensemble based Sparse Coding Networks (ESCN) by changing the initializations of SCN [58]. In ESCN, the ensemble weights are adaptively determined by a backprojection model.ESCN based SR method has achieved better performance than the original SCN method [58], however, there are two limitations: Firstly, it essentially integrates only one deep learning model, SCN based neural network, with multiple outputs under different initial conditions. Unfortunately, due to the limited capacity of the same network, the complementary information obtained by only changing the initialization is insufficient, thus the improvement of the ensemble result is limited. Secondly, it only considers the reconstruction constraints when determining the ensemble weights and no other prior has been taken into consideration. Their model is actually illposed, and there are many solutions to meet its objective function. From their experiments we can also find that the optimal ensemble weights and average weights obtained almost the same results. Therefore, it is not really effective to consider only reconstruction constraints. In contrast, our proposed method ensembles a variety of different methods, including traditional stateoftheart learning based methods and deep learning based methods with different neural networks emerged in recent years. Moreover, we introduce a reference dataset to measure the performance of different SR methods, which can be seen as the model prior and is incorporated into to our objective function as a regularization term.
Iii Proposed Method
In this section, we present the proposed RefESR method in detail. We firstly give the problem definition of RefESR in a Bayesian framework. Then, we show how to model the reconstruction constraint and the prior of ensemble weights. And then, we induce out the objective function of our proposed RefESR method. After that, we describe an analytical way to solve the optimization problem.
Iiia Problem Setup
In our proposed ensemble learning based SR method, we can obtain the SR reconstruction results, , of different methods, , for the observed LR image, x. Here, can be seen as the th SR model. Given and x in the ensemble SR framework, our aim is to infer the optimal ensemble weights, , where is associated with the th SR model . After obtaining the optimal ensemble weights, we can predict the HR output of LR input by
(1) 
Under the Bayesian framework, the regularized SR problem is related to a probabilistic model as follows:
Notice that the marginal likelihood, , does not depend on w. With the observation of and , the MAP estimation of w can be formulated as,
(2) 
where the first term is the likelihood term and the second term denotes the prior knowledge of the w. By the definition of the likelihood term and the prior term, we can maximize the objective function (IIIA) to obtain the optimal ensemble weights . Acquiring the optimal ensemble weights, we can expect to infer the target HR output.
IiiB Reconstruction Constraint Modeling
For single image SR problem, the relationship between the HR image y and the LR one x can be modeled by the observation model [67]:
(3) 
Here, we denote the matrix B a blurring operator, the matrix D a matrix representing the downsampling operator, and the matrix v
the additive Gaussian white noise. If we use the matrix
H to denote the blurring and downsampling processes (the matrix H stands for the degradation operations), (3) can be rewritten as [1]:(4) 
Since the matrix H has far fewer rows than columns, Eq. (4) is illposed and has an infinite number of solutions. Therefore, in order to recover a reasonable HR image, SR approaches typically try to find and model an appropriate prior knowledge of natural images. For example, gradient prior, selfsimilarity property (that some salient features repeat across different scales within an image), or the coupled LR/HR patches based algorithms have been used to effectively model the prior for building the inverse recovery mapping problem.
Developing sophisticated image priors has been the focus of much single image SR research in the past decade. In contrast, the reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, has received relatively little attention. Some algorithms do not enforce x = Hy at all. The representative ANR [37], A+ [39], and recently proposed deep learning based methods [52, 53, 54, 55] all ignore this reconstruction constraint.
To this end, in our ensemble learning based SR framework, we introduce this reconstruction constraint to our objective function. Specially, we enforce the blurred and downsampled HR ensemble output should approximately equal the lowres input image. We assume that the difference between the ensemble HR output and the LR input image, i.e.
, the reconstruction error obeys the Gaussian distribution, thus the likelihood term can be written as follows
(5) 
where
denotes the standard deviation of the noise.
IiiC Prior Modeling of Ensemble Weights
The aforementioned reconstruction constraint can be seen as the specific regularization for the ensemble weights of an observed LR image. In this subsection, we propose to regularize the ensemble weights by defining another prior of the ensemble weights, thus overcoming the illposed solution of Eq. (5).
In practice, the performance of component superresolver is unknown. However, we can get their SR results on a reference dataset, which can be used to approximate the performance. Specifically, we introduce an additional reference dataset, and then test the performance of component superresolvers. Then, their reconstruction quality evaluations can be obtained by combining their performances at different magnification factors, e.g., 2, 3, and 4 in our experiments,
We denote and the mean Peak SignaltoNoise Ratio (PSNR) and Structural Similarity Index (SSIM) [68] results of the th component superresolver at scale , respectively. It is worth mentioning that more measurements can be incorporated to obtain the performance score. Our basic assumption is that the method obtaining a better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework. Fig. 2 shows the process of obtaining the ensemble weight prior.
Therefore, given the performance of component superresolvers on the reference dataset, we define the
th element of the reference weight vector
as follows,(6) 
where is the bandwidth parameter, and is the best performance of component superresolvers, . The numerator represents the performance similarity between th component superresolver and the best method, while the denominator is normalization constant used to guarantee the sum of all element of to be one. The bandwidth parameter is crucial for the following SR task. Very large or small values will be detrimental to the final result. As shown in Fig. 3, when the value of is too small, the best component superresolver will dominate the SR reconstruction, i.e., the weight of the best component superresolver will be close to 1, while other component superresolvers are almost 0. In contrast, when the value of is too large, all the component superresolvers will contribute equally to the SR reconstruction, i.e., different component superresolvers are assigned to the same weights. For more detailed analysis, please refer to the experimental section.
Note that denotes the prior weights learned from the reference dataset. Our aim is to obtain an input specific ensemble weight vector w that cannot differ too much from
. Thus, we can define the prior probability of
w by Gaussian model due to its simplicity:(7) 
where is a scale parameter for the prior distribution of ensemble weights w.
IiiD Objective Function
By substituting Eq. (5) and Eq. (7) into the Eq. (2) and dropping some constant terms, we have
(8) 
The first term is the reconstruction error, while the second is the difference between a prelearned weight vector and the optimal weight vector to be estimated. The regularization parameter is related to and by , and is used to balance the contributions between the reconstruction error and the prior knowledge of w.
In order to make the ensemble SR results interpretable, we present to incorporate the sumtoone constraint to the objective function. Thus, we have
(9) 
To obtain an optimal ensemble weight vector, we simultaneously take into consideration the input dependent reconstruction constraint and the prior of the ensemble methods learned from a reference dataset. The first term can be seen a global reconstruction constraint, which can guarantee the consistence between the degraded HR estimation and the input LR image. For these patch based SR methods [24, 37, 39], the averaged and fused HR estimation may not meet perfectly with the global reconstruction constraint [17, 70]. In other words, these patches based SR methods reconstruct the HR image locally (patchwise) and ignore the global information. Through adding this global reconstruction constraint, our method can guarantee the degraded HR image (Hy) is equal to the observed LR image (x), and thus capturing more information about the global structure of the target HR image. Therefore, the proposed ensemble model can avoid the problem of lack of flexibility due to the absence of databased reconstruction constraints, or the problem of the solution is not unique due to illposed conditions.
IiiE Optimization
For the blurring and downsampling processes are the liner operator, thus we have
(10) 
Each column of denotes one downsampled HR output, . By substituting Eq. (10) to Eq. (9), the objective function can be rewritten as the following matrix form,
(11) 
Apparently, Eq. (12) is a constrained linear least squares problem. Following the work of [25], we first define a local Gram matrix G for ,
(13) 
where 1 is a column vector of ones. Then, the problem (12) has the following analytical solution:
(14) 
Upon acquiring the optimal ensemble weights of , we can just simply combine the results of component superresolvers and through Eq. (1). It is worth noting that the objective functions of Cevikalp et al. [71] and our proposed method are essentially a constrained least squares problem, as proposed in [25]. The work of [71]
tries to obtain the optimal combination weights of different classifiers to achieve the best classification performance, while our method focuses on the image SR problem, and tries to obtain the optimal combination (ensemble) weights with the global reconstruction constraint as well as the prior of the weight constraint. In this sense, they are different though they all use the same optimization method to solve their objective function. In fact, in the field of image processing and computer vision, the objective function of many methods is a very simple,
i.e., a constrained least squares problem. The difference lies in that different methods use different constraints (prior knowledge) to regularize the solutions. How to find a good prior knowledge and how to model it effectively is the key to the success of an algorithm. The novelty of the proposed method is the introduction of a reference dataset and using it to produce prior knowledge to regularize the combination (ensemble) weights.Dataset  SET14  

Scale  
Metric  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM 
Bicubic  30.24  0.8688  27.55  0.7742  26.00  0.7027 
Kim [31]  32.14  0.9032  28.96  0.8144  27.18  0.744 
SelfExSR [53]  32.22  0.9034  29.16  0.8196  27.40  0.7518 
A+ [39]  32.28  0.8056  29.13  0.8188  27.32  0.7491 
IA [69]  32.83  0.9110  29.63  0.8296  27.85  0.7643 
SRCNN [52]  32.42  0.9063  29.28  0.8209  27.49  0.7503 
CSCN [58]  32.56  0.9074  29.41  0.8238  27.64  0.7587 
CSCNMV [58]  32.80  0.9101  29.57  0.8263  27.81  0.7619 
VDSR [54]  33.03  0.9124  29.78  0.8314  28.01  0.7674 
DRCN [55]  33.04  0.9118  29.77  0.8312  28.02  0.7570 
ESCN [66]  32.67  0.9093  29.51  0.8264  27.75  0.7611 
RefESR  33.16  0.9134  29.90  0.8338  28.14  0.7702 
Dataset  SET5  

Scale  
Metric  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM 
Bicubic  33.66  0.9299  30.39  0.8682  28.42  0.8104 
Kim [31]  36.24  0.9518  32.3  0.9032  30.07  0.8542 
SelfExSR [53]  36.49  0.9537  32.58  0.9093  30.31  0.8619 
A+ [39]  36.54  0.9544  32.59  0.9088  30.28  0.8603 
IA [69]  37.37  0.9582  33.43  0.9186  31.05  0.8764 
SRCNN [52]  36.66  0.9542  32.58  0.9093  30.86  0.8732 
CSCN [58]  36.93  0.9552  33.10  0.9144  30.86  0.8732 
CSCNMV [58]  37.21  0.9571  33.34  0.9173  31.14  0.8189 
VDSR [54]  37.53  0.9587  33.66  0.9213  31.35  0.8838 
DRCN [55]  37.63  0.9588  33.82  0.9226  31.53  0.8854 
ESCN [66]  37.14  0.9571  33.28  0.9173  31.02  0.8774 
RefESR  37.71  0.9593  33.87  0.9224  31.55  0.8848 
Dataset  Urban100  

Scale  
Metric  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM 
Bicubic  26.88  0.8403  24.46  0.7349  23.14  0.6577 
Kim [31]  28.71  0.8942  25.24  0.7761  23.53  0.6790 
SelfExSR [53]  29.54  0.8967  26.44  0.8088  24.79  0.7374 
A+ [39]  29.20  0.8938  26.03  0.7973  24.32  0.7183 
IA [69]  29.93  0.9077  26.71  0.8106  24.93  0.7416 
SRCNN [52]  29.50  0.8946  26.24  0.7989  24.52  0.7221 
CSCN [58]  29.14  0.8988  25.58  0.7858  23.80  0.6924 
CSCNMV [58]  29.30  0.9015  25.70  0.7903  23.91  0.6984 
VDSR [54]  30.76  0.9140  27.14  0.8279  25.18  0.7524 
DRCN [55]  30.75  0.9133  27.15  0.8276  25.14  0.7510 
ESCN [66]  29.25  0.8986  25.72  0.7912  23.99  0.6975 
RefESR  30.88  0.9150  27.26  0.8285  25.28  0.7529 
Iv Experimental Results
In this section, we present the experimental settings used to evaluate the proposed RefESR approach and show the reconstruction results generated by carrying out SR experiments on three public general image databases and some face image databases.
Iva Experimental Setup
Database. To test the performance, we leverage three commonly used image sets, SET5, SET14, and Urban100, as the testing images^{1}^{1}1SET14 includes 14 different scenes and was firstly used by Zeyde et al. [32] to show their results, SET5 includes 5 different scenes of image and was used by Bevilacqua et al. [26], and Urban100 is created by Huang et al. [53] and contains 100 HR images with a variety of realworld structures, such as urban, city, and architecture. The length and width of original HR images for SET14 (the first column), SET5 (the second column), and Urban100 (the last two columns) databases, are all from 200 pixels to 600 pixels.. Like many stateoftheart single image SR methods [53, 39, 69, 52, 58, 54, 55]
, in our experiments the original HR images are degenerated by Bicubic interpolation (
i.e., the imresize function in Matlab) with a factor of 2, 3, and 4, to generate the corresponding LR images. It should be noted that if the image degradation process of the input LR image is unknown, which can be seen as the blind image SR, the performance of our method will reduce sharply because the mismatch between the true image degradation and simulated image degradation of the training dataset [72].Note that there are some contextual connections between the images in the reference set to image in the test set. This has been confirmed by many domainspecific image SR methods, i.e., face hallucination and text SR. When superresolving LR face images, a good general image SR method which is trained by diversity general images is usually worse than a domainspecific face image SR method which is trained by face images. In this paper, we consider only the general image SR problem, so we hope that the reference dataset should be as diversity as possible.
Implementation Details. To ensemble different component superresolvers, we first select some stateoftheart SR algorithms, which include four non deep learning, e.g., Kim [31], SelfExSR [53], A+ [39], and IA [69], and five deep learning based methods, e.g., SRCNN [52], CSCN [58], CSCNMV [58], VDSR [54], and DRCN [55].^{2}^{2}2We select these nine methods for their representative, pleasurable performance, and also public availability of their source codes. Then we test their performance on the reference dataset. Because we know the ground truth of the input LR image, the SR abilities of these algorithms can be measured by some objective metrics, such as PSNR, SSIM, or their combination. And then, the reference weight vector (calculated by Eq. (6)) is applied to regularize the optimization of the ensemble weights.
In the testing phase, we first reconstruct the HR images of abovementioned component superresolvers. And then, the optimal ensemble weights is obtained by Eq. (14). The final HR output can be constructed by the combination of the HR resultant images of different component superresolvers and the optimal ensemble weights.
IvB Parameter Analysis
In this subsection, we analyze the effect of model parameters for the performance of RefESR, and validate the proposed reconstruction constraint and reference ensemble weight prior used in the proposed network. Particularly, we conduct experiments on the testing image set of SET14 and the magnification is 3. For other cases, we can still draw a similar conclusion. Therefore, here we will not show up one by one. From the objective function (9) of our method, we learn that the bandwidth parameter and the regularization parameter have a great impact on the performance of the algorithm.
Method  PSNR  SSIM 

Best Component SuperResolver  29.78  0.8314 
Without Reconstruction Constraint  29.89  0.8337 
Without Weights Prior  29.70  0.8304 
Ensemble Via Averaging  29.71  0.8301 
The Proposed Method  29.90  0.8338 
Fig. 4 and Fig. 5 show the performance of our method when the other parameter is set to the optimal. As shown in Fig. 4, we can at least draw the following two conclusions: (i) The ensemble SR reconstruction is effective. This can be concluded by comparing the results when and . When , almost only the best component superresolver is active (the 8th method, i.e., VDSR [54]. Please refer to the topleft of Fig. 3), while when , only three component superresolvers (the 4th, 8th, and 9th methods, i.e., IA [69], VDSR [54], and DRCN [55]) contribute to the final result. (ii) The prior knowledge learned from the reference dataset is effective. This can be concluded by comparing the results when and . When , all the component superresolvers will be treated equally, i.e., the ensemble weights are set to the same value (please refer to the bottomright of Fig. 3 )), the performance is worse. This can be illustrated by that the poor component superresolver with unreasonable reconstruction of the results will pull down the overall reconstruction performance.
From Fig. 5, we can learn that the performance increases with the increase of the value of , and then slightly decrease. This indicates that the prior knowledge of the reference ensemble weights is very effective for the SR reconstruction. When , it reduces to the case of considering only the reconstruction constraint. There is 0.2 dB gain of the proposed method over the method neglecting the prior knowledge of the reference ensemble weights. The decrease after is because of overemphasizing the prior knowledge of the reference ensemble weights while neglecting the reconstruction constraint. This verifies our motivation of simultaneously taking into consideration of the reconstruction constraint (favors the degenerate model) and the prior knowledge generated from the reference dataset.
For the sake of convenience comparisons, in Table IV we tabulate the performance of abovementioned cases: RefESR without reconstruction constraint, RefESR without weights prior, ensemble via averaging, and the proposed RefESR method. In the second row, we also list the performance of the best component superresolver, i.e., VDSR [54]. The two cases of introducing the reconstruction constraint and averaging based ensemble obtain the similar results, which is consistent with Wang et al.’s results (see the Table 4 in [66]). It also shows that it is not enough to consider reconstruction constraint alone. When compared with RefESR without reconstruction constraint and RefESR without weights prior, it indicates that the ensemble weight prior is effective and relatively more important than the reconstruction constraint. This is mainly because that the component superresolvers used in our experiments are very competitive and have very good SR performance, and these methods essentially satisfy the reconstruction constraint. By incorporating the prior knowledge of ensemble weights, our method has a quite impressive gain, i.e., 0.2 dB. For image SR is a very hot topic and becomes a test bed for many emerging models and algorithms, and some very superior methods are constantly being presented, and thus it is very difficult for one new method to obtain a very large gain over previous methods. From Table IV, we can also see that simply averaging all the results of different methods will sacrifice the final ensemble performance, e.g., 0.07 dB decrease when compared with VDSR [54]. This once again shows the effectiveness of adaptively assigning different ensemble weights to different component superresolvers.
Under above optimal parameter settings, and , we examine the final ensemble weights of different testing images on the SET14. As shown in Fig. 6, three best component superresolvers, IA [69], VDSR [54], and DRCN [55], dominate the SR reconstruction. The better the quality of SR performance over the reference dataset is, the larger the ensemble weight is. This verifies our assumption that better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework. Moreover, from the results we also can see that some component superresolvers with low quality do not contribute substantially to the results. When we only select three methods that play dominant roles (i.e., the ensemble weights of these methods are relatively large), we find that this has little impact on the final performance of the proposed algorithm. This is consistent with the observation that it may be better to combine some instead of all of the component superresolvers.
IvC Compare with Stateoftheart
To verify the effectiveness of the proposed RefESR method, we provide quantitative and qualitative comparisons with the eight component superresolvers and Wang et al.’s ESCN method [66] over SET5, SET14, and Urban100 for different upscaling factors. We add the visual results of Bicubic interpolation, which can be seen as the baseline. In Table I, Table II, and Table III, we show the PSNR and SSIM for adjusted anchored ten comparison methods and our RefESR method. All the values in tables are the average over all the images within a dataset. From the results , we can learn that our method outperforms almost all existing methods, including the most competitive deep learning based methods, in all datasets and scale factors (in term of PSNR). Only in two situations our RefESR method is just a little worse than DRCN [55] (in term of SSIM). The visual comparisons of three typical images are shown in Fig. 7, Fig. 8, and Fig. 9. To make the comparison more notable, we also give the local region (marked by red boxes) magnification results. Our method produces relatively shaper boundaries and is free of the ringing artifacts.This can be explained as the following two reasons: (i) Through the ensemble strategy, it is possible to highlight the good side of these approaches with superior performance while inhibiting the poor side these approaches with poor performance. (ii). The reconstruction artifacts, ringing artifacts, of one component superresolver can be weakened by fusing multiple results. But we must also see that if all methods produce ringing artifacts in the same region, the ensemble results cannot eliminate these artifacts.
Image SR is a very hot topic and becomes a test bed for many emerging models and algorithms, especially recently very popular deep learning techniques. Almost every few days there will be a new algorithm is released in arXiv. In the process of preparing this paper, a series of deep learning based SR algorithms are released and achieve very good performance. To further demonstrate the effectiveness of the proposed ensemble learning framework, we additionally ensemble the most competitive method, EDSR [73], with aforementioned nine component superresolvers. Table V shows the results of EDSR and the proposed RefESR. In addition, we also give the results of the combination of geometric ensemble strategy and the proposed ensemble strategy, which is denoted as RefESR. From these results, we observe that: (i) Although the performance of EDSR is already very good, the proposed ensemble framework can still improve the final results. It shows that EDSR and other methods still have a certain degree of complementarity. (ii) RefE2SR is better than RefESR. This can be explained by the following reasons: when geometric ensemble strategy is applied to the component superresolvers, their performance can be improved. With these improved SR results, our proposed method can further promote the overall performance of the combination ensemble strategy.
Scale  

Metric  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM 
Dataset  SET14  
EDSR [73]  33.68  0.9172  30.34  0.8434  28.66  0.7845 
RefESR  33.85  0.9194  30.45  0.8454  28.75  0.7862 
RefESR  33.95  0.9203  30.61  0.8470  28.91  0.7873 
Dataset  SET5  
EDSR [73]  38.11  0.9601  34.64  0.9282  32.46  0.8968 
RefESR  38.16  0.9607  34.66  0.9285  32.48  0.8970 
RefESR  38.26  0.9611  34.92  0.9299  32.77  0.8996 
Dataset  Urban100  
EDSR [73]  32.93  0.9351  28.80  0.8653  26.64  0.9033 
RefESR  33.02  0.9373  28.89  0.8672  26.73  0.9039 
RefESR  33.19  0.9378  29.03  0.8698  26.96  0.9086 
Methods  

PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  
Wang [74]  27.73  0.7642  27.93  0.7564  27.01  0.7251 
NE [24]  30.73  0.8587  29.19  0.8065  27.92  0.7682 
LSR [75]  32.12  0.8969  28.70  0.7469  24.44  0.5269 
SR [17]  32.21  0.8983  28.37  0.7238  23.96  0.4903 
LcR [76]  32.23  0.8981  30.09  0.8275  30.29  0.8449 
SSR [77]  32.34  0.8992  29.82  0.8445  28.56  0.8022 
DRP [78]  32.60  0.9213  27.79  0.7102  23.21  0.4585 
RefESR  33.13  0.9252  30.67  0.8500  30.98  0.8624 
Gains  0.53  0.0039  0.58  0.0055  0.69  0.0175 
IvD Ensemble SuperResolution Results with Face Images
In order to verify the universality of our proposed ensemble framework, we test our the proposed RefESR method on the task of face image SR, a.k.a. face hallucination [79]. Similarly, through a reference set, the performance of different face SR algorithms is learned, i.e., their ensemble weights are estimated, and then the reconstruction results of different algorithms on the newly observed LR faces are integrated based on the estimated weights.
The reference face dataset consists of 600 images of 600 subjects, in which 200 subjects are from CASPEALR1 face database [80], 100 subjects are from CUHK face database [81], 200 subjects are from COXS2V face database [82], and 100 subjects are from Scface face database [83]. To evaluate the performance of the component superresolvers, we additionally collect 20, 10, 20, and 10 face images from these four databases to form the evaluation dataset. For testing, we capture 42 HighDefinition (HD) images, whos face images are very different from the face image in reference face dataset. Some example images are shown in Fig. 10. In our experiments, the component superresolvers for face images include Wang et al.’s Eigentrasformation method [74], neighbor embedding (NE) [24], least squares representation (LSR) [75], sparse representation (SR) [17], localityconstrained representation (LcR) [76], smooth sparse representation (SSR) [77], dual regularization prior (DRP) [78].
Similarly, we apply the reference face dataset to train the component superresolvers and use the evaluation dataset to obtain their performance in terms of PSNR and SSIM. Therefore, the the reference weight vector can be calculated according to Eq. (6). Based on the prior knowledge of w, we can obtain the optimal ensemble weight vector for each input LR face image by 9. In addition, we also conduct some experiments to test the robustness of our method when the input is contaminated by noise. Our first impression is: given the noise input, if the resulting images generated by different algorithms are not optimal (may contain noise), then the noise can be smoothed through fusion of different results.
Table LABEL:tab:face tabulates the performance (in terms of average PSNR and SSIM) of different component superresolvers and the proposed RefESR method under different noise levels, i.e., . We learn that RefESR achieves the best average PSNR and SSIM results. The gains of the proposed method over the second best method are obvious, greater than 0.5 dB in term of PSNR. In addition, we also observe that with the increase of noise, the advantage of the proposed method is much more obvious. In particular, when the input is noiseless, the PSNR gain of the proposed method over the second best method is 0.53 dB. When the input is contaminated by different levels of noise, the gain is 0.58 dB for and 0.69 dB for , respectively. We attribute this to the advantages of ensemble learning, which can eliminate the uncertainties caused by noise in different methods. Fig. 11 shows some visual comparison results of component superresolvers and the proposed method. From these results, we observe that the proposed RefESR method can remove most of the noise and well maintain the main structural information.
V Discussion
In this section, we show deep analysis to the proposed ensemble learning framework, so that readers can better capture our idea.
Time complexity. By ensembling the results of some stateoftheart methods, we can expect better reconstruction performance. This will also result in very high computational complexity. Despite the efficient solution of the optimization procedures of ESR, which take around 0.06 seconds for each image, the computational complexity of our method is high because the total running time is the sum of (i) all component superresolvers and (ii) the optimization procedures of ESR. Therefore, the computational complexity will be a bottleneck for our approach in practical applications.
Theoretical guarantee. Another drawback of the proposed algorithm is that there is no theoretical guarantee to produce a better result by ensembling different methods, which is also the limitation of conventional ensemble learning based machine learning methods [64]. From the experiments, we learn that in most cases our RefESR method beats all the comparison methods. However, under some situations, our RefESR method is worse than the best comparison method. Therefore, in the future we will consider the learning of a safe prediction from multiple component superresolvers, which is not worse than the performance of all component superresolvers.
Model universality. Different methods can adapt to different kind of test images. For example, there are SR algorithms for general images and SR algorithms for specific images such as digital characters, faces, and irises. SR models trained on general images are not suitable for reconstruction of specific images, and vice versa. Furthermore, the ensemble weight prior (of ensemble learning) obtained from the general images of different methods may not necessarily reflect the SR ability on specific images. In this paper, the proposed ensemble learning based SR method is applied to the general images and face images SR tasks. Through the experiments, we believe that the proposed method is indeed effective in the sense of improving the performance of the existing generic image SR algorithms or face image SR algorithms. In summary, the proposed framework is very universal in the sense that given a reference dataset, the proposed method can improve the performance of existing SR methods when the input image is with the same class of the reference dataset.
Choice of component superresolvers. In this paper, we do not consider the complementarity of different methods, but directly select several representative methods in the current SR field, including four shadow learningbased methods and five deep learningbased methods. We also believe that when choosing component superresolvers, it should consider the characteristics of different algorithms. Ensembling component superresolvers with different characteristics is more likely to improve the final ensemble performance.
Global reconstruction constraint. As shown in many previous works [84, 32, 17], global reconstruction constraint, which claims that the degenerated HR estimation should be consistent with the observed LR image [17, 70], is very effective for enhancing the final superresolved results by an iterative back projection strategy. In our experiments, we have found that if the performance of the component superresolver is good enough, the improvement brought by reconstruction constraint is very limited. In other words, when the component superresolver is good enough, it can basically meet the reconstruction constraint.
Vi Conclusion
In this paper, we present a novel framework based on ensemble learning to solve the single image SR problem. It introduces a reference dataset and incorporates the learned prior of each component superresolver, which states that the method obtains a better performance on the reference dataset should get a relatively larger weight when reconstructing the HR output image of an LR input one in the ensemble framework, to regularize the optimization of ensemble weights. We simultaneously model this learned prior of ensemble weights and reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, by an MAP formulation. Finally, we present an analytical solution to this constrained least squares problem induced from the MAP framework. Results show the effectiveness of the introduced prior knowledge of ensemble weights learned from a reference dataset.
Acknowledgment
References
 [1] S. C. Park, M. K. Park, and M. G. Kang, “Superresolution image reconstruction: a technical overview,” IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21–36, 2003.
 [2] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A comprehensive survey to face hallucination,” Int. J. Comput. Vis., vol. 106, no. 1, pp. 9–30, 2014.
 [3] X. Liu, D. Zhai, R. Chen, X. Ji, D. Zhao, and W. Gao, “Depth superresolution via joint colorguided internal and external regularizations,” IEEE Trans. Image Process., vol. 28, no. 4, pp. 1636–1645, 2019.
 [4] K. Jiang, Z. Wang, P. Yi, J. Jiang, J. Xiao, and Y. Yao, “Deep distillation recursive network for remote sensing imagery superresolution,” Remote Sensing, vol. 10, no. 11, p. 1700, 2018.
 [5] H. A. Aly and E. Dubois, “Image upsampling using totalvariation regularization with a new observation model,” IEEE Trans. Image Process., vol. 14, no. 10, pp. 1647–1659, 2005.
 [6] X. Liu, D. Zhai, D. Zhao, G. Zhai, and W. Gao, “Progressive image denoising through hybrid graph laplacian regularization: A unified framework,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1491–1503, April 2014.
 [7] X. Liu, G. Cheung, X. Wu, and D. Zhao, “Random walk graph laplacianbased smoothness prior for soft decoding of JPEG images,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 509–524, Feb 2017.
 [8] J. Sun, J. Sun, Z. Xu, and H.Y. Shum, “Image superresolution using gradient profile prior,” in CVPR. IEEE, 2008, pp. 1–8.
 [9] H. Chen, X. He, L. Qing, and Q. Teng, “Single image superresolution via adaptive transformbased nonlocal selfsimilarity modeling and learningbased gradient regularization,” IEEE Trans. on Multimedia, vol. PP, no. 99, pp. 1–1, 2017.
 [10] Y. HaCohen, R. Fattal, and D. Lischinski, “Image upsampling via texture hallucination,” in ICCP. IEEE, 2010, pp. 1–8.
 [11] D. Glasner, S. Bagon, and M. Irani, “Superresolution from a single image,” in ICCV, Sept 2009, pp. 349–356.
 [12] X. Liu, D. Zhao, R. Xiong, S. Ma, W. Gao, and H. Sun, “Image interpolation via regularized local linear regression,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3455–3469, 2011.
 [13] Y. Zhang, D. Zhao, J. Zhang, R. Xiong, and W. Gao, “Interpolationdependent image downsampling,” IEEE Trans. Image Process., vol. 20, no. 11, pp. 3291–3296, 2011.
 [14] J. Jiang, X. Ma, C. Chen, T. Lu, Z. Wang, and J. Ma, “Single image superresolution via locally regularized anchored neighborhood regression and nonlocal means,” IEEE Trans. Multimedia, vol. 19, no. 1, pp. 15–26, 2017.

[15]
W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: A lowrank approach,”
IEEE Trans. Image Process., vol. 22, no. 2, pp. 700–711, 2013.  [16] W. Gong, L. Hu, J. Li, and W. Li, “Combining sparse representation and local rank constraint for single image super resolution,” Inf. Sci., vol. 325, pp. 1–19, 2015.
 [17] J. Yang, J. Wright, T. Huang, , and Y. Ma, “Image superresolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
 [18] S. Yang, M. Wang, Y. Sun, F. Sun, and L. Jiao, “Compressive sampling based singleimage superresolution reconstruction by dualsparsity and nonlocal similarity regularizer,” Pattern Recognition Letters, vol. 33, no. 9, pp. 1049–1059, 2012.
 [19] X. Li, H. He, R. Wang, and D. Tao, “Single image superresolution via directional group sparsity and directional features,” IEEE Trans. Image Process., vol. 24, no. 9, pp. 2874–2888, 2015.
 [20] S. Yang, J. Zhang, S. Cui, M. Wang, and L. Jiao, “Curvelet support value filters (csvfs) for image superresolution,” Neurocomputing, vol. 211, pp. 53–59, 2016.
 [21] J. Ma, J. Zhao, J. Tian, X. Bai, and Z. Tu, “Regularized vector field learning with sparse approximation for mismatch removal,” Pattern Recognit., vol. 46, no. 12, pp. 3519–3532, 2013.
 [22] Z. Lin and H.Y. Shum, “Fundamental limits of reconstructionbased superresolution algorithms under local translation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 83–97, Jan. 2004.
 [23] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Examplebased superresolution,” in IEEE Computer Graphics and Applications, 2002, pp. 56–65.
 [24] H. Chang, D. Yeung, and Y. Xiong, “Superresolution through neighbor embedding,” in CVPR, vol. 1, 2004, pp. 275 – 282.
 [25] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” vol. 290, no. 5500, pp. 2323–2326, 2000.
 [26] M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi, “Lowcomplexity singleimage superresolution based on nonnegative neighbor embedding,” in Proceedings of British Machine Vision Conference (BMVC), 2012, pp. 1–10.

[27]
S. Yang, Z. Wang, L. Zhang, and M. Wang, “Dualgeometric neighbor embedding for image super resolution with sparse tensor,”
IEEE Trans. Image Process., vol. 23, no. 7, pp. 2793–2803, July 2014. 
[28]
J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. Tian, “Robust feature matching for remote sensing image registration via locally linear transforming,”
IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6469–6481, 2015.  [29] J. Ma, J. Wu, J. Zhao, J. Jiang, H. Zhou, and Q. Z. Sheng, “Nonrigid point set registration with robust transformation learning under manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., 2018.
 [30] H. Zhang, J. Yang, Y. Zhang, N. M. Nasrabadi, and T. S. Huang, “Close the loop: Joint blind image restoration and recognition with sparse representation prior,” in ICCV. IEEE, 2011, pp. 770–777.
 [31] K. I. Kim and Y. Kwon, “Singleimage superresolution using sparse regression and natural image prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 6, pp. 1127–1133, June 2010.
 [32] R. Zeyde, M. Elad, and M. Protter, “On single image scaleup using sparserepresentations.” Springer Berlin Heidelberg, 2012, vol. 6920, pp. 711–730.
 [33] J. Jiang, X. Ma, Z. Cai, and R. Hu, “Sparse support regression for image superresolution,” IEEE Photonics J., vol. 7, no. 5, pp. 1–11, 2015.
 [34] K. Jia, X. Wang, and X. Tang, “Image transformation based on learning dictionaries across image spaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 2, pp. 367–380, 2013.
 [35] S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semicoupled dictionary learning with applications to image superresolution and photosketch synthesis,” in in CVPR, 2012, pp. 2216–2223.
 [36] W. Yang, Y. Tian, F. Zhou, Q. Liao, H. Chen, and C. Zheng, “Consistent coding scheme for singleimage superresolution via independent dictionaries,” IEEE Trans. on Multimedia, vol. 18, no. 3, pp. 313–325, March 2016.
 [37] R. Timofte, V. De, and L. Van Gool, “Anchored neighborhood regression for fast examplebased superresolution,” in ICCV, Dec 2013, pp. 1920–1927.
 [38] C.Y. Yang and M.H. Yang, “Fast direct superresolution by simple functions,” in ICCV, 2013, pp. 561–568.
 [39] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast superresolution,” in Asian Conference on Computer Vision. Springer, 2014, pp. 111–126.
 [40] K. Zhang, D. Tao, X. Gao, X. Li, and J. Li, “Coarsetofine learning for singleimage superresolution,” IEEE Trans. Neural Netw. Learn. Syst., vol. PP, no. 99, pp. 1–14, 2016.
 [41] Y. Hu, N. Wang, D. Tao, X. Gao, and X. Li, “SERF: A simple, effective, robust, and fast image superresolver from cascaded linear regression,” IEEE Trans. Image Process., vol. 25, no. 9, pp. 4091–4102, 2016.
 [42] Y. Tang and Y. Yuan, “Learning from errors in superresolution,” IEEE Trans. Cybern., vol. 44, no. 11, pp. 2143–2154, Nov 2014.
 [43] W. Yang, T. Yuan, W. Wang, F. Zhou, and Q. Liao, “Singleimage superresolution by subdictionary coding and kernel regression,” IEEE Trans. Syst., Man, Cybern.: Systems, vol. 47, no. 9, pp. 2478–2488, Sept 2017.
 [44] J. Liu, W. Yang, X. Zhang, and Z. Guo, “Retrieval compensated group structured sparsity for image superresolution,” IEEE Trans. Multimedia, vol. 19, no. 2, pp. 302–316, Feb 2017.
 [45] Y. Zhang, Y. Zhang, J. Zhang, and Q. Dai, “Ccr: Clustering and collaborative representation for fast single image superresolution,” IEEE Trans. on Multimedia, vol. 18, no. 3, pp. 405–417, March 2016.
 [46] Y. Zhang, Y. Zhang, J. Zhang, D. Xu, Y. Fu, Y. Wang, X. Ji, and Q. Dai, “Collaborative representation cascade for singleimage superresolution,” IEEE Trans. Syst., Man, Cybern.: Systems, vol. PP, no. 99, pp. 1–16, 2018.
 [47] S. Yang, J. Liu, Y. Fang, and Z. Guo, “Jointfeature guided depth map superresolution with face priors,” IEEE Trans. Cybern., vol. PP, no. 99, pp. 1–13, 2017.
 [48] H. Shen, L. Peng, L. Yue, Q. Yuan, and L. Zhang, “Adaptive norm selection for regularized image restoration and superresolution,” IEEE Trans. Cybern., vol. 46, no. 6, pp. 1388–1399, June 2016.
 [49] Y. Tang and L. Shao, “Pairwise operator learning for patchbased singleimage superresolution,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 994–1003, 2017.
 [50] C. Deng, J. Xu, K. Zhang, D. Tao, X. Gao, and X. Li, “Similarity constraintsbased structured output regression machine: An approach to image superresolution,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 12, pp. 2472–2485, Dec 2016.
 [51] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
 [52] C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
 [53] J.B. Huang, A. Singh, and N. Ahuja, “Single image superresolution from transformed selfexemplars,” in CVPR, 2015, pp. 5197–5206.
 [54] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image superresolution using very deep convolutional networks,” in CVPR, 2016, pp. 1646–1654.
 [55] ——, “Deeplyrecursive convolutional network for image superresolution,” in CVPR, 2016, pp. 1637–1645.

[56]
K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image superresolution,”
IEEE Trans. Cybern., vol. 47, no. 1, pp. 27–37, Jan 2017.  [57] Y. Huang, L. Shao, and A. F. Frangi, “DOTE: dual convolutional filter learning for superresolution and crossmodality synthesis in MRI,” CoRR, vol. abs/1706.04954, 2017.
 [58] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image superresolution with sparse prior,” in ICCV, 2015, pp. 370–378.
 [59] Z. Wang, P. Yi, K. Jiang, J. Jiang, Z. Han, T. Lu, and J. Ma, “Multimemory convolutional neural network for video superresolution,” IEEE Trans. Image Process., pp. 1–1, 2018.
 [60] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
 [61] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in ECCV, 2016, pp. 694–711.
 [62] Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, “Unsupervised image superresolution using cycleincycle generative adversarial networks,” in CVPRW, June 2018.
 [63] Z.H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: many could be better than all,” Artificial intelligence, vol. 137, no. 12, pp. 239–263, 2002.
 [64] Z.H. Zhou, Ensemble methods: foundations and algorithms. CRC press, 2012.
 [65] R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video superresolution via deep draftensemble learning,” in ICCV, 2015, pp. 531–539.
 [66] L. Wang, Z. Huang, Y. Gong, and C. Pan, “Ensemble based deep networks for image superresolution,” Pattern Recogn., vol. 68, pp. 191–198, 2017.
 [67] M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Process., vol. 6, no. 12, pp. 1646–1658, 1997.
 [68] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600 –612, 2004.
 [69] R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve examplebased single image super resolution,” in CVPR, 2016, pp. 1865–1873.
 [70] X. Gao, K. Zhang, D. Tao, and X. Li, “Joint learning for single image superresolution via coupled constraint,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 469–480, 2012.
 [71] H. Cevikalp and R. Polikar, “Local classifier weighting by quadratic programming,” IEEE Trans. Neural Netw., vol. 19, no. 10, pp. 1832–1838, Oct 2008.
 [72] R. Timofte, E. Agustsson, L. Van Gool, M.H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee et al., “Ntire 2017 challenge on single image superresolution: Methods and results,” in CVPRW. IEEE, 2017, pp. 1110–1121.
 [73] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image superresolution,” in CVPRW, 2017, pp. 1132–1140.
 [74] X. Wang and X. Tang, “Hallucinating face by eigentransformation,” IEEE Trans. Syst. Man Cybern. Part CAppl. Rev., vol. 35, no. 3, pp. 425 –434, 2005.
 [75] X. Ma, J. Zhang, and C. Qi, “Hallucinating face by positionpatch,” Pattern Recogn., vol. 43, no. 6, pp. 2224 – 2236, 2010.
 [76] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Noise robust face hallucination via localityconstrained representation,” IEEE Trans. Multimedia, vol. 16, no. 5, pp. 1268–1281, 2014.
 [77] J. Jiang, J. Ma, C. Chen, X. Jiang, and Z. Wang, “Noise robust face image superresolution through smooth sparse representation,” IEEE Trans. Cybern., vol. 47, no. 11, pp. 3991–4002, 2017.
 [78] J. Shi and C. Qi, “Kernelbased face hallucination via dual regularization priors,” IEEE Signal Proc. Let., vol. 22, no. 8, pp. 1189–1193, 2015.
 [79] S. Baker and T. Kanade, “Hallucinating faces,” in Proc. IEEE Conf. on Automatic Face and Gesture (FG), 2000, pp. 83 –88.
 [80] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, and D. Zhao, “The caspeal largescale chinese face database and baseline evaluations,” IEEE Trans. Syst. Man Cybern. Part ASyst. Hum., vol. 38, no. 1, pp. 149 –161, 2008.
 [81] X. Wang and X. Tang, “Face photosketch synthesis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 1955–1967, 2009.

[82]
Z. Huang, S. Shan, R. Wang, H. Zhang, S. Lao, A. Kuerban, and X. Chen, “A benchmark and comparative study of videobased face recognition on cox face database,”
IEEE Trans. Image Process., vol. 24, no. 12, pp. 5967–5981, 2015.  [83] M. Grgic, K. Delac, and S. Grgic, “Scface–surveillance cameras face database,” Multimedia Tools Appl., vol. 51, no. 3, pp. 863–879, 2011.
 [84] S. Yang, M. Wang, Y. Chen, and Y. Sun, “Singleimage superresolution reconstruction via learned geometric dictionaries and clustered sparse coding,” IEEE Trans. Image Process., vol. 21, no. 9, pp. 4016–4028, 2012.