Log In Sign Up

Sparsity-based Color Image Super Resolution via Exploiting Cross Channel Constraints

Sparsity constrained single image super-resolution (SR) has been of much recent interest. A typical approach involves sparsely representing patches in a low-resolution (LR) input image via a dictionary of example LR patches, and then using the coefficients of this representation to generate the high-resolution (HR) output via an analogous HR dictionary. However, most existing sparse representation methods for super resolution focus on the luminance channel information and do not capture interactions between color channels. In this work, we extend sparsity based super-resolution to multiple color channels by taking color information into account. Edge similarities amongst RGB color bands are exploited as cross channel correlation constraints. These additional constraints lead to a new optimization problem which is not easily solvable; however, a tractable solution is proposed to solve it efficiently. Moreover, to fully exploit the complementary information among color channels, a dictionary learning method is also proposed specifically to learn color dictionaries that encourage edge similarities. Merits of the proposed method over state of the art are demonstrated both visually and quantitatively using image quality metrics.


page 3

page 7

page 8

page 10

page 11


Joint Dictionary Learning for Example-based Image Super-resolution

In this paper, we propose a new joint dictionary learning method for exa...

Mitigating Channel-wise Noise for Single Image Super Resolution

In practice, images can contain different amounts of noise for different...

Image Super-Resolution Based on Sparsity Prior via Smoothed l_0 Norm

In this paper we aim to tackle the problem of reconstructing a high-reso...

Super-Resolution of Brain MRI Images using Overcomplete Dictionaries and Nonlocal Similarity

Recently, the Magnetic Resonance Imaging (MRI) images have limited and u...

Compressive Sensing of Color Images Using Nonlocal Higher Order Dictionary

This paper addresses an ill-posed problem of recovering a color image fr...

Hybrid Function Sparse Representation towards Image Super Resolution

Sparse representation with training-based dictionary has been shown succ...

Local Patch Classification Based Framework for Single Image Super-Resolution

Recent learning-based super-resolution (SR) methods often focus on the d...

I Introduction

Superresolution is a branch of image reconstruction and an active area of research that focuses on the enhancement of image resolution. Conventional Super-Resolution (SR) approaches require multiple Low Resolution (LR) images of the same scene as input and maps them to a High Resolution (HR) image based on some reasonable assumptions, prior knowledge, or capturing the diversity in LR images [1, 2, 3]. This can be seen as an inverse problem of recovering the high resolution image by fusing the low resolution images of the scene. The recovered image should produce the same low resolution images if the physical image formation model is applied to the HR image. However, SR task is a severely ill-posed problem since much information is lost in the process of going from high resolution images to low resolution images and hence the solution is not unique. Consequently, strong prior information is incorporated to yield realistic and robust solutions. Example priors include knowledge of the underlying scene, distribution of pixels, historical data, smoothness and edge information and so on so forth. [4, 5, 6, 7]

In contrast to conventional super resolution problem with multiple low resolution images as input, single image super-resolution methods have been developed recently that generate the high resolution image only based on a single

low resolution image. Classically, solution to this problem is based on example-based methods exploiting nearest neighbor estimations, where pairs of low and high resolution image patches are collected and each low resolution patch is mapped to a corresponding high resolution patch. Freeman

et al. [1] proposed an estimation scheme where high-frequency details are obtained by taking nearest neighbor based estimation on low resolution patches. Glasner et al. [8] used the observation that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales and approached the single image super resolution problem. An alternate mapping scheme was proposed by Kim et al. [9]

using kernel ridge regression.

Many learning techniques have been developed which attempt to capture the co-occurrence of low resolution and high resolution image patches. [10] proposes a Bayesian approach by using Primal Sketch priors. Inspired by manifold forming methods like locally linear embedding (LLE), Chang et al. [11] propose a neighbourhood embedding approach. Specifically, small image patches in the low and high resolution images form manifolds with similar local geometry in two distinct feature spaces and local geometry information is used to reconstruct a patch using its neighbors in the feature space.

More recently, sparse representation based methods have been applied to the single image super resolution problem. Essentially in these techniques, a historical record of typical geometrical structures observed in images is exploited and examples of high and low resolution image patches are collected as dictionary (matrix). Yang et al. proposed to apply sparse coding for retrieving the high resolution image from the LR image [12]. Zeyde et al. extended this method to develop a local Sparse-Land model on image patches [13]. Timofte et al. proposed the Anchored Neighborhood Regression (ANR) method which uses learned dictionaries in combination with neighbor embedding methods [14, 15]. Other super resolution methods based on statistical signal processing or dictionary learning methods have been proposed by [16, 17, 18, 19, 20, 21].

On top of sparsity based methods, learning based methods have also been exploited for SR problem to learn dictionaries that are more suitable for this task. Mostly, dictionary learning or example-based learning methods in super-resolution use an image patch or feature-based approach to learn the relationship between high resolution scenes and their low resolution counterparts. Yang et al. [22] propose to use collection of raw image patches as dictionary elements in their framework. Subsequently, a method that learns LR and HR dictionaries jointly was proposed in [12]. A semi-coupled dictionary learning (SCDL) model and a mapping function was proposed in [23] where the learned dictionary pairs can characterize the structural features of the two image domains, while the mapping function reveals the intrinsic relationship between the two. In addition, coupled dictionary learning for the same problem was proposed in [24], where the learning process is modeled as a bilevel optimization problem. Dual or joint filter learning in addition to dual (joint) dictionaries was developed by Zhang et al. [25].

I-a Sparsity Based Single Image Super-Resolution

In the setting proposed by Yang et al. (ScSR) [12] a large collection of corresponding high resolution and low resolution image patches is obtained from training data. In this framework, the low resolution information can either be in the form of raw image patches, high frequency or edge information, or any other types of representative features, while high resolution information is in the form of image pixels to ensure reconstruction of high resolution images. Using methods mentioned for dictionary learning in SR task and sparsity constraints, high resolution and low resolution dictionaries are jointly learned such that they are capable of representing the LR image patches and their corresponding HR counterparts using the same sparse code. Once the dictionaries are learned, the algorithm searches for a sparse linear representation of each patch of LR image based on the following sparse coding optimization:



is the learned low resolution dictionary (or dictionary that is learned based on features extracted from LR patches),

is the sparse code representing the LR patch (or features extracted from LR patch) with respect to and is a regularizer parameter for enforcing the sparsity prior and regularizing the ill-posed problem. This is the familiar and famous LASSO [26, 27] problem which can be easily solved using any sparse solver toolbox. The high resolution reconstruction () of each low resolution patch or features of the patch () is then reconstructed using the same sparse code according to the HR dictionary as: . Joint dictionary learning for SR considers the problem of learning two joint dictionaries and for two features spaces (low resolution and high resolution domains) which are assumed to be tied by a certain mapping function [12, 23]. The assumption is that , the sparse representation of based on learned low resolution dictionary, should be the same as that of according to . The following optimization problem encourages this idea and learns low resolution and high resolution image dictionaries according to the same sparse code:


where is the number of training sample pairs and is the number of desired dictionary basis atoms. denotes the column of the matrix .

I-B Motivation and Contributions

Most of super-resolution methods, especially in single image SR literature, have been designed to increase the resolution of a single channel (monochromatic) image. A related yet more challenging problem, color super-resolution, addresses enhancing resolution of color (multi-channel) low resolution images to increase their spatial resolution. The typical solution for color super resolution involves applying SR algorithms to each of the color channels independently [28, 29]. Another approach which is more common is to transform the problem to a different color space such as YCbCr, where chrominance information is separated from luminance, and SR is applied only to the luminance channel [24, 12, 14] since the human eye is more sensitive to luminance information than chrominance information. Both of these methods are suboptimal for color super-resolution problem as they do not fully exploit the complementary information that may exist in different color channels. Moreover, the correlation across the color bands and the cross channel information are ignored in these ways of handling color super-resolution. In addition to this, many images have more information in the color channels rather than only luminance channel. For instance, Fig. 1 illustrates a synthetic image where there is much more color information (prominent edges) in chrominance channels (Cb and Cr) than luminance channel (Y). In traditional multi-frame super resolution problem, color information has been used in different ways to enhance super resolution results. Farsiu et al. [30] proposed a multi-frame demosaicing and super resolution framework for color images using different color regularizers. Belekos et al. proposed multi channel video super resolution in [31] and general color dictionary learning for image restoration is proposed in [32, 33, 34]. Other methods that use color channel information are proposed in [35, 36, 37, 38, 39, 40, 41, 42].

We develop a sparsity based Multi-Channel (i.e. color) constrained Super Resolution (MCcSR) framework. The key contributions of our work111Preliminary version of work was presented in ICIP conference 2016 [43] are as follows:

  • We explicitly address the problem of color image super-resolution by inclusion of color regularizers in the sparse coding for SR. These color regularizers capture the cross channel correlation information existing in different color channels and exploit it to better reconstruct super-resolution patches. The resulting optimization problem with added color-channel regularizers is not easily solvable and a tractable solution is proposed.

  • The amount of color information is not the same in each region of the image and in order to be able to force color constraints we develop a measure that captures the amount of color information and then use it to balance the effect of color regularizers. Therefore, an adaptive color patch processing scheme is also proposed in the paper where patches with stronger edge similarities are optimized with more emphasis on the color constraints.

  • In most dictionary learning algorithms for super-resolution, only the correspondence between low and high resolution patches is considered. However, we propose to learn dictionaries whose atoms (columns) are not only low resolution and high resolution counterparts of each other, but also in the high resolution dictionary in particular, we incorporate color regularizers such that the resulting learned high resolution patches exhibit high edge correlation across RGB color bands.

  • Reproducibility: All results in this paper are completely reproducible. The MATLAB code as well as images corresponding to the SR results are made available at:

The rest of this paper is organized as follows: In Section II, we generalize the sparsity-based super resolution framework to multiple (color) channels and motivate the choice of color regularizers. These color regularizers are used in Section III to assist learning of color adaptive dictionaries suitable for color super resolution task. Section IV includes experimental validation which demonstrates the effectiveness of our approach by comparing it with state-of-the-art image SR techniques. Concluding remarks are collected in Section V.

Ii Sparsity Constrained Color Image Super Resolution

Ii-a Problem formulation

A characteristic associated with most natural images is strong correlation between high-frequency spatial components across the color (RGB) channels. This is based on the intuition that a luminance edge for example is spread across the RGB channels [30, 44]. Fig. 1 illustrates this idea.

We can hence encourage the edges across color channels to be similar to each other. Fig. 2 also shows that RGB edges are far more close to each other than YCbCr edges. Such ideas have been exploited in traditional image fusion type super-resolution techniques [30], yet sparsity-based single image super resolution lacks a concrete color super resolution framework.

Fig. 1: Color chessboard cube and color channel components.

Fig. 2: Edges for color channels of chessboard cube.

Edge similarities across RGB color channels may be enforced in the following manner [30, 45, 46].


where and subscripts are indicating signals in R, G and B channels and matrix is a high-pass edge detector filter as in [44]. For instance, illustrate the edges in red channel of the desired high resolution image. These constraints are essentially enforcing the edge information across color channels to be similar in high resolution patches

. The underlying assumption here is that the high resolution patches need to be known beforehand which is not true in practice. We recognize however that these constraints can be equivalently posed on the sparse coefficient vector(s) corresponding to the individual color channels, since:


Note that sparse codes for different color channels are no longer independent and they may be jointly determined by solving the following optimization problem:


where the cost function is equivalent to the following:


For simplicity, we assume the same regularization parameters and for each of the edge difference terms and color channels. The high-pass edge detectors () are also chosen to be the same for each color channel. It is worth mentioning that if , (4) reduces to three independent sparse coding problems (ScSR) for each color channel. With the cross channel regularization terms, these sparse codes are no longer independent and (4) presents a challenging optimization problem in contrast with the optimization problem corresponding to single channel sparsity based super resolution. In the new problem, the additional color channel regularizers are of quadratic nature and make the optimization problem more challenging to solve. Next, we propose a tractable solution.

Ii-B Solution to the optimization problem

We introduce the following vectors and matrices:


Where and respectively are concatenation of sparse codes and low resolution image patches (or features) in different color channels. and are shifting matrices that can shift the order of coefficients in the vectors and matrices. They consist of zero and identity matrices and have a size of and , respectively. is the length of sparse code for each color channel, is the size of HR patches. and are dictionaries that contain color dictionaries in their block diagonals and is length of LR features (patches). We also define and simplify :


Finally, the cost function in (5) can be written as follows:


Substituting (7) in the above we have:


The re-written cost function in (11), which is now in a more familiar form, is a convex sparsity constrained optimization and consequently numerical algorithms such as FISTA [47, 48, 49] can be applied to solve it. Note that matrix captures cross channel constraints using its off-diagonal blocks.

Ii-C Color adaptive patch processing

In the previous subsection we presented our color image super resolution framework by exploiting color edge similarities across color channels. However, we should emphasize that not all patches in an image have the same amount of color information and edge similarities. Therefore, any single patch should be treated differently in terms of color constraints. The regularizer parameter can control the emphasis on color edge similarities. Next, we explain our approach to automatically determine in an image/patch adaptive manner.

We use the following color variance measure to quantify the color information in each patch:


where is normalization parameter, and are high-pass Scharr operators and and are Y, Cb and CR channel bands in YCbCr color space. We tested over a large number of image patches and determined a mapping from the values to actual regularizer values () in the optimization framework. This mapping is illustrated in Fig. 3.

Fig. 3: Relationship between color variance and regularizer parameter .

Iii Joint learning of Color Dictionaries

Correlation between color channels can be even better captured if the individual color channel dictionaries are also designed to facilitate the same. In order to learn such dictionaries, we propose a new cost function which involves joint learning of color channel dictionaries.

Given a set of sampled training image patch pairs , where is the set of high resolution patches sampled from training images and is the set of corresponding low resolution patches or extracted features, we aim to learn dictionaries with aforementioned characteristics. One essential requirement of course is that the sparse representation of low resolution patches and corresponding high resolution patches be the same. At the same time, the high resolution dictionary, which is responsible for reconstructing HR patches, should be designed to capture RGB edge correlations in the super-resolved images. Individually, sparse coding problems in low resolution and high resolution settings may be written as:


The additional terms in (14) incorporate the edge information across color channels as in (4). Note that there is an implicit constraint on and that they both are block diagonal matrices as defined in (6). Considering the requirement that the sparse codes are the same for LR and HR framework, we can obtain the following optimization problem which simultaneously optimizes the LR and HR dictionaries:


where balances the reconstruction error in low resolution and high resolution settings. Using simplifications similar to (9), this cost function can be re-written as follows:


where . The first and second terms in (17) are respectively responsible for small reconstruction error in low resolution and high resolution training data. The third term enforces sparsity and the last one encourages edge similarity via the learned dictionaries. We propose to minimize this cost function by alternatively optimizing over and individually, while keeping the others fixed.

With and being fixed, we optimize (17) over sparse code matrix . Interestingly because of the Trace operator and Frobenius norm, columns of can be obtained independently. For each column of () we can simplify the problem:


where and
. The optimization in (18) can be solved using FISTA [47].

The next step is to find the low resolution dictionary . By fixing and , the cost function reduces to:


Since is block diagonal and there is no explicit cross channel constraint for the low resolution dictionary, the above optimization can be split into three separate dictionary learning procedures as follows where .


which , and takes the subscripts from indicating a specific color channel. Each of the above dictionaries are learned by the dictionary learning method in [50].

Finally, for finding , when and are fixed, we have:


We develop a solution for (21) using the Alternative Direction Method of Multipliers (ADMM) [51]. We first define the function as follows which is essentially the same cost function with the multiplication by in the final term of (21) substituted by a slack matrix :

Then, solving the following optimization problem, which is a bi-convex problem, is equivalent to solving (21).


The following is a summary of iterative solution to (22) using ADMM until a convergence is achieved where is the iteration index of ADMM procedure:


Step 3 of the above ADMM procedure is straight forward. However, Steps 1 and 2 need further analytical simplifications for tractability.

Step 1: The optimization in this step can be re-written as:




Assuming the following block structure for and :


and due to the block diagonal structure of as in (6), we can rewrite each term in (26) in the following form:


Finally the cost function reduces to:


which is a separable optimization problem, i.e. it can be solved for and separately as follows:


Each of above subproblems now is solvable using the algorithmic approach in Online Dictionary Learning [50].

Step 2: This is an unconstrained convex optimization problem in terms of and we can find the minimum by taking the derivative. The closed form solution for is given by:


A formal stepwise description of our color dictionary learning algorithm is given in Algorithm 1.

0:  .initialize: , iteration index .
  for  Maxiter do
     (1) Find the sparse code matrix by Solving the convex optimization problem in (18):
     (2) Solve the LR dictionary learning problem in (20)
     (3) Solve the HR dictionary learning problem in (21):
     while stopping criterion not met do
        (3-1) Solve for using (23)
        (3-2) Solve for using (24)
        (3-3) Solve for using (25)
        (3-4) Increase inner iteration index .
     end while if
     (4) Increase iteration index .
  end for
Algorithm 1 Color Dictionary Learning

Iv Experimental Results

Our experiments are performed on the widely used set 5 and set 14 images as in [13]. We compare the proposed Multi-Channel constrained Super Resolution (MCcSR) method with several well-known single image super resolution methods. These include the ScSR [24] method because our MCcSR method can be seen as a multi-channel extension of the same. Other methods for which we report results are the Single Image Scale-up using Sparse Representation by Zeyde et al. [13], Anchored Neighborhood Regression for Fast Example-Based Super-Resolution (ANR) [15] and Global Regression (GR) [14] methods by Timofte et al, Neighbor Embedding with Locally Linear Embedding (NE+LLE) [11] and Neighbor Embedding with NonNegative Least Squares (NE+NNLS) [52] that were both adapted to learned dictionaries.

In our experiments, we will magnify the input images by a factor of , or , which is commonplace in the literature. For the low-resolution images, we use low-resolution patches with overlap of 4 pixels between adjacent patches and extract features based on method in [12]. It is noteworthy to mention that these features are not extracted from the

low resolution patches, but rather from bicubic interpolated version of the whole image with the desired magnification factor. Extracted features are then used to find the sparse codes according to (

11) which involves color information as well. Then, high resolution patches are reconstructed based on the same sparse code using the learned high resolution dictionaries and averaged over the overlapping regions. Dictionaries are obtained by training over patch pairs which are preprocessed by cropping out the textured regions and discarding the smooth regions. The number of columns in each learned dictionary is for most of our experiments and regularization parameter is picked via cross-validation to be .

Fig. 4: Comparison of different methods for comic image with scaling factor of 2 (Please refer to the electronic version and zoom in for obvious comparison). Numbers in parenthesis are PSNR, SSIM and SCIELAB error measures, respectively. Left to right: Original, Bicubic (30.46, 0.840, 1.898e4), Zeyde et al. (31.97, 0.887, 1.127e4), GR (31.70, 0.879, 1.198e4), ANR (32.09, 0.889, 1.077e4), NENNLS (31.87, 0.884, 1.159e4), NELLE (32.03, 0.889, 1.099e4), MCcSR (32.23, 0.899, 9.770e3), ScSR (32.14, 0.893, 1.014e4).
Fig. 5: Super-resolution results for scaling factor 3 and quantitative measures. Left to right: Original, Bicubic (27.51, 0.685, 3.423e4), Zeyde et al. (28.28, 0.737, 2.896e4), GR (28.15, 0.729, 3.008e4), ANR (28.36, 0.742, 2.865e4), NENNLS (28.17, 0.730, 2.961e4), NELLE (28.30, 0.738, 2.905e4), MCcSR (28.51, 0.758, 2.709e4), ScSR (28.31, 0.740, 2.860e4) .
Fig. 6: Super-resolution results for scaling factor 4 and quantitative measures. Left to right: Original, Bicubic (26.05, 0.566, 4.369e4), Zeyde et al. (26.61, 0.615, 3.923e4), GR (26.51, 0.607, 4.045e4), ANR (26.63, 0.618, 3.928e4), NENNLS (26.50, 0.606, 3.984e4), NELLE (26.57, 0.614, 3.967e4), MCcSR (26.74, 0.632, 3.818e4), ScSR (26.35, 0.608, 4.002e4) .
Fig. 7: Comparison of different methods for baboon image with scaling factor of 2. Numbers in parenthesis are PSNR, SSIM and SCIELAB error measures, respectively. Left to right: Original, Bicubic (28.19, 0.635, 7.856e4), Zeyde et al. (28.62, 0.683, 6.570e4), GR (28.63, 0.690, 6.388e4), ANR (28.67, 0.689, 3.287e4), NENNLS (28.58, 0.680, 6.585e4), NELLE (28.66, 0.688, 6.421e4), MCcSR (28.78, 0.705, 5.799e4), ScSR (28.69, 0.692, 6.296e4) .
Fig. 8: Super-resolution results for scaling factor 3 and quantitative measures. Left to right: Original, Bicubic (26.71, 0.480, 1.078e5), Zeyde et al. (26.94, 0.520, 1.008e5), GR (26.95, 0.529, 1.000e5), ANR (26.97, 0.527, 9.962e4), NENNLS (26.92, 0.518, 1.010e5), NELLE (26.97, 0.526, 9.998e4), MCcSR (27.11, 0.549, 9.574e4), ScSR (26.95, 0.524, 1.018e5) .
Fig. 9: Super-resolution results for scaling factor 4 and quantitative measures. Left to right: Original, Bicubic (26.00, 0.390, 1.237e5), Zeyde et al. (26.17, 0.420, 1.186e5), GR (26.17, 0.428, 1.183e5), ANR (26.19, 0.426, 1.180e5), NENNLS (26.15, 0.419, 1.190e5), NELLE (26.18, 0.425, 1.183e5), MCcSR (26.25, 0.446, 1.136e5), ScSR (26.11, 0.415, 1.185e5) .
Fig. 10: Effect of dictionary size on PSNR, SSIM and S-CIELAB error of SR methods with a scaling factor of .
Images PSNR (dB)
baby 38.42 39.51 39.38 39.56 39.22 39.49 39.51 39.40
butterfly 28.73 30.60 29.73 30.57 30.29 30.42 30.59 30.64
bird 36.37 37.90 37.44 37.92 37.68 37.90 38.02 37.59
face 35.96 36.44 36.40 36.50 36.39 36.47 36.48 36.37
foreman 35.76 37.67 36.84 37.71 37.37 37.69 37.74 37.64
coastguard 31.31 31.91 31.78 31.84 31.77 31.83 31.95 31.83
flowers 30.92 31.84 31.62 31.88 31.68 31.80 32.07 31.87
head 36.02 36.47 36.42 36.52 36.40 36.50 36.51 36.42
lenna 35.26 36.23 35.99 36.29 36.11 36.24 36.33 36.14
man 31.78 32.68 32.44 32.71 32.50 32.65 32.75 32.68
pepper 35.25 36.27 35.77 36.13 35.99 36.12 36.30 36.20
average 33.08 34.06 33.76 34.07 33.88 34.03 34.14 34.00
TABLE I: PSNR results of different methods for various images with scaling factor of .
Images SSIM
baby 0.88 0.90 0.90 0.90 0.89 0.90 0.90 0.89
butterfly 0.79 0.85 0.80 0.84 0.84 0.84 0.85 0.85
bird 0.90 0.92 0.91 0.92 0.92 0.92 0.93 0.91
face 0.72 0.74 0.74 0.74 0.74 0.74 0.75 0.74
foreman 0.89 0.91 0.90 0.91 0.90 0.91 0.91 0.90
coastguard 0.57 0.62 0.63 0.62 0.61 0.62 0.63 0.62
flowers 0.77 0.80 0.79 0.80 0.79 0.80 0.81 0.80
head 0.72 0.74 0.74 0.75 0.74 0.74 0.75 0.74
lenna 0.78 0.80 0.80 0.80 0.80 0.80 0.81 0.80
man 0.72 0.76 0.76 0.77 0.76 0.76 0.76 0.76
pepper 0.78 0.80 0.79 0.80 0.79 0.79 0.80 0.79
average 0.745 0.776 0.769 0.778 0.771 0.775 0.785 0.774
TABLE II: SSIM results of different methods for various images with scaling factor of .
baby 2.07E+04 1.36E+04 1.40E+04 1.32E+04 1.47E+04 1.34E+04 1.34E+04 1.50E+04
butterfly 2.28E+04 1.55E+04 1.84E+04 1.55E+04 1.60E+04 1.60E+04 1.54E+04 1.49E+04
bird 1.07E+04 7.36E+03 8.02E+03 7.21E+03 7.73E+03 7.30E+03 6.50E+03 7.81E+03
face 3.79E+03 2.71E+03 2.73E+03 2.57E+03 2.73E+03 2.61E+03 2.47E+03 2.70E+03
foreman 8.46E+03 3.90E+03 4.79E+03 3.48E+03 4.01E+03 3.62E+03 3.72E+03 3.89E+03
coastguard 1.96E+04 1.71E+04 1.70E+04 1.70E+04 1.76E+04 1.71E+04 1.69E+04 1.70E+04
flowers 4.47E+04 3.75E+04 3.89E+04 3.69E+04 3.84E+04 3.74E+04 3.29E+04 3.70E+04
head 3.79E+03 2.69E+03 2.74E+03 2.54E+03 2.79E+03 2.61E+03 2.42E+03 2.65E+03
lenna 2.44E+04 1.74E+04 1.85E+04 1.67E+04 1.79E+04 1.69E+04 1.58E+04 1.72E+04
man 3.80E+04 2.91E+04 3.03E+04 2.84E+04 3.02E+04 2.89E+04 2.88E+04 2.95E+04
pepper 2.48E+04 1.91E+04 2.15E+04 1.96E+04 2.02E+04 1.95E+04 1.73E+04 1.91E+04
average 2.79E+04 2.27E+04 2.36E+04 2.24E+04 2.33E+04 2.26E+04 2.14E+04 2.28E+04
TABLE III: S-CIELAB error results of different methods for various images with scaling factor of .

We perform visual comparisons of obtained super-resolution images and additionally evaluate them quantitatively using image quality metrics. The metrics we use include: 1.) Peak Signal to Noise Ratio (PSNR) while recognizing its limitations [53]222Note that since we work on color images, the PSNR reported is carried out on all the color channels., 2.) the widely used Structural Similarity Index (SSIM) [54] and 3.) a popular color-specific quality measure called S-CIELAB [55] which evaluates color fidelity while taking spatial context into account.

Iv-a Generic SR results

Fig. 4 show SR results for a popular natural image where resolution enhancement was performed via scaling by a factor of . In the description of the figure, PSNR (in dB), SSIM and S-CIELAB error measure appear in the parenthesis for each method. As can be seen in the enlarged area of Fig. 4, MCcSR more faithfully retains color texture. The bottom row of Fig. 4 shows the S-CIELAB error maps for different methods. It is again apparent that the MCcSR method produces less error around edges and color textures. Consistent with the visual observations, the S-CIELAB error is lowest for MCcSR.

Fig. 5 also shows the same image with a scaling factor of and the corresponding S-CIELAB error maps. In this case, the color texture in the enlarged area is even more pronounced for MCcSR vs. other methods. The trend continues and benefits of MCcSR are most significant for a scaling factor of in Fig. 6. Similar results for the Baboon image are shown for scaling factors of , , respectively in Figs. 7-9.

The degradation in image quality for SR results with increased scaling factor is intuitively expected. In a relative sense however, MCcSR suffers a more graceful decay. This is attributed to the use of prior information in the form of the quadratic color regularizers in our cost function, which compensates for the lack of information available to perform the superresolution task.

Tables I-III summarize the results of super resolution on images in set 5 and set 14 databases with a scaling factor of . PSNR, SSIM and S-CIELAB error measures are compared and almost consistently our MCcSR method outperforms all the other competing state-of-the-art methods. The last row in these tables is essentially the average performance of each method over all the images in set 5 and set 14 datasets. Due to space constraints, we do not include all the LR and SR images for set 5 and set 14 in the paper but they are made available online in addition to the code at:

Fig. 11: Visual Images as well as S-CIELAB error maps are shown for a scaling factor of 3. From left to right for each row Images correspond to: Original Image, applying SR separately on RGB channels, ScSR, MCcSR
Fig. 12: Visual Images as well as S-CIELAB error maps are shown for a scaling factor of 3. From left to right for each row images correspond to: Original image, applying SR separately on RGB channels (36.26, 0.83, 1.57e4), ScSR (36.13, 0.83, 1.67e4) and MCcSR (36.67, 0.85, 1.43e4). Numbers in parenthesis are PSNR, SSIM and SCIELAB error measures.

Iv-B Effect of Dictionary Size

So far we have used a fix dictionary of size atoms for all the methods. In this Section, we evaluate the effect of the learned dictionary size for super-resolution. We again sampled image patches and train dictionaries of size and respectively. The results are evaluated both visually and quantitatively in terms of PSNR, SSIM and S-CIELAB. As is intuitively expected reconstruction artifacts gradually diminish with an increase in dictionary size and our visual observations are also supported by PSNR, SSIM and S-CIELAB of the recovered images. Fig 10 shows the variation of different image quality metrics against dictionary size. For SSIM and S-CIELAB in particular, MCcSR is able to generate effective results even with smaller dictionaries.

Iv-C Effect of Color Regularizers: Separate RGBs

We provide evidence for the importance of effectively accounting for color geometry via an illustrative example image. Three variations of color SR results are presented next:

  1. SR performed only on the luminance channel by ScSR [24] method and bicubic interpolation is applied for chrominance channels.

  2. Single channel SR performed on red, green and blue channels independently. We again use ScSR method; however, we learn separate dictionaries for RGB channels and apply ScSR on RGB channels independently.

  3. Super-resolution by explicitly incorporating cross channel information into the reconstruction (our McCSR).

In these experiments we use a scaling factor of and the results are reported in Figs. 12, 11 and Table IV. It should particularly be noted (see Fig. 12) that applying the SR method independently on RGB channels introduces very significant artifacts around color edges which are not visible in the results of MCcSR and ScSR. Fig. 11 shows similar results for a few other images. Table IV reports image quality measures which confirms the importance of using color channel constraints.

Separate RGB ScSR MCcSR Separate RGB ScSR MCcSR Separate RGB ScSR MCcSR
comic 28.37 28.25 28.51 0.74 0.74 0.76 2.80e4 3.00e4 2.71e4
baboon 26.95 26.95 27.11 0.53 0.52 0.55 9.93e4 1.02e5 9.57e4
pepper 36.14 36.20 36.30 0.79 0.79 0.80 1.93e4 1.91e4 1.73e4
bird 37.71 37.59 38.02 0.92 0.91 0.93 7.28e3 7.81e3 6.50e3
TABLE IV: Quantitative measures to show effectiveness of color constraints in SR for a scaling factor of 3.

Iv-D Robustness to Noise

An often made assumption in single image SR is that the input images are clean and free of noise which is likely to be violated in many real world applications. Classical methods deal with noisy images by first denoising and filtering out the noise and then performing super-resolution. The final output of such a procedure highly depends on the denoising technique itself and the artifacts introduced in the denoising procedure may remain or even get magnified after super-resolution.

Similar to [12], the parameter in (4) is tuned based on the noise level of the input image and can control the smoothness of output results. We argue that our approach not only benefits from the noise robustness of ScSR [12], but the additional correlation information from multi-channels can help in further recovering more cleaner images.

We add different levels of Gaussian noise to the LR image input to test the robustness of our algorithm to noise and compare our results with ScSR method which has demonstrated success [12] in SR in the presence of noise. With a scaling factor of

, we chose the range of standard deviation of noise from

to and similar to [12] set to be one tenth of noise standard deviation. Likewise, we made the choice of in (4) using a cross-validation procedure to suppress noise. Fig 13 shows the SR results of an image with different levels of noise in comparison with ScSR and bicubic methods. Table V reports the average PSNR, SSIM and S-CIELAB error measures of reconstructed images from different levels of noisy images. In all cases, MCcSR outperforms the competition.

Measure Method
PSNR Bicubic 33.08 32.99 32.75 32.50 31.88
ScSR 34.00 33.95 33.92 33.90 33.86
MCcSR 34.14 34.11 34.09 34.09 34.07