1 Introduction
Superresolution is the problem of reconstructing a high resolution^{1}^{1}1In this article by resolution we mean spatial resolution. image from one or several low resolution images [1]. It has many potential applications like enhancing the image quality of lowcost imaging sensors (e.g., cell phone cameras) and increasing the resolution of standard definition (SD) movies to display them on high definition (HD) TVs, to name a few.
Prior to SR methods, the usual way to increase resolution of images was to use simple interpolationbased methods such as bilinear, bicubic and more recently the resampling method described in
[2] among many others. However all these methods suffer from blurring highfrequency details of the image especially for large upscaling factors (the amount by which the resolution of image is increased in each dimension). Thus, over the last few years, a large number of SR algorithms have been proposed [3]. These methods can be classified into two categories: multiimage SR, and singleimage SR.
Since the seminal work by Tsai and Huang [4] in 1984, many multiimage SR techniques were proposed [5, 6, 7, 8]. In the conventional SR problem, multiple images of the same scene with subpixel motion are required to generate the HR image. However the performance of these SR methods are only acceptable for small upscaling factors (usually smaller than 2). As the upscaling factor increases, the SR problem becomes severely illconditioned and a large number of LR images are needed to recover the HR image with acceptable quality.
To address this problem, examplebased SR techniques were developed which require only a single LR image as input [9]. In these methods, an external training database is used to learn the correspondence between manifolds of LR and HR image patches. In some approaches, instead of using an external database, the patches extracted from the LR image itself across different resolutions are used [10]. In [9] Freeman et al. used a Markov network model for superresolution. Inspired by the ideas in locally linear embedding (LLE) [11], the authors of [12]
used the similarity between manifolds of HR patches and LR patches to estimate HR image patches. Motivated by results of compressive sensing
[13], Yang et al. in [14] and [15] used sparse representation for SR. In [16] they introduced coupled dictionary training in which the sparse representation of LR image patches better reconstructs the HR patches.Recently, joint and coupled learning methods are utilized for efficient modeling of correlated sparsity structures [17, 15]. However joint learning methods and the coupled learning methods proposed in [14, 15, 18] still does not guarantee that the sparse representation of HR image patches over the HR dictionary is the same as the sparse representation of LR patches over LR dictionary. To address this problem, in this paper we propose a direct way to train the dictionaries that enforces the same sparse representation for LR and HR patches. Moreover since the HR dictionary is trained by minimizing the final error in reconstruction of HR patches, the reconstruction error in our method is smaller.
The rest of this paper is organized as follows. In section 2, Yang’s method for superresolution via sparse representation is reviewed. In section 3, a flaw in Yang’s method is discussed, and our method to solve this problem is presented. Finally, section 4 is devoted to simulation results.
2 review of superresolution via sparse representation
In SR via sparse representation we are given two sets of training data: a set of LR image patches, and a set of corresponding HR image patches. In other words, in the training data we have pairs of LR and HR image patches. The goal of SR is to use this database to increase the resolution of a given LR image.
Let
be the set of LR patches (each patch is arranged into a column vector
) and be the set of corresponding HR patches. In SR using sparse representation, the problem is to train two dictionaries and for the set of LR patches (or a feature of these patches) and HR patches respectively, such that for any LR patch , its sparse representation over , reconstructs the corresponding HR patch using : [15]. Towards this end, first the dictionary learning problem is briefly reviewed in section 2.1. Then the dictionary learning method for SR proposed in [15] is studied in section 2.2. Finally in section 2.3, it is shown how these trained dictionaries can be used to perform SR on a LR image.2.1 Dictionary learning
Given a set of signals , dictionary learning is the problem of finding a wide matrix over which the signals have sparse representation [19]. This problem is highly related to subspace identification [20]. However, sparsity helps us to turn the subspace recovery to a welldefined problem. This approach has attracted lot of attentions in the last decade and found diverse applications [21, 22, 23]. If we denote the sparse representation of over by , the dictionary learning problem can be formulated as
(1) 
in which the is the norm which is the number of nonzero components of a vector and is a small constant which determines the maximum tolerable error in sparse representations. Replacing the norm by norm, Yang et al. in [15] used the following formulation for sparse coding instead of (1)
(2) 
By defining and , it can be rewritten in matrix form as
(3) 
in which stands for the Frobenius norm. (2) and (1) are not equivalent, but closely related. (2) can be interpreted as minimizing the representation error of signals over the dictionary, while forcing these representations to be sparse by adding a regularization to the error. Therefore can be used as a parameter that balances the sparsity and the error; a larger results in sparser representations with larger errors.
2.2 Dictionary learning for SR
Given the sets of LR and HR training patches, and , by defining , and having (3) in mind, Yang et al. in [15] proposed the following joint dictionary learning to ensure that the sparse representation of LR patches over is the same as sparse representation of HR image patches over :
(4) 
The key point here is that they have used the same matrix for sparse representation of both LR and HR patches to make sure that their representation is the same over the dictionaries and . If we define the concatenated space of HR and LR patches:
then joint dictionary training (4) can also be written equivalently as
(5) 
This formulation is clearly the same as (3). In other words, in the concatenated space, joint dictionary learning is the same as conventional dictionary learning, and any dictionary learning algorithm can be used for joint dictionary learning.
2.3 SuperResolution
After training the two dictionaries and , the input LR image can be superresolved using the following steps:

The input LR image is divided into a set of overlapping LR patches: .

From each image patch , subtract its mean, ,
and find its sparse representation over

Using the sparse representation of each LR patch and its mean, the corresponding HR patch is estimated by

Combining the estimated HR image patches, the output HR image is generated.
3 our proposed method
Our method for SR is to improve the dictionary learning part of Yang’s method, described in section 2.2. Having the dictionaries trained, the rest of the method is the same as what described in section 2.3.
As mentioned earlier, in SR the dictionaries should be trained in a way that the sparse representation of each LR patch well reconstructs the corresponding HR patch. The Yang’s method uses (4) to accomplish this. It uses the same sparse representation matrix for both LR and HR patches to ensure that each LR and HR patch, both have the same sparse representation. However as can be seen from (5), this joint dictionary learning is only optimal in the concatenated space of LR and HR patches, but if we look at the space of LR and HR patches separately, we may find a sparser representation for some patches than the sparse representation found in the concatenated space.
To address this problem, note first that the SR method described in section 2.3 consists of two distinct operations: finding the sparse representation of the LR patch, and the reconstruction of the HR patch. Then, we note that the first operation uses only , and the second operation uses only . Therefore, instead of training the dictionaries jointly as in (4), we propose to train for LR patches solely, and then to train the HR dictionary by minimizing the reconstruction error when sparse representation of LR patches are used.
Mathematically, we propose to train the LR dictionary as
(6) 
which is a conventional dictionary learning problem. After training the LR dictionary, for each LR patch, its sparse representation is found over (note that this step is already done during the dictionary training in (6))
(7) 
Using the sparse representation of LR patches , the HR dictionary is found such that the reconstruction error of the corresponding HR patches are minimized, that is,
(8) 
This is an unconstrained quadratic optimization problem which has the following closedform solution:
(9) 
in which and represent transpose and pseudoinverse of a matrix, respectively.
Note that unlike Yang’s method, in the proposed method is not trained in a way that explicitly enforces the sparsity of representation of HR patches over it, rather it is trained to minimize the final reconstruction error.
4 simulation results
In this section we compare the performance of our method with Yang’s method. The error criteria used here are Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM) index [24]. PSNR criterion is defined as
(10) 
where MSE is mean square error given by
(11) 
in which is the original distortionfree image, and is the superresolved image derived from the SR algorithm, and and are dimensions of the image in pixels.
For the definition of SSIM refer to [24]. From these definitions it is clear that higher PSNR means less mean square error, however it does not necessarily mean a better image quality when perceived by human eye. Many other error criteria have been proposed to solve this problem of PSNR. SSIM is one of these error criteria. But still PSNR is widely used because of its simple mathematical form. This can be seen in (8
) where MSE is to train the HR dictionary, but in order to show the effectiveness of our method, here we use both SSIM and PSNR to compare images produced by our method with Yang’s method.
To make a fair comparison, the same set of 80000 training data patches sampled randomly from natural images is used to train dictionaries for both Yang’s and our method. The size of LR patches is and they are magnified by a factor of 2, i.e. the size of generated HR image patches is . The LR patches extracted from the input image have a 4 pixel overlap. Dictionary size is fixed at 1024, and is used for both methods as in [15].
In Fig. 1, simulation results of Yang’s method and proposed method on Lena image can be seen. The original image and the image magnified using bicubic interpolation are also given as references. The PSNRs of these images are dB, dB and dB for bicubic interpolation, Yang’s method and ours respectively. It is clear that the quality of images magnified by SR is much better than the image magnified by bicubic interpolation and the details are more visible, which has resulted in sharper images. But the difference between image (c) and image (d) is not noticeable visually, although the PSNR of image (d) which is superresolved by our method is about dB higher.
Bicubic  Yang  proposed  
Lena  PSNR  32.79  34.73  34.86 
SSIM  0.9012  0.9268  0.9283  
Parthenon  PSNR  26.50  27.77  27.89 
SSIM  0.8334  0.8737  0.8762  
Baboon  PSNR  24.66  25.30  25.39 
SSIM  0.9529  0.9872  0.9873  
Barbara  PSNR  27.93  28.61  28.59 
SSIM  0.9609  0.9852  0.9852  
Flower  PSNR  30.51  33.24  33.36 
SSIM  0.9230  0.9526  0.9538  
Average  PSNR  28.47  29.93  30.02 
SSIM  0.9143  0.9451  0.9462 
In Table 1 the PSNRs and SSIMs of some images produced by our method is compared with those of Yang’s method and bicubic interpolation. Almost all of the images recovered by our method have higher PSNRs than images recovered by Yang’s method. The average PSNRs given in the last row show that our method performs slightly better than Yang’s method on average.
The SSIMs in Table 1 also confirm that our method is performing better than Yang’s method. The images superresolved by the proposed method have on average a higher SSIM than images recovered by Yang’s method. Since SSIM is much more consistent with the image quality as it is perceived by human eye compared to PSNR, higher SSIM of images recovered by our method suggests that they also have better visual quality.
5 Conclusion and Future Works
In this paper, we presented a new dictionary learning algorithm for examplebased SR. The dictionaries were trained from a set of sample LR and HR image patches in order to minimize the final reconstruction error. Simulation results on real images showed the effectiveness of our algorithm in superresolving images with less error compared to Yang’s method. The average PSNR and average SSIM of images produced by our method were higher than images recovered by Yang’s method. In future, we can extend this work by training the HR dictionary using a better error criterion instead of PSNR. One of the advantages of our method is that training of is separated from in (6) and (8). We can use another error criterion that better represents the image quality like SSIM in (8) without making the training of more complex. Changing the error criterion in each of Yang’s methods will make the optimizations in their algorithms much more complex.
References
 [1] Sung Cheol Park, Min Kyu Park, and Moon Gi Kang, “Superresolution image reconstruction: a technical overview,” Signal Processing Magazine, IEEE, vol. 20, no. 3, pp. 21–36, 2003.
 [2] Lei Zhang and Xiaolin Wu, “An edgeguided image interpolation algorithm via directional filtering and data fusion,” Image Processing, IEEE Transactions on, vol. 15, no. 8, pp. 2226–2238, 2006.
 [3] Peyman Milanfar, Superresolution imaging, CRC Press, 2010.

[4]
RY Tsai and T.S. Huang,
“Multiframe image restoration and registration,”
Advances in computer vision and Image Processing
, vol. 1, no. 2, pp. 317–339, 1984.  [5] Michael Elad and Arie Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” Image Processing, IEEE Transactions on, vol. 6, no. 12, pp. 1646–1658, 1997.
 [6] Michael Elad and Arie Feuer, “Superresolution reconstruction of image sequences,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 9, pp. 817–834, 1999.

[7]
David Capel and Andrew Zisserman,
“Superresolution from multiple views using learnt image models,”
in
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on
. IEEE, 2001, vol. 2, pp. II–627.  [8] Sina Farsiu, M Dirk Robinson, Michael Elad, and Peyman Milanfar, “Fast and robust multiframe super resolution,” Image processing, IEEE Transactions on, vol. 13, no. 10, pp. 1327–1344, 2004.
 [9] William T Freeman, Thouis R Jones, and Egon C Pasztor, “Examplebased superresolution,” Computer Graphics and Applications, IEEE, vol. 22, no. 2, pp. 56–65, 2002.
 [10] Daniel Glasner, Shai Bagon, and Michal Irani, “Superresolution from a single image,” in Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009, pp. 349–356.
 [11] Sam T Roweis and Lawrence K Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
 [12] Hong Chang, DitYan Yeung, and Yimin Xiong, “Superresolution through neighbor embedding,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. IEEE, 2004, vol. 1, pp. I–275.
 [13] Emmanuel J Candès, “Compressive sampling,” in Proceedings oh the International Congress of Mathematicians: Madrid, August 2230, 2006: invited lectures, 2006, pp. 1433–1452.
 [14] Jianchao Yang, John Wright, Thomas Huang, and Yi Ma, “Image superresolution as sparse representation of raw image patches,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
 [15] Jianchao Yang, John Wright, Thomas S Huang, and Yi Ma, “Image superresolution via sparse representation,” Image Processing, IEEE Transactions on, vol. 19, no. 11, pp. 2861–2873, 2010.
 [16] Jianchao Yang, Zhaowen Wang, Zhe Lin, Scott Cohen, and Thomas Huang, “Coupled dictionary training for image superresolution,” Image Processing, IEEE Transactions on, vol. 21, no. 8, pp. 3467–3478, 2012.
 [17] A. Taalimi, H. Qi, and R. Khorsandi, “Online multimodal taskdriven dictionary learning and robust joint sparse representation for visual tracking,” in Advanced Video and Signal Based Surveillance (AVSS), 2015 12th IEEE International Conference on, Aug 2015, pp. 1–6.
 [18] A. Taalimi, A. Rahimpour, C. Capdevila, Z. Zhang, and H. Qi, “Robust coupling in space of sparse codes for multiview recognition,” in 2016 IEEE International Conference on Image Processing (ICIP), Sept 2016, pp. 3897–3901.
 [19] Michael Elad, Sparse and redundant representations: from theory to applications in signal and image processing, Springer, 2010.
 [20] Mostafa Rahmani and George Atia, “Innovation pursuit: A new approach to subspace clustering,” arXiv preprint arXiv:1512.00907, 2015.
 [21] Shervin Minaee, Amirali Abdolrashidi, and Yao Wang, “Screen content image segmentation using sparsesmooth decomposition,” in 2015 49th asilomar conference on signals, systems and computers. IEEE, 2015, pp. 1202–1206.

[22]
Mahdi Abavisani, Mohsen Joneidi, Shideh Rezaeifar, and Shahriar Baradaran
Shokouhi,
“A robust sparse representation based face recognition system for smartphones,”
in 2015 IEEE Signal Processing in Medicine and Biology Symposium (SPMB). IEEE, 2015, pp. 1–6.  [23] M Joneidi, M Rahmani, HB Golestani, and M Ghanbari, “Eigengap of structure transition matrix: A new criterion for image quality assessment,” in Signal Processing and Signal Processing Education Workshop (SP/SPE), 2015 IEEE. IEEE, 2015, pp. 370–375.
 [24] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli, “Image quality assessment: From error visibility to structural similarity,” Image Processing, IEEE Transactions on, vol. 13, no. 4, pp. 600–612, 2004.