Across-scale Process Similarity based Interpolation for Image Super-Resolution

03/20/2020 ∙ by Sobhan Kanti Dhara, et al. ∙ 2

A pivotal step in image super-resolution techniques is interpolation, which aims at generating high resolution images without introducing artifacts such as blurring and ringing. In this paper, we propose a technique that performs interpolation through an infusion of high frequency signal components computed by exploiting `process similarity'. By `process similarity', we refer to the resemblance between a decomposition of the image at a resolution to the decomposition of the image at another resolution. In our approach, the decompositions generating image details and approximations are obtained through the discrete wavelet (DWT) and stationary wavelet (SWT) transforms. The complementary nature of DWT and SWT is leveraged to get the structural relation between the input image and its low resolution approximation. The structural relation is represented by optimal model parameters obtained through particle swarm optimization (PSO). Owing to process similarity, these parameters are used to generate the high resolution output image from the input image. The proposed approach is compared with six existing techniques qualitatively and in terms of PSNR, SSIM, and FSIM measures, along with computation time (CPU time). It is found that our approach is the fastest in terms of CPU time and produces comparable results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 26

page 27

page 28

page 29

page 38

page 40

page 42

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Due to constraints in device and sensor technology, an image may not be captured in the required high resolution Park2003. Moreover, for high-speed transmission, the captured image is sometimes down-sampled to lower the bandwidth required wu2009low. The transmitted low resolution image is then required to be processed at the receiver to generate the image in higher resolution. Hence, approaches are needed to construct an image of the desired (higher) resolution from an image at lower resolution. Image super-resolution systems are designed for this purpose Park2003. Super-resolution has a critical role in applications where it is necessary to perform in-depth analysis of local regions for detection and recognition purposes yue2016image.

Most super-resolution strategies involve interpolation as a pivotal step to increase the resolution of an image. But interpolation is prone to artifacts such as blurring and ringing effects. Blurring artifact is due to attenuation at high frequencies which correspond to edges. The ringing artifact is due to the oscillation of signal at sharp edges because of the truncation of high frequency components Gonzalez2002.

Here, we discuss different interpolation techniques starting from the standard and well-known ones to recently proposed techniques in different categories more or less in a chronological order. The standard and well-known interpolation techniques are bilinear, bicubic and Lanczos, which introduce artifacts, especially at the edges Keys1981; Wolberg1990. Due to computational simplicity and fast performance siu2012review, these techniques are used by the standard photo editing software like Photoshop®111https://helpx.adobe.com/in/photoshop/using/resizing-image.html. Moreover, in many recently developed super resolution techniques, the input image is initially interpolated kim2016accurate; kim2016deeply or the chrominance channels are interpolated Dong2013; yang2017deep using one of the basic techniques. To handle the artifacts in the interpolation, different edge directed interpolations are proposed in the literature Jensen1995; Li2001. The technique proposed in Jensen1995 interpolates based on an edge fitting operator to maintain visual integrity. The technique proposed in Allebach1996 interpolates based on the direction of edges, and consists of data correction and edge directed rendering operations. The new edge directed interpolation (NEDI) proposed in Li2001 preserves low resolution covariance values at the higher resolutions based on resolution invariant property Li2001 of edge orientation. Such a property ensures interpolation along the edge direction avoiding blurring. NEDI is one of the widely referred techniques in the domain of interpolation.

Discrete wavelet transform (DWT) mallat1999wavelet decomposes an input image into four frequency subbands having low frequency component, and horizontal, vertical and diagonal edge components, separately. So, DWT based interpolation techniques in Carey1999a; nguyen2000wavelet; Demirel2011; Tian2011 can handle the edges efficiently. Regularity preserving image interpolation proposed in Carey1999aestimates the regularity of edges by measuring the decay of wavelet transform coefficients across scales. The algorithm also preserves the underlying regularity by extrapolating new subbands to be used in the generation of the high resolution image. The DWT-based technique mentioned in Demirel2011 decomposes an input image at finer to coarser levels and the subbands generated at each level are interpolated for estimation. The estimated high frequency subbands using DWT are then modified by using high frequency subbands obtained through the stationary wavelet transform (SWT) nason1995stationary. Then, inverse DWT (IDWT) is applied to get a sharper high resolution image. Prediction of the wavelet coefficients and parameter optimization by ant colony optimization is proposed in Tian2011 for a DWT-based interpolation. In Tian2011, a three-component exponential mixture (TCEM) model is used to analyze magnitude and sign information of wavelet coefficients. Ant colony optimization (ACO) dorigo2010ant technique is used for optimal parameter estimation of the TCEM model.

Sparse representation model (SRM) for image super-resolution is being widely explored Zhang2008; Mallat2010; Yang2010; Dong2011; Dong2013. The SRM based interpolation approach of Mallat2010, sparse mixing estimator (SME), interpolates by estimating a sparse image and by mixing directional interpolators over oriented blocks in a wavelet frame. SME computes mixing coefficients by minimizing norm which is weighted by signal regularity in each block. On the other hand, the sparse basis set-based technique of Zhang2008; Dong2011 learns a map from a relevant training set. Thus the missing information is retrieved or generated from training data. However, in Ebrahimi2007, the authors suggest extraction of information from the input image because it provides limited but more relevant information than that from a universal database Freedman2011. A self-exemplar SRM (without external training) is proposed in the NARM-SRM-NL approach of Dong2013

, where nonlocal autoregressive modeling is used. The earlier autoregressive model has been used

Zhang2008 to exploit the local similarity for interpolation, whereas, NARM-SRM-NL exploits both the local and nonlocal similarity in an adaptive manner and interpolates a pixel as a linear combination of its nonlocal neighbors. Although the algorithm promises improved performance, it suffers from a huge computational burden due to patch clustering and PCA sub-dictionary learning.

Self-similarity based super-resolution (SR) algorithms generate higher resolution images by exploiting statistical prior. These algorithms utilise the internal statistics of LR-HR patch pair Freedman2011; Singh2014; Huang2015. The technique proposed in Freedman2011 searches extremely localized regions for the example patches. However, realistic reproduction of fine-detailed cluttered regions is not achieved, and those regions appear somewhat faceted. Instead of searching for example patches from the input image, the technique proposed in Singh2014 searches for patches in different subbands of the input image. Recently, in Huang2015, the authors proposed a geometric patch transformation model which includes affine and perspective transformations during interpolation by patch matching. However, patch matching and dictionary learning based algorithms are always computationally expensive.

Recently, few end to end deep neural networks have been proposed for super-resolution. SRCNN

dong2014learning; dong2016image

uses the convolutional neural network (CNN)

krizhevsky2012imagenet for image super-resolution. The technique is later extended to DRCN kim2016deeply by increasing the number of layers of the network and incorporating recursive learning. The techniques proposed in kim2016accurate; yang2017deep use residual learning in deep and very deep networks. EDSR lim2017enhanced is an enhanced version of deep residual networks in single-scale architecture. The performance of EDSR is shown to be superior compared to other learning based super-resolution approaches. The super-resolution technique SRGAN ledig2017photo uses the generative adversarial network (GAN) goodfellow2014generative to generate perceptually high quality photo-realistic natural output images. As stated in Singh2014; cruz2018single, such methods are well suited to perform on images of types similar to that in its training set.

Restoration of edges (which is high frequency components) is crucial for the improvement of image visual quality shao2008edge; vishnukumar2014edge. Recently, authors in Acharya2017a proposed the so-called composite predictive technique to efficiently restore the high frequency and very high frequency components of the image. The technique extracts these components from the input low resolution image by exploiting the local statistics of an image. Later, in Acharya2017, the authors proposed a technique where fuzzy local adaptive Laplacian post-processing is used to enhance high frequency details in up-sampled images for interpolation.

The above discussion shows that image structure information at the desired high resolution image is related to that in the low resolution image. Such a relation is considered in many recent interpolation approaches as structural similarity deng2014structural. A higher resolution image can be generated by modeling the relation between its structure information and that in its low resolution image (given input). Such a modeling may not assume structural similarity. The relationship in structure information must be exploited, as it is the only relevant information about the high resolution image available from the input image for constructing the high frequency details.

In this paper, we propose a technique that performs interpolation by infusing high frequency signal components computed leveraging ‘process similarity’. For this, we consider the decomposition of an image at a higher resolution into a lower resolution image and high frequency details. Process similarity refers to the similarity between such a decomposition process of the image at one resolution and that at another resolution. This process similarity allows us to model the relation between structure information of the input low resolution image and that of the desired high resolution image. The modeling can be based on the available relation between structure information of the input image and its low resolution approximation obtained by the decomposition (See Figure 2). To accomplish this, discrete wavelet transform (DWT) and stationary wavelet transform (SWT) are used. The high frequency components and lower resolution approximations obtained through the said transforms are used to get the structural relation between the input image and its low resolution approximation. The structural relation is represented by model parameters obtained through particle swarm optimization (PSO). The parameters are then used to fuse the high frequency components that are subsequently employed to estimate the input image from its low resolution approximation. The PSO works to minimize the error between the actual input image and its estimate. Once the model parameters are obtained, they are then used to generate the desired high resolution image from the input image, just like the estimation process of input image from its low resolution approximation. Note that, the structural relation modeling in our approach should not be considered as a modeling under the well-known concept of self-similarity yang2010exploiting in super resolution. Presence of a structural relation between an image at two different resolutions does not necessarily mean that their structures are similar, that is, structural relation does not imply structural self-similarity.

As mentioned earlier, the concept of process similarity allows modeling of structural relation based on the input image and its low resolution approximation. Therefore, the structural information used for interpolation is obtained from the input image itself, which is more likely to carry relevant information required to generate the image at higher resolution compared to a database of images similar to the input image. Moreover, the high frequency components generated for the infusion are directional sensitive (vertical, horizontal and diagonal). To incorporate location consistency and value exactness (See Section 2), our approach considers both DWT and SWT in the modeling. These are the reasons why our technique when compared to the existing, produces results having lower artifacts, and improves both the subjective and objective resemblance between the output high resolution and the input low resolution images. Finally, the structural relation is represented by model parameters obtained through computationally inexpensive particle swarm optimization. With respect to the algorithms compared, our approach is less expensive in terms of memory and speed requiring only multi-scale decompositions up to two levels, multi-scale reconstruction and optimization using PSO. We consider comparisons of our approach based on visual inspection as well as PSNR, SSIM, Wang2004 and FSIM Zhang2011 measures with the widely accepted edge directed NEDI Li2001, nonlocal autoregressive modeling based NARM Dong2013, recent self-exemplar based SR-TSE Huang2015, the latest fuzzy predictive composite scheme (FPCS-LAL) Acharya2017

, and deep learning based EDSR

lim2017enhanced and SRGAN ledig2017photo. A preliminary version of this work has been published in dhara2017, where a few other existing approaches were considered for result analysis. They, in general, perform inferior to the existing approaches considered here for comparison. As discrete and stationary wavelet transforms, and particle swarm optimization are the core methods employed in our approach, a brief summary of them is provided in A and B.

The overall contributions of the paper are as follows :

  • We propose across-scale process similarity based on discrete wavelet transform (DWT) and stationary wavelet transform (SWT).

  • We model the structural relation between the input image and its low resolution approximation by exploiting the complementary nature of the DWT and SWT, and using particle swarm optimization (PSO).

  • Our approach is the fastest (in terms of CPU time) while producing satisfactory results.

Rest of the paper is organized as follows. The motivation of our proposal is explained in Section 2, our proposed technique is elaborated in Section 3 and the comparative results of interpolation for super-resolution using various techniques including the proposed approach are given in Section 4. Finally, Section 5 concludes the paper.

2 Motivation

A super-resolution system comprises of different sub-modules, namely interpolation, deblurring, artifact removal, denoising, etc. In the system, the pivotal step is interpolation which estimates values for increasing the resolution of the input image Park2003. Interpolation is an ill-posed problem as there is no unique solution for the estimation Dong2013. Different techniques in literature gather and utilize information relevant to the content of the high resolution image to be generated in different ways.

Symbol Description
Image
Scaling function
Wavelet function
k times the resolution of an image X
Scaling factor
M, N M is the row size and N is the column size of an image
or Approximation component after wavelet decomposition
or Vertical details after wavelet decomposition
or Horizontal details after wavelet decomposition
or Diagonal details after wavelet decomposition
Optimal weights as model parameters
Table 1: Description of important symbols used in the paper

To perform interpolation, in this paper, we consider human nature regarding anticipation in perception Bubic2010. Humans expect and thus predict a certain entity by projecting the knowledge about a contextually similar entity from their past experience. In our interpolation approach, we consider the ‘past experience’ to be the process that estimates the low resolution input from its lower resolution approximation. The approach performs the ‘projection of past experience’ by using the aforesaid process to generate the unknown higher resolution image from the low resolution input. Owing to the said human nature, it would not be inappropriate to assume that a human would do interpolation in a manner similar to which we consider to design our algorithm.

Now, according to analysis and synthesis of signals using wavelets Gonzalez2002; Daubechies1990 (A) an image of size at a higher scale (say ) can be generated through inverse wavelet transform (defined in equation (1)) when we have its approximation (LL), vertical details (LH), horizontal details (HL), and diagonal details (HH) in scale.

(1)

where is the scaling function and , and are wavelet functions in horizontal, vertical and diagonal direction, respectively, of scale. The approximation (LL) and the detail coefficients in horizontal (HL), vertical (LH) and diagonal (HH) directions are generated by wavelet decomposition of the signal as shown in (1) using the scaling and wavelet functions.

(2)
(3)

If the aforesaid wavelet based analysis and synthesis is considered for interpolation, then, the input low resolution image can be decomposed into its immediate lower resolution approximation (LL) using (2) and high frequency details using (3). Now, one can assume that only the lower approximation is available for synthesizing back the input image, and not the high frequency details. In such a case, one can try to estimate the unavailable high frequency details from the input image’s lower approximation. The estimated high frequency details can then be used in (1) to synthesize an image, which will be an estimate of the input image. The estimation of the unavailable high frequency details should obviously be done to minimize the error between the actual input image and its estimate. Such an estimation process can be considered analogous to modeling the ‘past experience’. Thus the ‘projection of past experience’ will be analogous to using the process to estimate the input image from its lower approximation for generating (or estimating) the desired high resolution image from the input low resolution image. That is, the input image should be considered as the lower resolution approximation (LL) of the desired high resolution image and the derived estimation model of unavailable high frequency details should be used to compute the high frequency details required to generate the desired high resolution image. With the high frequency details estimated, (1) can be used for the aforesaid generation, achieving interpolation.

The approach described above is what we employ in our technique and refer to it as the process similarity based interpolation. In the term, process signifies the ‘past experience’ (available knowledge) and similarity allows the ‘projection of past experience’.

Figure 1: Similarity between decompositions at different scales

Note that, the use of the process similarity by us is possible due to the resemblance in wavelet decomposition processes at different scales Carey1999a. The fundamental idea behind our proposal is that the higher resolution image is generated from a lower resolution image in a similar way irrespective of the scale. This can be depicted from the illustration shown in Figure 1. Let, (size: MN) denotes the resolution of the input image . So, the output image having the resolution (size: 2M 2N) is assumed to be generated from the image of resolution in a similar way to the generation of the image having resolution from the image of resolution (size: M/2N/2). It is obvious that the input image at resolution and the image at resolution are readily available to optimally model the estimation () of image at from the image at . Here, we consider input image () of resolution as the ideal one to be estimated from the image of resolution to capture and minimize the loss of information during interpolation. Once the modeling is performed, it can be applied recursively to generate an image of any higher resolution (size: MN, =, ) from a given input image of resolution . Note that, by definition, resolution and size of an image may not correspond to each other. As assuming resolution and size to be the same does not affect our analysis, we use them interchangeably for simplicity.

As mentioned earlier, we consider both the discrete wavelet transform (DWT) and the stationary wavelet transform (SWT) in modeling the process similarity. SWT is considered because of its scale-invariance property nason1995stationary. This allows us to have exact correspondence between spatial locations in an image and its SWT decomposed approximation (LL). Scale-invariance property does not hold for DWT decomposition, as it involves down-sampling. On the other hand, an application of DWT on an image yields an approximation (LL) with faithful value representation. This is because the basis functions used for DWT decomposition are not only similar across scales Carey1999a but also work at the same translations across scales Gonzalez2002; Daubechies1990. This is evident from the well-known recursion expression relating DWT basis functions at two consecutive scales:

(4)

where is scaling coefficient, and we see that a translation of the basis function is related to a translation of the down-sampled basis function , signifying equal amount of translation. Such a correspondence does not hold for SWT decomposition.

Note that, although the above discussion of SWT and DWT considers the decomposed approximation (LL), the same holds for the decomposed detail coefficients (HL, LH, and HH) as well. Thus, for SWT decomposition, location correspondence between an image and its decomposed coefficients is accurate. However, the values in the decomposed coefficients might not faithfully represent that in the image. So, we can consider that the location correspondence is our wanted component and the inexact value as the unwanted one in SWT usage. For DWT decomposition, it is the other way around, and hence, we can consider that the location inconsistency is our unwanted component and the value faithfulness as the wanted one.

Figure 2: Block diagram of the proposed interpolation approach for image super-resolution

3 Proposed process similarity based interpolation

Our image interpolation approach uses process similarity (See Section 2) to model the structural relation between scales in the image under consideration. The model is then used to generate the image at the required higher resolution. As elaborated in Section 1, across-scale subband self-similarity as in Singh2014; piao2007image must not be confused with across-scale process similarity, as the latter gives structural relation whereas the former assumes subband similarity. Our proposed algorithm is shown in Figure 2. It consists of two modules: modeling and generation. In the modeling stage, the input low resolution image is used to estimate a model capturing the structural relation between scales in the form of certain optimal weights . In the generation stage, these optimal weights are used to generate the higher resolution image from the input low resolution image. The modules of the proposed technique are elaborated below.

3.1 Module 1: Modeling

3.1.1 Part 1: High frequency components from discrete (DWT) and shift-invariant (SWT) wavelet decomposition

As shown in Figure 2, in the proposed approach, DWT decomposes the signal to half () the input resolution . Four subbands, approximation image LL, vertical details LH, horizontal details HL, and diagonal details HH are obtained with the latter three being the high frequency subbands. Further, the approximation LL is decomposed by both SWT and DWT which will generate subbands at resolutions and , respectively. The corresponding high frequency subbands at these two different resolutions are weighted by parameters and then fused using across-scale addition to generate three components (LH, HL, and HH), which are used in parameter optimization. For across-scale addition, upsampling followed by Gaussian filtering (zero-phase filter) of the lower resolution () components is performed before element-wise addition at the higher resolution ().

The parameter values determined from the optimization (See Section 3.1.2) are tuned to model the structural relation due to process similarity. That is, we consider components at resolutions and which are used to estimate the image at , similar to how we will use components at and to estimate the desired image at resolution . Recursive implementation of this operation can generate an image at a desired high resolution with scaling factor =, .

The model parameters (weights ) are optimized to generate the LH, HL, and HH components at resolution such that error between the generated image at and the input image is minimized. Obviously, the image at is generated using the optimal LH, HL, and HH along with the LL at through IDWT.

3.1.2 Part 2: Parameter optimization

As mentioned in Section 2, our across-scale process similarity based high resolution image generation process uses value faithfulness provided by DWT and location invariance provided by SWT. We use DWT and SWT to get three high frequency subbands that are obtained through across-scale addition.

During across-scale addition, the three components each from DWT and SWT are weighted by model parameters which would represent the structural relation between scales. Among the model parameters, the weights correspond to the three components from DWT and the weights correspond to those from SWT. To represent the structural relation between scales appropriately, these model parameters need to be optimized (). To do so, as mentioned in Section 3.1.1, we first decompose the input image into lower resolution () components and then use the components to optimally get an estimate of the input image. The optimization minimizes the error between the actual input image at resolution and its estimate. Particle swarm optimization (PSO) is a standard and widely used evolutionary optimization algorithm for various applications zhang2015comprehensive. PSO is very efficient in reaching an optimal solution utilizing limited memory, and faster computation kirchmaier2013swarm with ‘primitive mathematical operations’ chen2010saliency. Moreover, the optimization algorithm does not require any prior information about the solution. Global and local explorations of the particles ensure effective and faster optimization. The simple yet robust algorithm can optimize the solution through parallel processing. Therefore, we use PSO (as discussed in B) whose details are given in Section 4.1.

Note that the presence of across-scale process similarity allows these optimal weight parameters to represent the structural relation. A lower optimal weight value corresponding to a component from DWT suggests that the error contributed by location correspondence inaccuracy in it is dominating the correctness contributed by its value faithfulness. A higher optimal weight value would signify the opposite. On the other hand, a lower optimal weight value corresponding to a component from SWT suggests that the error contributed by value inexactness in it is dominating the correctness contributed by its accurate location correspondence. Again, a higher optimal weight value would signify the opposite.

Therefore, the optimal weights essentially represent the suitability of each of the six high frequency bands that are used to generate the image with resolution from the image with resolution . Due to process similarity, as elaborated under the generation stage, we use these weights to generate an unavailable higher resolution image from the input image at resolution .

3.2 Module 2: Generation

In the generation stage, the model parameters which represent the structural relation, will be used on the input image to generate the image at the desired higher resolution. The approach employs the optimal weights determined in the modeling stage (See Section 3.1). The generation block of our algorithm is shown in Figure 2 within the dashed (- - -) box. As mentioned earlier, our high resolution image generation process uses DWT and SWT.

Initially, during the generation, the input image of resolution is decomposed by DWT and SWT operation. We have four subbands LL, LH, HL, and HH of resolution and each due to DWT and SWT, respectively. Due to the presence of process similarity, we now use the optimal weights determined in the modeling stage to get the high frequency components to be used in the generation. During fusion by across-scale addition of corresponding high frequency components from DWT and SWT, the six subbands are multiplied with the corresponding optimal weights. The fusion generates high frequency components LH, HL, and HH at resolution . Then IDWT is applied, where the available input image acts as the approximation LL and the three estimated components LH, HL, and HH as the required high frequency details.

Further, to generate an image of a desired resolution , where =, ), one would require to apply the image generation stage of the proposed interpolation approach times recursively.

4 Experimental Results and Discussions

Experiments are performed to compare the proposed method with six existing methods and the results are reported here. The performance of our approach is compared to that of edge directed NEDI Li2001, nonlocal autoregressive modeling based NARM Dong2013, self-exemplar based SR-TSE Huang2015, fuzzy predictive composite scheme (FPCS-LAL)Acharya2017, and deep learning based EDSR lim2017enhanced, SRGAN ledig2017photo. EDSR is a recent technique which performs well for a variety of images and represents the state-of-the-art. The other five approaches are popular in their respective categories, with a couple of them (SRGAN, FPCS-LAL) being recently reported ones. The comparisons here are made both quantitatively and qualitatively in a multifaceted analysis, where the computation time is also considered. Note that, the codes of NARM, SR-TSE, FPCS-LAL, EDSR, and SRGAN were provided by the corresponding authors. We have used the trained models corresponding to the particular decimations for both EDSR and SRGAN as exactly provided by the corresponding authors.

4.1 Implementation details:

Our algorithm is based on wavelet decomposition, and hence it is required to choose a wavelet function (/scaling function). As our approach analyzes the representation of an image in its approximation and detail high frequency components to perform interpolation, we choose two different wavelet functions for the experiments, which differ substantially in representing local regularity of a signal. We consider Daubechies wavelet with support 4 (db2) and discrete approximation of Meyer filter (support is 102), where the latter excels in approximating signals with low local regularity/smoothness. The proposed method using Daubechies wavelet is referred here as ‘proposed (Daub)’ and that using discrete Meyer wavelet as ‘proposed (Meyer)’. Note that, the proposed (Daub) and proposed (Meyer) follow exactly the same approach (See Figure 2) except for the wavelet used in decomposition and reconstruction.

Our algorithm also uses PSO where we need to provide initialization and stopping criteria. We empirically observed that stopping with the maximum number of iterations as 15 after initializing the population with 15 randomly generated particles in 6-dimensional space (, [0 1]) works efficiently to produce at least near-optimal solution. Our PSO is designed for early termination. It stops if there is no change in the fitness value of the best solution in consecutive runs after the number of iterations is greater than 7. The minimum number of iterations is considered as 7 to allow sufficient exploration of the solution space . PSNR is considered as the fitness/objective function whose maximization results in minimization of error between the desired image and the generated image . This optimization function is as follows

(5)

The source code of our approach can be downloaded from here222https://drive.google.com/open?id=1jHw9r6gvT7AC8EnmozBV4z4U8AvFnAOe.

4.2 Computation speed

The increasing developments in display technology for electronic gadgets and content delivery through data streaming demands faster interpolation for immediate display of content without compromising much on quality. Such real-life applications require an algorithm to get high resolution, good quality output as fast as possible. Therefore, as far as interpolation is concerned in today’s world, the computation time of an algorithm is still an important factor.


Input Low
Resolution
Technique (speed in seconds
NEDI NARM SR-TSE FPCS-LAL
Proposed
(Daub)
Proposed
(Meyer)
0.593 13.81 7.57 0.38 0.38 0.022
0.93
0.075
2.68 64.16 34.57 4.23 0.54 0.014
1.40
0.119
11.56 290.37 187.72 152.65 0.93 0.093
2.71
0.218


Table 2: Computation time comparison between different interpolation algorithm for scaling factor 2 (Best result in bold)

Input Low
Resolution
Technique (speed in seconds)
NEDI SR-TSE FPCS-LAL
Proposed
(Daub)
Proposed
(Meyer)
3.22 22.00 4.36 1.23 0.108
2.74
0.208
13.70 113.18 148.65 1.78 0.036
4.72
0.370
54.65 534.77 3072 5.79 0.201
12.75
1.0


Table 3: Computation time (CPU time) comparison between different interpolation algorithm for scaling factor 4 (Best result in bold)

So, in this paper, we consider the computation time as an essential parameter to judge the efficiency of an algorithm. We use an Intel®i5(3.30 GHz) machine with 16 GB RAM and use MATLAB®(version 2016a) for this purpose. Our target is to perform fast interpolation which generates a high resolution image exploiting process similarity at low resolution without introducing any visual artifact. To calculate the computation speed, we have considered the 16, images from USC SIPI Miscellaneous gray scale image database333http://sipi.usc.edu/database/, and down-sampled them to generate images of , and sizes. We consider gray scale images of above mentioned three sizes as we perform an analysis of computation speed variations with the change in 2D matrix size (image size). The average computation time (CPU time) of all the algorithms over 16 images of a particular size are presented in Table 2 and 3

for scaling factors 2 and 4, respectively. As the proposed approach is based on heuristic optimization, the computation time may slightly vary due to early termination and random initialization. So, for our algorithm, we have considered 5 runs for each image to present the average computation time (CPU time) (over 16

5 runs) along with the standard deviation, which is negligible. For NARM, we have shown the computation time (CPU time) only for scaling factor 2 as the code

444http://www4.comp.polyu.edu.hk/ cslzhang/NARM.htm provided does not have an implementation of factor 4. Deep learning based algorithms such as EDSR, and SRGAN require a considerable number of low resolution (LR)-high resolution (HR) image pairs for learning, which (CPU time) consumes considerable amount of time in graphical processing units (GPUs) cruz2018single. Moreover, the implementations of EDSR and SRGAN by their authors (trained networks) also require GPUs to generate the output. As a fair comparison of computation times between approaches implemented in CPU and GPU is not possible, we do not include the same.

From Table 2, it is evident that FPCS-LAL and proposed (Daub) algorithm have the least computation time (CPU time) of 0.38 seconds for an input image of yielding an output of size . If the input size increases, our proposed (Daub) becomes substantially faster compared to the other algorithms compared here. Our proposed (Meyer) algorithm is slightly slower than the proposed (Daub) algorithm, but much faster when compared with other interpolation techniques presented in the table. Table 3, where the scaling factor is 4, shows that for an input image of size to , our proposed algorithms are much faster than all the existing algorithms compared here. So, our proposed approach is most suited among all the algorithms here when time is an important factor of application.

It is to be noted that the faster speed achieved by our algorithms is not by utilizing large memory space. This is because memory is occupied only by the multi-scale decompositions up to two levels, and the input and output images. In addition, a little memory is required Poli2007 for the PSO particles and parameters.

4.3 Subjective Evaluation:

Computation time (CPU time) comparison above shows that our proposed algorithms are considerably faster than the other algorithms and the difference increases in favor of our algorithms with the increase of the input image size and scaling factor.

With the above observation, it becomes important to analyze and compare the interpolation performance. Before we go into quantitative evaluation, where decimation/down-sampling of images from databases would be required, we present qualitative evaluation here. For qualitative analysis, we consider images from databases as input images for interpolation, and generate twice and four times larger images by the various interpolation approaches. For this experiment, we consider USC SIPI Miscellaneous image database (24 gray images and 16 color images). While the entire set of results is shared online111https://drive.google.com/file/d/1-X-ENfISl1tSlOOJsWAR5FxIHn-OSl0s/view?usp=sharing/, we consider a few examples here to summarize the generic observations on interpolation by the various approaches.

In Figure 3, we show the subjective comparison on a gray scale image considering the scaling factor to be 4. The cropped portion of the whole image is shown here for better visualization222Visualization is appropriate when looked at 100% on a computer screen. The observations made later may not be visible in print.. We see that NEDI’s output is smoother than all the others, which results in blurring and distortions especially at regions having changing gray values. SR-TSE, FPCS-LAL, and EDSR produce much sharper outputs than NEDI but are prone to ringing artifacts at regions of drastic change and appearance of the non-existent pattern in homogeneous regions. SRGAN produces artifacts which look like fixed pattern noise. Our approach produces negligible amount of the artifacts mentioned above, but are prone to some amount of jaggies at the boundary regions. As can be seen, the overall impression from the results is that the proposed algorithms perform better than the others. In Figure 4, we show another such (scaling factor 4) subjective comparison now on a color image. We again see here the artifacts mentioned above in the results by NEDI, SR-TSE, FPCS-LAL, EDSR, SRGAN and our proposed algorithms. Further, we can see that the artifacts in the results of the proposed algorithms is less pronounced compared to that in the others.

In Figures 3 and 4, although both the proposed algorithms do better than the others and also contain some amount of jaggies artifact, there are differences in their results. We have observed in our experiments that the jaggies introduced by the proposed algorithms are mainly due to aliasing, but with different characteristics. As can be seen, in Figure 3, the effect of aliasing by the proposed (Meyer) algorithm is concentrated at the mid-frequencies (like texture regions). On the other hand, aliasing by the proposed (Daub) algorithm is spread across all frequencies in small amount and is pronounced at high frequencies (sharp edges, see Figures 2(g) and 3(g)).

Further, in Figures 5 and 6, we show subjective comparison considering scaling factor 2 on gray and color images, respectively. Some of the observations similar to Figures 3 and 4 such as NEDI’s smoother output and appearance of the non-existent pattern in outputs of SR-TSE, FPCS-LAL, SRGAN, and EDSR are evident upon close observation. NARM, like NEDI, also produces a smooth output. However, upon quick observation it is hard to find a substantial difference between all the results. In such a situation, it is obvious that one should prefer an algorithm with the least computation time (CPU time), which is, our proposed (Daub) algorithm.

Now, let us consider a situation where it is possible for a user to choose an algorithm for an image. The following may be done based on the subjective evaluation of interpolation performance. In such a scenario, the proposed (Daub) algorithm should be chosen for images having a good amount of contents with changing pixel values (mid frequency) like textures. Whereas, the proposed (Meyer) algorithm should be employed if the changes in an image are mostly drastic (high frequency) and it has a good amount of homogeneous regions (low frequency). However, if for any input image, it is not possible for a user to choose an algorithm among proposed (Daub) and proposed (Meyer) for generating the output, then proposed (Daub) algorithm should automatically be chosen because of

  • The lower computation time (CPU time).

  • The fact that natural images contain very less high frequency contents than low and mid frequency ones.

(a) Original
(b) NEDI
(c) SR-TSE
(d) FPCS-LAL
(e) EDSR
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Meyer)
Figure 3: A cropped version of gray scale Man image for scaling factor of 4
(a) Original
(b) NEDI
(c) SR-TSE
(d) FPCS-LAL
(e) EDSR
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Meyer)
Figure 4: A cropped version of Pepper image for scaling factor of 4
(a) Original
(b) NEDI
(c) NARM
(d) SR-TSE
(e) FPCS-LAL
(f) EDSR
(g) SRGAN
(h) Proposed(Daub)
(i) Proposed(Meyer)
Figure 5: A cropped version of gray scale Man image for scaling factor of 2
Original
(a) NEDI
(b) NARM
(c) SR-TSE
(d) FPCS-LAL
(e) EDSR
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Meyer)
Figure 6: A cropped version of color Pepper image for scaling factor of 2

4.4 Quantitative Evaluation:

Unlike subjective evaluation, to evaluate the methods quantitatively, we need a known target/reference based on which the generated high resolution image can be assessed. Therefore, we generate the low resolution input images considering the actual images in databases as the desired high resolution images.

As stated in Dong2013, low resolution images to be used as the inputs are usually obtained by decimation (smoothing followed by down-sampling) of those images that are considered as the desired high resolution images. As stated in Demirel2011; Tian2011, wavelet approximation filter is used for the smoothing, as said in ledig2017photo, Gaussian filter is used for smoothing, as stated in Li2001; Mallat2010 and Dong2011 only down-sampling without smoothing is considered. The smoothing in the decimation can be achieved using different filters with different properties. However, an interpolation algorithm might be particularly suited to a kind of smoothing used to produce the low resolution input for quantitative evaluation. This could happen when the interpolation uses operators/filters consistent with a particular kind of decimation.

Therefore, we choose to treat the kind of smoothing used in the decimation as a parameter. To evaluate an algorithm’s quantitative performance corresponding to an image, we calculate the PSNR, SSIM and FSIM values for five types of inputs obtained through the following operations

  • Bicubic approximation(Bicubic)

  • Wavelet filter approximation (Daubechies)

  • Wavelet filter approximation (D-Meyer)

  • Gaussian filtering followed by down-sampling (Gaussian)

  • Down-sampling without any smoothing (Sub-sampling)

For quantitative comparisons, experiments are carried out using standard USC SIPI Miscellaneous image database (24 gray images and 16 color images), BSD image database (100 color images), SET 5 (5 color images) and SET 14 (14 color and gray scale images) Huang2015. These images offer different types of challenges for an interpolation algorithm to deal with. As PSO based optimization in our algorithm can yield slightly different results in different runs, like in Section 4.2, we consider the average performance over 5 runs of our algorithm on each image. These average performances are obtained for all the images in a database, and the average over all these images are reported. The standard deviation quantifying variation in average performance over all images in a database during the 5 runs is found to be in the order of dB (PSNR), and hence it is not reported individually. For all the different cases, the quantitative results are listed in tables with the best performing result in bold.


Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 27.60 27.78 31.24 30.41 32.07 30.83 29.27 28.09
Daubechies 28.25 29.14 29.15 29.31 29.22 24.14 30.24 29.39
D-Meyer 28.95 30.00 27.05 27.35 26.96 26.31 29.09 30.68
Gaussian 28.55 29.12 27.94 27.91 27.99 27.45 29.11 29.93
Sub-sampling 28.06 28.96 24.70 25.30 24.40 28.40 27.46 28.02
Average 28.28 29.00 28.02 28.06 28.13 27.43 29.03 29.22


Table 4: PSNR(dB) comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.7972 0.8013 0.8867 0.8714 0.8995 0.8859 0.8427 0.8216
Daubechies 0.8280 0.8433 0.8543 0.7607 0.8619 0.7076 0.8786 0.8587
D-Meyer 0.8337 0.8540 0.8066 0.8166 0.8108 0.7834 0.8526 0.8765
Gaussian 0.8204 0.8271 0.8238 0.8208 0.8293 0.8133 0.8494 0.8578
Sub-sampling 0.8340 0.8495 0.7330 0.7581 0.7316 0.8407 0.8244 0.8218
Average 0.8223 0.8350 0.8208 0.8055 0.8266 0.8062 0.8495 0.8472

Table 5: SSIM comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.9342 0.9365 0.9694 0.9618 0.9747 0.973 0.9479 0.9381
Daubechies 0.9451 0.9520 0.9699 0.9658 0.9711 0.9075 0.9724 0.9503
D-Meyer 0.9527 0.9616 0.9286 0.9313 0.9296 0.9267 0.9558 0.9616
Gaussian 0.9437 0.9500 0.9352 0.9337 0.9365 0.9338 0.9576 0.9578
Sub-sampling 0.9426 0.9491 0.9093 0.9103 0.9092 0.9649 0.9335 0.9361
Average 0.9437 0.9499 0.9424 0.9405 0.9442 0.9412 0.9534 0.9487

Table 6: FSIM comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 2

Tables 4, 5 and 6 show comparative results considering images from USC SIPI Miscellaneous gray scale image database in terms of PSNR, SSIM, and FSIM, respectively, for the scaling factor of 2. Table 4 shows that among all the algorithms compared on the basis of PSNR, EDSR performs the best for input generated by bicubic approximation, proposed (Daub) algorithm for input generated by Daubechies wavelet approximation, the proposed (Meyer) algorithm for input generated by Meyer wavelet approximation and Gaussian filtering followed by down-sampling, and NARM for input generated by only down-sampling.

But on an average over all types of inputs, the performance of our proposed approach is superior. PSNR value of proposed (Meyer) and proposed (Daub) algorithms are respectively 0.22 dB and 0.03 dB higher than the next best, NARM. Tables 5 and 6 respectively considering SSIM and FSIM also show that different algorithms perform the best for different input types, with the proposed approach being the best for Daubechies and Meyer wavelet approximations, and Gaussian filtering followed by down-sampling. On an average over all types of inputs, our proposed (Daub) algorithm yields slightly better results.


Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 30.60 31.13 36.48 34.92 37.82 35.59 33.18 31.12
Daubechies 31.23 33.25 33.85 33.75 33.95 32.69 34.90 33.22
D-Meyer 32.49 35.12 30.09 30.26 29.75 29.11 32.59 35.47
Gaussian 32.11 33.74 31.24 31.00 31.18 30.64 32.67 34.37
Sub-sampling 32.00 34.56 28.34 28.76 27.94 27.42 31.64 33.11
Average 31.66 33.56 32.00 31.74 32.13 31.09 33.00 33.46

Table 7: PSNR(dB) comparison of color images from USC SIPI Miscellaneous database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 27.63 27.84 31.12 30.55 32.17 31.22 29.53 28.16
Daubechies 28.55 29.37 29.12 29.18 29.07 28.53 30.10 29.31
D-Meyer 29.22 30.29 26.95 27.24 26.7 26.46 28.82 30.70
Gaussian 28.74 29.31 27.92 27.91 27.96 27.74 28.85 29.82
Sub-sampling 28.78 29.39 24.59 25.16 23.99 24.01 27.51 28.00
Average 28.54 29.24 27.94 28.00 27.98 27.59 28.97 29.20

Table 8: PSNR(dB) comparison of color images from BSD database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 30.33 30.49 36.46 35.19 37.97 35.54 33.48 30.65
Daubechies 28.95 32.73 33.41 33.13 33.13 31.71 34.35 32.72
D-Meyer 32.8 34.93 29.63 29.75 29.37 28.78 31.98 35.44
Gaussian 32.15 33.27 30.65 30.56 30.69 30.35 31.93 33.97
Sub-sampling 32.58 34.79 28.05 28.35 27.44 26.95 31.00 33.07
Average 31.36 33.24 31.64 31.40 31.72 30.66 32.55 33.17

Table 9: PSNR(dB) comparison of color images from SET 5 database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)

Bicubic
27.59 27.94 32.29 31.28 33.30 31.90 29.94 28.03
Daubechies 28.43 29.81 28.47 29.89 28.06 28.91 30.76 29.71
D-Meyer 29.42 31.18 26.99 27.29 26.63 26.31 26.24 31.52
Gaussian 28.95 30.00 28.01 27.96 28.03 27.74 29.09 30.42
Sub-sampling 28.94 30.51 25.05 25.56 24.19 24.14 27.96 29.09
Average 28.67 29.89 28.16 28.40 28.04 27.80 28.80 29.75

Table 10: PSNR(dB) comparison of gray scale and color images from SET 14 database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8803 0.8847 0.9359 0.9274 0.9431 0.9247 0.9124 0.8927
Daubechies 0.9001 0.9119 0.9195 0.9220 0.9257 0.8988 0.9278 0.9157
D-Meyer 0.9066 0.9192 0.8817 0.8872 0.8817 0.8521 0.9082 0.9273
Gaussian 0.8975 0.9045 0.8954 0.8929 0.8986 0.8818 0.9079 0.9185
Sub-sampling 0.9062 0.9188 0.8351 0.8505 0.8327 0.7912 0.8931 0.8955
Average 0.8981 0.9078 0.8935 0.8960 0.8964 0.8697 0.9099 0.9100

Table 11: SSIM comparison of color images from USC SIPI Miscellaneous database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.7873 0.7938 0.8856 0.8740 0.899 0.8843 0.8509 0.8264
Daubechies 0.8217 0.8421 0.8556 0.8574 0.8634 0.8429 0.8716 0.8544
D-Meyer 0.8331 0.8562 0.8064 0.8135 0.8058 0.7838 0.8449 0.8784
Gaussian 0.8112 0.8218 0.8172 0.8136 0.8224 0.8109 0.8351 0.8518
Sub-sampling 0.8353 0.8511 0.7323 0.7558 0.7235 0.7063 0.8257 0.8141
Average 0.8177 0.8330 0.8194 0.8228 0.8228 0.8057 0.8457 0.8457

Table 12: SSIM comparison of color images from BSD database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)

Bicubic
0.8935 0.8956 0.9532 0.944 0.9594 0.9398 0.9306 0.9041
Daubechies 0.8869 0.9248 0.9333 0.93 0.9343 0.9045 0.9388 0.9283
D-Meyer 0.9212 0.9346 0.8908 0.893 0.8887 0.8548 0.9179 0.9423
Gaussian 0.9115 0.9184 0.9073 0.9043 0.9097 0.8916 0.9173 0.9332
Sub-sampling 0.9224 0.9386 0.8479 0.8591 0.8376 0.8005 0.9049 0.9148
Average 0.9071 0.9224 0.9065 0.9061 0.9059 0.8782 0.9219 0.9245

Table 13: SSIM comparison of color images from SET 5 database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8135 0.8229 0.9042 0.8928 0.9153 0.8955 0.8722 0.8359
Daubechies 0.8459 0.8671 0.8533 0.8780 0.8442 0.8552 0.8896 0.8741
D-Meyer 0.8572 0.8806 0.8257 0.8324 0.8218 0.7950 0.7231 0.8950
Gaussian 0.8393 0.8527 0.8397 0.8361 0.844 0.8274 0.8560 0.8744
Sub-sampling 0.8573 0.8783 0.7586 0.7810 0.7431 0.7162 0.8431 0.8497
Average 0.8426 0.8603 0.8363 0.8441 0.8337 0.8179 0.8368 0.8658

Table 14: SSIM comparison of gray scale and color images from SET 14 database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.9407 0.9422 0.9814 0.9727 0.9866 0.9807 0.9631 0.9466
Daubechies 0.9570 0.9627 0.9719 0.9706 0.9729 0.9656 0.9765 0.9634
D-Meyer 0.9658 0.9731 0.9398 0.9418 0.9399 0.9329 0.9596 0.9753
Gaussian 0.9578 0.9622 0.9487 0.9463 0.9499 0.9461 0.9608 0.9704
Sub-sampling 0.9640 0.9704 0.9225 0.9266 0.9234 0.9149 0.9516 0.9587
Average 0.9570 0.9621 0.9528 0.9516 0.9546 0.9481 0.9517 0.9629

Table 15: FSIM comparison of color images from USC SIPI Miscellaneous database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8682 0.8731 0.9400 0.9283 0.9503 0.9441 0.9114 0.9063
Daubechies 0.8966 0.9080 0.9188 0.9192 0.9224 0.9148 0.9288 0.9169
D-Meyer 0.9051 0.9189 0.8820 0.8888 0.8815 0.8745 0.9092 0.9289
Gaussian 0.8815 0.8890 0.8959 0.8925 0.8967 0.8973 0.9033 0.9143
Sub-sampling 0.9161 0.9264 0.8490 0.8634 0.8458 0.8407 0.9040 0.8885
Average 0.8935 0.9031 0.8971 0.8984 0.8993 0.8943 0.9113 0.9110

Table 16: FSIM comparison of color images from BSD database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.9257 0.9262 0.9748 0.9604 0.9822 0.9739 0.9499 0.9307
Daubechies 0.9344 0.9503 0.9554 0.9488 0.9572 0.9432 0.9599 0.9483
D-Meyer 0.9498 0.9586 0.9173 0.9166 0.9164 0.9029 0.9408 0.9574
Gaussian 0.9407 0.9454 0.9370 0.9320 0.9381 0.9347 0.9438 0.9548
Sub-sampling 0.9577 0.9677 0.8979 0.9013 0.8954 0.8827 0.9390 0.9422
Average 0.9417 0.9496 0.9456 0.9318 0.9379 0.9275 0.9467 0.9467

Table 17: FSIM comparison of color images from SET 5 database for scaling factor 2

Input Low
Resolution
Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.9174 0.9212 0.973 0.9637 0.9783 0.9727 0.9501 0.9286
Daubechies 0.9406 0.9489 0.9562 0.9602 0.9325 0.9530 0.9684 0.9523
D-Meyer 0.9520 0.9628 0.9224 0.9243 0.9209 0.9168 0.8756 0.9684
Gaussian 0.9382 0.9454 0.9302 0.9278 0.9312 0.9295 0.9454 0.9578
Sub-sampling 0.9489 0.9580 0.8990 0.9067 0.8953 0.8926 0.9354 0.9459
Average 0.9394 0.9472 0.9362 0.9366 0.9316 0.9329 0.9350 0.9506

Table 18: FSIM comparison of gray scale and color images from SET 14 database for scaling factor 2

The computationally intensive algorithms proposed in Yang2010; Dong2013, generate high resolution color images, by applying their proposed interpolation technique on the luminance component and a simple interpolation technique like the cubic keys1981cubic method to generate chromatic component555This is evident from the codes provided as well. As our approach is computationally fast, we leverage this advantage and apply our interpolation to all the channels. The benefits of our approach are thus reflected in all the three channels.

Tables 7 - 18 show comparative results in terms of PSNR, SSIM, and FSIM considering scaling factor of 2 for the images from the USC SIPI Miscellaneous color and BSD, SET 5, and SET 14 databases. Here, we see the technique performing the best for different kinds of inputs is similar, that is, EDSR performs the best for input generated by bicubic approximation, proposed (Daub) algorithm for input generated by Daubechies wavelet approximation, proposed (Meyer) algorithm for input generated by Meyer wavelet approximation (except FSIM in SET 5) and NARM for input generated by only down-sampling. For Gaussian, the best performing algorithm is our proposed (Meyer). On an average over all types of inputs, the performance of our algorithm is comparable with the best result (that of NARM) in terms of PSNR. However, our algorithms’ average performance is the best in terms of SSIM for all databases and also in terms of FSIM except for SET 5 database.

Tables 19 - 33 show the comparative results of PSNR, SSIM and FSIM similar to that of Tables 4 - 18, except that the scaling factor is 4 instead of 2. Here, we find that different techniques perform the best for different kinds of inputs. On an average over all types of inputs, the performance of our algorithm is the best for all the databases except for USC SIPI Miscellaneous gray database in terms of PSNR. But in terms of SSIM and FSIM, our techniques’ performance is comparable with the best which is one of the algorithms among SR-TSR, EDSR, and SRGAN.

The above quantitative analysis shows that on an average performance over all types of inputs considered here, our proposed algorithms in terms of PSNR, SSIM and FSIM perform either the best or near to the best considering different databases containing color and gray scale images, and different scaling factors. Among the proposed algorithms, we do not find enough evidence here to choose one over the other. But one can choose the proposed (Daub) algorithm based on the observations made during qualitative analysis.

So, through the computation time (CPU time), qualitative and quantitative analyses, we find that the proposed approach not only consistently produces the best or near-best results but also does so in the least time (CPU time). Moreover, the lower computation time (CPU time) becomes more significant with the increase in the input image size and scaling factor.

Note that, our proposed approach suffers from a few drawbacks due to the use of heuristic optimization, PSO. Due to the stochastic behavior of PSO, our approach can yield slightly different results in different runs. Although the performance for quantitative evaluation shows that there can be a very slight variation from the average values, the algorithms may not produce the best result possible by them in a particular run. Moreover, the computation time may slightly vary in each run due to the early termination. In addition, as mentioned already, although our approach produces negligible amount of artifacts than the other algorithms, it is prone to appearance of some amount of jaggies at the boundary regions.

4.5 Auxiliary analysis

We provide a couple of additional analysis of the proposed approach. The first analysis is regarding the fact that the proposed approach is designed to generate images considering any scaling factor of the form , . When one desires a scaling factor , which cannot be expressed as , we suggest that a value of is considered such that . Then, the interpolated image of larger size can be decimated by using any simple and fast symmetric spline spath1995two approximation.

(a)
(b)
(c)
(d)
Figure 7: Subjective Performance of our proposed (Daub) for Color and Gray Scale Image

We show the results obtained by performing such an interpolation of scaling factor 3 using the proposed (Daub) algorithm in Figure 7. As we see for both the gray and color images, no additional artifacts have been introduced in the images of Figures 6(b) and 6(d) due to the decimation using bicubic approximation applied on Figures 6(a) and 6(c), respectively.

The second analysis here is regarding further improvement in computation time (CPU time) of the proposed approach. In this analysis, we show that by compromising a little in the interpolation performance of the proposed approach, results can be obtained much faster. Consider Table 34, which shows computation times for proposed (Daub) algorithm similar to those discussed in Tables 2 and 3. However, the computation times in Table 34 have been obtained by altering the proposed (Daub) algorithm’s PSO based optimization, where instead of zero change in fitness value during iteration, a change dB is checked for termination. As can be seen, comparing the computation times in Table 34, and Tables 2 and 3, there is about average improvement in the case where a scaling factor is 2 and about average improvement when a scaling factor is 4. With the alteration in the proposed (Daub) algorithm, it was observed that the average interpolation performance over all images in the USC SIPI Miscellaneous gray scale database was reduced by only around dB (in terms of PSNR) when input images were obtained by Daubechies wavelet filter approximation.

The final analysis provided here is on performance efficiency. Considering the USC SIPI Miscellaneous gray image database, Table 35 lists the average performance for the scaling factor 2 of different approaches along with the average time (CPU time) taken to achieve them. Doing so, it sheds some light on the rate of quantitative performance with respect to computational expenses. As can be seen, the proposed approach is the most efficient one, achieving the best average performance in the least average time.

4.6 Domain-specific Interpolation

In this section, we aim to analyse the performance and applicability of our proposed approach for domain specific interpolation. For this, we consider interpolation of human face and text images.

(a) Original
(b) Original
(c) EDSR
(d) EDSR
(e) SRGAN
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Daub)
(i) Proposed(Meyer)
(j) Proposed(Meyer)
Figure 8: Cropped version of a gray scale face image (left column) and a color face image (right column) for scaling factor of 2

Human face images are images where a wide range of frequencies exist just like many other natural images. As we have shown earlier that our proposed algorithm performs well for natural images (five databases), we expect it to do well in face images as well. For the experiment on face images, we consider the standard gray scale face images from Jaffe666http://www.kasrl.org/jaffe.html (213 images) database and the three color face images from SET5 database.

To quantitatively evaluate the performance, we calculate the PSNR, SSIM and FSIM values for five types of inputs generated through the five approaches mentioned in Section 4.4. We compare results of our approach with the two best performing existing algorithms found in our earlier quantitative and subjective evaluations, which are EDSR and SRGAN. We present the average PSNR, SSIM and FSIM values over the five types of inputs for scaling factors of 2 and 4 in Tables 36-39. The results show that our proposed approach performs the best in terms of PSNR and is as good as or better than the two best performing algorithms EDSR and SRGAN in terms of SSIM and FSIM. For subjective evaluation, we have shown the performance on cropped versions of a gray scale image taken from the Jaffe database and a color image taken from the SET5 database. Figures 8 and 9 show the subjective evaluation of the two images for scaling factors of 2 and 4, respectively. It shows that our proposed approach generates outputs as good as or better than the other two.

(a) Original
(b) Original
(c) EDSR
(d) EDSR
(e) SRGAN
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Daub)
(i) Proposed(Meyer)
(j) Proposed(Meyer)
Figure 9: Cropped version of a gray scale face image (left column) and a color face image (right column) for scaling factor of 4
(a) Original
(b) Original
(c) EDSR
(d) EDSR
(e) SRGAN
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Daub)
(i) Proposed(Meyer)
(j) Proposed(Meyer)
Figure 10: Cropped version of a hand written text image (left column) and a machine printed text image (right column) for scaling factor of 2
(a) Original
(b) Original
(c) EDSR
(d) EDSR
(e) SRGAN
(f) SRGAN
(g) Proposed(Daub)
(h) Proposed(Daub)
(i) Proposed(Meyer)
(j) Proposed(Meyer)
Figure 11: Cropped version of a hand written text image (left column) and a machine printed text image (right column) for scaling factor of 4

On the other hand, text images are images which mainly have very low and very high frequencies alone unlike most natural images. As pointed out in walha2016resolution, text images have sudden discontinuity, regularity in fine pattern and continuity of same intensity values along a particular direction. As our proposed approach is not targeted to work specifically on such special images, we do not expect it to perform well in text images. For an analysis, we have taken cropped versions of handwritten and machine printed images from standard IAM777http://www.fki.inf.unibe.ch/databases/iam-handwriting-database/download-the-iam-handwriting-database database to perform subjective comparison. Figures 10 and 11 show the interpolated output for scaling factors of 2 and 4, respectively. Interestingly, we find qualitatively that the performance of our algorithm is comparable to that of the two best performing existing algorithms EDSR and SRGAN even in the case of text images.

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 26.11 27.11 26.54 27.92 27.22 25.92 26.08
Daubechies 26.39 26.24 26.19 26.40 26.09 26.47 26.29
D-Meyer 26.30 25.76 26.03 25.79 25.54 26.15 26.70
Gaussian 25.92 26.29 25.92 26.54 26.26 25.89 26.13
Sub-sampling 25.28 21.85 23.05 21.40 21.76 23.69 23.81
Average 26.00 25.45 25.54 25.61 25.37 25.62 25.80

Table 19: PSNR(dB) comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.6532 0.7291 0.7082 0.7565 0.7352 0.6880 0.6927
Daubechies 0.6677 0.7213 0.7084 0.7439 0.7199 0.7201 0.6956
D-Meyer 0.6633 0.7030 0.6967 0.7158 0.6929 0.6860 0.7103
Gaussian 0.6414 0.7101 0.6867 0.7231 0.711 0.6709 0.6858
Sub-sampling 0.6624 0.5810 0.6194 0.5872 0.5749 0.6485 0.6320
Average 0.6596 0.6889 0.6839 0.7053 0.6868 0.6827 0.6833

Table 20: SSIM comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8553 0.9186 0.8955 0.9303 0.9257 0.8791 0.8783
Daubechies 0.8686 0.9019 0.8928 0.9110 0.9078 0.8985 0.8848
D-Meyer 0.8695 0.8998 0.8937 0.9044 0.8961 0.8940 0.8938
Gaussian 0.8418 0.9048 0.8847 0.9125 0.9056 0.8768 0.8750
Sub-sampling 0.8673 0.8208 0.8383 0.8214 0.8223 0.8544 0.8420
Average 0.8605 0.8892 0.8810 0.8959 0.8915 0.8801 0.8748


Table 21: FSIM comparison of gray scale images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 27.9 30.96 30.00 32.37 30.99 28.78 29
Daubechies 28.08 29.40 29.19 29.36 28.91 29.46 29.58
D-Meyer 28.03 28.67 28.90 28.33 28.00 29.26 30.04
Gaussian 27.61 29.43 28.82 29.94 29.31 29.00 29.14
Sub-sampling 27.07 25.21 26.3 24.29 24.54 27.31 27.42
Average 27.74 28.73 28.64 28.86 28.35 28.76 29.04

Table 22: PSNR(dB) comparison of color scale images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 25.34 26.85 26.39 27.58 27.02 25.84 25.91
Daubechies 25.58 26.02 26.03 26.06 25.92 26.18 26.21
D-Meyer 25.57 25.69 25.90 25.51 25.42 26.06 26.49
Gaussian 24.69 26.09 25.82 26.44 26.12 25.90 25.95
Sub-sampling 24.51 21.69 22.91 20.83 21.42 23.73 23.69
Average 25.14 25.27 25.41 25.28 25.18 25.54 25.65

Table 23: PSNR(dB) comparison of color scale images from BSD database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 27.09 30.21 29.26 31.98 30.57 27.96 28.21
Daubechies 27.40 28.6 28.46 28.7 28.32 28.79 29.06
D-Meyer 27.26 27.77 28.12 27.51 27.24 28.66 29.48
Gaussian 26.73 28.51 28.00 29.16 28.58 28.17 28.31
Sub-sampling 26.64 24.34 25.62 23.12 23.63 26.76 26.94
Average 27.02 27.89 27.89 28.10 27.67 28.07 28.40

Table 24: PSNR(dB) comparison of color images from SET 5 database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 23.47 27.39 26.51 28.33 27.49 25.65 25.83
Daubechies 25.34 26.16 26.05 25.90 25.82 26.26 26.30
D-Meyer 25.20 25.47 25.81 25.19 24.99 26.16 26.62
Gaussian 24.82 26.27 25.70 26.58 26.15 25.83 25.88
Sub-sampling 24.21 19.93 23.03 20.54 21.12 24.01 23.97
Average 24.61 25.04 25.42 25.31 25.11 25.58 25.72

Table 25: PSNR(dB) comparison of gray scale and color images from SET 14 database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.7895 0.8493 0.8289 0.8697 0.8399 0.8104 0.8127
Daubechies 0.8003 0.8358 0.8229 0.8497 0.8174 0.8232 0.8180
D-Meyer 0.7950 0.8162 0.8098 0.8207 0.7823 0.8064 0.8201
Gaussian 0.7825 0.8287 0.8118 0.8416 0.8199 0.8096 0.8123
Sub-sampling 0.7964 0.7392 0.7598 0.7323 0.6937 0.7857 0.7718
Average 0.7927 0.8138 0.8067 0.8228 0.7906 0.8070 0.8070

Table 26: SSIM comparison of color images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.6400 0.7121 0.6923 0.7362 0.7124 0.6738 0.6735
Daubechies 0.6531 0.7008 0.69132 0.7204 0.6965 0.6914 0.6853
D-Meyer 0.6495 0.6878 0.6827 0.6959 0.6684 0.6782 0.6941
Gaussian 0.6145 0.6838 0.6686 0.7000 0.6829 0.6677 0.6677
Sub-sampling 0.6495 0.5816 0.6105 0.5735 0.5615 0.6444 0.6266
Average 0.6413 0.6732 0.6690 0.6852 0.6643 0.6711 0.6694


Table 27: SSIM comparison of color images from BSD database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.7792 0.8593 0.8272 0.8048 0.8598 0.8029 0.8088
Daubechies 0.7917 0.8388 0.8167 0.8566 0.8229 0.8173 0.8195
D-Meyer 0.7823 0.8147 0.8012 0.8173 0.7790 0.8029 0.8220
Gaussian 0.7658 0.8297 0.8074 0.8524 0.8291 0.8033 0.8075
Sub-sampling 0.7910 0.7260 0.747 0.7044 0.6803 0.7848 0.7694
Average 0.7820 0.8137 0.7999 0.8071 0.7942 0.8022 0.8054

Table 28: SSIM comparison of color images from SET 5 database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.6260 0.7518 0.7243 0.7785 0.7496 0.7009 0.7025
Daubechies 0.6878 0.7353 0.7207 0.7506 0.7231 0.7218 0.7135
D-Meyer 0.6826 0.7159 0.7083 0.7208 0.6866 0.7072 0.7205
Gaussian 0.6609 0.7169 0.6997 0.7379 0.7176 0.6979 0.6970
Sub-sampling 0.6845 0.5321 0.6398 0.5928 0.5743 0.6776 0.6539
Average 0.6684 0.6904 0.6986 0.7161 0.6902 0.7011 0.6975

Table 29: SSIM comparison of gray scale and color images from SET 14 database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8751 0.9301 0.9092 0.9465 0.9364 0.8948 0.8961
Daubechies 0.8874 0.9171 0.9051 0.9257 0.9181 0.9054 0.9027
D-Meyer 0.8865 0.8995 0.8958 0.9039 0.8924 0.8951 0.9067
Gaussian 0.8701 0.9117 0.8971 0.9215 0.9161 0.8948 0.8968
Sub-sampling 0.8888 0.8500 0.8639 0.8466 0.8358 0.8816 0.8754
Average 0.8816 0.9017 0.8942 0.9088 0.8998 0.8943 0.8955



Table 30: FSIM comparison of color images from USC SIPI Miscellaneous database for scaling factor 4
Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.7435 0.8246 0.8038 0.8378 0.8361 0.7874 0.7870
Daubechies 0.7611 0.8286 0.8165 0.836 0.8351 0.8063 0.8052
D-Meyer 0.7643 0.8204 0.8128 0.8243 0.8205 0.8024 0.8107
Gaussian 0.7200 0.7985 0.7795 0.8048 0.8088 0.7733 0.7721
Sub-sampling 0.7932 0.7764 0.8031 0.7588 0.7677 0.8158 0.8080
Average 0.7564 0.8093 0.8031 0.8123 0.8137 0.7970 0.7966


Table 31: FSIM comparison of color images from BSD database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.838 0.9004 0.7963 0.8682 0.9203 0.8582 0.8607
Daubechies 0.8502 0.8870 0.8700 0.9012 0.8897 0.8685 0.8700
D-Meyer 0.8455 0.8713 0.8605 0.8798 0.8658 0.8599 0.8722
Gaussian 0.8317 0.8817 0.8613 0.9038 0.8967 0.8576 0.8595
Sub-sampling 0.8660 0.8236 0.8406 0.8067 0.8049 0.8641 0.8538
Average 0.8463 0.8728 0.8457 0.8719 0.8755 0.8617 0.8632

Table 32: FSIM comparison of color images from SET 5 database for scaling factor 4

Input Low
Resolution
Technique
NEDI SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Bicubic 0.8010 0.9005 0.8745 0.9172 0.9097 0.8622 0.8605
Daubechies 0.8465 0.8874 0.8743 0.8917 0.8879 0.8753 0.8716
D-Meyer 0.8482 0.8712 0.8665 0.8725 0.8646 0.8706 0.8771
Gaussian 0.8190 0.8595 0.8554 0.8833 0.8809 0.8546 0.8530
Sub-sampling 0.8539 0.7665 0.8291 0.7907 0.7915 0.8539 0.8414
Average 0.8337 0.8570 0.8600 0.8711 0.8669 0.8633 0.8607

Table 33: FSIM comparison of gray scale and color images from SET 14 database for scaling factor 4
Scaling
Factor
Input Image Size (Time in Seconds)
0.215
0.010
0.312
0.021
0.645
0.031
0.69
0.044
1.12
0.039
3.79
0.051
Table 34: Computation Time (CPU time) of Proposed (Daub) obtained by compromising around dB in PSNR for USC SIPI Miscellaneous gray scale image database for input generated by Wavelet filter approximation (Daubechies)
Computational Efficiency Technique
NEDI NARM SR-TSE FPCS-LAL EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
Avg. Time(sec) 14.36 403 210 526 GPU GPU 1.29 3.76
Avg. PSNR 28.28 29 28.02 28.06 28.12862 27.42518 29.03 29.22
Avg. SSIM 0.8223 0.835 0.8208 0.8055 0.826608 0.806168 0.8495 0.8472
Avg. FSIM 0.9437 0.9499 0.9424 0.9405 0.944217 0.941171 0.9534 0.9487
Table 35: Computational efficiency analysis on USC SIPI Miscellaneous gray image database for scaling factor 2

5 Conclusion

The paper proposes an approach that performs interpolation exploiting across-scale ‘process similarity’ in wavelet decomposition. The first stage of our proposed approach is the modeling stage, where a model capturing the structural relation between scales is estimated. The model is in the form of optimal weights obtained through particle swarm optimization (PSO). The weights are used to fuse the high frequency components that are subsequently employed to estimate the input image from its low resolution approximation. The second stage is the generation stage, where these optimal weights modeling the structural relation between scales are used to generate the higher resolution image from the input image. In this stage, the approach considers the input image as the low resolution approximation of the output high resolution image to be generated.

We find through qualitative analysis that our process similarity based interpolation approach produces results with fewer artifacts. Quantitative analysis shows that on an average, our proposed approach, in terms of PSNR, SSIM and FSIM perform either the best or near-best considering different databases containing color and gray scale images, and different scaling factors. The major advantage of our proposed approach is that it is very fast in terms of CPU computation time, much faster than all the algorithms compared.

The proposed approach has been implemented by performing both DWT and SWT based on the findings that they complement each other where the former provides value faithfulness and the latter provides accurate location correspondence. The implementation of our approach has been performed considering two different wavelet functions (Daubechies wavelet with support 4 and discrete approximation of Meyer filter with support 102) yielding two proposed algorithms. Subjective and quantitative comparisons between these algorithms have shown one of them to be marginally more preferable.

The modeling performed in this paper employing process similarity is based on wavelet decomposition. In future, process similarity can be explored as an independent concept beyond the use of wavelet decomposition.

Measure Technique
EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
PSNR 30.37 28.93 30.75 30.93
SSIM 0.8352 0.7833 0.8417 0.8392
FSIM 0.9358 0.9152 0.9384 0.9376
Table 36: Quantitative evaluation of gray scale face images from Jaffe database for scaling factor 2
Measure Technique
EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
PSNR 33.00 31.59 33.64 34.11
SSIM 0.9367 0.8688 0.9139 0.9162
FSIM 0.9579 0.9402 0.9593 0.9602
Table 37: Quantitative evaluation of color face images from SET5 for scaling factor 2
Measure Technique
EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
PSNR 27.41 27.10 27.59 27.85
SSIM 0.7516 0.7234 0.7508 0.7506
FSIM 0.8780 0.8780 0.8730 0.8740
Table 38: Quantitative evaluation of gray scale face images from Jaffe for scaling factor 4
Measure Technique
EDSR SRGAN
Proposed
(Daub)
Proposed
(Meyer)
PSNR 28.91 28.88 29.64 29.94
SSIM 0.8447 0.7904 0.8101 0.8134
FSIM 0.8998 0.8981 0.8960 0.8972
Table 39: Quantitative evaluation of color face images from SET5 for scaling factor 4

6 Acknowledgment

Authors would like to thank Mr. Debanjan Sengupta and Ms. Anusha Vupputuri, Ph.D. students at IIT Kharagpur, for their help in completing this article. The authors would also like to thank the anonymous reviewers, as their contribution to bring the article to its best version is enormous.

References

References

Appendix A Discrete and Stationary Wavelet Transforms

Wavelet transform mallat1999wavelet is a powerful multi-resolution analysis (MRA) mallat1989theory, designed to obtain the components of input signal at every location in different scales /resolutions. The discrete wavelet transform (DWT) mallat1989theory works on discrete signals to produce discrete coefficients quantifying the said components. A discrete signal can be represented as follows

(6)

With unique real valued expansion coefficients , a acts as a basis function over which the signal is represented. In other words, given all the s and the corresponding set of basis functions, the signal can be generated.

In the field of wavelet-based analysis and synthesis, the basis functions are of two types, known as scaling function and wavelet function. The scaling functions defined as s satisfy certain properties. In fact, , where , respectively represent scale and translation, and .

One property satisfied by scaling functions, which is the heart of discrete wavelet transform, is

(7)

where and s are known as scaling function coefficients. Note that the above expression relates s at consecutive scales. Another function called the wavelet function () is also defined, which is related to the scaling function as follows

(8)
(9)

where s are known as wavelet function coefficients. Note that, the above expression relates and at consecutive scales with being at the finer scale. In such condition, any function can be decomposed to approximation coefficients (low frequency component) with scaling function and detail coefficients (high frequency component) with wavelet function. They are collectively referred to as the wavelet coefficients.

Now let us consider a discrete signal having M samples such that . So, is the highest (finest) possible scale and is the lowest (coarsest) possible scale. As the input signal is at the scale, it is decomposed into approximation and detail coefficients at scale as follows

(10)
(11)

where and denote the approximation and detail coefficients, respectively. The decomposed coefficients are combined in the following way for exact reconstruction.

(12)

To generate the approximation and detail coefficients, the following convolutions and downsampling can be performed employing scaling and wavelet function coefficients.

(13)
(14)

Just as the scaling and wavelet function coefficients are used for the analysis above, they can be used for synthesis as well to reconstruct the signal. The diagram 12 depicts the one level decomposition (analysis) followed by reconstruction (synthesis) procedure.

Figure 12: One level decomposition and reconstruction of a 1D signal through discrete wavelet transform, with evaluated only at , that is, and

The procedure applied on can be further applied on the approximation coefficients at scales till , if required, to obtain their approximation and detail coefficients. This suggests that is considered the approximation co-efficient at the finest scale, that is, .

As the presence of downsampling and upsampling results in loss of translation invariance, in stationary wavelet transform (SWT) nason1995stationary, exactly the same procedure of decomposition is followed, but the downsampling and upsampling are not performed.

The above discussion considers signals of single dimension (e.g. time series). To handle 2-dimensional signals like images, scaling and wavelet functions are required for both row-wise and column-wise operations. This leads us to

(15)

where and are the two dimensional scaling and wavelet functions, respectively, which are assumed to be separable functions. Note that the wavelet functions are directionally sensitive as well. The wavelet decomposition and reconstruction for a 2-dimensional signal is defined in (1), (2), (3) which are the 2-dimensional extension of 1-dimensional representations defined in (11),(12),(14) respectively.

Appendix B Particle Swarm Optimization

Input: Fitness function to be maximized;
Position initialization ;
Velocity initialization ;
Initialize ;
Initialize cognitive parameter , social parameter , inertia weight ;
Initialize ;
Output:
while termination criteria not reached do
       for  to  do
             if  then
                  
             end if
             if  then
                  
             end if
             
       end for
      for  to  do
             for  to  do
                  
             end for
            
       end for
       
end while
Algorithm 1 Particle Swarm Optimization

Particle swarm optimization (PSO) eberhart1995new acquires its inspiration from movement of the flock of birds searching for food and is very good at approximating solutions using relatively less sensitive and limited number of parameters compared to other such heuristic algorithms in a computationally inexpensive manner eberhart1995new; eberhart1998comparison; lee2006application; omran2004particle. Particle’s initial position and velocity are initialized either based on the prior knowledge of the problem or randomly. Movement of a particle in search space is evaluated based on two criteria. The first criterion is, personal experience, which is represented by the best value achieved by every individual particle and is termed as personal best (). The second criterion is, neighboring particle (including own) experience, which is represented by best value among all personal bests obtained so far in the population, and is known as global best (). In each iteration, position and corresponding velocity of each particle are updated based on its previous velocity, and .

The generic procedure of particle swam optimization (PSO) is shown in Algorithm 1. In a population of N particles, the corresponding particle position with respect to a D-dimensional search is defined as and velocity of the particle is defined as . The positions of particles in each dimension are initialized randomly in the range [ ] and the corresponding velocities in the range [ ]. A fitness function that optimizes desired criteria is employed. In every iteration, PSO computes the fitness of all solutions in the population, and updates and their . The updates take place only when the current and have higher fitness values respectively than the previous. Then they are used along velocity values to stochastically update the population (or position) to be employed in the next iteration. As mentioned by the authors eberhart1995new, the implementation of PSO requires only primitive mathematical operators making it computationally inexpensive in terms of memory and speed. But the technique may get trapped in local minima for a non-convex problem especially with limited particles and iterations.

It is straightforward that in our use of PSO, the weights form 6-dimensional search space from where the of Algorithm 1 takes values, and the fitness function is PSNR. In our algorithm, we have set the values of , as 2 and as 1.