I Introduction
Millions of images are now captured and viewed daily on social networks and photosharing sites like Facebook and Flickr^{1}^{1}1
It is estimated that 300 million photos are uploaded to Facebook a day.
. The explosive volume increase of these images are outpacing the cost decrease of storage devices, and thus lossy image compression is still indispensable in today’s visual communication systems. The most prevalent compression format for images remains JPEG (Joint Photographic Experts Group)^{2}^{2}2https://jpeg.org/: a lossy image compression standard whose first and most popular version was finalized more than two decades ago. JPEG is a blockbased transform coding scheme, where an image is first divided into nonoverlapping pixel blocks, transformed via discrete cosine transform (DCT) to coefficients, then lossily quantized and entropy coded.Broadly speaking, there are two approaches to decoding a JPEG image. Given encoded quantization bin indices of different DCT coefficients in a pixel block, hard decoding chooses the bin centers as reconstructed coefficients and performs inverse DCT to recover the block’s pixels. It is thus inevitable that when the quantization bin sizes increase (coarser quantization), the resulting reconstruction quality will worsen correspondingly.
Instead, one can take a soft decoding approach: each DCT coefficient is only constrained to be within the indexed quantization bin, and the reconstruction value is chosen with the help of signal priors. This is the approach taken in many previous works [1, 2, 3, 4] and is the approach taken in this paper. The challenge thus lies in identifying appropriate signal priors and incorporating them into an effective soft decoding algorithm.
In this paper, we combine three signal priors—Laplacian prior for DCT coefficients [5], sparsity prior and graphsignal smoothness prior for image patches [6, 7]—to build an efficient soft decoding algorithm for JPEG images. Specifically, for each
code block in an image, we first use the Laplacian prior—modeling the probability distributions of DCT coefficients as Laplacian—to compute an initial signal estimate in a closed form that minimizes the expected mean square error (MMSE). However, optimizing nonoverlapping blocks individually can lead to block artifacts. Using a sparse signal model for a larger patch that straddles multiple
code blocks, one can remove block artifacts using an overcomplete dictionary of atoms [8] trained offline from a large dataset of natural image patches.The complexity of computing a code vector is high when the dictionary is large
[8]. Unfortunately, when one reduces the dictionary size, recovery of high DCT frequencies in image patches becomes poor. We explain this phenomenon by drawing an analogy to lowrate vector quantizers (VQ) [9] when the DCT frequencies are modeled as Laplacian distributions.To satisfactorily recover high DCT frequencies, we design a new graphsignal smoothness prior, where the key assumption is that the desired signal (pixel patch) contains mainly low graph frequencies. Our graphsignal smoothness prior is constructed using left eigenvectors of the random walk graph Laplacian matrix (LERaG); compared to previous graphbased priors [10, 11, 12], LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of piecewise smooth (PWS) signals via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering [13]—the closer a target signal is PWS, the more easily the pixels are divided into clusters, and the more likely LERaG can restore the signal. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms stateoftheart soft decoding algorithms in both objective and subjective evaluations noticeably.
The outline of the paper is as follows. We first review related works in Section II. We then discuss the three signal priors in order: the Laplacian prior, the sparsity prior and the graphsignal smoothness prior, in Section IV, Section V and Section VI, respectively. In Section VII, we combine the priors into a JPEG soft decoding algorithm. Finally, we present results and conclusion in Section VIII and IX, respectively.
Ii Related Work
Iia Soft Decoding of JPEG Images
There are two general approaches to reconstruct a compressed JPEG image in the literature: deblocking and dequantization. The purpose of deblocking is to remove the block artifacts in a JPEG image. By regarding compression noise as additive white Gaussian noise, deblocking works on decoded JPEG image directly, and recovers it in a similar way as denoising via predefined prior models, such as local smoothness [14, 15], Gaussian processes [16], sparsity [17], etc. However, the nonlinearity of quantization operations in JPEG makes quantization noises signal dependent, far from being white and independent. Inaccurate modeling of compression artifacts limits the restoration performance of deblocking. Dequantization, also called soft decoding, treats JPEG image restoration as an illposed inverse problem: find the most probable transform coefficients in a code block subject to indexed quantization bin constraints, with the help of signal priors and optimization. Soft decoding utilizes all available information such as indexed quantization bin boundaries and natural image priors, and hence has more potential to get better restoration performance. Soft decoding is also the approach taken in this paper. In the following, we review some popular and stateoftheart methods in the literature.
Many dequantization methods exploit image priors in the pixel or transform domain to address the inverse problem, such as the classical projection on convex sets (POCS) method [1], the total variation (TV) model [2], and nonlocal selfsimilarity in DCT domain [18]. Accompanying its success in other restoration applications, the sparsity prior has also shown its promise in combating compression distortions [3, 19, 20]. For instance, Chang et al. [3] proposed to learn the dictionary from the input JPEG image, and use total variation and quantization constraint to further limit the solution space. However, the dictionary trained from JPEG image itself will also learn noise patterns as atoms. Therefore, in some cases, this method will enhance but not suppress the compressed artifacts. Very recently, Zhao et al. [20] proposed to combine structural sparse representation prior with quantization constraint. This scheme achieves stateoftheart soft decoding performance. As depicted by the theoretical analysis in Section VC, the sparsity prior cannot provide satisfactory highfrequency information preservation when the dictionary size is small.
IiB Graph Laplacian Regularizer
Leveraging on recent advances in graph signal processing [7, 21], graph Laplacain regularizer has shown its superiority as a new prior in a wide range of inverse problems, such as denoising [22, 11]
[23], bitdepth enhancement [24] and deblurring [25]. The definition of graph Laplacian regularizer relies on the graph Laplacian matrix, which describes the underlying structure of the graph signal. Two definitions of graph Laplacian matrix are typically used: combinatorial Laplacian and normalized Laplacian.Most of existing methods [22, 11, 23, 24] utilize combinatorial graph Laplacian to define the smoothness prior, which is real, symmetric and positivesemidefinite. Its spectrum carries a notion of frequency. However, its filtering strength depends on the degrees of the graph vertices. Several normalized graph Laplacian versions are proposed so that their filtering strength is independent of the vertex degrees. One popular option is to normalize each weight symmetrically [7]. The symmetrically normalized graph Laplacian has similar properties as the combinatorial Laplacian, except that it does not have the DC component. Therefore, it cannot handle constant signals well. [25] proposed a doubly stochastic graph Laplacian, which is symmetric and contains the DC component. However, it requires nontrivial computation to identify transformation matrices to make the rows and columns of the Laplacian matrix stochastic.
In our previous work [26], we used a graphsignal smoothness prior based on combinatorial graph Laplacian for soft decoding of JPEG image. In this paper, we propose a new smoothness prior using left eigenvectors of the random walk graph Laplacian matrix to overcome the drawbacks of the combinatorial graph Laplacian. Further, we provide a thorough analysis from a spectral clustering perspective to explain why our proposed graphsignal smoothness prior can recover high DCT frequencies of piecewise smooth signals.
Iii Problem Formulation
Iiia Quantization Bin Constraint
We begin with the problem setup. In JPEG standard, each nonoverlapping pixel block (called code block in the sequel) in an image is compressed independently via transform coding. Specifically, each code block in vector form is transformed via DCT to 64 coefficients , where is the transform matrix. The th coefficient is quantized using quantization parameter (QP) —assigned a quantization bin (qbin) index (qindex) as:
(1) 
Because natural images tend to be smooth, most AC coefficients are close or equal to zero. JPEG also employs larger QPs for higher frequencies [27] due to human’s lesser visual sensitivities to fast changing details, resulting in compact signal representation in the DCT domain for most code blocks.
At the decoder, having received only qindex there exists an uncertainty when recovering , in particular:
(2) 
This quantization constraint defines the search space for during restoration of the code block.
IiiB Soft Decoding of JPEG Images
A JPEG standard decoder [27] simply chooses the qbin centers as the reconstructed coefficients and performs inverse DCT to recover the block’s pixels. Clearly, the restoration quality will worsen as the sizes of the qbins increase, particular at high frequencies. This approach is called hard decoding. Instead, our approach is to only constrain each DCT coefficient to be within the indexed qbin using (2), and choose the “optimal” reconstruction value using predefined signal priors. This approach is called soft decoding.
It is known that block artifacts appear as unnatural high frequencies across block boundaries at low rates [1]. To reduce block artifacts, we enforce consistency among adjacent blocks as follows. As shown in Fig. 2, we define a larger patch that encloses a smaller code block , where . Mathematically, , where binary matrix extracts pixels in corresponding to the smaller code block . is the basic processing unit containing pixels from multiple patches, and thus when recovering , pixels across patch boundaries are enforced to be consistent by averaging.
The illposed nature of soft decoding means that signal prior(s) is needed to differentiate among good candidate solutions ’s. We first denote by the vector of qindices corresponding to block surrounded by (i.e., ) and the vector of corresponding QPs. Having defined , we pursue a maximum a posterior (MAP) formulation and seek the signal
with the largest posterior probability
. Mathematically, MAP is formulated as:(3) 
where denotes the likelihood of observing qindices given . can be written as:
(4) 
where here means elementbyelement division. Thus, the MAP formulation (3) becomes:
(5) 
Iv The Laplacian Prior
Given that individual DCT coefficients of feasible solutions are constrained to be inside indexed qbins (2), one natural choice for signal prior is the Laplacian prior, which is also defined for individual DCT coefficients. Specifically, the Laplacian prior [5]
states that the probability density function of a DCT coefficient
is:(6) 
where is a parameter. [5] show that the higher the frequency, the sharper the Laplacian distribution (larger ).
Using the Laplacian prior alone, given qbin constraints we can compute a minimum mean square error (MMSE) solution for soft decoding. In particular, each optimal DCT coefficient of a code block in a MMSE sense can be computed independently as:
(7) 
By taking the derivative of (7) with respect to and setting it to zero, we obtain a closedform solution:
(8) 
The recovered code block can be obtained by performing inverse DCT on the estimated coefficients . After performing MMSE estimation for all code blocks, we get a soft decoded image.
A clear advantage of Laplacianbased soft decoding is that there is a closedform MMSE solution computed efficiently, which by definition has a smaller expected squared error than a MAP solution, which is just the most probable solution [28]. The closed form is possible because the Laplacian prior describes distributions of individual coefficients, which is also how the qbin constraints are specified in (2). However, the MMSE solution (8) can only be used to recover code blocks separately, and thus cannot handle block artifacts that occur across adjacent blocks. As such, we next propose to employ the sparsity prior to provide additional a priori signal information and optimize at a larger patch level .
V The Sparsity Prior
Given that it is difficult to use the Laplacian prior directly to recover a larger patch , we first formulate a MAP problem for a patch using the sparsity prior. Then, using again the Laplacian prior we analyze the KSVD based dictionary learning, showing that when the dictionary is small, the atoms tend to have low average DCT frequency.
Va Sparse Signal Model
The sparsity prior [8] states that a signal can be well approximated by a sparse linear combination of atoms from an appropriately chosen overcomplete dictionary , where the dictionary size is much larger than the signal dimension, i.e., . Mathematically we write:
(9) 
where code vector contains the coefficients corresponding to atoms in , and is a small error term. is assumed to be sparse; i.e., the number of nonzero entries in is small. Dictionary can be learned offline from a large set of training patches using KSVD [8].
Given signal , an optimal can be found via [8]:
(10) 
where is the norm of ; is a parameter trading off the fidelity term with the sparsity prior. (10) is NPhard, but the matching and basis pursuit algorithms have been shown effective in finding approximated solutions [8, 29, 30]. For instance, in [8], the orthogonal matching pursuit (OMP) algorithm is employed, which greedily identifies the atom with the highest correlation to the current signal residual at each iteration. Once the atom is selected, the signal is orthogonally projected to the span of the selected atoms, the signal residual is recomputed, and the process repeats.
We can write the sparsity prior as a probability function:
(11) 
VB Sparsitybased Soft Decoding
Given the prior probability in (
11), we can rewrite the optimization problem in (5) as follows:(12) 
Compared to (10), there is an additional quantization constraint, which is specific to the soft decoding problem.
In (12), both signal and code vector are unknown. We can solve the problem alternately by fixing one variable and solving for the other, then repeat until convergence:

Step 1–Initial Estimation: Initialize . For example, is initialized as the closedform solution image using the Laplacian prior described in Section IV.

Step 2–Sparse Decomposition: Given of iteration , compute the corresponding code vector by solving the following minimization problem:
(13) using OMP^{3}^{3}3In the unlikely case that there exist multiple ’s that yield the same minimum objective, we assume that the algorithm returns deterministically the “first” solution (e.g., one with nonzero coefficients of the smallest atom indices) so the solution to (13) is always unique. stated earlier.

Step 3–Quantization Constraint: Given sparse code , optimize that satisfies the qbin constraint:
(14) which can be solved via quadratic programming [31].
Step 2 and Step 3 are repeated until converges. In the above procedure, the computational burden mainly lies in the sparse code search step, where the complexity of OMP increases linearly with the number of dictionary atoms.
We prove local convergence of our sparsitybased soft decoding algorithm via the following lemma.
Lemma 1.
The sparsitybased soft decoding algorithm converges to a local minimum.
Proof.
Steps 2 and 3, examining variables and separately, are both minimizing objective in (12) that is lowerbounded by 0. At step 2, given that was optimized assuming a fixed previous , the algorithm finds . If , then the objective must be smaller; otherwise, is not the objectiveminimizing argument, a contradiction. The same is true for step 3. Hence the objective is monotonically decreasing when the variables are updated to different values. Given that the objective is lowerbounded by , the objective cannot decrease indefinitely, and hence the algorithm converges to a local minimum. ∎
VC Limitation of Small KSVD Trained Dictionary
We now argue that when the size of a dictionary trained by KSVD [8] is small (to reduce complexity during code vector search via OMP), the recovery of high DCT frequencies suffers. Given a set of training patches each of size , dictionary of atoms, is trained using KSVD via the following formulation:
(15) 
The variables are both and , which are solved alternately by fixing one and solving for the other.
We write (15) in the DCT frequency domain using the Parsavel’s theorem [32]:
(16) 
where is the DCT transform for the pixel patch, and is the vector of DCT coefficients for .
We rewrite the optimization to its constrained form:
(17) 
where is a preset sparsity limit for each .
For intuition, we focus on the special case when . In this special case, the dictionary learning problem is analogous to the vector quantization (VQ) design problem [9]. Specifically, selecting atoms in dictionary is analogous to designing partition so that their union is the space of feasible signal , i.e., , and no two partitions overlap, i.e. , . The centroid of each cell is atom .
When the number of training samples tends to infinity, the sum over training patches is replaced by integration over each with probability :
(18) 
where a signal in partition (DCT frequency domain) is represented by atom (in pixel domain), thus distortion .
We have assumed in Section IV that
is a product of Laplacian distributions for individual DCT frequencies, where for high frequencies the distributions are more skewed and concentrated around zero. Thus, as the number
of atoms in dictionary (hence the number of partitions ) decreases, in order to minimize the expected squared error in (18), quantization bins will be relatively coarse for large magnitude of high frequencies, because they are less probable in comparison [33]. See Fig. 3 for an example that shows coarse quantization bins for for a product VQ of two dimensions. This explains why when the dictionary is small, recovery of large magnitude high DCT frequencies is difficult.We can observe empirically this is indeed the case. For a given KSVD trained dictionary , we compute the mean frequency (MF) of atoms as follows. Let and be a DCT frequency and an atom ’s corresponding coefficient at that frequency. MF for is computed as:
(19) 
We computed MF for dictionaries of sizes ranging from to (the dimension of an atom is if is set to ). As illustrated in Fig. 4 (a), MF is roughly linearly increasing with dictionary size. Further, as illustrated in Fig. 4 (b), the performance of sparsitybased soft decoding also improves with dictionary size. As shown in Fig. 5, we can observe that using a large dictionary can significantly improve PSNR and SSIM values of the restored image, where the improvement over a smaller dictionary mainly lies in edge structures.
Vi The GraphSignal Smoothness Prior
Given that the sparsity prior cannot recover high DCT frequencies when QP is large and the dictionary is small, we introduce the graphsignal smoothness prior, which was shown to restore piecewise smooth (PWS) signals well [10, 11, 12]. We first define a discrete signal on a graph, then define a conventional smoothness notion for graphsignals and interpret graphsignal smoothness in the graph frequency domain. Finally, we propose a new graphsignal smoothness prior based on the random walk graph Laplacian for soft decoding.
Via Signal on Graphs
We define a graph as . and are respectively the sets of vertices and edges in . in the adjacency matrix is the weight of the edge connecting vertices and . We consider only undirected graphs and nonnegative edge weights [7]; i.e., and .
A graphsignal is a collection of discrete samples on vertices of a graph , where . In our work, for each target pixel patch we construct a fullyconnected graph, where each vertex (pixel) is connected to all other vertices in . The edge weight between two vertices and is conventionally computed using Gaussian kernels [7]:
(20) 
where and are the intensity and location of pixel , and and are chosen parameters. This definition is similar to domain and range filtering weights in bilateral filter [34].
ViB Graph Laplacians and Signal Smoothness
A degree matrix is a diagonal matrix where . A combinatorial graph Laplacian is [7]:
(21) 
There exist two normalized variants of : normalized graph Laplacian and random walk graph Laplacian ,
(22) 
A normalized version means that its filtering strength does not depend on the degrees of the graph vertices.
Given , one can describe the squared variations of the signal with respect to using the graph Laplacian regularizer [10, 11]:
(23) 
A common graphsignal smoothness prior can then be defined as:
(24) 
which states that a probable signal has small . By (23), is small if has similar values at each vertex pair connected by an edge, or the edge weight is small. Thus is smooth if its variations across connected vertices are consistent with the underlying graph (where edge weights reflect expected signal variations). For example, if a discontinuity is expected at pixel pair , one can preset a small weight , so that the smoothness prior will not oversmooth during optimization. Later, we will propose a different smoothness prior than the conventional (24) with more desirable filtering properties.
ViC Frequency Interpretation of Graph Smoothness Prior
Since is a real, symmetric, positive semidefinite matrix, it can be decomposed into a set of orthogonal eigenvectors, denoted by
, with real nonnegative eigenvalues
. We define as the eigenmatrix with ’s as columns and as the diagonal matrix with ’s on its diagonal. can thus be written as:(25) 
We define the
graph Fourier transform
(GFT) matrix as . A graphsignal can be transformed to the graph frequency domain via:(26) 
With this definition, the graph Laplacian regularizer can be written as:
(27) 
can thus be written as a sum of squared GFT coefficients each scaled by the eigenvalue , which is interpreted as frequency in graph transform domain.
By maximizing the graphsignal smoothness prior in (24) (minimizing ), signal is smoothened with respect to the graph, i.e., high graph frequencies are suppressed while low graph frequencies are preserved. It is shown [10, 11, 12] that PWS signals—with discontinuities inside signals that translate to high DCT frequencies—can be well approximated as linear combinations of low graph frequencies for appropriately constructed graphs. We provide an explanation next, which leads naturally to the derivation of a new smoothness prior.
ViD Building A Random Walk Graph Laplacian Prior
We first show that low graph frequencies for the normalized graph Laplacian can be interpreted as relaxed solutions for spectral clustering, which are PWS in general. We then argue that to induce desirable filtering properties for , a similarity transformation is required, resulting in the random walk Laplacian . Finally, we show that a more appropriate smoothness prior than (24) can be defined using , with many desirable filtering properties.
ViD1 Spectral Clustering
To understand low graph frequencies for computed from a PWS signal , we take a spectral clustering perspective. Spectral clustering [35] is the problem of separating vertices in a similarity graph into two subsets of roughly the same size via spectral graph analysis. Specifically, the authors in a landmark paper [13] proposed a minimization objective normalized cut (Ncut) to divide into subsets and :
(28) 
where counts the edges across the two subsets and , and counts the degrees of vertices in :
(29) 
From (28), it is clear that a good division of into similar clusters and with small will have small and large and . However, minimizing Ncut over all possible and that divide is NPhard.
Towards an approximation, the authors first rewrote the minimization of Ncut as:
(30) 
where
(31) 
We see that is piecewise constant (PWC).
The problem constraints (31) are then relaxed, resulting in:
(32) 
The idea is that if solution satisfies (31), then . Thus a solution to (32) can be interpreted as a relaxed / approximate solution to the Ncut problem in (28).
We now use this result from [13] to interpret low graph frequencies of . We first define and . (32) can then be rewritten as:
(33) 
Note that the objective in (33) is the Rayleigh quotient with respect to matrix [36]. It is easy to see that minimizes this objective: . Hence is the first eigenvector of , and the solution to (33) is the second eigenvector of . We can thus conclude the following:
The second eigenvector of is a relaxed solution to the Ncut problem; if the solution becomes exact, then is PWC according to (31).
Eigenbasis of thus seem suitable to compactly represent PWS signals.
ViD2 Smoothness Prior using Random Walk Graph Laplacian
However, because the first eigenvector of , , is not a constant vector (DC component) in general, the eigenbasis of are not suitable for filtering of images, which tend to be smooth. We thus perform a similarity transformation^{4}^{4}4https://en.wikipedia.org/wiki/Matrix_similarity on to efficiently filter constant signals. Let , where is a matrix with eigenvectors of as columns, and is a diagonal matrix with corresponding eigenvalues on its diagonal. We now define . We see that has left eigenvectors (in rows):
(34) 
Note that because and are similar, they have the same eigenvalues . When a constant signal is projected onto left eigenvectors , is in fact in , which is orthogonal to other rows of . This means that left eigenbasis coefficients of , defined as , has nonzero only in the first element when .
However, is asymmetric, which cannot be easily decomposed into a set of orthogonal eigenvectors with real eigenvalues. Thus there is no clear graph frequency interpretation of as we did for in (27).
To achieve symmetry, suppose we use and define the following regularization term instead:
(35) 
If we write , then (35) becomes:
(36) 
Since is a diagonal matrix, one can see that:
(37) 
where and are maximum and minimum vertex degrees in graph . So instead of minimizing , we can minimize its upper bound .
We can interpret more naturally as follows. can now be written as:
(38) 
Thus we can now have a graph frequency interpretation of our regularizer : high frequencies of (or ) are suppressed to restore smooth signal . For convenience, we call our regularizer Left Eigenvector Randomwalk Graph Laplacian (LERaG).
For efficient computation, can be most efficiently computed as:
(39) 
Compared to in (24), LERaG is based on the normalized variant , and thus its filtering is insensitive to the vertex degrees of particular graphs. Further, unlike , LERaG is based on left eigenvectors of that efficiently filter constant signals, thus is suitable for image filtering. Finally, compare to [25] which requires nontrivial computation to identify transformation matrices to make row and column stochastic, LERaG can be computed simply via (39). Thus our regularizer has desirable filtering properties with little computation overhead compared to previous graphbased regularizers in the literature.
ViD3 Analysis of Ideal Piecewise Constant Signals
We now show that LERaG computes to 0 for ideal PWC signal , where:
(40) 
where constants and are sufficiently different, i.e., for some sufficiently large .
Using (20) to compute edge weights, this PWC signal implies that if and belong to different constant pieces. Assume further that is sufficiently large relative to location differences , so that the first exponential kernel in (20) alone determines . The adjacency matrix and the graph Laplacian are then both blockdiagonal:
(41) 
where is a matrix of all zeros, and and are respective adjacency and combinatorial graph Laplacian matrices for full unweighted graph of vertices:
(42) 
Clearly, normalized is also blockdiagonal:
(43) 
where is a normalized graph Laplacian for a full unweighted graph of vertices:
(44) 
We already know is the first eigenvector corresponding to eigenvalue . Further, we see also that a second (unnormalized) eigenvector corresponding to eigenvalue can be constructed as:
(45) 
It is easy to verify that and . In fact, we see that takes the form in (31); i.e., for this graph corresponding to an ideal PWC signal, the relaxed solution to Ncut is also the optimal one, which is PWC.
Analyzing coefficients , we see that can be expressed exactly as a linear combination of the first two eigenvectors of , i.e., , where
(46) 
and
(47) 
Because the first two eigenvectors and correspond to eigenvalue of , we can conclude:
Given an ideal twopiece PWC signal defined in (40), can be represented exactly using the first two eigenvectors of corresponding to eigenvalue , and hence LERaG evaluates to .
In other words, is also ideal lowpass (cutoff frequency at 0) given eigenvectors of , and so LERaG, which penalizes high frequencies according to (38), computes to . Another interpretation is that an ideal twopiece PWC signal leads to two clusters of identical vertices, hence of is an exact solution to Ncut, which is PWC (31). In this case, and are both PWC, so together they are sufficient to represent , also PWC.
ViD4 Analysis of Ideal Piecewise Smooth Signals
We now generalize our analysis of PWC signals to PWS signals. We first define a PWS signal as:
(48) 
where . In words, contains two smooth pieces in and : two samples in the same piece are similar to within , and two samples from two different pieces differ by more than . Clearly, if , then is PWC.
We now construct a complete graph for using (20) to compute edge weight between vertices and . The resulting normalized is still blockdiagonal for sufficiently large , but the diagonal blocks are no longer in (44). in this case can be computed as:
(49) 
One can verify that and . is roughly PWS according to (49), since is similar for nodes within the same smooth piece. Thus , also roughly PWS, can be well approximated as , with little energy leakage to high frequencies; i.e., is approximately lowpass given eigenvectors of . That means LERaG computes to a small value for PWS signal .
In the more general case when signal is not PWS, then is not blockdiagonal, and second eigenvector computed in (33) corresponds to an eigenvalue . This first nonzero eigenvalue is called the Fiedler number [37], and is a measure of how connected the graph is. In our signal restoration context, it means that the closer is to PWS, the closer is to , and the more likely LERaG can recover the signal well.
Vii Soft Decoding based on Prior Mixture
We now combine the priors discussed in previous sections to construct an efficient JPEG soft decoding algorithm.
Viia The Objective Function Formulation
Combining the sparsity prior and LERaG, we arrive at the following optimization for soft decoding:
(50) 
where both patch and its sparse code are unknown.
For parameter of the sparsity prior, we empirically set . For parameter of the graphsignal smoothness prior, we set it adaptively as follows. As argued in Section IVC, if a patch contains large high DCT frequencies, then it will not be recovered well by the sparsity prior. We thus adaptively increase if qbin indices indicate the presence of high DCT frequencies in target .
Since by design adjacent patches have overlaps, the pixel values in an overlapping region from two patches should be as similar as possible. To this end, upon obtaining all the optimal reconstructed patches , the softdecoded image can then be obtained by averaging over pixels in overlapped regions.
ViiB Optimization
Optimization of (50) is not convex. Instead, we propose to employ an alternating procedure to optimize and iteratively. First, for initialization we compute the Laplacian prior based MMSE solution per code block for the entire image to obtain . Then, the th () iteration of the optimization procedure is described as follows:
1) Fix and estimate :
The optimization problem becomes a standard sparse coding:
(51) 
which can be efficiently solved by some welldeveloped greedy minimization algorithm such as OMP.
2) Fix and estimate :
The objective function becomes:
(52) 
which can be solved efficiently using quadratic programming [31].
The outputted solution is used as the initial for th iteration. The graph is updated according to . The algorithm terminates when both and converge. The local convergence can be proved in a similar way as Lemma 1.
Images  JPEG  BM3D [38]  KSVD [8]  ANCE [18]  DicTV [3]  SSRQC [20]  Ours  
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  
Butterfly  22.65  0.7572  23.91  0.8266  24.55  0.8549  24.34  0.8532  23.54  0.8228  25.31  0.8764  25.82  0.8861 
Leaves  22.49  0.7775  23.78  0.8408  24.39  0.8684  24.18  0.8551  23.27  0.8245  25.01  0.8861  25.57  0.8979 
Hat  25.97  0.7117  26.79  0.7497  27.41  0.7802  27.09  0.7706  27.33  0.7707  27.27  0.7753  27.71  0.7946 
Boat  25.23  0.7054  26.31  0.7547  26.85  0.7739  26.59  0.7637  26.31  0.7491  26.85  0.7745  27.17  0.7783 
Bike  21.72  0.6531  22.60  0.7039  22.84  0.7130  22.74  0.6973  22.28  0.6952  22.96  0.7119  23.32  0.7291 
House  27.76  0.7732  28.87  0.8020  29.53  0.8185  29.07  0.8131  29.59  0.8072  29.92  0.8226  30.25  0.8237 
Flower  24.51  0.6866  25.49  0.7352  25.84  0.7471  25.54  0.7337  25.88  0.7316  25.69  0.7407  25.93  0.7521 
Parrot  26.15  0.7851  27.40  0.8329  27.22  0.8465  27.81  0.8475  27.92  0.8382  28.21  0.8566  28.25  0.8572 
Pepper512  27.17  0.7078  28.31  0.7573  29.32  0.7949  29.03  0.7891  28.81  0.7769  29.28  0.7948  29.81  0.7984 
Fishboat512  25.56  0.6563  26.44  0.6921  26.87  0.7072  26.72  0.6994  26.35  0.6963  26.86  0.7042  27.04  0.7113 
Lena512  27.32  0.7365  28.43  0.7788  29.13  0.8079  28.96  0.8024  28.51  0.7976  29.19  0.8072  29.29  0.8077 
Airplane512  26.34  0.7843  27.11  0.8101  27.62  0.8262  27.40  0.8201  26.95  0.8112  28.90  0.8261  29.19  0.8482 
Bike512  22.25  0.6231  23.12  0.6693  23.39  0.6781  23.24  0.6605  22.73  0.6559  24.60  0.6815  24.86  0.6992 
Statue512  25.72  0.6629  33.61  0.9188  27.03  0.7371  26.83  0.7265  26.51  0.7268  28.03  0.7627  28.30  0.7685 
Average  25.06  0.7157  26.58  0.7765  26.62  0.7827  26.39  0.7737  26.14  0.7645  27.01  0.7872  27.32  0.7941 
Images  JPEG  BM3D [38]  KSVD [8]  ANCE [18]  DicTV [3]  SSRQC [20]  Ours  
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  
Butterfly  29.97  0.9244  31.35  0.9555  31.57  0.9519  31.38  0.9548  31.22  0.9503  32.02  0.9619  32.87  0.9627 
Leaves  30.67  0.9438  32.55  0.9749  33.04  0.9735  32.74  0.9728  32.45  0.9710  32.13  0.9741  34.42  0.9803 
Hat  32.78  0.9022  33.89  0.9221  33.62  0.9149  33.69  0.9169  33.20  0.8988  34.10  0.9237  34.46  0.9268 
Boat  33.42  0.9195  34.77  0.9406  34.28  0.9301  34.64  0.9362  26.08  0.7550  33.88  0.9306  34.98  0.9402 
Bike  28.98  0.9131  29.96  0.9356  30.19  0.9323  30.31  0.9357  29.75  0.9154  30.35  0.9411  31.14  0.9439 
House  35.07  0.8981  36.09  0.9013  36.05  0.9055  36.12  0.9048  35.17  0.8922  36.49  0.9072  36.55  0.9071 
Flower  31.62  0.9112  32.81  0.9357  32.63  0.9271  32.67  0.9314  31.86  0.9084  33.02  0.9362  33.37  0.9371 
Parrot  34.03  0.9291  34.92  0.9397  34.91  0.9371  35.02  0.9397  33.92  0.9227  35.11  0.9401  35.32  0.9401 
Pepper512  34.21  0.8711  34.94  0.8767  34.89  0.8784  34.99  0.8803  34.24  0.8639  35.05  0.8795  35.19  0.8811 
Fishboat512  32.76  0.8763  33.61  0.8868  33.36  0.8809  33.60  0.8861  32.53  0.8496  33.68  0.8859  33.73  0.8871 
Lena512  35.12  0.9089  36.03  0.9171  35.82  0.9146  36.04  0.9177  34.85  0.8986  36.09  0.9187  36.11  0.9191 
Airplane512  33.36  0.9253  34.38  0.9361  34.36  0.9341  34.53  0.9358  33.75  0.9134  35.81  0.9355  36.07  0.9439 
Bike512  29.43  0.9069  30.47  0.9299  30.66  0.9258  30.71  0.9298  30.05  0.9043  32.26  0.9372  32.55  0.9387 
Statue512  32.78  0.9067  33.61  0.9188  33.55  0.9149  33.55  0.9193  32.53  0.8806  34.88  0.9249  34.95  0.9273 
Average  32.44  0.9097  33.52  0.9264  33.50  0.9229  33.57  0.9258  32.25  0.8945  33.91  0.9283  34.41  0.9311 
Viii Experimentation
We now present experimental results to demonstrate the superior performance of our soft decoding algorithm. We selected fourteen widely used images in the literature as test images. The first eight images are sized of , the rest six images are sized of , as illustrated in Fig. 7.
Viiia Competing Methods
Our proposed method is compared against: 1) BM3D [39]: a stateoftheart image denoising method, which is included because the restoration of compressed images can be viewed as a denoising problem; in our test, the extended version [38] is used, which achieves better performance than original BM3D. 2) KSVD [8]: a wellknown sparse coding framework; we test KSVD with a large enough overcomplete dictionary (). In contrast, our scheme uses a much smaller one (). We will demonstrate that our scheme achieves better reconstruction performance along with lower computational complexity compared with KSVD. 3) ANCE [18]: a stateoftheart JPEG image restoration algorithm. 4) DicTV [3]: a recent sparsitybased compressed image restoration algorithm, which exploits both sparsity and TV priors. 5) SSRQC [20]: a stateoftheart JPEG soft decoding method published very recently, which can be regarded as one of the best algorithms todate. The source codes of compared methods are all kindly provided by their authors.
The determination of parameters deserves clarification. Regarding the size of patches, larger patches can result in more overlapping pixels among neighboring code blocks, which will be beneficial in removing block artifcacts. However, it will also make the task of finding their sparse representations difficult. Thus we set the size of the patch to be , overlapping 2 pixels horizontally and vertically with neighboring patches, to achieve a good tradeoff. For the uncompressed training set used to obtain the overcomplete dictionary, we randomly select five images from the Kodak Lossless True Color Image Suite^{5}^{5}5 http://r0k.us/graphics/kodak/. The training set does not have any overlap with the test set.
Our method can be easily extended to restore compressed color images. When compressing color images, JPEG first performs YUV color transformation, and then compresses the resulting Y, U and V channels separately. As the image signal energy is highly packed into the luminance channel Y, we apply our proposed scheme to Y. For chrominance channels U and V, we only use the graphsignal smoothness prior to speed up the restoration process. Due to space limitation, we only report the test results on graylevel images in the following.
ViiiB Objective Performance Comparison
Table I and Table II tabulate the objective evaluations (PSNR and SSIM) of the above algorithms for fourteen test images. The test images are coded by a JPEG coder with with a small QF = 5 and a medium QF = 40.
When QF = 5, the average PSNR gains are 0.74dB and 0.7dB respectively, compared to two wellknown algorithms BM3D and KSVD. Note that KSVD uses a large enough dictionary for reconstruction. While our method uses a much smaller dictionary, it achieves better results than KSVD, which demonstrate the effectiveness of the proposed graphsignal smoothness prior. Compared to the stateoftheart ANCE algorithm, our method greatly improves the reconstruction quality. The PSNR gain is up to 1.48dB (Butterfly) and the average PSNR gain is 0.93dB. Our method also performs better than stateoftheart sparsity based methods. Compared to DicTV, designed specifically to restore compressed images, the average gain of our method is 1.18dB. Compared to SSRQC, the newest sparsitybased soft decoding method, our method achieves an average gain of 0.31dB. When QF = 40, the proposed algorithm also performs better than other methods for all test images. The average PSNR gains are 0.89dB, 0.91dB, 0.84dB, 2.16dB and 0.5dB, respectively.
We further show the test results for a wide range of QFs (from 10 to 80), illustrated in Fig. 8. Due to the space limitation, here we only report the test results over six test images, including four images and two images. We observe that our method consistently performs better than other methods.
ViiiC Subjective Performance Comparison
We now demonstrate that our soft decoding algorithm also achieves better perceptual quality of the restored images. When QF is 5, the quantization noise is severe, which leads to very poor subjective quality of JPEGcompressed images. Therefore, we use QF = 5 as an example to evaluate visual quality of different schemes.
Fig. 9 and Fig. 10 illustrate the perceptual quality for different methods. The test images Butterfly and Leaves have rich structure regions. We use them as examples to demonstrate the performance of different methods in recovering structure regions. The images reproduced by BM3D suffer from highly visible noises, especially the blocking artifacts. KSVD achieves better results than BM3D, where fewer blocking artifacts are detectable. However, KSVD cannot preserve the edge structure well. In results produced by DicTV, there are still strong blocking artifact. This is because, in DicTV, the dictionary is learned from the JPEG compressed image itself. When quantization step is large, the structure noise is also learned as atoms of dictionary. Therefore, it will enhance but not suppress the quantization noise through subsequent sparse coding based restoration. ANCE can suppress most blocking artifacts, but there are still noticeable artifacts along edges. SSRQC removes most block artifacts. However, it can be observed that the edge regions are blurred to some extent.
The images restored by our method are much cleaner, in which the structures and sharpness of edges are well preserved. Our proposed method primarily benefit from exploiting both sparsity prior for recovering textures (low DCT frequencies) and graphsignal smoothness prior for recovering structures (high DCT frequencies). The experimental results validate this point. Our proposed method can also remove DCT blocking artifacts in smooth areas completely, which are largely free of the staircase and ringing artifacts along edges. Due to space limitation, here we only show the subjective comparisons for low QF, as the superiority can be better visually reflected in these cases. Our method also achieves better subjective quality for medium to high QFs as well.
ViiiD Computational Complexity Comparison
We further report the results on computational complexity comparison. Here we show the average running time comparison on eight test images when QF = 40. The compared methods are running on a typical laptop computer (Intel Core i7 CPU 2.6GHz, 8G Memory, Win10, Matlab R2014a). As depicted in Table III, the complexity of our method is lower than the stateoftheart algorithms BM3D (BM3DSAPCA) and ANCE. Compared with KSVD, which is with a large enough overcomplete dictionary, the proposed scheme can significantly reduce the computational complexity.
TIME  BM3D  KSVD  ANCE  DicTV  SSRQC  Proposed 

Average  373.35  209.71  307.43  39.53  70.32  143.73 
ViiiE Comparison of Different Graphsignal Smoothness Prior
To demonstrate the superiority of the proposed graph Laplacian regularizer LERaG, we compare it with the popular combinational graph Laplacian regularizer, symmetrically normalized graph Laplacian regularizer and the doubly stochastic one proposed in [25]. We test eight images when QF = 5. The results reported are PSNR values after the first iteration. That is, after solving (51), we use three different graph Laplacian regularizer to formulate (52). For fairness of comparison, the regularization parameter are carefully selected for optimal performance for each case. It can be found that the proposed regularizer outperform other three ones with respect to average PSNR. Compared with other three graph Laplacian regularizers, our method can improve the PSNR performance up to 0.18dB, 0.87dB and 0.42dB, respectively.
Images  Combinatorial  Normalized  Doubly Stochastic  LERaG 

Butterfly  25.42  24.70  25.15  25.57 
Leaves  24.99  24.54  24.84  25.17 
Hat  27.53  27.42  27.43  27.56 
Boat  26.99  26.94  26.98  26.99 
Bike  23.12  23.01  23.09  23.17 
House  29.87  29.83  29.86  29.89 
Flower  25.84  25.78  25.82  25.87 
Parrot  27.97  27.95  27.97  28.02 
Average  26.46  26.27  26.39  26.53 
Ix Conclusion
In this paper, a novel soft decoding approach for the restoration of JPEGcompressed images is proposed. The main technical contribution is twofold. First, we propose a new graphsignal smoothness prior based on left eigenvectors of the random walk graph Laplacian with desirable image filtering properties, which can recover high DCT frequencies of piecewise smooth signals well. Second, we combine the Laplacian prior, sparsity prior and our new graphsignal smoothness prior into an efficient softdecoding algorithm. Experimental results demonstrate that our method achieves better objective and subjective restoration quality compared to stateoftheart soft decoding algorithms.
References
 [1] A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Transactions on Circuits and Systems for Video Technology,, vol. 2, no. 1, pp. 91–95, Mar 1992.
 [2] K. Bredies and M. Holler, “A total variationbased jpeg decompression model,” SIAM J. Img. Sci., vol. 5, no. 1, pp. 366–393, Mar. 2012.
 [3] H. Chang, M. Ng, and T. Zeng, “Reducing artifacts in jpeg decompression via a learned dictionary,” IEEE Transactions on Signal Processing,, vol. 62, no. 3, pp. 718–728, Feb 2014.

[4]
X. Liu, X. Wu, J. Zhou, and D. Zhao, “Datadriven sparsitybased restoration
of jpegcompressed images in dual transformpixel domain,” in
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2015, pp. 5171–5178.  [5] E. Lam and J. Goodman, “A mathematical analysis of the dct coefficient distributions for images,” IEEE Transactions on Image Processing,, vol. 9, no. 10, pp. 1661–1666, Oct 2000.
 [6] M. Elad, M. Figueiredo, and Y. Ma, “On the role of sparse and redundant representations in image processing,” Proceedings of the IEEE, vol. 98, no. 6, pp. 972–982, June 2010.

[7]
D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE Signal Processing Magazine,, vol. 30, no. 3, pp. 83–98, May 2013.  [8] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing,, vol. 54, no. 11, pp. 4311–4322, Nov 2006.
 [9] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Norwell, MA, USA: Kluwer Academic Publishers, 1991.
 [10] J. Pang, G. Cheung, W. Hu, and O. Au, “Redefining selfsimilarity in natural images for denoising using graph signal gradient,” in APSIPA ASC, December 2014.
 [11] J. Pang, G. Cheung, A. Ortega, and O. Au, “Optimal graph laplacian regularization for natural image denoising,” in IEEE International Conference on Acoustics, Speech and Signal Processing, April 2015.
 [12] W. Hu, G. Cheung, and M. Kazui, “Graphbased dequantization of blockcompressed piecewise smooth images,” in IEEE Signal Processing Letters, vol. 23, no.2, February 2016, pp. 242–246.
 [13] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000.
 [14] I. Reeve, H. and J. Lim, “Reduction of blocking effect in image coding,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 8, Apr 1983, pp. 1212–1215.
 [15] D. Tschumperle and R. Deriche, “Vectorvalued image regularization with pdes: a common framework for different applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 506–517, April 2005.
 [16] Y. Kwon, K. I. Kim, J. Tompkin, J. H. Kim, and C. Theobalt, “Efficient learning of image superresolution and compression artifact removal with semilocal gaussian processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1792–1805, Sept 2015.
 [17] C. Jung, L. Jiao, H. Qi, and T. Sun, “Image deblocking via sparse representation,” Signal Processing: Image Communication, vol. 27, no. 6, pp. 663 – 677, 2012.
 [18] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlappedblock transform coefficient estimation with block similarity,” IEEE Transactions on Image Processing,, vol. 22, no. 12, pp. 4613–4626, Dec 2013.
 [19] X. Liu, X. Wu, J. Zhou, and D. Zhao, “Datadriven soft decoding of compressed images in dual transformpixel domain,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1649–1659, April 2016.
 [20] C. Zhao, J. Zhang, S. Ma, X. Fan, Y. Zhang, and W. Gao, “Reducing image compression artifacts by structural sparse representation and quantization constraint prior,” IEEE Transactions on Circuits and Systems for Video Technology, vol. PP, no. 99, pp. 1–1, 2016.
 [21] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3042–3054, June 2014.
 [22] W. Hu, X. Li, G. Cheung, and O. Au, “Depth map denoising using graphbased transform and group sparsity,” in IEEE International Workshop on Multimedia Signal Processing, Pula, Italy, October 2013.

[23]
Y. Mao, G. Cheung, and Y. Ji, “Image interpolation for dibr viewsynthesis using graph fourier transform,” in
3DTVConference, July 2014, pp. 1–4. 
[24]
P. Wan, G. Cheung, D. Florencio, C. Zhang, and O. C. Au, “Image bitdepth enhancement via maximum a posteriori estimation of ac signal,”
IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2896–2909, June 2016.  [25] A. Kheradmand and P. Milanfar, “A general framework for regularized, similaritybased image restoration,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5136–5151, Dec 2014.
 [26] X. Liu, G. Cheung, X. Wu, and D. Zhao, “Interblock consistent soft decoding of jpeg images with sparsity and graphsignal smoothness priors,” in 2015 IEEE International Conference on Image Processing (ICIP), Sept 2015, pp. 1628–1632.
 [27] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard, 1st ed. Norwell, MA, USA: Kluwer Academic Publishers, 1992.
 [28] P. Wan, G. Cheung, D. Florencio, C. Zhang, and O. Au, “Image bitdepth enhancement via maximumaposteriori estimation of AC signal,” in IEEE Transactions on Image Processing, vol. 25, no.6, June 2016, pp. 2896–2909.
 [29] S. G. Mallat and Z. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec 1993.
 [30] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, Jan. 2001.
 [31] J. Nocedal and S. J. Wright, Numerical Optimization, second edition. World Scientific, 2006.
 [32] J. Rayleigh, “On the character of the complete radiation at a given temperature,” Philosophical Magazine, vol. 7, pp. 460 C–469, 1889.
 [33] P. F. Swaszek, “A vector quantizer for the laplace source,” IEEE Transactions on Information Theory, vol. 37, no. 5, pp. 1355–1365, Sep 1991.
 [34] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Sixth International Conference on Computer Vision, 1998., Jan 1998, pp. 839–846.
 [35] U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
 [36] R. A. Horn and C. R. Johnson, Eds., Matrix Analysis. New York, NY, USA: Cambridge University Press, 1986.
 [37] D. A. Spielman and S. Teng, “Spectral partitioning works: Planar graphs and finite element meshes,” Linear Algebra and its Applications, vol. 421, no. 2 C3, pp. 284–305, 2007.

[38]
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Bm3d image denoising with shapeadaptive principal component analysis,” in
Proc. Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS’09), 2009.  [39] ——, “Image denoising by sparse 3d transformdomain collaborative filtering,” IEEE Transactions on Image Processing,, vol. 16, no. 8, pp. 2080–2095, Aug 2007.
Comments
There are no comments yet.