Millions of images are now captured and viewed daily on social networks and photo-sharing sites like Facebook and Flickr111 It is estimated that 300 million photos are uploaded to Facebook a day.
It is estimated that 300 million photos are uploaded to Facebook a day.. The explosive volume increase of these images are outpacing the cost decrease of storage devices, and thus lossy image compression is still indispensable in today’s visual communication systems. The most prevalent compression format for images remains JPEG (Joint Photographic Experts Group)222https://jpeg.org/: a lossy image compression standard whose first and most popular version was finalized more than two decades ago. JPEG is a block-based transform coding scheme, where an image is first divided into non-overlapping pixel blocks, transformed via discrete cosine transform (DCT) to coefficients, then lossily quantized and entropy coded.
Broadly speaking, there are two approaches to decoding a JPEG image. Given encoded quantization bin indices of different DCT coefficients in a pixel block, hard decoding chooses the bin centers as reconstructed coefficients and performs inverse DCT to recover the block’s pixels. It is thus inevitable that when the quantization bin sizes increase (coarser quantization), the resulting reconstruction quality will worsen correspondingly.
Instead, one can take a soft decoding approach: each DCT coefficient is only constrained to be within the indexed quantization bin, and the reconstruction value is chosen with the help of signal priors. This is the approach taken in many previous works [1, 2, 3, 4] and is the approach taken in this paper. The challenge thus lies in identifying appropriate signal priors and incorporating them into an effective soft decoding algorithm.
In this paper, we combine three signal priors—Laplacian prior for DCT coefficients , sparsity prior and graph-signal smoothness prior for image patches [6, 7]—to build an efficient soft decoding algorithm for JPEG images. Specifically, for each
code block in an image, we first use the Laplacian prior—modeling the probability distributions of DCT coefficients as Laplacian—to compute an initial signal estimate in a closed form that minimizes the expected mean square error (MMSE). However, optimizing non-overlapping blocks individually can lead to block artifacts. Using a sparse signal model for a larger patch that straddles multiplecode blocks, one can remove block artifacts using an over-complete dictionary of atoms  trained offline from a large dataset of natural image patches.
The complexity of computing a code vector is high when the dictionary is large. Unfortunately, when one reduces the dictionary size, recovery of high DCT frequencies in image patches becomes poor. We explain this phenomenon by drawing an analogy to low-rate vector quantizers (VQ)  when the DCT frequencies are modeled as Laplacian distributions.
To satisfactorily recover high DCT frequencies, we design a new graph-signal smoothness prior, where the key assumption is that the desired signal (pixel patch) contains mainly low graph frequencies. Our graph-signal smoothness prior is constructed using left eigenvectors of the random walk graph Laplacian matrix (LERaG); compared to previous graph-based priors [10, 11, 12], LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of piecewise smooth (PWS) signals via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering —the closer a target signal is PWS, the more easily the pixels are divided into clusters, and the more likely LERaG can restore the signal. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.
The outline of the paper is as follows. We first review related works in Section II. We then discuss the three signal priors in order: the Laplacian prior, the sparsity prior and the graph-signal smoothness prior, in Section IV, Section V and Section VI, respectively. In Section VII, we combine the priors into a JPEG soft decoding algorithm. Finally, we present results and conclusion in Section VIII and IX, respectively.
Ii Related Work
Ii-a Soft Decoding of JPEG Images
There are two general approaches to reconstruct a compressed JPEG image in the literature: deblocking and dequantization. The purpose of deblocking is to remove the block artifacts in a JPEG image. By regarding compression noise as additive white Gaussian noise, deblocking works on decoded JPEG image directly, and recovers it in a similar way as denoising via pre-defined prior models, such as local smoothness [14, 15], Gaussian processes , sparsity , etc. However, the non-linearity of quantization operations in JPEG makes quantization noises signal dependent, far from being white and independent. Inaccurate modeling of compression artifacts limits the restoration performance of deblocking. Dequantization, also called soft decoding, treats JPEG image restoration as an ill-posed inverse problem: find the most probable transform coefficients in a code block subject to indexed quantization bin constraints, with the help of signal priors and optimization. Soft decoding utilizes all available information such as indexed quantization bin boundaries and natural image priors, and hence has more potential to get better restoration performance. Soft decoding is also the approach taken in this paper. In the following, we review some popular and state-of-the-art methods in the literature.
Many dequantization methods exploit image priors in the pixel or transform domain to address the inverse problem, such as the classical projection on convex sets (POCS) method , the total variation (TV) model , and non-local self-similarity in DCT domain . Accompanying its success in other restoration applications, the sparsity prior has also shown its promise in combating compression distortions [3, 19, 20]. For instance, Chang et al.  proposed to learn the dictionary from the input JPEG image, and use total variation and quantization constraint to further limit the solution space. However, the dictionary trained from JPEG image itself will also learn noise patterns as atoms. Therefore, in some cases, this method will enhance but not suppress the compressed artifacts. Very recently, Zhao et al.  proposed to combine structural sparse representation prior with quantization constraint. This scheme achieves state-of-the-art soft decoding performance. As depicted by the theoretical analysis in Section V-C, the sparsity prior cannot provide satisfactory high-frequency information preservation when the dictionary size is small.
Ii-B Graph Laplacian Regularizer
Most of existing methods [22, 11, 23, 24] utilize combinatorial graph Laplacian to define the smoothness prior, which is real, symmetric and positive-semidefinite. Its spectrum carries a notion of frequency. However, its filtering strength depends on the degrees of the graph vertices. Several normalized graph Laplacian versions are proposed so that their filtering strength is independent of the vertex degrees. One popular option is to normalize each weight symmetrically . The symmetrically normalized graph Laplacian has similar properties as the combinatorial Laplacian, except that it does not have the DC component. Therefore, it cannot handle constant signals well.  proposed a doubly stochastic graph Laplacian, which is symmetric and contains the DC component. However, it requires non-trivial computation to identify transformation matrices to make the rows and columns of the Laplacian matrix stochastic.
In our previous work , we used a graph-signal smoothness prior based on combinatorial graph Laplacian for soft decoding of JPEG image. In this paper, we propose a new smoothness prior using left eigenvectors of the random walk graph Laplacian matrix to overcome the drawbacks of the combinatorial graph Laplacian. Further, we provide a thorough analysis from a spectral clustering perspective to explain why our proposed graph-signal smoothness prior can recover high DCT frequencies of piecewise smooth signals.
Iii Problem Formulation
Iii-a Quantization Bin Constraint
We begin with the problem setup. In JPEG standard, each non-overlapping pixel block (called code block in the sequel) in an image is compressed independently via transform coding. Specifically, each code block in vector form is transformed via DCT to 64 coefficients , where is the transform matrix. The -th coefficient is quantized using quantization parameter (QP) —assigned a quantization bin (q-bin) index (q-index) as:
Because natural images tend to be smooth, most AC coefficients are close or equal to zero. JPEG also employs larger QPs for higher frequencies  due to human’s lesser visual sensitivities to fast changing details, resulting in compact signal representation in the DCT domain for most code blocks.
At the decoder, having received only q-index there exists an uncertainty when recovering , in particular:
This quantization constraint defines the search space for during restoration of the code block.
Iii-B Soft Decoding of JPEG Images
A JPEG standard decoder  simply chooses the q-bin centers as the reconstructed coefficients and performs inverse DCT to recover the block’s pixels. Clearly, the restoration quality will worsen as the sizes of the q-bins increase, particular at high frequencies. This approach is called hard decoding. Instead, our approach is to only constrain each DCT coefficient to be within the indexed q-bin using (2), and choose the “optimal” reconstruction value using pre-defined signal priors. This approach is called soft decoding.
It is known that block artifacts appear as unnatural high frequencies across block boundaries at low rates . To reduce block artifacts, we enforce consistency among adjacent blocks as follows. As shown in Fig. 2, we define a larger patch that encloses a smaller code block , where . Mathematically, , where binary matrix extracts pixels in corresponding to the smaller code block . is the basic processing unit containing pixels from multiple patches, and thus when recovering , pixels across patch boundaries are enforced to be consistent by averaging.
The ill-posed nature of soft decoding means that signal prior(s) is needed to differentiate among good candidate solutions ’s. We first denote by the vector of q-indices corresponding to block surrounded by (i.e., ) and the vector of corresponding QPs. Having defined , we pursue a maximum a posterior (MAP) formulation and seek the signal
with the largest posterior probability. Mathematically, MAP is formulated as:
where denotes the likelihood of observing q-indices given . can be written as:
where here means element-by-element division. Thus, the MAP formulation (3) becomes:
Iv The Laplacian Prior
Given that individual DCT coefficients of feasible solutions are constrained to be inside indexed q-bins (2), one natural choice for signal prior is the Laplacian prior, which is also defined for individual DCT coefficients. Specifically, the Laplacian prior 
states that the probability density function of a DCT coefficientis:
where is a parameter.  show that the higher the frequency, the sharper the Laplacian distribution (larger ).
Using the Laplacian prior alone, given q-bin constraints we can compute a minimum mean square error (MMSE) solution for soft decoding. In particular, each optimal DCT coefficient of a code block in a MMSE sense can be computed independently as:
By taking the derivative of (7) with respect to and setting it to zero, we obtain a closed-form solution:
The recovered code block can be obtained by performing inverse DCT on the estimated coefficients . After performing MMSE estimation for all code blocks, we get a soft decoded image.
A clear advantage of Laplacian-based soft decoding is that there is a closed-form MMSE solution computed efficiently, which by definition has a smaller expected squared error than a MAP solution, which is just the most probable solution . The closed form is possible because the Laplacian prior describes distributions of individual coefficients, which is also how the q-bin constraints are specified in (2). However, the MMSE solution (8) can only be used to recover code blocks separately, and thus cannot handle block artifacts that occur across adjacent blocks. As such, we next propose to employ the sparsity prior to provide additional a priori signal information and optimize at a larger patch level .
V The Sparsity Prior
Given that it is difficult to use the Laplacian prior directly to recover a larger patch , we first formulate a MAP problem for a patch using the sparsity prior. Then, using again the Laplacian prior we analyze the K-SVD based dictionary learning, showing that when the dictionary is small, the atoms tend to have low average DCT frequency.
V-a Sparse Signal Model
The sparsity prior  states that a signal can be well approximated by a sparse linear combination of atoms from an appropriately chosen over-complete dictionary , where the dictionary size is much larger than the signal dimension, i.e., . Mathematically we write:
where code vector contains the coefficients corresponding to atoms in , and is a small error term. is assumed to be sparse; i.e., the number of non-zero entries in is small. Dictionary can be learned offline from a large set of training patches using K-SVD .
Given signal , an optimal can be found via :
where is the -norm of ; is a parameter trading off the fidelity term with the sparsity prior. (10) is NP-hard, but the matching and basis pursuit algorithms have been shown effective in finding approximated solutions [8, 29, 30]. For instance, in , the orthogonal matching pursuit (OMP) algorithm is employed, which greedily identifies the atom with the highest correlation to the current signal residual at each iteration. Once the atom is selected, the signal is orthogonally projected to the span of the selected atoms, the signal residual is recomputed, and the process repeats.
We can write the sparsity prior as a probability function:
V-B Sparsity-based Soft Decoding
Given the prior probability in (11), we can rewrite the optimization problem in (5) as follows:
Compared to (10), there is an additional quantization constraint, which is specific to the soft decoding problem.
In (12), both signal and code vector are unknown. We can solve the problem alternately by fixing one variable and solving for the other, then repeat until convergence:
Step 1–Initial Estimation: Initialize . For example, is initialized as the closed-form solution image using the Laplacian prior described in Section IV.
Step 2–Sparse Decomposition: Given of iteration , compute the corresponding code vector by solving the following minimization problem:
using OMP333In the unlikely case that there exist multiple ’s that yield the same minimum objective, we assume that the algorithm returns deterministically the “first” solution (e.g., one with non-zero coefficients of the smallest atom indices) so the solution to (13) is always unique. stated earlier.
Step 3–Quantization Constraint: Given sparse code , optimize that satisfies the q-bin constraint:
which can be solved via quadratic programming .
Step 2 and Step 3 are repeated until converges. In the above procedure, the computational burden mainly lies in the sparse code search step, where the complexity of OMP increases linearly with the number of dictionary atoms.
We prove local convergence of our sparsity-based soft decoding algorithm via the following lemma.
The sparsity-based soft decoding algorithm converges to a local minimum.
Steps 2 and 3, examining variables and separately, are both minimizing objective in (12) that is lower-bounded by 0. At step 2, given that was optimized assuming a fixed previous , the algorithm finds . If , then the objective must be smaller; otherwise, is not the objective-minimizing argument, a contradiction. The same is true for step 3. Hence the objective is monotonically decreasing when the variables are updated to different values. Given that the objective is lower-bounded by , the objective cannot decrease indefinitely, and hence the algorithm converges to a local minimum. ∎
V-C Limitation of Small K-SVD Trained Dictionary
We now argue that when the size of a dictionary trained by K-SVD  is small (to reduce complexity during code vector search via OMP), the recovery of high DCT frequencies suffers. Given a set of training patches each of size , dictionary of atoms, is trained using K-SVD via the following formulation:
The variables are both and , which are solved alternately by fixing one and solving for the other.
where is the DCT transform for the -pixel patch, and is the vector of DCT coefficients for .
We rewrite the optimization to its constrained form:
where is a pre-set sparsity limit for each .
For intuition, we focus on the special case when . In this special case, the dictionary learning problem is analogous to the vector quantization (VQ) design problem . Specifically, selecting atoms in dictionary is analogous to designing partition so that their union is the space of feasible signal , i.e., , and no two partitions overlap, i.e. , . The centroid of each cell is atom .
When the number of training samples tends to infinity, the sum over training patches is replaced by integration over each with probability :
where a signal in partition (DCT frequency domain) is represented by atom (in pixel domain), thus distortion .
We have assumed in Section IV that
is a product of Laplacian distributions for individual DCT frequencies, where for high frequencies the distributions are more skewed and concentrated around zero. Thus, as the numberof atoms in dictionary (hence the number of partitions ) decreases, in order to minimize the expected squared error in (18), quantization bins will be relatively coarse for large magnitude of high frequencies, because they are less probable in comparison . See Fig. 3 for an example that shows coarse quantization bins for for a product VQ of two dimensions. This explains why when the dictionary is small, recovery of large magnitude high DCT frequencies is difficult.
We can observe empirically this is indeed the case. For a given K-SVD trained dictionary , we compute the mean frequency (MF) of atoms as follows. Let and be a DCT frequency and an atom ’s corresponding coefficient at that frequency. MF for is computed as:
We computed MF for dictionaries of sizes ranging from to (the dimension of an atom is if is set to ). As illustrated in Fig. 4 (a), MF is roughly linearly increasing with dictionary size. Further, as illustrated in Fig. 4 (b), the performance of sparsity-based soft decoding also improves with dictionary size. As shown in Fig. 5, we can observe that using a large dictionary can significantly improve PSNR and SSIM values of the restored image, where the improvement over a smaller dictionary mainly lies in edge structures.
Vi The Graph-Signal Smoothness Prior
Given that the sparsity prior cannot recover high DCT frequencies when QP is large and the dictionary is small, we introduce the graph-signal smoothness prior, which was shown to restore piecewise smooth (PWS) signals well [10, 11, 12]. We first define a discrete signal on a graph, then define a conventional smoothness notion for graph-signals and interpret graph-signal smoothness in the graph frequency domain. Finally, we propose a new graph-signal smoothness prior based on the random walk graph Laplacian for soft decoding.
Vi-a Signal on Graphs
We define a graph as . and are respectively the sets of vertices and edges in . in the adjacency matrix is the weight of the edge connecting vertices and . We consider only undirected graphs and non-negative edge weights ; i.e., and .
A graph-signal is a collection of discrete samples on vertices of a graph , where . In our work, for each target pixel patch we construct a fully-connected graph, where each vertex (pixel) is connected to all other vertices in . The edge weight between two vertices and is conventionally computed using Gaussian kernels :
where and are the intensity and location of pixel , and and are chosen parameters. This definition is similar to domain and range filtering weights in bilateral filter .
Vi-B Graph Laplacians and Signal Smoothness
A degree matrix is a diagonal matrix where . A combinatorial graph Laplacian is :
There exist two normalized variants of : normalized graph Laplacian and random walk graph Laplacian ,
A normalized version means that its filtering strength does not depend on the degrees of the graph vertices.
A common graph-signal smoothness prior can then be defined as:
which states that a probable signal has small . By (23), is small if has similar values at each vertex pair connected by an edge, or the edge weight is small. Thus is smooth if its variations across connected vertices are consistent with the underlying graph (where edge weights reflect expected signal variations). For example, if a discontinuity is expected at pixel pair , one can pre-set a small weight , so that the smoothness prior will not over-smooth during optimization. Later, we will propose a different smoothness prior than the conventional (24) with more desirable filtering properties.
Vi-C Frequency Interpretation of Graph Smoothness Prior
Since is a real, symmetric, positive semi-definite matrix, it can be decomposed into a set of orthogonal eigenvectors, denoted by
, with real non-negative eigenvalues. We define as the eigen-matrix with ’s as columns and as the diagonal matrix with ’s on its diagonal. can thus be written as:
We define the graph Fourier transform
graph Fourier transform(GFT) matrix as . A graph-signal can be transformed to the graph frequency domain via:
With this definition, the graph Laplacian regularizer can be written as:
can thus be written as a sum of squared GFT coefficients each scaled by the eigenvalue , which is interpreted as frequency in graph transform domain.
By maximizing the graph-signal smoothness prior in (24) (minimizing ), signal is smoothened with respect to the graph, i.e., high graph frequencies are suppressed while low graph frequencies are preserved. It is shown [10, 11, 12] that PWS signals—with discontinuities inside signals that translate to high DCT frequencies—can be well approximated as linear combinations of low graph frequencies for appropriately constructed graphs. We provide an explanation next, which leads naturally to the derivation of a new smoothness prior.
Vi-D Building A Random Walk Graph Laplacian Prior
We first show that low graph frequencies for the normalized graph Laplacian can be interpreted as relaxed solutions for spectral clustering, which are PWS in general. We then argue that to induce desirable filtering properties for , a similarity transformation is required, resulting in the random walk Laplacian . Finally, we show that a more appropriate smoothness prior than (24) can be defined using , with many desirable filtering properties.
Vi-D1 Spectral Clustering
To understand low graph frequencies for computed from a PWS signal , we take a spectral clustering perspective. Spectral clustering  is the problem of separating vertices in a similarity graph into two subsets of roughly the same size via spectral graph analysis. Specifically, the authors in a landmark paper  proposed a minimization objective normalized cut (Ncut) to divide into subsets and :
where counts the edges across the two subsets and , and counts the degrees of vertices in :
From (28), it is clear that a good division of into similar clusters and with small will have small and large and . However, minimizing Ncut over all possible and that divide is NP-hard.
Towards an approximation, the authors first rewrote the minimization of Ncut as:
We see that is piecewise constant (PWC).
The problem constraints (31) are then relaxed, resulting in:
Note that the objective in (33) is the Rayleigh quotient with respect to matrix . It is easy to see that minimizes this objective: . Hence is the first eigenvector of , and the solution to (33) is the second eigenvector of . We can thus conclude the following:
The second eigenvector of is a relaxed solution to the Ncut problem; if the solution becomes exact, then is PWC according to (31).
Eigen-basis of thus seem suitable to compactly represent PWS signals.
Vi-D2 Smoothness Prior using Random Walk Graph Laplacian
However, because the first eigenvector of , , is not a constant vector (DC component) in general, the eigen-basis of are not suitable for filtering of images, which tend to be smooth. We thus perform a similarity transformation444https://en.wikipedia.org/wiki/Matrix_similarity on to efficiently filter constant signals. Let , where is a matrix with eigenvectors of as columns, and is a diagonal matrix with corresponding eigenvalues on its diagonal. We now define . We see that has left eigenvectors (in rows):
Note that because and are similar, they have the same eigenvalues . When a constant signal is projected onto left eigenvectors , is in fact in , which is orthogonal to other rows of . This means that left eigen-basis coefficients of , defined as , has non-zero only in the first element when .
However, is asymmetric, which cannot be easily decomposed into a set of orthogonal eigenvectors with real eigenvalues. Thus there is no clear graph frequency interpretation of as we did for in (27).
To achieve symmetry, suppose we use and define the following regularization term instead:
If we write , then (35) becomes:
Since is a diagonal matrix, one can see that:
where and are maximum and minimum vertex degrees in graph . So instead of minimizing , we can minimize its upper bound .
We can interpret more naturally as follows. can now be written as:
Thus we can now have a graph frequency interpretation of our regularizer : high frequencies of (or ) are suppressed to restore smooth signal . For convenience, we call our regularizer Left Eigenvector Random-walk Graph Laplacian (LERaG).
For efficient computation, can be most efficiently computed as:
Compared to in (24), LERaG is based on the normalized variant , and thus its filtering is insensitive to the vertex degrees of particular graphs. Further, unlike , LERaG is based on left eigenvectors of that efficiently filter constant signals, thus is suitable for image filtering. Finally, compare to  which requires non-trivial computation to identify transformation matrices to make row and column stochastic, LERaG can be computed simply via (39). Thus our regularizer has desirable filtering properties with little computation overhead compared to previous graph-based regularizers in the literature.
Vi-D3 Analysis of Ideal Piecewise Constant Signals
We now show that LERaG computes to 0 for ideal PWC signal , where:
where constants and are sufficiently different, i.e., for some sufficiently large .
Using (20) to compute edge weights, this PWC signal implies that if and belong to different constant pieces. Assume further that is sufficiently large relative to location differences , so that the first exponential kernel in (20) alone determines . The adjacency matrix and the graph Laplacian are then both block-diagonal:
where is a matrix of all zeros, and and are respective adjacency and combinatorial graph Laplacian matrices for full unweighted graph of vertices:
Clearly, normalized is also block-diagonal:
where is a normalized graph Laplacian for a full unweighted graph of vertices:
We already know is the first eigenvector corresponding to eigenvalue . Further, we see also that a second (unnormalized) eigenvector corresponding to eigenvalue can be constructed as:
It is easy to verify that and . In fact, we see that takes the form in (31); i.e., for this graph corresponding to an ideal PWC signal, the relaxed solution to Ncut is also the optimal one, which is PWC.
Analyzing coefficients , we see that can be expressed exactly as a linear combination of the first two eigenvectors of , i.e., , where
Because the first two eigenvectors and correspond to eigenvalue of , we can conclude:
Given an ideal two-piece PWC signal defined in (40), can be represented exactly using the first two eigenvectors of corresponding to eigenvalue , and hence LERaG evaluates to .
In other words, is also ideal low-pass (cut-off frequency at 0) given eigenvectors of , and so LERaG, which penalizes high frequencies according to (38), computes to . Another interpretation is that an ideal two-piece PWC signal leads to two clusters of identical vertices, hence of is an exact solution to Ncut, which is PWC (31). In this case, and are both PWC, so together they are sufficient to represent , also PWC.
Vi-D4 Analysis of Ideal Piecewise Smooth Signals
We now generalize our analysis of PWC signals to PWS signals. We first define a PWS signal as:
where . In words, contains two smooth pieces in and : two samples in the same piece are similar to within , and two samples from two different pieces differ by more than . Clearly, if , then is PWC.
We now construct a complete graph for using (20) to compute edge weight between vertices and . The resulting normalized is still block-diagonal for sufficiently large , but the diagonal blocks are no longer in (44). in this case can be computed as:
One can verify that and . is roughly PWS according to (49), since is similar for nodes within the same smooth piece. Thus , also roughly PWS, can be well approximated as , with little energy leakage to high frequencies; i.e., is approximately low-pass given eigenvectors of . That means LERaG computes to a small value for PWS signal .
In the more general case when signal is not PWS, then is not block-diagonal, and second eigenvector computed in (33) corresponds to an eigenvalue . This first non-zero eigenvalue is called the Fiedler number , and is a measure of how connected the graph is. In our signal restoration context, it means that the closer is to PWS, the closer is to , and the more likely LERaG can recover the signal well.
Vii Soft Decoding based on Prior Mixture
We now combine the priors discussed in previous sections to construct an efficient JPEG soft decoding algorithm.
Vii-a The Objective Function Formulation
Combining the sparsity prior and LERaG, we arrive at the following optimization for soft decoding:
where both patch and its sparse code are unknown.
For parameter of the sparsity prior, we empirically set . For parameter of the graph-signal smoothness prior, we set it adaptively as follows. As argued in Section IV-C, if a patch contains large high DCT frequencies, then it will not be recovered well by the sparsity prior. We thus adaptively increase if q-bin indices indicate the presence of high DCT frequencies in target .
Since by design adjacent patches have overlaps, the pixel values in an overlapping region from two patches should be as similar as possible. To this end, upon obtaining all the optimal reconstructed patches , the soft-decoded image can then be obtained by averaging over pixels in overlapped regions.
Optimization of (50) is not convex. Instead, we propose to employ an alternating procedure to optimize and iteratively. First, for initialization we compute the Laplacian prior based MMSE solution per code block for the entire image to obtain . Then, the -th () iteration of the optimization procedure is described as follows:
1) Fix and estimate :
The optimization problem becomes a standard sparse coding:
which can be efficiently solved by some well-developed greedy -minimization algorithm such as OMP.
2) Fix and estimate :
The objective function becomes:
which can be solved efficiently using quadratic programming .
The outputted solution is used as the initial for -th iteration. The graph is updated according to . The algorithm terminates when both and converge. The local convergence can be proved in a similar way as Lemma 1.
|Images||JPEG||BM3D ||KSVD ||ANCE ||DicTV ||SSRQC ||Ours|
|Images||JPEG||BM3D ||KSVD ||ANCE ||DicTV ||SSRQC ||Ours|
We now present experimental results to demonstrate the superior performance of our soft decoding algorithm. We selected fourteen widely used images in the literature as test images. The first eight images are sized of , the rest six images are sized of , as illustrated in Fig. 7.
Viii-a Competing Methods
Our proposed method is compared against: 1) BM3D : a state-of-the-art image denoising method, which is included because the restoration of compressed images can be viewed as a denoising problem; in our test, the extended version  is used, which achieves better performance than original BM3D. 2) KSVD : a well-known sparse coding framework; we test KSVD with a large enough over-complete dictionary (). In contrast, our scheme uses a much smaller one (). We will demonstrate that our scheme achieves better reconstruction performance along with lower computational complexity compared with KSVD. 3) ANCE : a state-of-the-art JPEG image restoration algorithm. 4) DicTV : a recent sparsity-based compressed image restoration algorithm, which exploits both sparsity and TV priors. 5) SSRQC : a state-of-the-art JPEG soft decoding method published very recently, which can be regarded as one of the best algorithms to-date. The source codes of compared methods are all kindly provided by their authors.
The determination of parameters deserves clarification. Regarding the size of patches, larger patches can result in more overlapping pixels among neighboring code blocks, which will be beneficial in removing block artifcacts. However, it will also make the task of finding their sparse representations difficult. Thus we set the size of the patch to be , overlapping 2 pixels horizontally and vertically with neighboring patches, to achieve a good tradeoff. For the uncompressed training set used to obtain the over-complete dictionary, we randomly select five images from the Kodak Lossless True Color Image Suite555 http://r0k.us/graphics/kodak/. The training set does not have any overlap with the test set.
Our method can be easily extended to restore compressed color images. When compressing color images, JPEG first performs YUV color transformation, and then compresses the resulting Y, U and V channels separately. As the image signal energy is highly packed into the luminance channel Y, we apply our proposed scheme to Y. For chrominance channels U and V, we only use the graph-signal smoothness prior to speed up the restoration process. Due to space limitation, we only report the test results on gray-level images in the following.
Viii-B Objective Performance Comparison
Table I and Table II tabulate the objective evaluations (PSNR and SSIM) of the above algorithms for fourteen test images. The test images are coded by a JPEG coder with with a small QF = 5 and a medium QF = 40.
When QF = 5, the average PSNR gains are 0.74dB and 0.7dB respectively, compared to two well-known algorithms BM3D and KSVD. Note that KSVD uses a large enough dictionary for reconstruction. While our method uses a much smaller dictionary, it achieves better results than KSVD, which demonstrate the effectiveness of the proposed graph-signal smoothness prior. Compared to the state-of-the-art ANCE algorithm, our method greatly improves the reconstruction quality. The PSNR gain is up to 1.48dB (Butterfly) and the average PSNR gain is 0.93dB. Our method also performs better than state-of-the-art sparsity based methods. Compared to DicTV, designed specifically to restore compressed images, the average gain of our method is 1.18dB. Compared to SSRQC, the newest sparsity-based soft decoding method, our method achieves an average gain of 0.31dB. When QF = 40, the proposed algorithm also performs better than other methods for all test images. The average PSNR gains are 0.89dB, 0.91dB, 0.84dB, 2.16dB and 0.5dB, respectively.
We further show the test results for a wide range of QFs (from 10 to 80), illustrated in Fig. 8. Due to the space limitation, here we only report the test results over six test images, including four images and two images. We observe that our method consistently performs better than other methods.
Viii-C Subjective Performance Comparison
We now demonstrate that our soft decoding algorithm also achieves better perceptual quality of the restored images. When QF is 5, the quantization noise is severe, which leads to very poor subjective quality of JPEG-compressed images. Therefore, we use QF = 5 as an example to evaluate visual quality of different schemes.
Fig. 9 and Fig. 10 illustrate the perceptual quality for different methods. The test images Butterfly and Leaves have rich structure regions. We use them as examples to demonstrate the performance of different methods in recovering structure regions. The images reproduced by BM3D suffer from highly visible noises, especially the blocking artifacts. KSVD achieves better results than BM3D, where fewer blocking artifacts are detectable. However, KSVD cannot preserve the edge structure well. In results produced by DicTV, there are still strong blocking artifact. This is because, in DicTV, the dictionary is learned from the JPEG compressed image itself. When quantization step is large, the structure noise is also learned as atoms of dictionary. Therefore, it will enhance but not suppress the quantization noise through subsequent sparse coding based restoration. ANCE can suppress most blocking artifacts, but there are still noticeable artifacts along edges. SSRQC removes most block artifacts. However, it can be observed that the edge regions are blurred to some extent.
The images restored by our method are much cleaner, in which the structures and sharpness of edges are well preserved. Our proposed method primarily benefit from exploiting both sparsity prior for recovering textures (low DCT frequencies) and graph-signal smoothness prior for recovering structures (high DCT frequencies). The experimental results validate this point. Our proposed method can also remove DCT blocking artifacts in smooth areas completely, which are largely free of the staircase and ringing artifacts along edges. Due to space limitation, here we only show the subjective comparisons for low QF, as the superiority can be better visually reflected in these cases. Our method also achieves better subjective quality for medium to high QFs as well.
Viii-D Computational Complexity Comparison
We further report the results on computational complexity comparison. Here we show the average running time comparison on eight test images when QF = 40. The compared methods are running on a typical laptop computer (Intel Core i7 CPU 2.6GHz, 8G Memory, Win10, Matlab R2014a). As depicted in Table III, the complexity of our method is lower than the state-of-the-art algorithms BM3D (BM3D-SAPCA) and ANCE. Compared with K-SVD, which is with a large enough over-complete dictionary, the proposed scheme can significantly reduce the computational complexity.
Viii-E Comparison of Different Graph-signal Smoothness Prior
To demonstrate the superiority of the proposed graph Laplacian regularizer LERaG, we compare it with the popular combinational graph Laplacian regularizer, symmetrically normalized graph Laplacian regularizer and the doubly stochastic one proposed in . We test eight images when QF = 5. The results reported are PSNR values after the first iteration. That is, after solving (51), we use three different graph Laplacian regularizer to formulate (52). For fairness of comparison, the regularization parameter are carefully selected for optimal performance for each case. It can be found that the proposed regularizer outperform other three ones with respect to average PSNR. Compared with other three graph Laplacian regularizers, our method can improve the PSNR performance up to 0.18dB, 0.87dB and 0.42dB, respectively.
In this paper, a novel soft decoding approach for the restoration of JPEG-compressed images is proposed. The main technical contribution is twofold. First, we propose a new graph-signal smoothness prior based on left eigenvectors of the random walk graph Laplacian with desirable image filtering properties, which can recover high DCT frequencies of piecewise smooth signals well. Second, we combine the Laplacian prior, sparsity prior and our new graph-signal smoothness prior into an efficient soft-decoding algorithm. Experimental results demonstrate that our method achieves better objective and subjective restoration quality compared to state-of-the-art soft decoding algorithms.
-  A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Transactions on Circuits and Systems for Video Technology,, vol. 2, no. 1, pp. 91–95, Mar 1992.
-  K. Bredies and M. Holler, “A total variation-based jpeg decompression model,” SIAM J. Img. Sci., vol. 5, no. 1, pp. 366–393, Mar. 2012.
-  H. Chang, M. Ng, and T. Zeng, “Reducing artifacts in jpeg decompression via a learned dictionary,” IEEE Transactions on Signal Processing,, vol. 62, no. 3, pp. 718–728, Feb 2014.
-  X. Liu, X. Wu, J. Zhou, and D. Zhao, “Data-driven sparsity-based restoration of jpeg-compressed images in dual transform-pixel domain,” in
-  E. Lam and J. Goodman, “A mathematical analysis of the dct coefficient distributions for images,” IEEE Transactions on Image Processing,, vol. 9, no. 10, pp. 1661–1666, Oct 2000.
-  M. Elad, M. Figueiredo, and Y. Ma, “On the role of sparse and redundant representations in image processing,” Proceedings of the IEEE, vol. 98, no. 6, pp. 972–982, June 2010.
D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”IEEE Signal Processing Magazine,, vol. 30, no. 3, pp. 83–98, May 2013.
-  M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing,, vol. 54, no. 11, pp. 4311–4322, Nov 2006.
-  A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Norwell, MA, USA: Kluwer Academic Publishers, 1991.
-  J. Pang, G. Cheung, W. Hu, and O. Au, “Redefining self-similarity in natural images for denoising using graph signal gradient,” in APSIPA ASC, December 2014.
-  J. Pang, G. Cheung, A. Ortega, and O. Au, “Optimal graph laplacian regularization for natural image denoising,” in IEEE International Conference on Acoustics, Speech and Signal Processing, April 2015.
-  W. Hu, G. Cheung, and M. Kazui, “Graph-based dequantization of block-compressed piecewise smooth images,” in IEEE Signal Processing Letters, vol. 23, no.2, February 2016, pp. 242–246.
-  J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000.
-  I. Reeve, H. and J. Lim, “Reduction of blocking effect in image coding,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 8, Apr 1983, pp. 1212–1215.
-  D. Tschumperle and R. Deriche, “Vector-valued image regularization with pdes: a common framework for different applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 506–517, April 2005.
-  Y. Kwon, K. I. Kim, J. Tompkin, J. H. Kim, and C. Theobalt, “Efficient learning of image super-resolution and compression artifact removal with semi-local gaussian processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1792–1805, Sept 2015.
-  C. Jung, L. Jiao, H. Qi, and T. Sun, “Image deblocking via sparse representation,” Signal Processing: Image Communication, vol. 27, no. 6, pp. 663 – 677, 2012.
-  X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity,” IEEE Transactions on Image Processing,, vol. 22, no. 12, pp. 4613–4626, Dec 2013.
-  X. Liu, X. Wu, J. Zhou, and D. Zhao, “Data-driven soft decoding of compressed images in dual transform-pixel domain,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1649–1659, April 2016.
-  C. Zhao, J. Zhang, S. Ma, X. Fan, Y. Zhang, and W. Gao, “Reducing image compression artifacts by structural sparse representation and quantization constraint prior,” IEEE Transactions on Circuits and Systems for Video Technology, vol. PP, no. 99, pp. 1–1, 2016.
-  A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3042–3054, June 2014.
-  W. Hu, X. Li, G. Cheung, and O. Au, “Depth map denoising using graph-based transform and group sparsity,” in IEEE International Workshop on Multimedia Signal Processing, Pula, Italy, October 2013.
Y. Mao, G. Cheung, and Y. Ji, “Image interpolation for dibr viewsynthesis using graph fourier transform,” in3DTV-Conference, July 2014, pp. 1–4.
P. Wan, G. Cheung, D. Florencio, C. Zhang, and O. C. Au, “Image bit-depth enhancement via maximum a posteriori estimation of ac signal,”IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2896–2909, June 2016.
-  A. Kheradmand and P. Milanfar, “A general framework for regularized, similarity-based image restoration,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5136–5151, Dec 2014.
-  X. Liu, G. Cheung, X. Wu, and D. Zhao, “Inter-block consistent soft decoding of jpeg images with sparsity and graph-signal smoothness priors,” in 2015 IEEE International Conference on Image Processing (ICIP), Sept 2015, pp. 1628–1632.
-  W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard, 1st ed. Norwell, MA, USA: Kluwer Academic Publishers, 1992.
-  P. Wan, G. Cheung, D. Florencio, C. Zhang, and O. Au, “Image bit-depth enhancement via maximum-a-posteriori estimation of AC signal,” in IEEE Transactions on Image Processing, vol. 25, no.6, June 2016, pp. 2896–2909.
-  S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec 1993.
-  S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, Jan. 2001.
-  J. Nocedal and S. J. Wright, Numerical Optimization, second edition. World Scientific, 2006.
-  J. Rayleigh, “On the character of the complete radiation at a given temperature,” Philosophical Magazine, vol. 7, pp. 460 C–469, 1889.
-  P. F. Swaszek, “A vector quantizer for the laplace source,” IEEE Transactions on Information Theory, vol. 37, no. 5, pp. 1355–1365, Sep 1991.
-  C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Sixth International Conference on Computer Vision, 1998., Jan 1998, pp. 839–846.
-  U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
-  R. A. Horn and C. R. Johnson, Eds., Matrix Analysis. New York, NY, USA: Cambridge University Press, 1986.
-  D. A. Spielman and S. Teng, “Spectral partitioning works: Planar graphs and finite element meshes,” Linear Algebra and its Applications, vol. 421, no. 2 C3, pp. 284–305, 2007.
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Bm3d image denoising with shape-adaptive principal component analysis,” inProc. Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS’09), 2009.
-  ——, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Transactions on Image Processing,, vol. 16, no. 8, pp. 2080–2095, Aug 2007.