1 Introduction
In visual communication and computing systems, the most common cause of image degradation is arguably compression. Lossy compression, such as JPEG [25] and HEVCMSP [4], is widely adopted in image and video codecs for saving both bandwidth and indevice storage. It exploits inexact approximations for representing the encoded content compactly. Inevitably, it will introduce undesired complex artifacts, such as blockiness, ringing effects, and blurs. They are usually caused by the discontinuities arising from batchwise processing, the loss of highfrequency components by coarse quantization, and so on. These artifacts not only degrade perceptual visual quality, but also adversely affect various lowlevel image processing routines that take compressed images as input [11].
As practical image compression methods are not information theoretically optimal [24], the resulting compression code streams still possess residual redundancies, which makes the restoration of the original signals possible. Different from general image restoration problems, compression artifact restoration has problemspecific properties that can be utilized as powerful priors. For example, JPEG compression first divides an image into 8 8 pixel blocks, followed by discrete cosine transformation (DCT) on every block. Quantization is applied on the DCT coefficients of every block, with preknown quantization levels [25]. Moreover, the compression noises are more difficult to model than other common noise types. In contrast to the tradition of assuming noise to be white and signal independent [2], the nonlinearity of quantization operations makes quantization noises nonstationary and signaldependent.
Various approaches have been proposed to suppress compression artifacts. Early works [6, 22] utilized filteringbased methods to remove simple artifacts. Datadriven methods were then considered to avoid inaccurate empirical modeling of compression degradations. Sparsitybased image restoration approaches have been discussed in [7, 8, 19, 23, 26] to produce sharpened images, but they are often accompanied with artifacts along edges, and unnatural smooth regions. In [24], Liu et.al. proposed a sparse coding process carried out jointly in the DCT and pixel domains, to simultaneously exploit residual redundancies of JPEG code streams and sparsity properties of latent images. More recently, Dong et. al. [11]
first introduced deep learning techniques
[21] into this problem, by specifically adapting their SRCNN model in [12]. However, it does not incorporate much problemspecific prior knowledge.The time constraint is often stringent in image or video codec postprocessing scenarios. Lowcomplexity or even realtime attenuation of compression artifacts is highly desirable [28]. The inference process of traditional approaches, for example, sparse coding, usually involves iterative optimization algorithms, whose inherently sequential structure as well as the datadependent complexity and latency often constitute a major bottleneck in the computational efficiency [14]
. Deep networks benefit from the feedforward structure and enjoy much faster inference. However, to maintain their competitive performances, deep networks show demands for increased width (numbers of filters) and depth (number of layers), as well as smaller strides, all leading to growing computational costs
[16].In the paper, we focus on removing artifacts in JPEG compressed images. Our major innovation is to explicitly combine both the prior knowledge in the JPEG compression scheme and the successful practice of dualdomain sparse coding [24], for designing a taskspecific deep architecture. Furthermore, we introduce a OneStep Sparse Inference (1SI) module, that acts as a highly efficient and lightweighted approximation of the sparse coding inference [10]. 1SI also reveals important inner connections between sparse coding and deep learning. The proposed model, named Deep DualDomain () based fast restoration, proves to be more effective and interpretable than general deep models. It gains remarkable margins over several stateoftheart methods, in terms of both restoration performance and time efficiency.
2 Related Work
Our work is inspired by the prior wisdom in [24]. Most previous works restored compressed images in either the pixel domain [2] or the DCT domain [25] solely. However, an isolated quantization error of one single DCT coefficient is propagated to all pixels of the same block. An aggressively quantized DCT coefficient can further produce structured errors in the pixeldomain that correlate to the latent signal. On the other hand, the compression process sets most high frequency coefficients to zero, making it impossible to recover details from only the DCT domain. In view of their complementary characteristics, the dualdomain model was proposed in [24]. While the spatial redundancies in the pixel domain were exploited by a learned dictionary [2], the residual redundancies in the DCT domain were also utilized to directly restore DCT coefficients. In this way, quantization noises were suppressed without propagating errors. The final objective (see Section 3.1) is a combination of DCT and pixeldomain sparse representations, which could cross validate each other.
To date, deep learning [21] has shown impressive results on both highlevel and lowlevel vision problems [35, 36]. The SRCNN proposed by Dong et al. [12]
showed the great potential of endtoend trained networks in image super resolution (SR). Their recent work
[11]proposed a fourlayer convolutional network that was tuned based on SRCNN, named Artifacts Reduction Convolutional Neural Networks (ARCNN), which was effective in dealing with various compression artifacts.
In [14], the authors leveraged fast trainable regressors and constructed feedforward network approximations of the learned sparse models. By turning sparse coding into deep networks, one may expect faster inference, larger learning capacity, and better scalability. Similar views were adopted in [29] to develop a fixedcomplexity algorithm for solving structured sparse and robust low rank models. The paper [17] summarized the methodology of “deep unfolding”. [35] proposed deeply improved sparse coding for SR, which can be incarnated as an endtoend neural network. Lately, [34] proposed Deep Encoders, to model
sparse approximation as feedforward neural networks.
[33] further extended the same “taskspecific” strategy to graphregularized approximation. Our taskspecific architecture shares similar spirits with these works.3 Deep DualDomain () based Restoration
3.1 Sparsitybased DualDomain Formulation
We first review the sparsitybased dualdomain restoration model established in [24]. Considering a training set of uncompressed images, pixeldomain blocks {}
(vectorized from a
patch; = 64 for JPEG) are drawn for training, along with their quantized DCT coefficient blocks {}. For each (JPEGcoded) input , two dictionaries and ( and denote the dictionary sizes) are constructed from training data {} and {}, in the DCT and pixel domains, respectively, via locally adaptive feature selection and projection. The following optimization model is then solved during the testing stage:
(1) 
where is the DCT coefficient block for . and are sparse codes in the DCT and pixel domains, respectively. denotes the inverse discrete cosine transform (IDCT) operator. , and are positive scalars. One noteworthy point is the inequality constraint, where and represents the (preknown) quantization intervals according to the JPEG quantization table [25]. The constraint incorporates the important side information and further confines the solution space. Finally,
provides an estimate of the original uncompressed pixel block
.Such a sparsitybased dualdomain model (1) exploits residual redundancies (e,g, interDCTblock correlations) in the DCT domain without spreading errors into the pixel domain, and at the same time recovers highfrequency information driven by a large training set. However, note that the inference process of (1) relies on iterative algorithms, and is computational expensive. Also in (1), the three parameters , and have to be manually tuned. The authors of [24] simply set them all equal, which may hamper the performance. In addition, the dictionaries and have to be individually learned for each patch, which allows for extra flexibility but also brings in heavy computation load.
3.2 : A FeedForward Network Formulation
In training, we have the pixeldomain blocks {} after JPEG compression, as well as the original blocks {}. During testing, for an input compressed block , our goal is to estimate the original , using the redundancies in both DCT and pixel domains, as well as JPEG prior knowledge.
As illustrated in Fig. 1, the input is first transformed into its DCT coefficient block , by feeding through the constant 2D DCT matrix layer . The subsequent two layers aim to enforce DCT domain sparsity, where we refer to the concepts of analysis and synthesis dictionaries in sparse coding [15]. The Sparse Coding (SC) Analysis Module 1 is implemented to solve the following type of sparse inference problem in the DCT domain ( is a positive coefficient):
(2) 
The Sparse Coding (SC) Synthesis Module 1 outputs the DCTdomain sparsitybased reconstruction in (1), i.e., .
The intermediate output is further constrained by an auxiliary loss, which encodes the inequality constraint in (1): . We design the following signaldependent, boxconstrained [20] loss:
(3) 
Note it takes not only , but also as inputs, since the actual JPEG quantization interval [, ] depends on . The operator keeps the nonnegative elements unchanged while setting others to zero. Eqn. (3) will thus only penalize the coefficients falling out of the quantization interval.
After the constant IDCT matrix layer , the DCTdomain reconstruction is transformed back to the pixel domain for one more sparse representation. The SC Analysis Module 2 solves ( is a positive coefficient):
(4) 
while the SC Synthesis Module 2 produces the final pixeldomain reconstruction . Finally, the loss between and is enforced.
Note that in the above, we try to correspond the intermediate outputs of with the variables in (1), in order to help understand the close analytical relationship between the proposed deep architecture with the sparse codingbased model. That does not necessarily imply any exact numerical equivalence, since allows for endtoend learning of all parameters (including in (2) and in (4)). However, we will see in experiments that such enforcement of the specific problem structure improves the network performance and efficiency remarkably. In addition, the above relationships remind us that the deep model could be well initialized from the sparse coding components.
3.3 OneStep Sparse Inference Module
The implementation of SC Analysis and Synthesis Modules appears to be the core of . While the synthesis process is naturally feedforward by multiplying the dictionary, it is less straightforward to transform the sparse analysis (or inference) process into a feedforward network.
We take (2) as an example, while the same solution applies to (4). Such a sparse inference problem could be solved by the iterative shrinkage and thresholding algorithm (ISTA) [5], each iteration of which updates as follows:
(5) 
where denotes the intermediate result of the th iteration, and where is an elementwise shrinkage function ( is a vector and is its th element, ):
(6) 
The learned ISTA (LISTA) [14] parameterized encoder further proposed a natural network implementation of ISTA. The authors timeunfolded and truncated (5) into a fixed number of stages (more than 2), and then jointly tuned all parameters with training data, for a good feedforward approximation of sparse inference. The similar unfolding methodology has been lately exploited in [17], [29], [30].
In our work, we launch a more aggressive approximation, by only keeping one iteration of (5), leading to a OneStep Sparse Inference (1SI) Module. Our major motivation lies in the same observation as in [11] that overly deep networks could adversely affect the performance in lowlevel vision tasks. Note that we have two SC Analysis modules where the original LISTA applies, and two more SC Synthesis modules (each with one learnable layer). Even only two iterations are kept as in [14], we end up with a sixlayer network, that suffers from both difficulties in training [11] and fragility in generalization [31] for this task.
A 1SI module takes the following simplest form:
(7) 
which could be viewed as first passing through a fullyconnected layer (
), followed by neurons that take the form of
. We further rewrite (6) as [35] did^{1}^{1}1In (8), we slightly abuse notations, and set to be a vector of the same dimension as , in order for extra elementwise flexibility.:(8) 
Eqn. (8) indicates that the original neuron with trainable thresholds can be decomposed into two linear scaling layers plus a unitthreshold neuron. The weights of the two scaling layers are diagonal matrices defined by and its elementwise reciprocal, respectively. The unitthreshold neuron
could in essence be viewed as a doublesided and translated variant of ReLU
[21].A related form to (7) was obtained in [10] on a different case of nonnegative sparse coding. The authors studied its connections with the softthreshold feature for classification, but did not correlate it with network architectures.
3.4 Model Overview
By plugging in the 1SI module (7), we are ready to obtain the SC Analysis and Synthesis Modules, as in Fig. 3. By comparing Fig. 3 with Eqn. (2) (or (4)), it is easy to notice the analytical relationships between and (or ), and (or ), as well as and (or ). In fact, those network hyperparamters could be well initialized from the sparse coding parameters, which could be obtained easily. The entire model, consisting of four learnable fullyconnected weight layers (except for the diagonal layers), are then trained from end to end ^{2}^{2}2From the analytical perspective, is the transpose of , but we untie them during training for larger learning capability..
In Fig. 3, we intentionally do not combine into layer (also into layer ), for the reason that we still wish to keep and layers tied as elementwise reciprocal. That proves to have positive implications in our experiments. If we absorb the two diagonal layers into and , Fig. 3 is reduced to two fully connected weight matrices, concatenated by one layer of hidden neurons (8). However, keeping the “decomposed” model architecture facilitates the incorporation of problemspecific structures.
3.5 Complexity Analysis
From the clear correspondences between the sparsitybased formulation and the model, we immediately derive the dimensions of weight layers, as in Table 1.
Layer  diag()  

Stage I (DCT Domain)  
Stage II (Pixel Domain) 
3.5.1 Time Complexity
During training, deep learning with the aid of gradient descent scales linearly in time and space with the number of training samples. We are primarily concerned with the time complexity during testing (inference), which is more relevant to practical usages. Since all learnable layers in the model are fullyconnected, the inference process of is nothing more than a series of matrix multiplications. The multiplication times are counted as: ( in Stage I) + (two diagonal layers) + ( in Stage I) + ( in Stage II) + (two diagonal layers) + ( in Stage II). The 2D DCT and IDCT each takes multiplications [25] . Therefore, the total inference time complexity of is:
(9) 
The complexity could also be expressed as .
It is obvious that the sparse coding inference [24] has dramatically higher time complexity. We are also interested in the inference time complexity of other competitive deep models, especially ARCNN [11]. For their fully convolutional architecture, the total complexity [16] is:
(10) 
where is the layer index, is the total depth, is the number of filters in the th layer, is the spatial size of the filter, and is the spatial size of the output feature map.
3.5.2 Parameter Complexity
The total number of free parameters in is:
(11) 
As a comparison, the ARCNN model [11] contains:
(12) 
Compressed  SD  ARCNN  D128  D256  DBase256  
Q = 5  PSNR  24.61  25.83  26.64  26.26  27.37  25.83 
SSIM  0.7020  0.7170  0.7274  0.7203  0.7303  0.7186  
PSNRB  22.01  25.64  26.46  25.86  26.95  25.51  
Q = 10  PSNR  27.77  28.88  29.03  28.62  29.96  28.24 
SSIM  0.7905  0.8195  0.8218  0.8198  0.8233  0.8161  
PSNRB  25.33  27.96  28.76  28.33  29.45  27.57  
Q = 20  PSNR  30.07  31.62  31.30  31.20  32.21  31.27 
SSIM  0.8683  0.8830  0.8871  0.8829  0.8903  0.8868  
PSNRB  27.57  29.73  30.80  30.56  31.35  29.25  
#Param  NA  106,448  33, 280  66, 560  66, 560 
4 Experiments
4.1 Implementation and Setting
We use the disjoint training set (200 images) and test set (200 images) of BSDS500 database [3], as our training set; its validation set (100 images) is used for validation, which follows [11]. For training the D model, we first divide each original image into overlapped patches, and subtract the pixel values by 128 as in the JPEG mean shifting process. We then perform JPEG encoding on them by MATLAB JPEG encoder with a specific quality factor , to generate the corresponding compressed samples. Whereas JPEG works on nonoverlapping patches, we emphasize that the training patches are overlapped and extracted from arbitrary positions. For a testing image, we sample blocks with a stride of 4, and apply the D model in a patchwise manner. For a patch that misaligns with the original JPEG block boundaries, we find its most similar coding block from its local neighborhood, whose quantization intervals are then applied to the misaligned patch. We find this practice effective and important for removing blocking artifacts and ensuring the neighborhood consistency. The final result is obtained via aggregating all patches, with the overlapping regions averaged.
The proposed networks are implemented using the cudaconvnet package [21]. We apply a constant learning rate of 0.01, a batch size of 128, with no momentum. Experiments run on a workstation with 12 Intel Xeon 2.67GHz CPUs and 1 GTX680 GPU. The two losses, and , are equally weighted. For the parameters in Table 1, is fixed as 64. We try different values of and in experiments.
Based on the solved Eqn. (1), one could initialize , , and from , and in the DCT domain block of Fig. 1, and from , and in the pixel domain block, respectively. In practice, we find that such an initialization strategy benefits the performances, and usually leads to faster convergence.
We test the quality factor = 5, 10, and 20. For each , we train a dedicated model. We further find the easyhard transfer suggested by [11] useful. As images of low values (heavily compressed) contain more complex artifacts, it is helpful to use the features learned from images of high values (lightly compressed) as a starting point. In practice, we first train the D model on JPEG compressed images with (the highest quality). We then initialize the model with the model, and similarly, initialize model from the one.
4.2 Restoration Performance Comparison
We include the following two relevant, stateoftheart methods for comparison:

Sparsitybased DualDomain Method (SD) [24] could be viewed as the “shallow” counterpart of D. It has outperformed most traditional methods [24], such as BM3D [9] and DicTV [7], with which we thus do not compare again. The algorithm has a few parameters to be manually tuned. Especially, their dictionary atoms are adaptively selected by a nearestneighbour type algorithm; the number of selected atoms varies for every testing patch. Therefore, the parameter complexity of SD cannot be exactly computed.

ARCNN has been the latest deep model resolving the JPEG compression artifact removal problem. In [11], the authors show its advantage over SADCT [13], RTF [18], and SRCNN [12]. We adopt the default network configuration in [11]: = 9, = 7, = 1, = 5; = 64, = 32, = 16, = 1. The authors adopted the easyhard transfer in training.
For D, we test = = 128 and 256 ^{3}^{3}3from the common experiences of choosing dictionary sizes [2]. The resulting D models are denoted as D128 and D256, respectively. In addition, to verify the superiority of our taskspecific design, we construct a fullyconnected Deep Baseline Model (DBase), of the same complexity with D256, named DBase256. It consists of four weight matrices of the same dimensions as D256’s four trainable layers^{4}^{4}4DBase256 is a fourlayer neural network, performed on the pixel domain, without DCT/IDCT layers. The diagonal layers contain a very small portion of parameters and are ignored here.. DBase256 utilizes ReLU [21] neurons and the dropout technique.
We use the 29 images in the LIVE1 dataset [27] (converted to the gray scale) to evaluate both the quantitative and qualitative performances. Three quality assessment criteria: PSNR, structural similarity (SSIM) [32], and PSNRB [37], are evaluated, the last of which is designed specifically to assess blocky images. The averaged results on the LIVE1 dataset are list in Table 2.
Compared to SD, both D128 and D256 gain remarkable advantages, thanks to the endtoend training as deep architectures. As and grow from 128 to 256, one observes clear improvements in PSNR/SSIM/PSNRB. D256 has outperformed the stateoftheart ARCNN, for around 1 dB in PSNR. Moreover, D256 also demonstrates a notable performance margin over DBase256, although they possess the same number of parameters. D is thus verified to benefit from its taskspecific architecture inspired by the sparse coding process (1), rather than just the large learning capacity of generic deep models. The parameter numbers of different models are compared in the last row of Table 2. It is impressive to see that D256 also takes less parameters than ARCNN.
We display three groups of visual results, on Bike, Monarch and Parrots images, when = 5, in Figs. 2, 4 and 5, respectively. ARCNN tends to generate oversmoothness, such as in the edge regions of butterfly wings and parrot head. SD is capable of restoring sharper edges and textures. The D models further reduce the unnatural artifacts occurring in SD results. Especially, while D128 results still suffer from a small amount of visible ringing artifacts, D256 not only shows superior in preserving details, but also suppresses artifacts well.
4.3 Analyzing the Impressive Results of D
We attribute our impressive recovery of clear fine details, to the combination of our specific pipeline, the initialization, and the boxconstrained loss.
Taskspecific and interpretable pipeline The benefits of our specifically designed architecture were demonstrated by the comparison experiments to baseline encoders. Further, we provide intermediate outputs of the IDCT layer, i.e., the recovery after the DCTdomain reconstruction. We hope that it helps understand how each component, i.e., the DCTdomain reconstruction or the pixeldomain reconstruction, contributes to the final results. As shown in Fig. 6 (a)(c), such intermediate reconstruction results contain both sharpened details (see the characters in (a), which become more recognizable), and unexpected noisy patterns (see (a) (b) (c) for the blockiness, and ringingtype noise along edges and textures). It implies that Stage I DCTdomain reconstruction has enhanced the highfrequency features, yet introducing artifacts simultaneously due to quantization noises. Afterwards, Stage II pixeldomain reconstruction performs extra noise suppression and global reconstruction, which leads to the artifactfree and more visually pleasing final results.
Sparse codingbased initialization We conjecture that the reason why is more capable in restoring the text on Bike and other subtle textures hinges on our sparse codingbased initialization, as an important training detail in . To verify that, we retrain with random initialization, with the testing results in Fig. 6 (d)(f), which turn out to be visually smoother (closer to ARCNN results). For example, the characters in (d) are now hardly recognizable. We notice that the S results, as in original Fig. 25 (c), also presented sharper and more recognizable texts and details than ARCNN. These observations validate our conjecture. So the next question is, why sparse coding helps significantly here? The quantization process can be considered as as a lowpass filter that cuts off highfrequency information. The dictionary atoms are learned from offline highquality training images, which contain rich highfrequency information. The sparse linear combination of atoms is thus richer in highfrequency details, which might not necessarily be the case in generic regression (as in deep learning).
Boxconstrained loss The loss (3) acts as another effective regularization. We retrain without the loss, and obtain the results in Fig. 6 (g)(i). It is observed that the boxconstrained loss helps generate details (e.g., comparing characters in (g) with those in Fig. 2 (f)), by bounding the DCT coefficients, and brings PSNR gains.
4.4 Running Time Comparison
The image or video codecs desire highly efficient compression artifact removal algorithms as the postprocessing tool. Traditional TV and digital cinema business uses frame rate standards such as 24p (i.e., 24 frames per second), 25p, and 30p. Emerging standards require much higher rates. For example, highend HighDefinition (HD) TV systems adopt 50p or 60p; the UltraHD (UHD) TV standard advocates 100p/119.88p/120p; the HEVC format could reach the maximum frame rate of 300p [1]. To this end, higher time efficiency is as desirable as improved performances.
ARCNN  D128  D256  DBase256  

Q = 5  396.76  7.62  12.20  9.85 
Q = 10  400.34  8.84  12.79  10.27 
Q = 20  394.61  8.42  12.02  9.97 
We compare the averaged testing times of ARCNN and the proposed D models in Table 3, on the LIVE29 dataset, using the same machine and software environment. All running time was collected from GPU tests. Our best model, D256, takes approximately 12 ms per image; that is more than 30 times faster than ARCNN. The speed difference is NOT mainly caused by the different implementations. Both being completely feedforward, ARCNN relies on the timeconsuming convolution operations while ours takes only a few matrix multiplications. That is in accordance with the theoretical time complexities computed from (9) and (10), too. As a result, D256 is able to process 80p image sequences (or even higher). To our best knowledge, D is the fastest among all stateoftheart algorithms, and proves to be a practical choice for HDTV industrial usage.
5 Conclusion
We introduce the D model, for the fast restoration of JPEG compressed images. The successful combination of both JPEG prior knowledge and sparse coding expertise has made D highly effective and efficient. In the future, we aim to extend the methodology to more related applications.
References
 [1] https://en.wikipedia.org/wiki/Frame_rate/.
 [2] M. Aharon, M. Elad, and A. Bruckstein. Ksvd: An algorithm for designing overcomplete dictionaries for sparse representation. TSP, 54(11):4311–4322, 2006.
 [3] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. TPAMI, 33(5):898–916, 2011.
 [4] E. A. Ayele and S. Dhok. Review of proposed high efficiency video coding (hevc) standard. International Journal of Computer Applications, 59(15):1–9, 2012.
 [5] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications, 14(56):629–654, 2008.
 [6] K. Bredies and M. Holler. A total variationbased jpeg decompression model. SIAM Journal on Imaging Sciences, 5(1):366–393, 2012.
 [7] H. Chang, M. K. Ng, and T. Zeng. Reducing artifacts in jpeg decompression via a learned dictionary. TSP, 2014.
 [8] I. Choi, S. Kim, M. S. Brown, and Y.W. Tai. A learningbased approach to reduce jpeg artifacts in image matting. In ICCV, pages 2880–2887. IEEE, 2013.
 [9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3d transformdomain collaborative filtering. TIP, 16(8):2080–2095, 2007.
 [10] M. Denil and N. de Freitas. Recklessly approximate sparse coding. arXiv preprint arXiv:1208.0959, 2012.
 [11] C. Dong, Y. Deng, C. C. Loy, and X. Tang. Compression artifacts reduction by a deep convolutional network. ICCV, 2015.
 [12] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image superresolution. In ECCV, pages 184–199. Springer, 2014.
 [13] A. Foi, V. Katkovnik, and K. Egiazarian. Pointwise shapeadaptive dct for highquality denoising and deblocking of grayscale and color images. TIP, 2007.
 [14] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In ICML, pages 399–406, 2010.
 [15] S. Gu, L. Zhang, W. Zuo, and X. Feng. Projective dictionary pair learning for pattern classification. In NIPS, pages 793–801, 2014.
 [16] K. He and J. Sun. Convolutional neural networks at constrained time cost. In CVPR, 2015.
 [17] J. R. Hershey, J. L. Roux, and F. Weninger. Deep unfolding: Modelbased inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574, 2014.
 [18] J. Jancsary, S. Nowozin, and C. Rother. Lossspecific training of nonparametric image restoration models: A new state of the art. In ECCV, pages 112–125. Springer, 2012.
 [19] C. Jung, L. Jiao, H. Qi, and T. Sun. Image deblocking via sparse representation. Signal Processing: Image Communication, 27(6):663–677, 2012.
 [20] D. Kim, S. Sra, and I. S. Dhillon. Tackling boxconstrained optimization via a new projected quasinewton approach. SIAM Journal on Scientific Computing, 2010.
 [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
 [22] K. Lee, D. S. Kim, and T. Kim. Regressionbased prediction for blocking artifact reduction in jpegcompressed images. TIP, 14(1):36–48, 2005.
 [23] X. Liu, G. Cheung, X. Wu, and D. Zhao. Interblock soft decoding of jpeg images with sparsity and graphsignal smoothness priors. In ICIP. IEEE, 2015.
 [24] X. Liu, X. Wu, J. Zhou, and D. Zhao. Datadriven sparsitybased restoration of jpegcompressed images in dual transformpixel domain. In CVPR, 2015.
 [25] W. B. Pennebaker and J. L. Mitchell. JPEG: Still image data compression standard. Springer Science & Business Media, 1993.
 [26] R. Rothe, R. Timofte, and L. Van Gool. Efficient regression priors for reducing image compression artifacts. In IEEE ICIP, 2015.
 [27] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. Live image quality assessment database release 2, 2005.

[28]
M.Y. Shen and C.C. Jay Kuo.
Realtime compression artifact reduction via robust nonlinear filtering.
In ICIP, volume 2, pages 565–569. IEEE, 1999.  [29] P. Sprechmann, A. Bronstein, and G. Sapiro. Learning efficient sparse and low rank models. TPAMI, 2015.
 [30] P. Sprechmann, R. Litman, T. B. Yakar, A. M. Bronstein, and G. Sapiro. Supervised sparse analysis and synthesis operators. In NIPS, pages 908–916, 2013.
 [31] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
 [32] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 13(4):600–612, 2004.
 [33] Z. Wang, S. Chang, J. Zhou, M. Wang, and T. S. Huang. Learning a taskspecific deep architecture for clustering. SDM, 2016.
 [34] Z. Wang, Q. Ling, and T. Huang. Learning deep encoders. AAAI, 2016.
 [35] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks for image superresolution with sparse prior. ICCV, 2015.
 [36] Z. Wang, Y. Yang, Z. Wang, S. Chang, W. Han, J. Yang, and T. Huang. Selftuned deep super resolution. In IEEE CVPR Workshops, pages 1–8, 2015.
 [37] C. Yim and A. C. Bovik. Quality assessment of deblocked images. TIP, 20(1):88–98, 2011.
Comments
There are no comments yet.