A Comprehensive Benchmark for Single Image Compression Artifacts Reduction

09/09/2019 ∙ by Jiaying Liu, et al. ∙ 16

We present a comprehensive study and evaluation of existing single image compression artifacts removal algorithms, using a new 4K resolution benchmark including diversified foreground objects and background scenes with rich structures, called Large-scale Ideal Ultra high definition 4K (LIU4K) benchmark. Compression artifacts removal, as a common post-processing technique, aims at alleviating undesirable artifacts such as blockiness, ringing, and banding caused by quantization and approximation in the compression process. In this work, a systematic listing of the reviewed methods is presented based on their basic models (handcrafted models and deep networks). The main contributions and novelties of these methods are highlighted, and the main development directions, including architectures, multi-domain sources, signal structures, and new targeted units, are summarized. Furthermore, based on a unified deep learning configuration (i.e. same training data, loss function, optimization algorithm, etc.), we evaluate recent deep learning-based methods based on diversified evaluation measures. The experimental results show the state-of-the-art performance comparison of existing methods based on both full-reference, non-reference and task-driven metrics. Our survey would give a comprehensive reference source for future research on single image compression artifacts removal and inspire new directions of the related fields.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 9

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Lossy compression, such as JPEG [1], HEVC (High Efficiency Video Coding) [2], Advanced Video Coding (AVC) [3], has been widely used in image and video codecs to reduce information redundancy in transmission and storage process to save bandwidth and resource. Based on human visual properties, the codecs make use of redundancies in spatial, temporal and transform domains to provide approximations of encoded content compactly. They effectively reduce the bit-rate cost but inevitably lead to unsatisfactory visual artifacts, e.g.

blockiness, ringing and banding. These artifacts are derived from the high-frequency detail loss in the quantization process and the discontinuities caused by block-wise batch processing. These artifacts not only degrade user visual experience, but also are detrimental for the successive image processing and computer vision tasks.

In our work, we focus on the degradation of compressed images. The degradation configurations of two codecs are considered: JPEG and HEVC. Most modern codecs first divide the whole image into blocks, which sometimes have a fixed size, e.g. JPEG, while others have different sizes, e.g. HEVC. Then, transformations, e.g. discrete cosine transformation (DCT) and discrete sine transformation (DST) etc., follow to convert each block into transformed coefficients with more compact energy and sparser distributions than those in the spatial domain. After that, quantization is applied on the transformed coefficients, based on the pre-defined quantization steps, to remove the signal components that have less significant influence on the human visual system. The quantization intervals are usually much larger on high-frequency components than those on low-frequency components, because human visual system is less capable of distinguishing high frequency components. Noting that the quantization step is the main reason causing artifacts. After quantization, the boundaries between blocks become discontinuous. Thus, blocking artifacts are generated. Blurring is caused by the loss of high-frequency components. In regions including sharp edges, the ringing artifacts become visible. When the quantization step becomes larger, the reconstructed images suffer from severe distortions. Noticeable banding effects appear in smooth regions over the image.

Lots of efforts are dedicated to the restoration of compressed images. Early preliminary works [4, 5] perform filtering along the boundaries to remove simple artifacts. After that, data-driven methods are then proposed to learn the inverse mapping of compression degradations to remove artifacts. These methods are proposed in two directions: 1) one for better inference models, e.g., sparse coding [6] and deep networks [7]; 2) the other for utilizing better priors and side information [8, 9]. In recent years, the emergence of deep learning [7] largely improves the restoration capacity of the data-driven methods, with its excellent nonlinear modeling capacity. More advanced network architectures, e.g. dense residual network [10], are put forward and more strong side information, e.g. partition mask [11], is employed for compression artifacts removal. Besides these two common factors, for learning-based approaches, the training configurations and protocols, including training data, losses, optimization approaches, data generation, and details of codecs etc., also have large effects on the final performance. Thus, the related detail changes in these factors also contribute to the performance gains.

Despite their promising results, there are several neglected issues in previous methods. First, there is no unified framework to understand and sort out all previous methods. A survey that compares and summarizes these methods with a simple and integrated view is needed. Second, inconsistent experimental configurations and protocols are employed in different works. There is a lack of benchmarking efforts on state-of-the-art algorithms on a large-scale public dataset. Last, previous datasets do not cover 4K resolution images, which in fact sets a barrier to compare the performance of different methods in recently more and more popular ultra high definition display devices.

Fig. 1: Milestones of compressed image restoration methods, including filter based artifacts removal, probabilistic priors based artifacts removal, deep learning-based artifacts removal, and deep learning-based loop filter. The time period up to 2015 is dominated by handcrafted methods, including filter-based and probabilistic priors-based artifacts removal. The emergence of ARCNN [7] changes the development of this domain. A turning point is observed in 2015. After that, deep learning-based methods play a major role in the next several years. It is observed that, the years 2017 and 2018 welcome the blooming of deep learning-based artifacts removal and loop filters.
Fig. 2: Example images sampled from LIU4K. (a) Training set. (b) Testing set.

Our work paper is directly motivated to address the above issues, and makes four-fold technical contributions:

  • The first contribution of this paper is to provide a comprehensive survey on compression artifacts removal methods. Our survey provides a holistic view on most of the existing methods. Particular emphasis is placed on the deep learning-based single-image compression artifacts removal methods as they offer the state-of-the-art performance and own much flexibility for further improvements.

  • We introduce a new single image compression artifact removal benchmark, called the Large-scale Ideal Ultra high definition 4K (LIU4K) dataset. It is the first dataset including 4K images as training and testing images serving for image restoration. It is also more large-scale than other existing datasets that include high definition images. Our LIU4K provides a better foundation to evaluate performance of different methods, especially in recent popular ultra high-resolution display devices.

  • We conduct a systematic and extensive range of experiments to compare state-of-the-art methods quantitatively with diversified measures, on the new LIU4K dataset as well as previous commonly used datasets with a unified experimental setting, including the same training data, optimization method, and loss function et al. The thorough evaluation and analysis show the performance and limitations of state-of-the-art methods. The new rich insights inspire new research directions.

  • We also explore to generalize some constraints and training strategies from JPEG artifacts removal to the general compression artifacts removal. Three strategies, including dense DCT transform constraints, mixed batches with the patches having different sizes, and gradually expanded patch sizes are used in our experimental setting. These strategies are also common to benefit future compression artifacts removal methods.

Ii A New Dataset for Restoration: LIU4K

Ii-a Previous Datasets

We first review existing testing and training datasets: 1) Testing: BSD100, Kodak, DIV2K-test, Set5, Set14, Classic5, and Twitter; 2) Training: BSD400, DIV2K-train, and

Mini-ImageNet

.

Kodak111http://r0k.us/graphics/kodak/. It is a very representative dataset proposed in 1991 including 24 digital color images extracted from a wide range of films. After that, most image processing methods have been proposed, optimized, and evaluated based on this dataset. The resolution of images is 768512 or 512768.

BSD400 and BSD100. These two datasets are two parts of BSD500 [12], which is originally designed for semantic segmentation. It covers a large variety of real-life scenes, with 200 training image, 200 validation image, and 100 testing images. The image resolution is 321481 or 481321. For image restoration, we combine the training and validation sets of BSD500 as the training set of restoration and use its testing set for the restoration evaluation.

DIV2K [13]. There are 1,000 images whose resolution is 2K. It includes 800 images for training, 100 images for validation, and 100 images for testing. Sizes of the images are around 20001000 or 10002000.

The max length of the height and width of an image is 2,040 and the other one is greater than 1,000. It is a milestone dataset for image super-resolution and supports the NTIRE Challenge

222http://www.vision.ee.ethz.ch/ntire17/ which uncovers preludes of challenges in low-level image enhancement.

Set5 [14] and Set14 [15]. They are two effective small-scale datasets to evaluate image restoration quality and usually provide consistent evaluation results to those on large-scale datasets. The resolution of Set5 is less than 500 500. The image size of Set14 is greater than and less than .

Classic5 [16]. Classic5 dataset includes five represented images used for evaluating compressed image restoration. Their resolutions are .

Twitter [7]. 40 images compressed by the Twitter platform whose sizes vary from 3,264 2,448 to 600 450. The included artifacts are highly complex because the compression process includes the rescaling operation.

Mini-ImageNet [17]. The dataset used to train SRGAN in [17] including 300,000 images sampled from ImageNet. The smallest size is less than 5050, and the maximum size is larger than 4,0003,000. Although it might lead to superior performance of the restoration models, it is very time and resource-consuming to train with it.

Dataset Number Resolution Train/Test Features
Kodak 24 768512 Test Earliest Milestone
Set5 5 500 500 Test Small and Effective
Set14 14
250 250 -
500 500
Test Small, Effective
Classic5 5 Test Small, Effective
BSD500
200/200/100
Train / Test
Validation
Middle Scale with
Abundant Texture
Mini-
ImageNet
300,000
5050-
4,0003,000
Train Very Large Scale
Twitter 40 600 450 Test Complex Degradation
DIV2K
800/100/100
2,040 1,000
Train & Test
Validation
Large-Scale with
2K Images
LIU4K 1,500/200 3,264 2,448 Train&Test
Large-Scale with
4K Images
TABLE I: The summary of different datasets for compression artifact removal.
Dataset BPP PPI (10) Number ENTROPY BRISQUE ENIQA () NIQE
Set5 0.52 (0.004) 1.13 5 6.84 (0.548) 33.58 (47.82) 0.1393 (74.75) 4.79 (4.62)
Classic5 0.65 (0.004) 2.62 5 7.37 (0.037) 22.99 (146.58) 0.3696 (3.46) 5.08 (1.62)
Kodak 0.56 (0.006) 3.93 24 6.93 (0.175) 6.22 (17.32) 0.0202 (6.35) 2.95 (0.21)
LIVE1 0.58 (0.007) 3.57 39 7.14 (0.107) 5.01 (11.82) 0.0218 (5.80) 2.87 (0.18)
Set14 0.58 (0.013) 2.30 14 6.74 (0.605) 26.29 (147.77) 0.1421 (194.00) 4.71 (2.69)
BSD100 0.60 (0.010) 1.54 100 6.94 (0.258) 20.01 (84.39) 0.0801 (97.13) 3.09 (0.76)
DIV2K 0.53 (0.011) 28.35 100 7.02 (0.748) 23.64 (131.93) 0.0925 (47.01) 3.18 (2.05)
LIU4K 0.92 (0.0298) 132.79 200 7.43 (0.039) 15.98 (73.86) 0.0036 (32.02) 2.39 (0.24)
TABLE II: The statistical comparisons of different testing sets.

The numbers in brackets denote the variance. The best results are denoted in bold.

The summarization of all these datasets are provided in Table I.

Method Published Category Inference Model Priors / Side Information Basic Idea
Minami and Zakhor TCSVT-1995 [18] Filter
Linear model (constrained
quadratic programming)
Mean squared difference
of slope (MSDS)
Reducing the expected MSDS
Deblocking Fliter
TCSVT-2012 [68] Filter
Hand-crafted metrics
based boundary classification
Coding information
(PU/TU, intra mode,

motion vector

etc.)
Divide blocking boundaries into
different types and accordingly
choose diffident kinds of deblocking
filters.
Field of Expert TIP-2007 [20] Probabilistic prior
MAP framework

High-order Markov model

Quantization tables
The original image is modeled as a
high order MRF with learned
potential functions
Transform Domain
Non-Local Similarity
ICME-2012 [21]
TIP-2013 [22]
Probabilistic prior
MAP framework
Adaptive parameter selection
Nonlocal similarity
The decoded coefficient and its

nonlocal estimation are fused

adaptively
Low-Rank Minimization
DCC-2013 [23] Probabilistic prior
Patch clustering
Singular value thresholding
Local sparsity
Low-rank prior
Similar patches are clustered
and reconstructed by low-rank
minimization
Sparse Coding with
Total Variation
TSP-2014 [6] Probabilistic prior
Sparse coding
Sparsity prior
Total variations
Combination of sparse
representation and total variations
Dual Domain
Sparse Representation
CVPR-2015 [24]
TIP-2016 [25]
Probabilistic prior
Sparse coding
Spatial domain
DCT domain
External data
Sparse representations
jointly in dual domains
augmented by external data
SA-DCT Transform
TIP-2007 [16] Probabilistic prior
Wiener filtering
in SA-DCT domain
SA-DCT transform
Structural constraint
The transform uses
adaptive supports which
leads to better edge reconstruction
Adaptive Low-Rank
Minimization
TIP-2016 [26] Probabilistic prior
Patch clustering
Singular value thresholding
Local sparsity
Low-rank prior
Transform coefficient variance
Quantization step
The thresholds in SVT are
adaptively determined
TABLE III: An Overview of Existing Works on Non-Deep JPEG Artifacts Removal.
Method Published Inference Model Priors / Side Information Basic Idea
Artifacts Reduction
CNN
ICCV-2015 [7]
Three-layer CNN
/
The first work introducing deep models to the topic
Trainable Nonlinear
Reaction Diffusion
TPAMI-2017 [27]
Trainable nonlinear
diffusion model
/
The proposed nonlinear diffusion model is unrolling
into a deep network
D Model
CVPR-2016 [8]
Learned iterative shrinkage
and thresholding with DCT
layers
DCT domain constraint
Sparsity constraint
The proposed nonlinear diffusion model is unrolling into a
deep network
Denoising CNN
TIP-2017 [28]
CNN with residual learning

and batch normalization

/
The combination of residual learning, batch normalization,
and Adam optimization
Dual-Domain CNN
ECCV-2016 [9]
A two-branch CNN
Range of DCT coefficients
A two-branch CNN works in pixel and DCT domains and
finally aggregates their information
Residual Encoder-
Decoder Network
NIPS-2016 [29]
Encoder-decoder with
skip connections
/
An encoder-decoder with symmetric skip connections
Compression Artifact
Suppression CNN
IJCNN-2017 [30]
Encoder-decoder with
skip connections
Multi-scale losses
An encoder-decoder constrained by multi-scale losses
One-to-Many
Network
CVPR-2017 [31]
ResNet
Shift-and-average strategy
Perceptual loss
Adversarial loss
JPEG loss
A ResNet takes input as random noise and a compressed
image, and its output is constrained by three losses
MemNet
ICCV-2017 [32]
DenseNet architecture
Memory block
Multi-supervision
Long-term memory
The network is stacked by memory blocks, consisting of a
recursive unit and a gate unit to learn explicit persistent
memories
DMCNN
ICIP-2018 [33]
A two-branch auto-encoder
with dilated convolution
DCT domain constraint
Multi-scale loss
It integrates the dual domain architecture (DCT and spatial
domains), DCT loss and multi-scale loss
Multi-level
Wavelet-CNN
CVPRW-2018 [34]
Encoder-decoder with
Wavelet transforms
Wavelet signal structure
Wavelet transforms are introduced into CNN architecture
Dual Pixel-Wavelet
Domain Deep CNN
CVPRW-2018 [35]
A two-branch CNN
with Wavelet transforms
Dual domains
Wavelet signal structure
A two-branch CNN is constructed to make use of both
redudenancy in pixel and frequency domains
VRCNN
MMM-2017 [36]
Variable-filter-size
Residue-learning
/
The designed CNN owns variable filter size to learn the
residual between input and target frames
Deep CNN-based
Auto Decoder
DCC-2017 [37]
ResNet
TU size
A ResNet is used for quality enhancement in the decoder end
Partition Mask CNN
ICIP-2018 [11]
ResNet
CU size
The CU size is utilized and integrated with distorted decoded
frame
Residual High-way
CNN
TIP-2018 [38]
Highway network
QP range
Residual highway CNNs trained delicately for each QP range
MLSDRN
DCC-2018 [39]
Multi-channel long-short
term dependency residual
network
Block boundary
Multi-channel
MLSDRN uses an update cell to adaptively store and select
the long-term and short-term dependency
Adversarial
Intra Coding
ICASSP-2018 [40]
Multi-scale structure
Adversarial learning
A multi-level progressive refinement network with adversarial
learning
Decoder-Side
Scalable CNN
ICME-2017 [41]
Two-branch scalable CNN
/
The network has two branches. A group of switches will
control whether the complicated one is activated.
Practical CNN
ICIP-2018 [42]
Compressed fixed
point CNN
QP
The network also takes QP as input. After training, the model
is compressed and concerted into fixed point format.
Multi-Scale
Deep Decoder
DCC-2018 [43]
CNN
Multi-scale LSTM
/
Each frame is fed into a CNN, then a multi-scale LSTM is
connected to fuse multi-frame redundancies.
MF Quality Enhancer
CVPR-2018 [44]
SVM classier
CNN-based alignment
CNN-based enhancer
Neighboring peak
quality frames
The neighboring high quality frames are fed into a CNN to
facilitate inference of enhanced frames.
Separable CNN filter
JVET-K0158 [45]
SE block
Separable convolution
Normalized Y/U/V
Normalized QP
The network takes as input Normalized Y/U/V and QP and
consists of SE blocks and separable convolutions.
Dense Residual Network
VCIP-2018 [10]
Dense shortcuts
Residual learning
Bottleneck layer
/
The network consists of dense shortcuts, residual learning,
and bottleneck layers.
CU Classification
VCIP-2018 [72]
Multiple variable-filter-size
residue-learning
CU classification

A classifier is employed to decide whether to use VRCNN-ext

for each coding unit.
Progressive Rethinking
Network
ICIP-2019 [70]
Progressive Rethinking
Block and Network
Multi-scale mean
value of CUs
The progressive rethinking network is built to take multi-scale
mean value of coding units as side information.
Coding Prior based
High Efficiency Restoration
ICIP-2019 [71]
Weight Normalization
Unfiltered frame
Prediction frame
An EDSR-like network takes the unfiltered and prediction
frames as side information and is trained with weight
normalization.
Content-Aware CNN
TIP-2019 [73]
Context-based
model selection
Clusters based on
quality ranking
The discriminative model is learned to analyze the region
content for model selection. An iterative training is proposed
to label filter categories and fine-tune CNN models.
TABLE IV: An Overview of Existing Works on Deep Compression Artifacts Removal.

Ii-B LIU4K Dataset

The main characteristics of the LIU4K dataset and previous datasets serving for image restoration are listed in Table II. LIU4K has several unprecedented superiorities as follows,

  • High-resolution definition. Compared to previous datasets, the resolution of the images in our dataset is 28484288, larger than those in previous datasets, which offers abundant materials for testing and evaluating the performance in 4K/8K display devices.

  • Large-scale. Our dataset is large-scale. Our training and testing images include 1,541 and 200 4K images, much more than those in previous datasets. Thus, training and evaluation based on LIU4K are more comprehensive and balanced.

  • Diversified and complex signals. As shown in Table II, our dataset achieves the best results in entropy-driven non-reference metrics, which demonstrates its signal diversity and complexity.

  • High visual quality. LIU4K wins in general purpose non-reference metrics (except for Kodak and LIVE1, which are the training set of some metrics) as shown in Table II, which confirms its high visual quality.

We perform statistical comparisons to demonstrate the superiority of LIU4K dataset. Entropy, BPP (Bits Per Pixel) and PPI (Pixels Per Image) are used to indicate the amount of information included in each dataset. Three non-reference image quality assessment metrics are utilized to assess the perceptual image quality, including Entropy, natural image quality evaluator (NIQE) [55], blind/referenceless image spatial quality evaluator (BRISQE) [56], and entropy-based image quality assessment (ENIQA). The entropy is estimated following the most primitive calculation based on per-pixel independent distribution [69]. The bits used to calculate BPP values are estimated by the compressing the gray version of an image into a PNG image. The work in [57] has shown that, the non-reference image quality assessment metrics are highly correlated to human perception and are superior to some full-referenced measures in visual quality. In our work, we calculate values of NIQE, BRISQE, and ENIQA with the codes provided by their authors with the default settings. For NIQE, BRISQE and ENIQA, small values indicate better image qualities.

As listed in Table II, LIU4K is more large-scale than previous datasets in scale and resolution. From the perspective of information theory, the images in LIU4K are more informative. Its mean BPP and entropy values are greater, which means that the dataset contains more information. For perceptual image quality assessment, LIU4K also achieves very competitive scores in BRISQUE, ENIQA, and NIQE. Note that, the values of BRISQUE and ENIQA of LIU4K are worse than those of Kodak and LIVE1, since the metrics are trained on these two datasets and their scales are relative small. These assessments indicate that images in LIU4K are of relatively high perceptual quality and suitable for image restoration tasks.

Iii Algorithm Survey

The approaches designed for compression artifacts removal, namely loop filters in codecs, are proposed in literatures. There are four categories in our review: filter-based methods, probabilistic-prior based methods, deep learning-based JPEG artifacts removal methods, and deep learning-based loop filter methods. The first and last two categories are summarized in Table III and IV, respectively. We will review the four categories and then briefly summarize their technical improvements. Note that, the technologies discussed in our work can be applied without the change of the existing pipeline of codecs.

Iii-a Filtering-based method

The earliest methods [46, 47] perform filtering operations to remove compression artifacts. Later approaches [18, 2] attempt to infer the parameters of filtering operations adaptively. Minami and Zakhor [18] observed that quantization of the DCT coefficients of two neighboring blocks increases the expected value of the mean squared difference of slope (MSDS) between the slope across two adjacent blocks, and the average value between the boundary slopes of each of the two blocks. Thus, a constrained quadratic programming problem is built to reduce the expected value of this MSDS to extent to decrease the blocking effect while preserving texture details. In HEVC, an in-loop deblocking filter is specially designed [68] to reduce the blocking artifacts between coding units. The picture is divided into blocks and boundaries on the grid are classified by a series of metrics. Different levels of deblocking operations are later performed on the boundaries according to their types.

Iii-B Probabilistic-prior methods.

Some successive approaches are based on probability estimation of image prior models. Based on their basic models, these methods can be further categorized into Markov random field 

[20], non-local similarity [21, 22], low-rank minimization [23, 22], sparse coding [6, 24, 25], and adaptive DCT transformation [16]. In [20], the distortion term is modeled as additive, spatially correlated Gaussian noise, and the original image is depicted as a high order Markov random field based on the fields of experts framework. Non-local based methods [21, 22] consider similar blocks to be potentially correlated, estimate the overlapped-block transform coefficients, and remove compression noise from non-local similar blocks. For low-rank based methods, Ren et al. [23] performed patch clustering and low-rank minimization simultaneously to make use of both local sparsity and non-local similarity. A successive work [22] selects thresholds adaptively for each group of similar patches based on compression noise levels and decomposed singular values. In [16], a new shape adaptive DCT transform is proposed for image compression artifacts reduction.

Fig. 3: The technical improvement route for deep learning-based compression artifacts removal and loop filter of codecs.
Fig. 4: The network improvement route for compression artifacts removal and loop filter of codecs, where the multiplication sign in the circle in (c) denotes the element-wise multiplication operation.

Iii-C Deep learning-based JPEG artifacts removal

Deep learning-based methods largely improve the restoration capacity of the data-driven methods. ARCNN [7] is the seminal work and takes the architecture of a three-layer CNN. Deep Dual-Domain (D[8] is the first work to introduce the DCT-domain priors to facilitate JPEG artifacts removal. It combines both the strong learning capacity of deep networks, as well as the problem-specific knowledge of JPEG artifacts removal.

Successive works proceed into two main streams: better network architectures [28, 30, 32] and better utilization of DCT domain information [33, 9]

. Many advanced networks are constructed to model the rich dependencies of deep features. Residual Encoder-Decoder Network (RED-Net) 

[29] and Compression Artifact Suppression CNN (CAS-CNN) [30] utilize deep encoding-decoding frameworks with symmetric convolutional-deconvolutional layers. Tai et al. [32] constructed a deep persistent memory network. The memory blocks consist of a recursive unit and a gate unit to keep memories. The former extracts multi-level representations from the last input feature while the latter learns to control the ratio between the memory and current input. Dual-Domain Multi-Scale CNN (DMCNN) [33] integrates the dual domain and auto-encoder style networks with dilated convolutions to have very large receptive fields and eliminate the banding effects. In [34], wavelet transforms are introduced into CNN architecture for a better trade-off between the receptive field size and computational efficiency. In [35], a two-branch CNN is constructed to handle the restoration in the pixel and discrete wavelet domains.

Besides network improvement, some works try to embed traditional priors or constraints into deep networks, e.g. sparsity [8], nonlinear diffusion [27], multi-scale constraint [30, 33], and wavelet signal structures [34, 35]. In one-to-many network [31], adversarial learning is introduced to facilitate generating visually pleasing restoration results. A summary of the performance comparison of typical deep learning-based JPEG artifacts removal reported by their works is presented in Fig. 5.

Fig. 5: Recent evolution of deep learning-based JPEG artifacts removal. We can observe significant performance (PSNR) improvement since deep learning entered the scene in 2015. The performances showed here are directly quoted from the published papers.

Iii-D Deep Learning-Based Loop Filter

Besides JPEG, deep learning techniques are also applied to latest codecs, e.g. HVEC, as a post-processor. Beyond the improvement directions embodied in JPEG artifacts removal, deep-learning based loop filter considers more on handling the degradation caused by variable-size partition and utilizing the side information from codecs. Variable-Filter-Size Residue-Learning CNN (VRCNN) [36] is the pioneering work. The designed CNN owns variable filter size to learn the residual between input and target frames. Successive works also proceed into two classes: better network and better side information. Zhang et al. [38]

proposed a residual highway convolutional neural network (RHCNN) for in-loop filter of HEVC. In 

[43], Wang et al. proposed a multi-scale LSTM to fuse multi-frame redundancies along temporal dimension to acquire the fused feature. Meng et al. [39] proposed a multi-channel long-short term dependency residual network to simulate the mechanism of human memory update and introduced an update cell which learns to store and select the long-term and short-term dependency adaptively. Li et al. [52] presented a dynamic classification mechanism. An up-to-one byte flag indicates complexity of video content and quality of each frame. In [41], Yang et al. designed a scalable deep CNN to reduce distortion of both I and B/P frames in HEVC. is proposed. It has two branches and a group of switches to control whether DS-CNN-B branch is activated based on the resource state. In [42], Song et al. developed a CNN that can enhance compressed videos of different qualities with low redundancy. In [44], Yang et al. explored to enhance the compressed video frames using the neighboring high quality frames. A novel multi-frame convolutional neural network is built for compressed video enhancement. In [45], Hashimoto et al. proposed a CNN with squeeze and excitation block and spatial separable convolution for deblocking. In [10], Wang et al. proposed a dense residual convolutional neural network (DRN). In this network, dense shortcuts and residual learning are combined. Bottleneck layers are injected into each DRN to save the computational resources while adaptively fusing the hierarchical features.

Various kinds of side information is designed for more effective post-processing of compression artifacts reduction. This side information includes: the compression parameters from coding tree units (CTU) [51], partition mask of CTU [11], QP parameter [38], block boundary [39], complexity [52], peak quality frames and optical flow [44], and normalized Y/U/V and normalized QP [45], etc.

Iii-E Technical Improvement Summary

The typical improvement route of deep learning-based compression artifacts reduction is summarized in Fig. 3. Three aspects of improvements are included: side information utilization, e.g. injecting the partition mask of CTU [11] as input; network improvement, e.g. dense residual network [10]; and novel loss function, e.g. adversarial loss [31]. For the network improvement, all methods are improved in four directions: 1) network architecture improvement (summarized more specifically in Fig. 4); 2) multi-domain network, e.g. DMCNN [33]; 3) signal structure embedding, e.g. D3 [8]; 4) new unit design, e.g. TNRD [27]. In the next section, we will benchmark these methods using the unified protocols.

Iv Algorithm Benchmarking

With the rich resources provided by LIU4K, we evaluate 9 representative state-of-the-art algorithms: Shape-Adaptive DCT (SA-DCT) [16], Artifacts Removal CNN (ARCNN) [7], Trainable Nonlinear Reaction Diffusion (TNRD) [27], Denoising CNN (DnCNN) [28], Persistent Memory Network MemNet [32], Dual-domain Multi-scale CNN (DMCNN) [33], Multi-Level Wavelet-CNN (MWCNN) [34], Variable-filter-size Residue-learning CNN (VRCNN) [36], and Progressive Rethinking Network (PRN) [70]. Our selected baselines try to cover most of the representative methods. The first one is a traditional non-deep method. The successive six methods are deep learning-based JPEG artifacts reduction methods. The last two are deep learning-based loop filter methods. We apply most of learning-based methods to restorations of images compressed by JPEG and HEVC. For JPEG artifacts reduction, we train the models on the training of LIU4K. For loop filters, the models are trained on the training sets of both BSD500 and LIU4K. Note that, the source codes of SA-DCT and TNRD provided by the authors only support removing JPEG artifacts with quality factors 10, 20, 30, 40 and 10, 20, 30, respectively. Thus, for these two methods, we only compare their performances in these cases. We also add the residual learning in our implemented ARCNN for fast training and comparison.

Iv-a Advanced Training Strategies

In our benchmarking, we also make efforts in generalizing some constraints and methods of JPEG artifacts reduction to the general compression artifacts reduction.
Dense DCT Domain Constraint. For some codecs, e.g. HEVC, the partitioned block sizes are not the same all the time. Thus, the original DCT branch constraint that regularizes reconstruction of fixed-sized blocks in JPEG artifacts removal cannot be utilized. Thus, we slightly change our DCT branch design:

  • Location-independence. We do not pose the constraint based on block partitions. Instead, we impose the DCT constraint densely for every pixel location.

  • Variable-size DCT constraint. To handle variable block sizes in HEVC codecs, we extend the DCT branch into multiple branches. Each branch is responsible for constructing DCT constraints at a certain scale.

In summary, our DCT constraint takes several variable-block-size DCT transformation paths densely in every location.
Gradual Expanding Patch Size. The DCT branch is not stable during training. To make our training more effective, we first use small patches to train our network and then enlarge the patch size gradually. Let and

denote the patch size and epoch. The size of training patches during train a restoration model for JPEG is set as follows,

(1)

For HEVC post-processing, all interval bounds for are multiplied by 5. This strategy leads to the better constraint of the DCT branch and also offers a better performance.

Learning with Mixed Batches. For the methods with high complexities, it is impossible to train the models with both a large patch size and a large batch size at the same time. Thus, we propose to apply training with mixed batches, the combination of (large patch, small batch) and (small patch, large batch). Therefore, with the limited GPU memory resources, the network training can be stabilized by the large batch size, and at the same time, the model can also learn the information from a large context with the large patch size. In our benchmark, we train MemNet and PRN with this strategy.

Iv-B Evaluation Protocols

Four full-reference metrics including PSNR, PSNR-B [58], SSIM [59], MS-SSIM [60], and two non-reference metrics including NIQE [55], and BRISQUE [61] are used to evaluate the effectiveness of the proposed method. In our implementation, we use Adam [62] optimizer to pre-train our network and finetune it with stochastic gradient descend (SGD) [63] with cosine decay. In the first stage, the learning rate is set to 0.001. For PRN and MemNet, the learning rate is set to 0.0001. After training 16 epochs, SGD is used for fintuning. The initial learning rate is set to 0.0001 at the second-stage training with cosine decay. We allow at most 60 epochs for JPEG artifacts removal and 300 epochs for restoration of compressed images by HEVC. For all methods, the models used for restorations of images compressed by JPEG with a quality factor 40 and HEVC with a quantization parameter 22 are trained from scratch. Other models are initialized these two models during the training.

Iv-C Objective Comparison

The objective results are presented in Table V. DMCNN is the obvious winner in full-reference metrics, followed by MWCNN for JPEG artifacts removal and PRN for loop filter. On the whole, deep learning-based methods achieve significantly superior performance than previous ones. In no-reference metrics, TNRD achieves a superior performance for JPEG artifacts removal and almost all methods generate results worse than original compressed images. We also provide more objective results of different methods on other testing sets in the supplementary material. These results have high consensus levels among different testing sets.

Method Quality Compressed SA-DCT TNRD ARCNN VRCNN DnCNN MemNet MWCNN PRN DMCNN
PSNR QF=10 30.45 31.32 31.85 31.86 31.83 32.05 32.18 32.28 32.19 32.33
PSNR-B 29.93 31.32 31.82 31.83 31.81 32.01 32.10 32.23 32.15 32.30
SSIM 0.8090 0.8237 0.8427 0.8423 0.8419 0.8463 0.8488 0.8508 0.8491 0.8520
MS-SSIM 0.9270 0.9353 0.9457 0.9457 0.9452 0.9481 0.9496 0.9511 0.9498 0.9513
NIQE 6.81 4.55 5.31 5.22 5.31 5.16 5.27 5.39 5.25 5.48
BRISQUE 56.39 51.09 43.00 49.94 48.63 49.40 49.56 50.32 50.51 51.16
PSNR QF=20 33.28 33.87 34.48 34.51 34.47 34.57 34.80 34.86 34.83 34.91
PSNR-B 32.61 33.86 34.42 34.45 34.39 34.47 34.71 34.80 34.78 34.86
SSIM 0.8772 0.8787 0.8958 0.8963 0.8958 0.8970 0.9005 0.9013 0.9007 0.9023
MS-SSIM 0.9675 0.9665 0.9737 0.9741 0.9738 0.9740 0.9757 0.9760 0.9757 0.9762
NIQE 5.30 4.23 4.84 4.84 4.98 4.86 4.94 4.96 4.94 5.17
BRISQUE 53.67 46.99 39.44 45.69 47.01 43.65 46.62 45.85 46.20 47.10
PSNR QF=30 34.81 35.27 35.92 35.90 35.89 36.11 36.21 36.23 36.23 36.31
PSNR-B 34.09 35.26 35.84 35.82 35.76 35.97 36.04 36.17 36.16 36.25
SSIM 0.9062 0.9040 0.9192 0.9196 0.9198 0.9215 0.9229 0.9232 0.9233 0.9242
MS-SSIM 0.9799 0.9774 0.9830 0.9832 0.9833 0.9839 0.9842 0.9844 0.9843 0.9846
NIQE 4.68 4.08 4.57 4.64 4.59 4.57 4.74 4.92 4.78 4.96
BRISQUE 48.48 45.28 37.59 43.10 42.64 42.49 43.69 44.11 43.94 44.28
PSNR QF=40 35.82 36.20 - 36.89 36.86 37.08 37.11 37.11 37.18 37.23
PSNR-B 35.07 36.19 - 36.77 36.74 36.96 36.94 37.03 37.06 37.13
SSIM 0.9220 0.9188 - 0.9331 0.9330 0.9349 0.9354 0.9353 0.9358 0.9363
MS-SSIM 0.9856 0.9829 - 0.9878 0.9878 0.9882 0.9883 0.9884 0.9885 0.9886
NIQE 4.28 4.00 - 4.49 4.54 4.63 4.61 4.75 4.68 4.80
BRISQUE 43.77 44.01 - 41.35 41.00 40.79 41.28 41.51 41.56 41.80
PSNR QP=22 41.94 - - 41.72 41.72 41.79 41.80 41.79 41.75 41.86
PSNR-B 41.67 - - 41.58 41.58 41.69 41.67 41.68 41.62 41.77
SSIM 0.9728 - - 0.9729 0.9730 0.9732 0.9732 0.9732 0.9729 0.9734
MS-SSIM 0.9957 - - 0.9957 0.9957 0.9957 0.9957 0.9957 0.9957 0.9958
NIQE 3.86 - - 3.80 3.81 3.93 3.96 3.91 3.78 3.97
BRISQUE 21.29 - - 24.61 24.51 24.36 24.56 24.33 23.30 24.57
PSNR QP=27 38.47 - - 38.48 38.49 38.50 38.55 38.56 38.61 38.59
PSNR-B 38.33 - - 38.43 38.45 38.47 38.50 38.52 38.56 38.57
SSIM 0.9456 - - 0.9462 0.9462 0.9465 0.9465 0.9468 0.9473 0.9472
MS-SSIM 0.9897 - - 0.9896 0.9896 0.9898 0.9897 0.9898 0.9899 0.9899
NIQE 4.02 - - 4.06 4.13 4.13 4.22 4.17 4.19 4.26
BRISQUE 32.68 - - 34.65 34.81 34.83 35.37 35.25 35.29 35.60
PSNR QP=32 35.52 - - 35.63 35.65 35.71 35.74 35.75 35.78 35.79
PSNR-B 35.46 - - 35.62 35.64 35.70 35.71 35.74 35.77 35.78
SSIM 0.9061 - - 0.9072 0.9073 0.9083 0.9082 0.9085 0.9091 0.9094
MS-SSIM 0.9776 - - 0.9777 0.9776 0.9780 0.9779 0.9781 0.9782 0.9783
NIQE 4.55 - - 4.57 4.62 4.68 4.76 4.61 4.74 4.79
BRISQUE 40.99 - - 41.87 42.58 42.47 42.61 43.32 43.51 43.69
PSNR QP=37 32.85 - - 33.00 33.02 33.06 33.12 33.14 33.16 33.17
PSNR-B 32.81 - - 33.00 33.01 33.05 33.10 33.14 33.14 33.16
SSIM 0.8558 - - 0.8577 0.8582 0.8582 0.8594 0.8604 0.8603 0.8613
MS-SSIM 0.9559 - - 0.9563 0.9562 0.9564 0.9567 0.9573 0.9571 0.9575
NIQE 5.04 - - 5.03 5.07 5.07 5.24 5.08 5.14 5.21
BRISQUE 46.61 - - 47.16 47.62 46.83 48.74 47.77 47.11 48.30
TABLE V: Objective evaluations of different methods on LIU4K for compression artifacts reduction. The value in red, and blue colors denote the first, and second best results, respectively.
Categories Non-Deep Deep JPEG Artifacts Removal Deep Loop Filter
Metrics SA-DCT ARCNN TNRD DnCNN MemNet MWCNN PRN (J) DMCNN (J) VRCNN PRN (H) DMCNN (H)
Parameter - 106,564 26,645 1,112,192 3,165,196 16,152,260 1,312,140 5,751,614 54,673 7,600,065 9,400,180
Storage (MB) - 0.40 1.93 2.12 12.26 61.60 5.16 21.95 0.21 29.16 22.67
Time (ms/per-image) 43164.00 3.56 15050.30 6.31 186.30 132.39 49.92 17.34 3.92 841.01 32.48
TABLE VI: The model complexity analysis of different methods. J denotes the version used for JPEG artifacts removal. H signifies the version used for the restoration of compressed images by HEVC.

Iv-D Subjective Results

We also compare the subjective quality of different methods in Fig. LABEL:fig:visual_comparison and LABEL:fig:visual_comparison2. It is observed that, DMCNN achieves the overall best visual quality. Most artifacts are removed and texture details are preserved due to its superior modeling capacity. As shown in Fig. LABEL:fig:visual_comparison, JPEG, ARCNN, VRCNN and DnCNN generate obvious banding effects in large and smooth regions. MemNet and PRN achieve better results. However, there are still gentle bands when taking a close look. Benefiting from the large receptive filed, MWCNN and DMCNN can well restore the artifacts in the smooth regions and remove banding artifacts. For the water wave textures, after compression, some regions are quantized to be small smooth blocks. All methods fail to restore the visually pleasing texture. ARCNN, VRCNN, DnCNN only remove blockiness boundaries. MemNet and PRN restore water wave textures in stochastic directions. MWCNN and DMCNN generate consistent water wave textures to the surrounding waves. Fig. LABEL:fig:visual_comparison2 provides the results of edges and regular textures. It is observed that, the results of ARCNN, VRCNN and DnCNN contain many artifacts. MWCNN, MemNet and PRN generate better results. DMCNN generates most shape edges and regular brick textures.

Fig. 8: Visual results of performance and complexity (i.e. parameters) of different methods. (a) JPEG artifacts removal (QF=10). (b) restoration of compressed images by HEVC (QP=37).

Iv-E Evaluation on Model Capacity

Table VI reports the parameter number, the storage usage, and the per-image running time of each method, averaged over the images (768512) in LIVE1, on a machine with Intel(R) Xeon(TM) E5-2650 v4 2.20 GHz CPU, 16G RAM, and GeForce GTX 1080 Ti.

ARCNN, DnCNN, MemNet, MWCNN, PRN, DMCNN, and VRCNN are implemented in Pytorch. SA-DCT and TNRD are implemented in MATLAB. ARCNN, DnCNN, MemNet, MWCNN, PRN, DMCNN, and VRCNN run on GPU while SA-DCT and TNRD run on CPU. It is observed that, all deep learning-based methods can finish process an image within 1 seconds. ARCNN, VRCNN and DnCNN achieve the shortest running time and can finish the restoration within 10 millisecond. As for the storage, ARCNN and VRCNN use the minimum storage space. As for the model complexity, MWCNN uses the most parameters while TNRD uses the fewest ones. Note that, PRN and DMCNN use different network architectures to handle JPEG artifacts removal and the restoration of compressed images by HEVC. Therefore, we present the complexities of their different versions in Table 

VI, denoted with (J) and (H), respectively. The results are also visualized in Fig. 8.

Iv-F Evaluations on Performance of Computer Vision Tasks

Fig. 9: The visual results of depth estimation on compressed images (JPEG) with and without compression artifacts reduction. (a) Input RGB image. (b) Depth map of compressed image (QF=10). (c) Depth map of restored image (QF=10). (d) Depth map of original image.
Fig. 10: Visual results of performance changes before and after the restoration at different QP/QF conditions for depth estimation. JC: Compressed by JPEG. HC: Compressed by HEVC. JR: Restored from images compressed by JPEG. HR: Restored from images compressed images by HEVC.
Fig. 11: The visual results of semantic segmentation on compressed images (HEVC) with and without compression artifacts reduction. (a) Input RGB image. (b) Semantic map of compressed image (QP=37). (c) Semantic map of restored image (QP=37). (d) Semantic map of original image.
Fig. 12: Visual results of performance changes before and after the restoration at different QP/QF conditions for semantic segmentation. JC: Compressed by JPEG. HC: Compressed by HEVC. JR: Restored from images compressed by JPEG. HR: Restored from images compressed images by HEVC. M1: ResNet50Dilated + PPM Deepsup. M2: ResNet50 + UperNet.
Metrics Codecs Uncompressed Compressed Restored
QF - - QF=10 QF=20 QF=30 QF=40 QF=10 QF=20 QF=30 QF=40
MSE 0.2994 0.4072 0.3193 0.3080 0.3031 0.3652 0.3163 0.3058 0.3021
RMS JPEG 0.5471 0.6382 0.5651 0.5550 0.5505 0.6043 0.5624 0.5530 0.5496
REL 0.1179 0.1482 0.1248 0.1207 0.1191 0.1354 0.1245 0.1214 0.1201
log10 0.0521 0.0644 0.0548 0.0531 0.0526 0.0611 0.0551 0.0535 0.0530
0.8572 0.7910 0.8414 0.8529 0.8546 0.8057 0.8384 0.8488 0.8520
0.9728 0.9484 0.9689 0.9711 0.9720 0.9531 0.9679 0.9703 0.9715
0.9928 0.9864 0.9914 0.9921 0.9927 0.9888 0.9917 0.9922 0.9926
P 0.6220 0.5694 0.6091 0.6167 0.6193 0.6020 0.6099 0.6133 0.6156
R 0.4904 0.4481 0.4751 0.4825 0.4853 0.4575 0.4795 0.4851 0.4880
F1 0.5421 0.4928 0.5271 0.5349 0.5379 0.5126 0.5303 0.5353 0.5381
QP - - QP=22 QP=27 QP=32 QP=37 QP=22 QP=27 QP=32 QP=37
MSE 0.2994 0.2976 0.2973 0.3129 0.3385 0.2971 0.2991 0.3183 0.3483
RMS HEVC 0.5471 0.5455 0.5453 0.5594 0.5818 0.5451 0.5469 0.5642 0.5901
REL 0.1179 0.1185 0.1199 0.1235 0.1300 0.1188 0.1204 0.1246 0.1319
log10 0.0521 0.0522 0.0527 0.0546 0.0580 0.0523 0.0530 0.0553 0.0594
0.8572 0.8553 0.8535 0.8420 0.8247 0.8547 0.8517 0.8376 0.8171
0.9728 0.9725 0.9714 0.9682 0.9617 0.9724 0.9709 0.9664 0.9581
0.9928 0.9929 0.9926 0.9918 0.9900 0.9930 0.9924 0.9916 0.9895
P 0.6220 0.6196 0.6133 0.6090 0.6030 0.6188 0.6130 0.6098 0.6065
R 0.4904 0.4909 0.4913 0.4847 0.4710 0.4915 0.4909 0.4833 0.4687
F1 0.5421 0.5415 0.5393 0.5334 0.5216 0.5416 0.5388 0.5326 0.5213
TABLE VII: Comparisons of SENet on the NYU-Depth V2 dataset. The input testing images in “Compressed” category are compressed with different quantization parameters. The input testing images in “Restored” category are processed with VRCNN. “QF” denotes the quality factor. “QP” signifies the quantization parameter.
Metrics Codecs Baseline Uncompressed Compressed Restored
QF - - - QF=10 QF=20 QF=30 QF=40 QF=10 QF=20 QF=30 QF=40
Mean IoU JPEG ResNet50Dilated + PPM_Deepsup 0.4075 0.2794 0.3711 0.3911 0.394 0.3145 0.3624 0.382 0.3831
Accuracy 79.64% 70.77% 77.17% 78.49% 78.78% 73.98% 77.15% 78.26% 78.41%
Mean IoU ResNet50 + UperNet 0.4029 0.2814 0.3662 0.3842 0.3884 0.3345 0.3685 0.3834 0.3843
Accuracy 79.57% 70.85% 77.32% 78.47% 78.65% 75.40% 77.56% 78.48% 78.45%
QP - - - QP=22 QP=27 QP=32 QP=37 QP=22 QP=27 QP=32 QP=37
Mean IoU HEVC ResNet50Dilated + PPM_Deepsup 0.4075 0.4043 0.3956 0.3736 0.3363 0.4027 0.3902 0.3653 0.3269
Accuracy 79.64% 79.50% 79.02% 77.75% 75.08% 79.45% 78.80% 77.41% 74.66%
Mean IoU ResNet50 + UperNet 0.4029 0.4006 0.3945 0.3789 0.3463 0.3999 0.3931 0.3744 0.3417
Accuracy 79.57% 79.45% 79.03% 78.12% 76.02% 79.42% 78.94% 77.95% 75.84%
TABLE VIII: Comparisons of SENet on the NYU-Depth V2 dataset. The input testing images in “Compressed” category are compressed with different quantization parameters. The input testing images in “Predicted” category are processed with VRCNN. “QF” denotes the quality factor. “QP” signifies the quantization parameter.

Depth Estimation. Table VIII shows the results by depth estimation with accurate object boundaries [66], one of the state-of-the-art depth estimation methods, based on images with and without compression artifacts reduction by VRCNN on NYUv2 [67] in different measures. Several accuracy measures are employed to evaluate the depth estimation performance. Mean squared error (MSE), root mean squared error (RMS), mean relative error (MRE), mean log 10 error (log 10), and threshold accuracy, as well as the precision (P), recall (R) and F1 score of the estimated edge maps. It is noted that, for MSE, RMS and MRE, small values signify better performance. For log 10, threshold accuracy , P, R and F1 score, large values denote better performance. From the results, for MSE, RMS, and MRE, it is always beneficial to perform compression artifacts reduction among all cases (both JPEG and HEVC codecs with all QPs and QFs). For other metrics, the results become slightly controversial. The results of the restored images are sometimes inferior to those of the compressed ones, e.g., QF=10 on log 10, and QF=30, 40 on P, etc. However, in general, the results of restored images win in more cases, compared to the compressed ones. It is also demonstrated that, the reconstruction aiming to restore compressed images in visual quality might be not beneficial to the successive tasks all the time. The trend of the performance change before and after the restoration at different QP/QF conditions is also visualized in Fig. 10. We also show the visual results in Fig. 9. It is observed that, when QF=10, the result of the compressed image degrades much and the enhancement operation effectively improves the visual quality of the depth map. When QF=20, the degradations in the result of the compressed image are not obvious. The enhancement operation also leads to minor visual quality gains. Some discontinuous boundary artifacts are removed as shown in the red boxes of Fig 9. However, some details become blurry, e.g. the details and boundaries in the blue boxes, as shown in Fig 9.

Semantic Segmentation. We integrate two baselines for evaluations: ResNet50Dilated + PPM_Deepsup and ResNet50 + UperNet [64]. The evaluation is performed on ADE20K [64]. Results are reported in two metrics commonly used for semantic segmentation [65]: Pixel accuracy indicates the proportion of correctly classified pixels. Mean IoU indicates the intersection-over-union between the predicted and groundtruth pixels, averaged over all the classes. It is observed from Table VIII that, compression artifacts reduction (i.e. VRCNN) may not benefit the inference of semantic segmentation all the time. In many cases, e.g. for JPEG artifacts, the performance of the baseline ResNet50Dilated + PPM_Deepsup on restored images is worse than that on compressed images for QF=40. The trend of the performance change before and after the restoration at different QP/QF conditions is also visualized in Fig. 12. The main reason of the performance drop might be the consensus of the effect of MSE used in the training and the semantic segmentation purpose. Training with MSE, the restoration results of compressed images with gender artifacts tend to be smooth and some critical details are lost causing low accuracy. For visual results, it is observed from Fig. 11 that, compression artifact removal might slightly correct some false boundaries.

V Trends and Challenges

Although deep learning techniques bring fast development in compression artifacts reduction, there remains several important challenges and inherent trends. First, recent researchers obtained higher and higher accuracy by advanced deep models with a huge amount of parameters, it is still hard to apply these methods in real scenario. It is interesting to re-design compact deep network architectures and compress or adjust the existing models into small ones for real-time compression artifacts reduction. Second, with the latest codecs, i.e. versatile video coding (VVC), more integrated tools are employed, thus the distribution of compression artifacts is more complex. It is challenging to apply the existing methods to the next generation codecs. With more powerful weapons of deep learning, e.g.

capsule network, and reinforcement learning

etc., we believe that, the future technique improvement on restoration of more complex degradations will bring new surprises. Third, for compression artifacts reduction, there are few works on the internal mechanism of feature learning and the related interpretable factors. Beyond obtaining superior performance, one direction is to give comprehensive explanations on what factors might lead to a more effective network and the specific mechanism. Last, for various low-level image processing tasks, it is a critical issue to design and apply proper metrics to constrain model training and evaluate the model’s effectiveness. Thus, it is an important future direction to develop more effective and rational measures that balance both signal fidelity and visual perception for compression artifacts reduction.

Vi Conclusion

This paper presents a systematic review of compression artifacts reduction methods, including both traditional and deep-learning based methods. These methods evolve in several perspectives, including model architecture improvement and continuing exploration of side information embedding, etc. We summarize milestone and typical methods and highlight their contributions, strengths, and weaknesses. We also conduct a thorough benchmark of state-of-the-art compression artifacts reduction methods. In our benchmarking experiments, some constraints and training skills targeted for JPEG artifacts removal are generalized to handle general compression artifacts reduction methods. Based on our evaluation and analysis, overall remarks, challenges, and trends are given. Although our attempts are preliminary, they build a bridge from the existing world to a new one, where more researchers are expected to come.

References

  • [1] Gregory K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, 1991.
  • [2] Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. on Circuits and Systems for Video Technology, 2012.
  • [3] Iain E. Richardson, The H.264 Advanced Video Compression Standard, 2010.
  • [4] K. Bredies and M. Holler, “A total variation-based jpeg decompression model,” SIAM J. Img. Sci., March 2012.
  • [5] Kiryung Lee, Dong Sik Kim, and Taejeong Kim, “Regression-based prediction for blocking artifact reduction in jpeg-compressed images,” TIP, Jan 2005.
  • [6] H. Chang, M. K. Ng, and T. Zeng, “Reducing artifacts in jpeg decompression via a learned dictionary,” TSP, Feb 2014.
  • [7] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in ICCV, 2015.
  • [8] Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. S. Huang, “D3: Deep dual-domain based fast restoration of jpeg-compressed images,” in CVPR, 2016.
  • [9] Jun Guo and Hongyang Chao, “Building dual-domain representations for compression artifacts reduction,” in ECCV, 2016.
  • [10] Y. Wang, H. Zhu, Y. Li, Z. Chen, and S. Liu, “Dense residual convolutional neural network based in-loop filter for hevc,” in VCIP, 2018.
  • [11] X. He, Q. Hu, X. Zhang, C. Zhang, W. Lin, and X. Han, “Enhancing hevc compressed videos with a partition-masked convolutional neural network,” in ICIP, 2018.
  • [12] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in ICCV, 2001.
  • [13] Radu Timofte, Shuhang Gu, Jiqing Wu, Luc Van Gool, Lei Zhang, Ming-Hsuan Yang, Muhammad Haris, et al., “Ntire 2018 challenge on single image super-resolution: Methods and results,” in CVPR, 2018.
  • [14] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie-Line Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” in BMVC, 2012.
  • [15] Abhishek Singh and Narendra Ahuja, “Super-resolution using sub-band self-similarity,” in ACCV, Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang, Eds., 2015.
  • [16] A. Foi, V. Katkovnik, and K. Egiazarian, “Pointwise shape-adaptive dct for high-quality denoising and deblocking of grayscale and color images,” TIP, May 2007.
  • [17] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in CVPR, 2017.
  • [18] S. Minami and A. Zakhor, “An optimization approach for removing blocking effects in transform coding,” TCSVT, April 1995.
  • [19] C. Tsai, C. Chen, T. Yamakage, I. S. Chong, Y. Huang, C. Fu, T. Itoh, T. Watanabe, T. Chujoh, M. Karczewicz, and S. Lei, “Adaptive loop filtering for video coding,” JSTSP, Dec 2013.
  • [20] Deqing Sun and Wai-kuen Cham, “Postprocessing of low bit-rate block DCT coded images based on a fields of experts prior,” TIP, 2007.
  • [21] X. Zhang, R. Xiong, S. Ma, and W. Gao, “Reducing blocking artifacts in compressed images via transform-domain non-local coefficients estimation,” in ICME, 2012.
  • [22] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity,” TIP, Dec 2013.
  • [23] J. Ren, J. Liu, M. Li, W. Bai, and Z. Guo, “Image blocking artifacts reduction via patch clustering and low-rank minimization,” in DCC, 2013.
  • [24] X. Liu, X. Wu, J. Zhou, and D. Zhao, “Data-driven sparsity-based restoration of jpeg-compressed images in dual transform-pixel domain,” in CVPR, 2015.
  • [25] X. Liu, X. Wu, J. Zhou, and D. Zhao, “Data-driven soft decoding of compressed images in dual transform-pixel domain,” TIP, April 2016.
  • [26] X. Zhang, W. Lin, R. Xiong, X. Liu, S. Ma, and W. Gao, “Low-rank decomposition-based restoration of compressed images via adaptive noise estimation,” TIP, Sep. 2016.
  • [27] Yunjin Chen and Thomas Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” TPAMI, 2017.
  • [28] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” TIP, July 2017.
  • [29] Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in NIPS. 2016.
  • [30] Lukas Cavigelli, Pascal Hager, and Luca Benini, “CAS-CNN: A deep convolutional neural network for image compression artifact suppression,” in IJCNN, 2017, pp. 752 – 759.
  • [31] J. Guo and H. Chao, “One-to-many network for visually pleasing compression artifacts reduction,” in CVPR, 2017.
  • [32] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory network for image restoration,” in ICCV, 2017.
  • [33] X. Zhang, W. Yang, Y. Hu, and J. Liu, “DMCNN: Dual-domain multi-scale convolutional neural network for compression artifacts removal,” in ICIP, 2018.
  • [34] Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo, “Multi-level wavelet-cnn for image restoration,” in CVPRW, 2018.
  • [35] H. Chen, X. He, L. Qing, S. Xiong, and T. Q. Nguyen, “Dpw-sdnet: Dual pixel-wavelet domain deep cnns for soft decoding of jpeg-compressed images,” in CVPRW, 2018.
  • [36] Yuanying Dai, Dong Liu, and Feng Wu, “A convolutional neural network approach for post-processing in hevc intra coding,” in MultiMedia Modeling, 2017.
  • [37] T. Wang, M. Chen, and H. Chao, “A novel deep learning-based method of improving coding efficiency from the decoder-end for hevc,” in DCC, 2017.
  • [38] Y. Zhang, T. Shen, X. Ji, Y. Zhang, R. Xiong, and Q. Dai, “Residual highway convolutional neural networks for in-loop filtering in hevc,” TIP, Aug 2018.
  • [39] X. Meng, C. Chen, S. Zhu, and B. Zeng, “A new hevc in-loop filter based on multi-channel long-short-term dependency residual networks,” in DCC, 2018.
  • [40] Z. Jin, P. An, C. Yang, and L. Shen, “Quality enhancement for intra frame coding via cnns: An adversarial approach,” in ICASSP, 2018.
  • [41] R. Yang, M. Xu, and Z. Wang, “Decoder-side hevc quality enhancement with scalable convolutional neural network,” in ICME, 2017.
  • [42] X. Song, J. Yao, L. Zhou, L. Wang, X. Wu, D. Xie, and S. Pu, “A practical convolutional neural network as loop filter for intra frame,” in ICIP, 2018.
  • [43] T. Wang, W. Xiao, M. Chen, and H. Chao, “The multi-scale deep decoder for the standard hevc bitstreams,” in DCC, 2018.
  • [44] R. Yang, M. Xu, Z. Wang, and T. Li, “Multi-frame quality enhancement for compressed video,” in CVPR, 2018.
  • [45] Tomonori Hashimoto, Eiichi Sasaki, and Tomohiro Ika, “JVET-K0158: Separable convolutional neural network filter with squeeze-and-excitation block,” http://phenix.it-sudparis.eu/jvet/, July 2018.
  • [46] Jae S. Lim Howard C. Reeve, “Reduction of blocking effects in image coding,” Optical Engineering, 1984.
  • [47] Bhaskar Ramamurthi and Allen Gersho, “Nonlinear space-variant postprocessing of block coded images,” TASSP, 1986.
  • [48] Jeremy Jancsary, Sebastian Nowozin, and Carsten Rother, “Loss-specific training of non-parametric image restoration models: A new state of the art,” in ECCV, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid, Eds., 2012.
  • [49] Ke Yu, Chao Dong, Chen Change Loy, and Xiaoou Tang, “Deep Convolution Networks for Compression Artifacts Reduction,” arXiv e-prints, August 2016.
  • [50] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse coding,” in

    Proceedings of the 27th International Conference on International Conference on Machine Learning

    , 2010, ICMLE.
  • [51] J. Kang, S. Kim, and K. M. Lee, “Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec,” in ICIP, 2017.
  • [52] C. Li, L. Song, R. Xie, and W. Zhang, “Cnn based post-processing to improve HEVC,” in ICIP, 2017.
  • [53] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in CVPR, 2016.
  • [54] Dezhao Wang, Sifeng Xia, Wenhan Yang, Yueyu Hu, and Jiaying Liu, “Partition tree guided progressive rethinking network for in-loop filtering of hevc,” in ICIP, 2019.
  • [55] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” SPL, March 2013.
  • [56] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” TIP, Dec 2012.
  • [57] Chao Ma, Chih-Yuan Yang, Xiaokang Yang, and Ming-Hsuan Yang, “Learning a no-reference quality metric for single-image super-resolution,” CVIU, 2017.
  • [58] C. Yim and A. C. Bovik, “Quality assessment of deblocked images,” TIP, Jan 2011.
  • [59] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” TIP, April 2004.
  • [60] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Asilomar Conf. Signals, Systems Computers, Nov 2003.
  • [61] A. Mittal, A. K. Moorthy, and A. C. Bovik, “Blind/referenceless image spatial quality evaluator,” in ASILOMAR, Nov 2011.
  • [62] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
  • [63] Léon Bottou,

    “Large-scale machine learning with stochastic gradient descent,”

    in COMPSTAT, 2010.
  • [64] Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba, “Scene parsing through ADE20K dataset,” in

    Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition

    , 2017.
  • [65] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” TPAMI, April 2017.
  • [66] Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani, “Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries,” in WACV, 2019.
  • [67] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
  • [68] A. Norkin, G. Bjontegaard, A. Fuldseth, M. Narroschke, M. Ikeda, K. Andersson, M. Zhou, and G. Van der Auwera, “Hevc deblocking filter,” TCSVT, Dec 2012.
  • [69] Gonzalez, R.C. R.E. Woods, and S.L. Eddins, Digital Image Processing Using MATLAB, Prentice Hall, 2003.
  • [70] Dezhao Wang, Sifeng Xia, Wenhan Yang, Yueyu Hu, and Jiaying Liu, “Partition tree guided progressive rethinking network for in-loop filtering of hevc,” in ICIP, 2019.
  • [71] Siwei Ma Longtao Feng, Xinfeng Zhang, “Coding prior based high efficiency restoration for compressed video,” in ICIP, 2019.
  • [72] Y. Dai, D. Liu, Z. Zha, and F. Wu, “A cnn-based in-loop filter with cu classification for hevc,” in VCIP, 2018.
  • [73] C. Jia, S. Wang, X. Zhang, S. Wang, J. Liu, S. Pu, and S. Ma, “Content-aware convolutional neural network for in-loop filtering in high efficiency video coding,” TIP, July 2019.