Better Compression with Deep Pre-Editing

02/01/2020 ∙ by Hossein Talebi, et al. ∙ 0

Could we compress images via standard codecs while avoiding artifacts? The answer is obvious – this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits. We design this editing operation as a learned convolutional neural network, and formulate an optimization problem for its training. Our loss takes into account a proximity between the original image and the edited one, a bit-budget penalty over the proposed image, and a no-reference image quality measure for forcing the outcome to be visually pleasing. The proposed approach is demonstrated on the popular JPEG compression, showing savings in bits and/or improvements in visual quality, obtained with intricate editing effects.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Commonly used still image compression algorithms, such as JPEG [51], JPEG-2000 [11], HEIF [41] and WebP [21] produce undesired artifacts when the allocated bit rate is relatively low. Blockiness, ringing, and other forms of distortion are often seen in compressed-decompressed images, even at intermediate bit-rates. As such, the output images from such a compression procedure are of poor quality, which may hinder their use in some applications, or more commonly, simply introduce annoying visual flaws.

Numerous methods have been developed over the years to confront this problem. In Section 4 we provide a brief review of the relevant literature, encompassing the various strategies taken to fight compression artifacts. Most of the existing solutions consider a post-processing stage that removes such artifacts after decompression [57, 32, 10, 48, 2, 3, 30, 53, 58, 29, 59, 12, 17, 16, 33]

. Indeed, hundreds of papers that take this post-processing approach have been published over the years, including recent deep-learning based solutions (e.g.,

[15, 52, 19, 9]).

(a) Input
(b) Baseline JPEG (0.4809 bpp)
(c) Edited input
(d) JPEG after editing (0.4726 bpp)
Figure 1: Comparison of our pre-editing method with baseline JPEG. The uncompressed input (a) is compressed by JPEG (b), which shows a lot of compression artifacts. We propose to edit the input image (c) before JPEG compression (d) to obtain a better perceptual quality and lower bit rate.

Far less popular are algorithms that propose to pre-process the image prior to its compression, in order to reduce its entropy, thus avoiding the creation of artifacts in the first place [40, 46, 54, 13, 42, 14]. Indeed, a denoising applied before the compression is often found effective for better encoding performance (e.g. [45]). This line of thinking is scarcer in the literature due to the more complex treatment it induces and the weaker control it provides on the output artifacts. Still, such a pre-processing approach has a great advantage over the alternatives, as the changes to the image are done on the server side, while the decoder side does not need to be modified nor adjusted.

In this work we propose to pre-process the image by automatically editing its content, applied before its compression using a standard coding algorithm. Our goal is to modify the image content smartly so as to guarantee that (i) most of the visual information in the image is preserved; (ii) the subsequent compression operates in a much better regime and thus leads to reduced artifacts; and (iii) the edited image after compression is still visually appealing. By considering all these forces holistically, we aim to get creative editing effects that enable the compression-decompression stage to perform at its best for the given bit budget.

While one could pose the proposed editing task as an optimization problem to be solved for each incoming image separately, we take a more challenging route, in which we target the design of a universal deep neural network that performs the required editing on any input image. The clear advantage in this approach is the speed with which inference is obtained once the network has been trained.

Our learning relies on minimizing a loss-function that includes three key penalties, aligned with the above description. The first forces the original and the edited images to be “sufficiently close” to each other, while still allowing content editing. A second term penalizes the bit content of the edited image, so as to force the bit-budget constraint while striving for an artifact-free compression. This part is achieved by yet another network

[4] that predicts the entropy and quality of the image to be compressed. Last, but definitely not least, is a third penalty that encourages the edited image after compression to be visually pleasing. Indeed, our formulation of the problem relies heavily on the availability of a no-reference quality metric, a topic that has seen much progress in recent years [39, 56, 36, 26, 37, 7, 38]. All the above-mentioned ingredients are posed as differentiable machines, enabling an end-to-end effective learning of the editing operation. An example of the proposed technique is shown in Fig. 1, where the editing operation allows for better perceptual quality and lower bit budget.

2 Formulating the Problem

Figure 2: Our learning pipeline for training the image editing . Input image is first edited by our editing network. Then, the edited image is fed to the differentiable JPEG encoder/decoder. The entropy of the quantized DCT coefficients are predicted and used in our training loss. To ensure that the compressed image is close to the uncompressed input, we use a distance measure. We also use a quality term to enforce the human perceptual preference.

We start with a few definitions that will help in formulating our problem and its solution.

Definition 1

(Codec Operation) We define by the process of compression and decompression of a given image with bits. This function gets an image and produces an image, , possibly with the compression artifacts mentioned above.

Definition 2

(Quality Assessment) We define by the process of allocating a no-reference quality to a given image . The output is a non-negative scalar with values tending to zero for higher quality images.

Definition 3

(Distance Measure) We define by the distance between two images, and , of the same size. Our distance function should be defined such that it is “forgiving” to minor content changes such as small geometrical shifts or warps, delicate variations in gray-values, or removal of fine texture.

Armed with the above, we are now ready to formulate our problem. Given an image to be compressed with a bit budget of bits, the common practice is to perform compression and decompression directly, , and live with the limitations.

In this work we suggest a novel alternative: We seek a new image that is (i) as close as possible to the given image ; (ii) it is compressible using bits; and most importantly (iii) it is of high quality. Naturally, will be an edited variation of in which some of the content has been changed, so as to enable good quality despite the compression. Here is our first attempt to formulate this problem:

(1)

In words, given and we seek an image that is close to , it is of high quality (low value of ), and it can be represented via bits. Referring to the constraint, recall that the compression-decompression operation is idempotent, i.e. applying it more than once on a given image results with the same outcome as using it once [28]. Thus, the constraint aims to say that is a feasible outcome of the compression algorithm with the given budget of bits.

An alternative formulation that may serve the same goal is one in which we fix the quality as a constraint as well,

(2)

so as to say that whatever happens, we insist on a specific output quality, willing to sacrifice content accordingly.

Both problems defined in Equations (1) and (2), while clearly communicating our goal, are hard to handle. This is mainly due to the non-differentiable nature of the function , and the fact that it is hard to fix a rate while modifying the image . While these could be dealt with by a projection point of view (see [5]), we take a different route and modify our formulation to alleviate these difficulties. This brings us to the following additional definitions:

Definition 4

(Quality-Driven Codec Operation) We define by the process of compression and decompression of a given image with a quantization (or quality factor) . This function gets an image and produces an image, , possibly with the compression artifacts mentioned above.

Definition 5

(Entropy Predictor) We define by the process of predicting the compression-decompression performance of a specific algorithm (e.g., JPEG) for a given image and a quantization level . This function produces the expected file-size (or entropy).

Note that by setting , we induce a roughly fixed PSNR on the image after compression-decompression. Thus, by minimizing with respect to , we will be aiming for reducing the file size while preserving quality. Returning to our formulation, we add a penalty term, , so as to guarantee that the rate is preserved (or more accurately, controlled). This leads to

(3)

The constraint assures that is a valid output of the compression, and it can be alternatively written as111Admittedly, the notations introduced are a bit cumbersome, as both and use the same quantization level . The alternative could have been to divide the compression into an encoder and decoder, and feed the encoder result to without specifying . We chose to stay with the above formulation for consistency with the opening description.

(4)

If we have a differentiable proxy for the compression operation , the above loss is manageable.

We could have stopped here, handling this optimization task and getting an edited and compressed image for any incoming image . This could have been a worthy and even fascinating feat by itself, but we leave it for future work.

As we have already mentioned in the introduction, we aim higher. Our goal is to design a feed-forward CNN that would perform this editing for any given image automatically. Denoting this editing network by , where are the network parameters to be learned/set, our training loss is given by the following expression:

This expression simply sums the per-image loss over many training images , and replaces the edited image by the network’s output . Minimizing this loss with respect to , we obtain the editing network, as desired. Our learning pipeline is shown in Fig. 2.

3 The Proposed Approach

In this section we dive into our implementation of the above-discussed editing idea. We start by specifying the ingredients used within the loss function and then turn to describe the training procedure employed.

3.1 Specifying the Training Loss

Returning to our definitions, we would like to identify the specific ingredients used in Equation (2). In this work we concentrate on the JPEG compression algorithm, due to its massive popularity and the fact that it is central in digital imaging and mobile photography. Our formulation relies on three ingredients:

Distance measure: We seek a definition of

that does not penalize moderate editing changes between the two images. In our implementation we construct this distance measure by feature extraction function,

, and use the perceptual loss as our distance [20, 27, 50]. These features could be the activations of an inner layer within the VGG-16 [44] or the NIMA [38] networks, and they are used to characterize an image in a domain that is less sensitive to the allowed perturbations. The deeper these features are taken from, the more daring the editing of the image is expected to be. We experimented with various VGG-16 activations trained for image quality assessment task [38]

, and selected the output of the second convolutional layer before max pooling as our feature extraction function

.

Quality measure: We assess image quality using NIMA [38]. NIMA is a no-reference image quality assessment machine that has been used for training image enhancement [47].

Differentiable JPEG: As mentioned above, we need to incorporate the function within our loss and thus it should be differentiable. Indeed, as we are working with JPEG, this function does not control the rate but rather the quality factor when running this compression-decompression. We obtain a differentiable version of this operator by replacing the quantization step-function curve by a -degree smoothed polynomial. Our implementation in essence is quite similar to [43].

Entropy prediction:

In our framework with JPEG, the discrete entropy of the quantized DCT coefficients should be measured. However, just as described above, the derivatives of the quantization operation are zero almost everywhere, and consequently gradient descent would be ineffective. To allow optimization via stochastic gradient descent, we use the entropy estimator proposed in

[4]

, where an additive i.i.d. uniform noise is added to the quantized coefficients. This means that the probability mass function of the DCT coefficients is estimated by a continuous relaxation of it, implying that the differential entropy of the DCT coefficients can be used as an approximation of the discrete entropy. This provides a slightly biased estimate of the discrete entropy in coarser quantization regimes, but the bias vanishes for finer quantization levels

[4].

To summarize, the following is the loss function we use in our experiments:

where the distance function represents the perceptual error measures, the image quality is computed by NIMA and a total variation measure, and the entropy estimate is computed over the quantized DCT coefficients of the edited image. Note that the same q-factor is applied both in the function and the differentiable JPEG function .

Figure 3: Our image smoothing CNN.
Figure 4:

Our patch-based spatial transformer network. The affine transformer parameters of

blocks are obtained from a trainable CNN. Transformed image grid is interpolated to obtain a warped image block of size

. Finally, an central block is extracted.
(a) Input
(b) Smoothed input
(c) Difference between (a) and (b)
Figure 5: The difference between the input and the smoothed images (without JPEG compression). Our smoothing trained-network removes fine-grain details from the input image to make it more compressible by JPEG. Compressing (a) and (b) images with JPEG encoder at quality factor 20 takes 1.15 and 1.03 bpp, respectively.
(a) Input
(b) Warped input
(c) Difference of (a) and (b)
Figure 6: The difference between the input and the warped images (without JPEG compression). Our warping makes spatial transformations on local image patches to make them more compressible by JPEG. Compressing (a) and (b) images with JPEG encoder at quality factor 20 takes 0.725 and 0.708 bpp, respectively.

3.2 The Editing Network

Our editing network consists of two parts; An image smoothing (Fig. 3), and a patch-based warping operation (Fig. 4). While the smoothing is similar to a denoiser that controls fine-grained details, the spatial transformer allows for subtle local warps that make the underlying image more compressible [42]. More details on both parts is given below.

3.2.1 The Smoothing Network

Our image smoothing CNN is shown in Fig. 3. This convolutional neural network is similar to the residual CNN of Ledig et al. [34]. This architecture has identical residual blocks, with 3

3 kernels and 64 feature maps followed by batch normalization 

[24]

layers and Leaky ReLu (instead of parametric ones) 

[22]

activations. To avoid boundary artifacts, the input image and feature maps are symmetrically padded before convolutions.

Examples of using the trained smoothing network are shown in Fig. 5. These images are not compressed by JPEG, and only represent edits applied to the input. The difference image shows that our editing removes fine details. Note that compressing the smoothed image with JPEG encoder at quality factor 20 takes 1.03 bpp, whereas the same encoder takes 1.15 bpp for compressing the input image.

3.2.2 The Spatial Transformer Network (STN)

As shown by Rott et al. [42], local image deformations can lead to DCT domain sparsity and consequently better compressibility. Unlike [42] that solves an alternating optimization with an optical flow, we use the differentiable spatial transformer network [25]

. STN learns 6 parameters for an affine local transformation that allows cropping, translation, rotation, scale, and skew to be applied on the input (Fig. 

4). We apply STN on overlapping blocks of size , and then we extract central crops of size that are aligned with JPEG blocks. Since each block is warped separately, this can cause inconsistency near the boundary of cropped blocks. To alleviate this, all overlapped grid values are averaged across neighboring blocks.

Examples of using the trained STN are shown in Fig. 6. The STN warps textures and edges locally to make the blocks more compressible by JPEG encoder. Compressing the input and deformed images in Fig. 6(a) and Fig. 6(b) with JPEG encoder at quality factor 20 requires 0.725 bpp and 0.708 bpp, respectively.

To take advantage of both editing stages, we cascade the smoothing and warping operations. While the smoothing allows for less blockiness artifacts, the STN leads to texture preservation. Next, we discuss our training data.

3.3 Data

Our editing networks are trained on uncompressed images. To this end, we use burst processed images of Hasinoff et al. [23], which provides 3640 images of size 12 mega pixels. All images are converted to 8-bit PNG format. We extract about 120K non-overlapping patches of size , and use them to train our model. We also create a test set with of the data.

4 Relation to Prior Work

We pause our main story for a while and discuss the rich literature on combating compression artifacts. Our goal is to give better context to the suggested methodology by presenting the main existing alternatives. Note, however, that this does not constitutes an exhaustive scan of the existing literature, as this is beyond the scope of this work. We survey these algorithms by dividing them into categories based on their core strategies:

Post-Processing Algorithms [57, 32, 10, 48, 2, 3, 30, 53, 58, 29, 59, 12, 17, 16, 33]: Those are the most common methods available, operating on the image after the compression-decompression damage has already been induced. Algorithms of this sort that are designed in the context of the JPEG format are known as deblocking algorithms. The idea behind these methods, be it for JPEG or any other transform-based coder, is quite simple, even though there are many ways to practice it; Given the compressed-decompressed image and knowing the quantization levels and the transform applied, the original image to be recovered must lie in a convex set that has a rotated hyper-rectangle shape. A recovery algorithm should seek for the most probable image within this set, something that could be done by relying on various regularization strategies. While some algorithms make use of this very rationale directly, others relax it in various ways, by simplifying the constraint set to a sphere, by forcing the recovery algorithm to take a specific shape, and more. At it simplest form, such a deblocking could be a simple linear filter applied to the boundaries between adjacent blocks.

Deep-Learning Based Solutions [15, 52, 19, 9]: Still under the regime of post-processing, recent solutions rely on deep neural networks, trained in a supervised fashion to achieve their cleaning goal. These methods tend to be better performing, as their design targets the recovery error directly, instead of relying on model-based restoration methods.

Scale-Down and Scale-Up [8, 49, 35, 55]: An entirely different way to avoid compression artifacts is to scale-down the image before compression, apply the compression-decompression on the resulting smaller image, and scale-up the outcome in the client after decompression. This approach is especially helpful in low bit-rates, since the number of blocks is reduced, the bit stream overhead reduce along with it, and the scale-up at the client brings an extra smoothing. Variations over this core scheme have been proposed over the years, in which the scale-down or up are optimized for better end-to-end performance.

Pre-Processing Algorithms [40, 46, 54, 45]: It is well known that compression-decompression often behaves as a denoiser, removing small and faint details from the image. Nevertheless, applying a well-designed denoiser prior to the compression may improve the overall encoding performance by better prioritizing the content to be treated. The existing publications offering this strategy have typically relied on this intuition, without an ability to systematically design the pre-filter for best end-to-end performance, as the formulation of this problem is quite challenging.

Deformation Aware Compression [42]: While this work offers a pre-processing of the image along the same lines as described above, we consider it as a class of its own because of two reasons: (i) Rather than using a denoiser, the pre-process applied in this work is a geometrical warp, which re-positions elements in the image to better match the coder transform and block-division; and (ii) the design of the warp is obtained by an end-to-end approximate optimization method. Indeed, this paper has been the source of inspiration behind our ideas in this work.

Our proposed method joins the list of pre-processing based artifact removal algorithms, generalizing the work in [42] in various important ways: (i) Our method could accommodate more general editing effects; (ii) Its application is simple and fast, once the editing network has been trained; and (iii) We employ a no-reference image quality assessment that supports better quality outcomes. As already mentioned, the pre-processing strategy has a unique advantage over the alternative methods in the fact that the decoder does not have to be aware of the manipulations that the image has gone through, applying a plain decoding, while leaving the burden of the computations to the encoder. That being said, we should add that this approach can be easily augmented with a post-processing stage, for fine-tuning and improving the results further.

We conclude this survey of the relevant literature by referring to two recent and very impressive papers. The work reported in [6] offers a theoretical extension of the classic rate-distortion theory by incorporating the perceptual quality of the decompressed-image, exposing an unavoidable trade-off between distortion and visual quality. Our work practices this very rationale by sacrificing image content (via pre-editing) for obtaining better looking compressed-decompressed images. The work by Agustsson et. al [1] offers a GAN-based learned compression algorithm that practically trades visual quality for distortion. While aiming for the same goal as our work, [1] replaces the whole compression-decompression process, whereas we insist on boosting available standard algorithms, such as JPEG, due to their massive availability and spread use.

5 Experimental Results

Figure 7: Our training loss components during gradient descent with JPEG quality factor in the range [8,25]. For better display, all losses are smoothed.
Figure 8: MSE vs. mean bit-rate for the Kodak dataset [18].
(a) Input
(b) Baseline JPEG (0.3529 bpp)
(c) Smoothing + JPEG (0.3508 bpp)
Figure 9: Compression performance with our smoothing network. Smoothing the image before compression leads to less blockiness and color artifacts.
(a) Input
(b) Baseline JPEG (1.0567 bpp)
(c) STN + JPEG (1.0485 bpp)
Figure 10: Compression performance of the STN network. The STN applies local warps that lead to better detail preservation after compression.
(a) Input
(b) Baseline JPEG (0.4705 bpp)
(c) Smoothing + STN + JPEG (0.4355 bpp)
Figure 11: Compression performance for applying smoothing and STN.
(a) Input
(b) Baseline JPEG (0.4293 bpp)
(c) Smoothing + STN + JPEG (0.4169 bpp)
Figure 12: Compression performance for applying smoothing and STN.
Figure 13: Percentage of human raters preference for pairwise comparison between our result and baseline JPEG. Each data point is an average of 480 ratings (24 Kodak images [18] and 20 human raters).

In this section our results are discussed and compared to other methods. Our train and test are performed on a single Nvidia GPU V100 with 16GB RAM. At training, images are cropped to , and testing is performed on the Kodak dataset [18]. We use the Adam optimizer [31] with learning rate set to , and batch size as 1. The editing notwork is trained for steps of stochastic gradient descent. Weights from the NIMA are kept fixed during training.

In order to train the STN and smoothing networks, we randomly sample the JPEG quality factor from a uniform distribution in the range

at each step of the gradient descent. This allows our editing to be effective for a range of bit-rates. At test time, we compare our results with the baseline JPEG at comparable bit-rates. To compress a test image at various bit-rates, we adjust the JPEG quality factor to ensure that our result compresses with fewer bits.

Our weighted loss for cascaded smoother and STN are shown in Fig. 7. We select the NIMA weight to be and adjust the entropy prediction weight to . Our experiments suggest that the weighted predicted entropy and the distance measure should be close to each other. Also, as discussed in [47], the quality measure is most effective when its contribution is limited to a fraction of the total loss.

We trained the smoother and STN networks separately, and then fine-tuned them jointly. The pre-trained smoothing is obtained by training with an distance measure

. Training images are augmented with random additive white Gaussian noise to enforce smoothing property in the resulting network. We randomly vary the standard deviation of the noise in the range

at each training step, and append the noise standard deviation and JPEG quality factor as extra channels to the input RGB image. Rate-distortion curves of the regular JPEG and the smoother content are shown in Fig. 8. As expected, the smoothing improves upon the baseline JPEG. Note that these results are obtained before fine-tuning the smoother with the STN. Examples of the smoother editing are shown in Fig. 9, where color degradation and blockiness artifacts are more visible in the baseline JPEG, compared to our results.

Results for training the STN network are shown in Fig. 10. Our editing of the input images allows to preserve structures and textures more effectively. The local deformations of the STN seem to make certain image textures more compressible. Note that this is a different behavior than the smoother’s effect .

We fine-tune both the smoother and STN networks jointly and present the results in Figs. 11 and 12. The cascade editor seems to present comparable details to the baseline, but with less JPEG artifacts.

We carried a human evaluation study to compare our proposed framework with baseline JPEG. We used Amazon Mechanical Turk with pairwise comparison for this task. We asked raters to select the image with better quality. We processed 24 images from Kodak dataset [18] with our smoothing and warping (STN) frameworks and compared them with their baseline JPEG counterparts at similar bit-rate. Comparisons are made by 20 human raters, and average percentage of the raters preference over baseline JPEG is reported in Fig. 13. As can be seen, both STN and our smoothing show perceptual preference of more than for bit-rates smaller than 0.5 bpp. For higher bit-rates our methods did not provide a statistically significant advantage over baseline. Also, we observed that smoothing consistently outperforms STN.

We conclude by referring to run time: We ran both our editors on an Intel Xeon CPU @ 3.5 GHz with 32 GB memory and 12 cores. We only measure timing of the pre-editing operation, as both methods use the JPEG encoder. The smoothing CNN and STN run in sec and sec on a 1 mega pixel image, respectively. Since our editors are based on convolutional neural networks, these running times can be further improved by GPU inference.

6 Conclusion

One of the main bottlenecks of low bit-rate JPEG compression is loss of textures and details and presence of visual artifacts. In this work we have proposed an end-to-end trainable manipulation framework that edits images before compression in order to mitigate these problems. Our CNN-based trained editors optimize for better perceptual quality, lower JPEG distortions and color degradation. The proposed image-editors are trained offline, avoiding the need for per-image optimization, or a post-processing on the decoder (client) side. Our future work will focus on extending this idea to other image compression standards, while seeking new ways to allow for more daring editing effects.

References

  • [1] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. Van Gool (2019) Generative adversarial networks for extreme learned image compression. In arXiv:1804.02958v3, Cited by: §4.
  • [2] F. Alter, S. Durand, and J. Froment (2005) Adapted total variation for artifact free decompression of JPEG images. Journal of Mathematical Imaging and Vision 23 (2), pp. 199–211. Cited by: §1, §4.
  • [3] A. Z. Averbuch, A. Schclar, and D. L. Donoho (2005) Deblocking of block-transform compressed images using weighted sums of symmetrically aligned pixels. IEEE Transactions on Image Processing 14 (2), pp. 200–212. Cited by: §1, §4.
  • [4] J. Ballé, V. Laparra, and E. P. Simoncelli (2017-04) End-to-end optimized image compression. In Int’l. Conf. on Learning Representations (ICLR2017), Toulon, France. Cited by: §1, §3.1.
  • [5] S. Beygi, S. Jalali, A. Maleki, and U. Mitra (2018) An efficient algorithm for compression-based compessed sensing. Information and Inference: A Journal of the IMA 8 (2), pp. 343–375. Cited by: §2.
  • [6] Y. Blau and T. Michaeli (2019) Rethinking lossy compression: the rate-distortion-perception tradeoff. In

    International Conf. on Machine Learning (ICML)

    ,
    Cited by: §4.
  • [7] S. Bosse, D. Maniry, T. Wiegand, and W. Samek (2016) A deep neural network for image quality assessment. In IEEE International Conference on Image Processing (ICIP), pp. 3773–3777. Cited by: §1.
  • [8] A. M. Bruckstein, M. Elad, and R. Kimmel (2003) Down-scaling for better transform compression. IEEE Transactions on Image Processing 12 (9), pp. 1132–1144. Cited by: §4.
  • [9] L. Cavigelli, P. Hager, and L. Benini (2017) CAS-cnn: a deep convolutional neural network for image compression artifact suppression. In International Joint Conference on Neural Networks (IJCNN), Cited by: §1, §4.
  • [10] T. Chen, H. R. Wu, and B. Qiu (2001) Adaptive postfiltering of transform coefficients for the reduction of blocking artifacts. IEEE Transactions on Circuits and Systems for Video Technology 11 (5), pp. 594–602. Cited by: §1, §4.
  • [11] C. Christopoulos, A. Skodras, and T. Ebrahimi (2000) The JPEG2000 still image coding system: an overview. IEEE Transactions on Consumer Electronics 46 (4), pp. 1103–1127. Cited by: §1.
  • [12] Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes (2016) Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing 25 (7), pp. 3044–3058. Cited by: §1, §4.
  • [13] Y. Dar, M. Elad, and A. M. Bruckstein (2018) Optimized pre-compensating compression. IEEE Transactions on Image Processing 27 (10), pp. 4798–4809. Cited by: §1.
  • [14] Y. Dar, M. Elad, and A. M. Bruckstein (2018) System-aware compression. In IEEE International Symposium on Information Theory (ISIT), pp. 2226–2230. Cited by: §1.
  • [15] C. Dong, D. Yubin, C. L. Chen, and X. Tang (2015) Compression artifacts reduction by a deep convolutional network. In

    Proceedings of the IEEE International Conference on Computer Vision

    ,
    Cited by: §1, §4.
  • [16] K. Du, H. Han, and G. Wang (2011) A new algorithm for removing compression artifacts of wavelet-based image. In IEEE International Conference on Computer Science and Automation Engineering, Vol. 1, pp. 336–340. Cited by: §1, §4.
  • [17] K. Du, J. Lu, H. Sekiya, and T. Yahagi (2007) Post-processing for restoring edges and removing artifacts of low bit rates wavelet-based image. IEEJ Transactions on Electronics, Information and Systems 127 (6), pp. 928–936. Cited by: §1, §4.
  • [18] R. Franzen (1999) Kodak lossless true color image suite. Note: http://r0k.us/graphics/kodak Cited by: Figure 13, Figure 8, §5, §5.
  • [19] L. Galteri, L. Seidenari, m. Bertini, and A. Del Bimbo (2017) Deep generative adversarial compression artifact removal. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4826–4835. Cited by: §1, §4.
  • [20] L. A. Gatys, A. S. Ecker, and M. Bethge (2016) Image style transfer using convolutional neural networks. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 2414–2423. Cited by: §3.1.
  • [21] D. D. Giusto. (2012) Objective assessment of the webp image coding algorithm. Signal Processing: Image Communication 27 (8), pp. 867–874. Cited by: §1.
  • [22] X. Glorot, A. Bordes, and Y. Bengio (2011) Deep sparse rectifier neural networks. In

    Proceedings of the fourteenth international conference on artificial intelligence and statistics

    ,
    pp. 315–323. Cited by: §3.2.1.
  • [23] S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy (2016) Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 35 (6). Cited by: §3.3.
  • [24] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §3.2.1.
  • [25] M. Jaderberg, K. Simonyan, A. Zisserman, et al. (2015) Spatial transformer networks. In Advances in neural information processing systems, pp. 2017–2025. Cited by: §3.2.2.
  • [26] B. Jin, M. V. O. Segovia, and S. Süsstrunk (2016) Image aesthetic predictors based on weighted cnns. In IEEE International Conference on Image Processing (ICIP), pp. 2291–2295. Cited by: §1.
  • [27] J. Johnson, A. Alahi, and L. Fei-Fei (2016)

    Perceptual losses for real-time style transfer and super-resolution

    .
    In European Conference on Computer Vision, pp. 694–711. Cited by: §3.1.
  • [28] R. L. Joshi, M. Rabbani, and M. A. Lepley (2000) Comparison of multiple compression cycle performance for JPEG and JPEG 2000. In Applications of Digital Image Processing XXIII, Vol. 4115, pp. 492–501. Cited by: §2.
  • [29] C. Jung, L. Jiao, H. Qi, and T. Sun (2012) Image deblocking via sparse representation. Signal Processing: Image Communication 27 (6), pp. 663–677. Cited by: §1, §4.
  • [30] T. Kartalov, Z. A. Ivanovski, L. Panovski, and L. J. Karam (2007) An adaptive pocs algorithm for compression artifacts removal. In 2007 9th International Symposium on Signal Processing and its Applications, pp. 1–4. Cited by: §1, §4.
  • [31] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.
  • [32] C-C. J. Kuo (1998) Review of postprocessing techniques for compression artifact removal. Journal of Visual Communication and Image Representation 9 (1), pp. 2–14. Cited by: §1, §4.
  • [33] Y. Kwon, K. I. Kim, J. Tompkin, J. H. Kim, and C. Theobalt (2015) Efficient learning of image super-resolution and compression artifact removal with semi-local gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9), pp. 1792–1805. Cited by: §1, §4.
  • [34] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. Cited by: §3.2.1.
  • [35] W. Lin and L. Dong (2006) Adaptive downsampling to improve image compression at low bit rates. IEEE Transactions on Image Processing 15 (9), pp. 2513–2521. Cited by: §4.
  • [36] X. Lu, Z. Lin, X. Shen, R. Mech, and J. Z. Wang (2015) Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 990–998. Cited by: §1.
  • [37] L. Mai, H. Jin, and F. Liu (2016) Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506. Cited by: §1.
  • [38] P. Milanfar (2018) NIMA: neural image assessment. IEEE Transactions on Image Processing 27 (8), pp. 399–4011. Cited by: §1, §3.1, §3.1.
  • [39] A. Mittal, A. K. Moorthy, and A. C. Bovik (2012) No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing 21 (12), pp. 4695–4708. Cited by: §1.
  • [40] M. Oizumi (2006) Preprocessing method for dct-based image-compression. IEEE Transactions on Consumer Electronics 52 (3), pp. 1021–1026. Cited by: §1, §4.
  • [41] (2013) Requirements for still image coding using hevc. Note: http://mpeg.chiariglione.org/standards/mpeg-h/high-efficiency-video-coding/requirements-still-image-coding-using-hevc Cited by: §1.
  • [42] T. Rott Shaham and T. Michaeli (2018) Deformation aware image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2453–2462. Cited by: §1, §3.2.2, §3.2, §4, §4.
  • [43] R. Shin and D. Song (2017) JPEG-resistant adversarial images. In NIPS 2017 Workshop on Machine Learning and Computer Security, Cited by: §3.1.
  • [44] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.1.
  • [45] J. Starck, F. Murtagh, B. Pirenne, and M. Albrecht (1996) Astronomical image compression based on noise suppression. Publications of the Astronomical Society of the Pacific 108 (723). Cited by: §1, §4.
  • [46] T. Szirányi (2005) Artifact reduction with diffusion preprocessing for image compression. Optical Engineering 44 (2), pp. 027003. Cited by: §1, §4.
  • [47] H. Talebi and P. Milanfar (2018) Learned perceptual image enhancement. In 2018 IEEE International Conference on Computational Photography (ICCP), pp. 1–13. Cited by: §3.1, §5.
  • [48] G. Triantaffilidis, D. Sampson, D. Tzovaras, and M. Strintzis (2002) Blockiness reduction in JPEG coded images. In 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No. 02TH8628), Vol. 2, pp. 1325–1328. Cited by: §1, §4.
  • [49] Y. Tsaig, M. Elad, P. Milanfar, and G. H. Golub (2005) Variable projection for near-optimal filtering in low bit-rate block coders. IEEE Transactions on Circuits and Systems for Video Technology 15 (1), pp. 154–160. Cited by: §4.
  • [50] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §3.1.
  • [51] G. K. Wallace (1992) The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38 (1), pp. xviii–xxxiv. Cited by: §1.
  • [52] Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. S. Huang (2016) D3: deep dual-domain based fast restoration of JPEG-compressed images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1, §4.
  • [53] P. Weiss, L. Blanc-Féraud, T. André, and M. Antonini (2008) Compression artifacts reduction using variational methods: algorithms and experimental study. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1173–1176. Cited by: §1, §4.
  • [54] M. H. F. Wilkinson (2007) Image preprocessing for compression: attribute filtering. In Proceedings of International Conference on Signal Processing and Imaging Engineering (ICSPIE’07), Cited by: §1, §4.
  • [55] X. Wu, X. Zhang, and X. Wang (2009) Low bit-rate image compression via adaptive down-sampling and constrained least squares upconversion. IEEE Transactions on Image Processing 18 (3), pp. 552–561. Cited by: §4.
  • [56] W. Xue, L. Zhang, and X. Mou (2013) Learning without human scores for blind image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 995–1002. Cited by: §1.
  • [57] A. Zakhor (1992) Iterative procedures for reduction of blocking effects in transform image coding. IEEE Transactions on Circuits and Systems for Video Technology 2 (1), pp. 91–95. Cited by: §1, §4.
  • [58] G. Zhai, W. Lin, J. Cai, X. Yang, and W. Zhang (2009) Efficient quadtree based block-shift filtering for deblocking and deringing. Journal of Visual Communication and Image Representation 20 (8), pp. 595–607. Cited by: §1, §4.
  • [59] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao (2013) Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing 22 (12), pp. 4613–4626. Cited by: §1, §4.