Compression of user generated content using denoised references

by   Eduardo Pavez, et al.
University of Southern California

Video shared over the internet is commonly referred to as user generated content (UGC). UGC video may have low quality due to various factors including previous compression. UGC video is uploaded by users, and then it is re encoded to be made available at various levels of quality and resolution. In a traditional video coding pipeline the encoder parameters are optimized to minimize a rate-distortion criteria, but when the input signal has low quality, this results in sub-optimal coding parameters optimized to preserve undesirable artifacts. In this paper we formulate the UGC compression problem as that of compression of a noisy/corrupted source. The noisy source coding theorem reveals that an optimal UGC compression system is comprised of optimal denoising of the UGC signal, followed by compression of the denoised signal. Since optimal denoising is unattainable and users may be against modification of their content, we propose using denoised references to compute distortion, so the encoding process can be guided towards perceptually better solutions. We demonstrate the effectiveness of the proposed strategy for JPEG compression of UGC images and videos.


page 3

page 4


System-Aware Compression

Many information systems employ lossy compression as a crucial intermedi...

Adversarial Distortion for Learned Video Compression

In this paper, we present a novel adversarial lossy video compression mo...

Fast and Efficient Lenslet Image Compression

Light field imaging is characterized by capturing brightness, color, and...

User Preference Aware Lossless Data Compression at the Edge

Data compression is an efficient technique to save data storage and tran...

Characterizing Generalized Rate-Distortion Performance of Video Coding: An Eigen Analysis Approach

Rate-distortion (RD) theory is at the heart of lossy data compression. H...

PRNU Estimation from Encoded Videos Using Block-Based Weighting

Estimating the photo-response non-uniformity (PRNU) of an imaging sensor...

MuZero with Self-competition for Rate Control in VP9 Video Compression

Video streaming usage has seen a significant rise as entertainment, educ...

1 Introduction

Video sharing applications (e.g., YouTube, TikTok) produce a large percentage of Internet traffic. This type of video is commonly referred to as user generated content (UGC) [19]. UGC is first uploaded by users and then it is re-encoded by service providers in order to be made available at various levels of quality and resolution. The traditional video compression pipeline assumes the input video is pristine, however this is often not true for UGC, where the source material has been compressed by the users sharing it. In addition, UGC may have low quality due to additional factors, e.g., use of non professional video equipment, poor shooting skills, low light, editing, special effects, etc.

In the traditional video compression pipeline, the encoder/decoder parameters are optimized to minimize the distortion, subject to bitrate (and additional computational complexity) constraints [16, 17]. However, when the distortion is computed with respect to a corrupted reference signal, the rate distortion optimization process may lead to suboptimal coding parameters that preserve undesirable features that do not lead to improved perceptual quality (e.g., blocking artifacts due to previous compression).

The fundamental UGC compression problem is to, given an UGC signal and a compression system (e.g., JPEG, AV1), choose coding parameters, to accurately represent and encode the perceptually meaningful parts of the signal, while avoiding allocating resources to encode compression artifacts and noise.

To address the issue of a low quality and unreliable reference, researchers have proposed using non reference metrics to assess subjective video quality [21, 15]

, which can be used to perceptually optimize (guide) the compression of UGC videos. Another approach classifies UGC based on content category and similarity in rate-distortion characteristics, so that fixed coding parameters can be used for each UGC class

[11, 13]. While previous works have recognized that the encoding process should adapt to the quality of the input UGC video, and have provided tools and insights to design UGC compression systems, we take a step towards solving the UGC compression problem from a rate-distortion theoretic perspective [2].

Figure 1: Block diagram for UGC video coding. is the pristine (unknown) signal, is the UGC video, and is the reconstructed signal at the decoder.

In Section 2 we formulate the UGC compression problem as an instance of noisy source coding, where the noiseless source corresponds to the pristine original, and the noisy/corrupted signal is the UGC. This process is depicted in Figure 1

. In this ideal scenario, the goal is to minimize distortion computed with respect to the pristine (unknown) original. By invoking a noisy source coding theorem, we can show that the optimal encoder-decoder system in the mean-squared-error (MSE) sense is comprised of optimal MSE estimation of the clean source from the noisy source, followed by optimal (noiseless) source coding of this estimate

[7, 20, 2]. The noisy source coding theorem has been applied to compression of noisy images [1], speech coding [8, 9, 10] and to the design of video coding systems robust to pre- and post-processing [5, 6]. However, to the best of our knowledge, it has not been applied yet to UGC compression.

Note that in traditional video coding the distortion goes to zero as the rate increases and the quality of the encoded video improves. In contrast, a consequence of the noisy source coding theorem is that distortion with respect to the UGC source should not go to zero, i.e., further increases in bitrate beyond a certain point do not lead to improved performance. This is because in the UGC coding scenario, we wish to minimize distortion with respect to the pristine reference, but we have to do this without being able to encode the pristine signal directly. An example of the optimal rate-distortion curve for a Gaussian source corrupted by additive Gaussian noise [20] is depicted in Figure 2. When distortion is computed with respect to the noisy input, the distortion quickly goes to zero, while for the optimal UGC coding system, distortion decreases more slowly and saturates to a positive distortion value.

While encoding a denoised signal is theoretically optimal, in a practical system it may be preferable to encode the UGC signal directly instead, because: 1) users may object to a service provider modifying their uploaded content, and 2) finding good denoising/restoration algorithms may be difficult, given that there may be multiple reasons for quality degradation in a UGC signal and thus a specific denoiser may not always produce reliable outputs. Hence, we propose to use a denoised UGC signal only as a proxy to compute distortion, i.e., as a replacement for the (unavailable) pristine original, while using the UGC signal itself as the source for the encoder. In Section 3 we show experimentally that the rate-distortion curve of this system has a saturation region, similar to the optimal RD curve. We use this and propose an algorithm to choose coding parameters based on detecting saturation of the distortion curve, to avoid encoding a bitrates for which the encoded signal quality does not improve. We show that for a JPEG encoder, the quality parameter associated with the onset of the saturation region, is positively correlated with the perceptual quality of the UGC video.

Figure 2: Optimal distortion-rate functions for scalar zero mean Gaussian sources. Comparison of optimal RD curves for: encoding with reference , encoding of noisy signal , with as reference ( from (1)), and encoding of with pristine source as original ( from (3)).

has variance

, the UGC signal is , where is a zero mean Gaussian with variance , independent of .

2 The UGC compression problem

In this section we propose a theoretical formulation of the UGC compression problem. We show that optimal denoising is essential for efficient compression of UGC. We then propose a practical framework using an off-the shelf denoiser to guide the encoder towards a reference signal that approximates the optimal denoised reference.

2.1 Noisy source coding

In the UGC compression problem (Figure 1) and

are random vectors representing the pristine content and the UGC signal, respectively. The encoded representation is denoted by

, where is the encoder, while the number of bits of the representation is denoted by . The output of the decoder is denoted by . For a given rate , a traditional (noiseless) source coding problem has the form


where is the distortion rate function. Note that as the bitrate increases in (1), and the distortion decreases so that . This is problematic, because at high rates low distortion simply means that and are close, but the best possible representation (, corresponding to ) is not guaranteed to have good quality, given that the input is UGC.

Ideally, since the source is noisy, the source coding problem should be formulated so that distortion is computed with respect to the pristine original:


Under the optimality criterion of (2) the decoded signal has to approximate the pristine content . The following result allows us to break down (2) into two simpler steps: 1) optimal denoising, and 2) optimal (noiseless) source coding.

Theorem 1.

[7, 20, 2] The optimal distortion-rate function for the UGC coding problem is:


where , , and .

Note that is the minimum mean square error estimator (MMSEE) of the pristine signal, which does not depend on the encoder/decoder functions, or the rate . The proof of Theorem 1 [7, 20, 2] uses two facts: (i) and are measurable functions of and (ii) the MMSEE is orthogonal, namely , for any measurable function .

A first consequence of Theorem 1 is the lower bound


which establishes that is the lowest achievable distortion by any encoder/decoder, at any rate. Since is the error of the MMSEE, it can be interpreted as a quality metric of the UGC signal. Another consequence of Theorem 1 is that for an encoder/decoder to asymptotically achieve this lower bound, that is, , we have that , while is the corresponding decoder for . In other words, an optimal UGC compression system has two components:

  1. Optimal denoising with the MMSEE, ,

  2. Lossy encoding/decoding that acts on instead of .

Example optimal distortion-rate curves are shown in Figure 2, where we consider a zero mean scalar Gaussian source , contaminated with additive independent zero mean Gaussian noise . For this example, we can compare the derivatives of from (1) and from (3) using formulas from [20] to obtain


where we used . Therefore, the formulation (i.e., (1)), suggests that by increasing the rate by bit, quality improves by , while in fact a correct formulation (i.e., (3)) only guarantees the more modest improvement by .

While Theorem 1 gives us a clear solution, its implementation is impractical for several reasons. First, optimal MSE denoising depends on the signal

, and on the joint distribution of

and , which are unknown. Second, while practical video codecs can achieve impressive compression performance using rate-distortion optimization of encoding parameters for a single video [16, 17], the noisy source coding formulation (3) is concerned with guaranteeing optimality in an average sense, i.e., when considering the average performance for a family of signals with the same distribution. Third, while Theorem 1 suggests directly encoding a denoised signal , this may be undesirable for the reasons mentioned in the introduction (users may not want their content to be modified and the denoising/estimation methods may not be reliable).

Figure 3: UGC encoding strategies using difference signals for encoder input and reference metric.

2.2 UGC compression with denoised references

We propose compressing the UGC signal using the signal as a reference for distortion computation, where is a denoiser. Using this metric we can guide the encoding process towards solutions with fewer artifacts. The main idea behind our proposal is illustrated in Figure 3. According to the noise source coding theory, points inside the red circle with at the center will have distortions that are not achievable. The MMSEE is a point in the boundary of that red circle and thus ideal UGC compression using as reference is depicted by the dashed arrow, where as the bitrate increases, the encoded signal approaches . A standard UGC encoder is represented by a solid black arrow, such that as the bitrate increases the encoded signal approaches . Note that when the bitrate is small, is at similar distance from both the UGC signal and the pristine original, that is , thus in this regime, an encoded version of may look similar to an encoded version of . In this figure we can see that as the rate increases will become closer to at the expense of increasing the distance to , which would be clearly undesirable. Thus, we can define a noise encoding region, corresponding to the family of all encoder/decoders (with their parameters), for which the encoded signal is closer to than , or more precisely


The boundary of the noise encoding region is depicted by a blue dashed line in Figure 3. Our goal is to avoid , and find encoder/decoder pairs, . Clearly, cannot be found, given that we do not have access to . If the denoised signal is a better approximation to the pristine original than the UGC signal, that is, , then we can use to guide away for as the rate increases. Thus, we propose to define an empirical noise encoding region using the denoised reference,


which can be used to choose coding parameters.

Figure 4: MSE values for individual blocks and their IQR. Synthetic UGC image created by compression with H.264 using quantization parameter (top). Block MSE with respect to pristine original (left), and block MSE with respect to BM3D denoised reference (right).

3 Experiments

To find the region in which the distortion saturates and the quality of the UGC signal does not improve (see Figure 2), we use denoised reference signals and the criteria to detect from (7). For simplicity, our experiments use JPEG for compression. Within a single image, different regions have varying levels of complexity and correspondingly require different bitrates to achieve the same quality. Since different images will have different mixes of high and low complexity blocks, we do not use the overall MSE and instead we partition each image into

blocks and use the per-block MSEs to detect saturation of the distortion function. Specifically, to capture the typical block behaviour and to remove outliers, we use the Interquartile Range (IQR) of the per-block MSE. Letting

be the MSE from the th block (for either pristine reference or denoised reference), and assuming that the are sorted by magnitude, so that for all blocks, the MSE IQR is defined as


where is larger than of the values, and is larger than of the values. The MSE-IQR captures the variation of the middle of the block MSE values, while removing outliers.

3.1 Experiments with synthetic UGC images

In this section we show experimentally that when the UGC has low quality due to previous compression, RD curves computed using the pristine original and the denoised UGC content have a similar saturation region. We used pristine images from the KADID-10k dataset [12], and compressed them with H.264. In Figure 4, we depict an example of a heavily compressed image to be used as UGC. This UGC image is then encoded with JPEG at different bitrates. For each bitrate, the image is divided into blocks, and for each block we compute MSE with respect to the pristine original and with respect to a BM3D denoised [14] reference. Figure 4 also shows the per block MSE as a function of the total bitrate. At lower bitrate, there is high variation of MSE across blocks, while for higher bitrates, this variation decreases. In Figure 5 we plot the MSE-IQR computed for pristine and denoised references, as a function of the bitrate. For the same UGC image, with different levels of quality, we observe that the both distortions, pristine and alternative references, saturate at similar bitrates, which decrease with the UGC quality level.

Figure 5: Synthetic UGC images obtained by H.264 compression with high quality (top-left ), intermediate quality (middle-left, ), and low quality (bottom-left, ). Saturation of block-based MSE-IQR curves (right).

3.2 Experiments with YouTube UGC dataset

YouTube UGC is a large scale dataset sampled from Youtube videos. Each video clip in YouTube UGC is accompanied by a mean-opinion-score (MOS) that provides a subjective measure of their quality. Videos are also annotated with different content type categories. The dataset provides users with two versions of original videos: RAW YUV and almost lossless compressed videos using H264 CRF 10.We use the H264 CRF 10 versions. For each clip we sample frames, starting from the th frame, and sampling every frames. The denoised references are computed using the Python Scikit-Image [18] implementation of the BayesShrink wavelet denoiser [3]. We encode each frame with JPEG (using the Pillow Python implementation [4]), using different quality values (, ). The ranges from (worst) to (best) with interval . Our goal is to find a saturation quality value , so that if we chose a larger than (i.e., we increase the bitrate), the quality of the encoded UGC has saturated. Let and denote the th blocks of the th frames of the UGC, and denoised UGC signals, respectively, while is the th block, of the th frame of the UGC encoded using . Applying the saturation criteria (7) to each block, we compute


We say that the th block of the th frame has saturated at quality value if . We define the saturation quality value for the th block in frame , as the smallest quality value, that satisfies, for all


If for all , then we say that the block does not saturate and . The saturation quality value of the frame is computed as . The saturation quality value of the clip is computed as . Video clips from the Sport and Livemusic categories with resolution 360P are used to show the correlation between MOS and saturation , where the MOS value used is measured for the first 10 seconds rather than the whole video. In Figure 6, we observe positive correlation between MOS and . Note that a perceptual metric such as MOS depends on multiple factors, including the content quality, and not just the compression quality of the UGC content. Thus, while we do observe positive correlation it is not surprising that correlation is not perfect.

Figure 6: Scatter plot of MOS and saturation quality value . Each point represent a video clip from Youtube UGC.

4 Conclusion

We have formulated the problem of compression of user generated content (UGC), as compression of a noisy/distorted source. Using classic resutls from rate-distortion theory, we showed that optimal UGC compression can be obtained by optimal denoising/restoration followed by optimal compression of a noiseless signal. Since in practical systems, it may be undesirable and challenging to find good denoising/restoration algorithms, we propose instead, using a denoised reference to compute distortion, and guide (regularize) the compression process, to avoid spending bitrate in encoding noise and undesirable artifacts. We perform experiments on synthetic UGC images, and show that distortion-rate curves with denoised UGC as a reference, shares similar saturation properties as the distortion-rate curve that uses the pristine (unknown) signal as reference. We then propose a simple method to detect distortion saturation of YouTube UGC videos, and demonstrate that the Quality Parameter of a JPEG encoder, at which the distortion saturates, is positively correlated with the mean-opinion-score.


  • [1] O. K. Al-Shaykh and R. M. Mersereau (1998) Lossy compression of noisy images. IEEE Transactions on Image Processing 7 (12), pp. 1641–1652. Cited by: §1.
  • [2] T. Berger (1971) Rate distortion theory: a mathematical basis for data compression. Prentice-Hall electrical engineering series, Prentice-Hall. External Links: ISBN 9780137531035, LCCN 75148254, Link Cited by: §1, §1, §2.1, Theorem 1.
  • [3] S. G. Chang, B. Yu, and M. Vetterli (2000) Adaptive wavelet thresholding for image denoising and compression. IEEE transactions on image processing 9 (9), pp. 1532–1546. Cited by: §3.2.
  • [4] A. Clark (2015) Pillow (pil fork) documentation. Readthedocs. Https://Buildmedia. Readthedocs. Org/Media/Pdf/Pillow/Latest/Pillow. Pdf. Cited by: §3.2.
  • [5] Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes (2016) Postprocessing of compressed images via sequential denoising. IEEE Transactions on Image Processing 25 (7), pp. 3044–3058. Cited by: §1.
  • [6] Y. Dar, M. Elad, and A. M. Bruckstein (2018) Optimized pre-compensating compression. IEEE Transactions on Image Processing 27 (10), pp. 4798–4809. Cited by: §1.
  • [7] R. Dobrushin and B. Tsybakov (1962) Information transmission with additional noise. IRE Transactions on Information Theory 8 (5), pp. 293–304. External Links: Document Cited by: §1, §2.1, Theorem 1.
  • [8] Y. Ephraim and R. M. Gray (1988)

    A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization

    IEEE Transactions on Information Theory 34 (4), pp. 826–834. Cited by: §1.
  • [9] T. R. Fischer, J. D. Gibson, and B. Koo (1990) Estimation and noisy source coding. IEEE Transactions on Acoustics, Speech, and Signal Processing 38 (1), pp. 23–34. Cited by: §1.
  • [10] J. Gibson, B. Koo, and S. Gray (1991) Filtering of colored noise for speech enhancement and coding. IEEE Transactions on Signal Processing 39 (8), pp. 1732–1742. Cited by: §1.
  • [11] S. John, A. Gadde, and B. Adsumilli (2020)

    Rate distortion optimization over large scale video corpus with machine learning

    In 2020 IEEE International Conference on Image Processing (ICIP), pp. 1286–1290. Cited by: §1.
  • [12] H. Lin, V. Hosu, and D. Saupe (2019) KADID-10k: a large-scale artificially distorted iqa database. In 2019 Tenth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–3. Cited by: §3.1.
  • [13] S. Ling, Y. Baveye, P. Le Callet, J. Skinner, and I. Katsavounidis (2020) Towards perceptually-optimized compression of user generated content (UGC): prediction of UGC rate-distortion category. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. Cited by: §1.
  • [14] Y. Mäkinen, L. Azzari, and A. Foi (2020) Collaborative filtering of correlated noise: exact transform-domain variance for improved shrinkage and patch matching. IEEE Transactions on Image Processing 29, pp. 8339–8354. Cited by: §3.1.
  • [15] A. Mittal, A. K. Moorthy, and A. C. Bovik (2012) No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21 (12), pp. 4695–4708. Cited by: §1.
  • [16] A. Ortega and K. Ramchandran (1998) Rate-distortion methods for image and video compression. IEEE Signal processing magazine 15 (6), pp. 23–50. Cited by: §1, §2.1.
  • [17] G. J. Sullivan and T. Wiegand (1998) Rate-distortion optimization for video compression. IEEE signal processing magazine 15 (6), pp. 74–90. Cited by: §1, §2.1.
  • [18] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, and T. Yu (2014) Scikit-image: image processing in python. PeerJ 2, pp. e453. Cited by: §3.2.
  • [19] Y. Wang, S. Inguva, and B. Adsumilli (2019) YouTube UGC dataset for video compression research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5. Cited by: §1.
  • [20] J. Wolf and J. Ziv (1970) Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Transactions on Information Theory 16 (4), pp. 406–411. Cited by: §1, §1, §2.1, §2.1, Theorem 1.
  • [21] X. Yu, N. Birkbeck, Y. Wang, C. G. Bampis, B. Adsumilli, and A. C. Bovik (2021) Predicting the quality of compressed videos with pre-existing distortions. IEEE Transactions on Image Processing 30, pp. 7511–7526. Cited by: §1.