Lossy compression techniques [1, 2, 3] allow a significant reduction of image sizes, which is highly beneficial for various applications that involve storage [4, 5] and/or sharing image content over the internet . This, however, is achieved at the cost of removal of some image details which can be clearly seen in Fig. 1, where the left image illustrates the result of the compression-decompression process of the recently introduced PIK  algorithm, applied to the middle image of the figure. Some details have disappeared and the image looks unnaturally smooth. To overcome this problem we suggest augmenting the compressed image with noise that will make the resulting image more visually appealing, as can be seen in the right image of Fig. 1.
The most straightforward way of doing this is to simply generate white noise and add it to the decompressed image. This, however, typically results in even less naturally looking images. One of the main reasons is that different parts of an image captured by an ordinary camera exhibit different amount of noise. For natural raw images, the noise has typically smaller magnitude in the presence of a low intensity signal and higher otherwise. Most modern cameras, however, perform gamma-correction after capturing the image, therefore the noise level becomes higher for a low-intensity signal and smaller otherwise. Fig.1 illustrates this phenomenon. To address this issue, we propose to use physically inspired intensity dependent noise model , which describes the interaction between the light and the camera sensor. This allows us to model the amount of noise that we add to the compressed image as a function of image intensities in a small neighbourhood of each pixel.
Further, our approach operates in receptor color space , which models different processes happening in the human eye and is also used in the recent PIK  compression algorithm. This allows us to generate realistically looking colored noise that is consistent with image content.
In short, in this paper, our contribution is two-fold:
Our physically and biologically inspired noise generation algorithm significantly improves realism of decompressed images.
Our approach is reasonably fast in the decompression step, which is crucial for compression algorithms.
2 Related work
To the best of our knowledge, the problem of noise re-generation for compression algorithms is relatively unexplored. Therefore, in this section we discuss methods for noise estimation and film-grain effect generation tasks, as we find them the most similar to our problem.
often rely on noise estimation methods as part of their pipeline. Many of these methods assume that the amount of noise is independent for each pixel in the image and follows the normal distribution (i.e. white gaussian noise). For example, in order to estimate the appropriate level of image noise that needs to be added to the image, uses mean absolute deviation and the authors in  propose the method based on Laplacian and Sobel filters. Contrary to these methods, we design a model that depends on the pixel intensities, and mimics the physical and biological processes happening inside both the human visual system and a camera sensor.
A similar direction was taken by the authors of [9, 12]. The first work designs an intensity-dependent noise model. Their method infers the noise level from a single image using Bayesian MAP inference, which is relatively slow and therefore cannot be directly applied to our problem. Further,  introduces a segmentation-based algorithm, which also requires heavy computation to find clusters. By contrast, we propose a fast method for estimation of the signal-dependent noise model. Further, to make this model biologically and physically plausible, we suggest working in XYB color space (that is thoroughly discussed in Section 3.1).
Additionally, the authors of  propose a noise re-generation algorithm for video compression. This work is similar in spirit to ours, however, the authors apply additive white Gaussian noise, which may introduce image artifacts. Instead, we propose to process the random signal with a high-pass filter, which allows generating a more appealing noise for visual perception.
Being clearly seen in traditional analog movies, film-grain noise is currently used as an artistic effect that makes compressed images/videos more visually pleasing [14, 15] and appear as if they were recorded on photographic film. Though the idea of making compressed images visually pleasing is similar to our approach, the nature of film-grain noise is very different with respect to the one coming from a digital sensor (that we simulate), which in turn results in differently looking images. In this section we nevertheless describe some of the approaches that allow generating additive film-grain noise. As such, the authors in  generate signal-dependent film-grain noise using higher-order statistics of the image signal. While effective, their approach relies on noisy measurement of the high-order signal statistics and works in RGB color space, which makes it difficult to combine different levels of image noise coming from different RGB channels. A different approach was proposed by , where the authors use spectral domain analysis to generate film-grain noise. Their method, however, relies on the DCT transform  for noise model estimation, which is a time-consuming procedure and requires a large amount of memory space to store all the DCT coefficients.
To sum up, various noise estimation models have been proposed as part of de-noising and image compression algorithms. These models, however, either assume the independence of noise distribution across the image or rely on complex inference models, which are impractical for various applications. Furthermore, there is a separate class of methods aimed at generating film-grain noise, which however, does not aim at simulating sensor artifacts that are present in modern digital images. Contrary to all these methods, we propose a fast and memory efficient algorithm for noise level estimation and re-generation that mimics the image artifacts, which appear due to the nature of the sensors of digital cameras.
In this section we introduce our noise re-generation algorithm. The overview of our approach is depicted by Fig. 2. Briefly, the algorithm consists of the two following parts. First, prior to image encoding, we estimate the parameters of our noise model based on the non-overlapping image patches. We then use this model to re-generate an appropriate amount of noise for different parts of the decompressed image.
In this section, we first introduce the color space in which our method operates and then discuss in more detail each of the aforementioned steps.
3.1 XYB color space
The human eye relies on two types of cells, rods and cones, that capture light coming from the environment. Rods are very sensitive and capture the intensity of the signal, while cones extract chromatic information. Cones are themselves subdivided into three different types, which roughly capture signals of long, medium and short wavelengths. The recently introduced XYB color space  is specifically designed to model this behavior. Its core advantage for us with respect to the commonly used color spaces RGB  and CIELAB  is that it allows to model noise the same way as it appears in the human eye, which in turn allows adding naturally looking augmentation to the decompressed images, which makes them more visually pleasing.
More formally we can define the relationship between XYB and RGB as follows. First we divide the input linear-light RGB signal into three different ones that capture long (), medium () and short () wavelengths  as follows:
where linearly depend on photon counts  for each cell in camera sensor and are the intensities in RGB color space. Then to model the processes happening in the human eye we apply gamma correction:
For simplicity we refer to the space of these three signals (channels) as the g-LMS space. Finally, we calculate XYB image channels as follows:
where and are fixed constants .
3.2 Noise estimation
We estimate noise during the encoding step of our algorithm. In short, we first select homogeneous patches from the image, that are the ones that do not contain texture, edges, or other image details. Then, we design and train the noise model based on the intensity values of these patches. In the remainder of this section we discuss this process in more detail.
Noise is difficult to separate from image details. Therefore, texture, edges, and spots inside an image patch may result in imprecision of the noise estimation algorithm. In order to precisely estimate the level of image noise we first select “homogeneous” patches that are free from the aforementioned image details. To do so we divide the image patch into a set of blocks , as shown in Fig. 3. Further, following the work , for each patch of the image we calculate the Sum of Absolute Differences (SAD) similarity measure between center block and each of as follows:
where are the two image blocks of pixel intensities, as described in Fig. 3. Inspired by the Rank-Ordered Absolute Differences (ROAD) metric , which is shown to be robust to impulse noise, we then create a subset from the elements of which have the smallest values. The resulting homogeneity measure can be computed for each patch as:
Using this metric, we can estimate whether the patch is homogeneous or not, based on Eq. (5). This can be done with a simple threshold , such that for the patch to be homogeneous the following condition need to be fulfilled:
This approach, however, requires manual selection of the value, which depends on the image properties. To overcome this limitation we build a histogram of values from the image patches. Fig. 4 illustrates sample images with the respective histograms. As we can see, histograms of images which contain homogeneous areas have a large peak. Natural images typically have only a single peak, however, if an image contains multiple we choose the largest one. This peak corresponds to a set of patches that have low scores. Therefore, in order to automatically select a homogeneity threshold for such images we simply need to find the location of this peak in each of the histograms , which we compute using a robust mode estimator . However, for images that constitute a high level of texture and detail, the peak of the histogram may correspond to a high value, which no longer corresponds to a set of homogeneous patches and therefore may lead to erroneous noise model estimation. In this work we overcome this problem by empirically setting the maximum possible threshold . Thus, we compute the homogeneity threshold as follows:
Given a set of homogeneous patches we need to estimate the noise level of the original image that will later on allow us to re-generate the same amount of noise to add to the decompressed image. To do so we define the noise level metric as:
where is the matrix of pixel intensity values of patch , is a Laplacian filter , is the total number of pixels in each patch and is the convolutional operator. The Laplacian filter is a discrete version of the Laplace operator and is defined as follows:
Therefore, the intuition behind this noise level metric (Eq. (8)) resides in the nature of the Laplace operator, which describes the divergence of the gradient of the image signal. As a result, is close to for smooth patches and large for the noisy ones. Further, due to the fact that
the metric does not depend on the pixel’s intensity values, which essentially means that the same additive noise gives the same score for various intensity values. Thus, the metric provides an unbiased estimate of the noise level for different homogeneous patches.
In both camera sensors and human eyes, noise resembles a Poisson distribution. Therefore, in principle, image noise can be modeled using the intensity dependent model of . However, it does not take into account various camera post-processing steps, which include gamma correction and de-mosaicking. Further, as mentioned earlier, image noise typically depends on material properties of objects that are present in the scene and, therefore, varies across different parts of the image. To alleviate these issues we suggest to learn an intensity-dependent model for each image as follows:
where , and are the parameters of the noise model. These parameters are trained independently for each image during the encoding step by minimizing the following objective function:
where is a set of homogeneous patches, is the mean intensity of the patch for the channel from Eq. (2), is the noise level, calculated by Eq. (8), the noise model, defined in Eq. (10) and is the empirically chosen weight of the regularization parameter . We minimize this function using the scaled conjugate gradient method . It is worth noting that we chose the power function as our noise generation model because we work with gamma corrected signal in g-LMS space. In practice such a model is very compact and requires only a few bytes of additional memory regardless of image size, as it effectively needs storing just three (quantized) floating-point values: , and .
3.3 Noise re-generation
Finally, during the decoding step, we re-generate noise using estimated parameters from Eq. (10) and intensity values of the image pixels. Briefly, in order to achieve this, we first estimate the expected noise level for each pixel and then generate random noise, which satisfies this value. We now discuss this process in more detail.
Noise level estimation.
As discussed in Section 3.1 the XYB color space has a direct relationship with the g-LMS space. This allows us to add different amount of noise to different wavelengths of the input signal. Due to the nature of the human eye, the signals with long and medium wavelengths matter the most, therefore here we consider only channels and of the g-LMS space, introduced in Eq. (2). Based on our noise model, we then estimate the appropriate noise level for each of these channels as:
where depict the noise levels for each pixel of the g-LMS channels corresponding to medium and long wavelengths respectively.
Additive random noise generation.
Now that the noise level for each pixel is estimated, our final step is to generate the appropriate amount of random noise and add it to the decompressed image. According to Eq. (3) both X and Y channels of XYB color space depend on long and medium wavelength channels of g-LMS space. Therefore it is natural to assume that the additive noise should contain two parts, one of which is shared across X and Y, while the other is not. To model this behavior we generate three random matrices for each decompressed image with size
. These matrices must contain a high-frequency signal, therefore, to generate them we apply a high pass filter to random matrices, elements of which are uniformly distributed in. To make the computation fast, which is crucial for a compression algorithm, we use the following trick. For each element of a generated matrix we subtract a random element from its neighborhood as depicted by Fig. 5. Then, the matrices are normalized to have the following property: , where is the Laplacian matrix, introduced in Eqs. (8) and (9). In such formulation, matrices and account for independent noise appearing in the long and medium wavelength channels of g-LMS space, while is used to model correlated random noise.
Further, we augment each channel of every pixel of the decompressed image in the following way:
where , are the noise levels, introduced in Eq. (12) and is the regularization parameter, which allows balancing the correlation between noise, generated for the X and Y channels of the XYB color space. In practice, parameter has direct influence on the ‘colorfulness’ of the generated noise, with the degenerate case of that corresponds to the generation of completely gray-scale noise.
In this section we describe our experiments. First, we describe the settings and the parameters of the algorithm. Then, we show visual results of our approach. Finally, we present the results of the user study that aims at quantifying the improvement in quality of visual perception of images processed with our noise re-generation algorithm. The code for the proposed approach is available under the following link: https://github.com/google/pik.
We run our experiments with the recently proposed lossy compression algorithm PIK , which is designed to replace JPEG  with about one-third the data size at similar perceptual quality. Our goal is to improve visual perception of strongly compressed images as for these conditions compression algorithms often remove important small image details. Therefore, we re-generate noise in the PIK output with Butteraugli psychovisual target distance  equal to 3.0. This setting results in highly compressed images, which is achieved at the cost of removal of some image details and introduction of ringing and blocking artifacts. To reduce the influence of these effects, in our experiments, we apply a simple deblocking filter similar to  before adding noise.
In order to evaluate our method, we have created the dataset of images with different
noise conditions. In particular, we have been using images from  111We use the images that can be found under the following link:
https://github.com/WyohKnott/image-formats-comparison/blob/gh-pages/cite_images.txt. Furthermore, we describe the parameters of our method for noise estimation and re-generation below.
Noise estimation parameters.
We set the size of image patches, which are used to estimate noise, to which is the same as PIK’s block size. Further, to select the homogeneous patches we evaluate the SAD metric with sub-patches of size (see Eq. (4)). To build the SAD histogram we use and bins.
Sample images with the respective noise modes are illustrated by Fig. 6. It is worth noting that in the linear color space the amount of noise grows with signal intensity. However, due to the fact that modern cameras apply gamma correction, the relative noise level in digital images is considerably stronger in the dark areas, rather than in the bright ones. A similar effect is happening in a human eye, which makes people perceive noise differently depending on the areas of different brightness.
To model this effect, we have introduced a regularization parameter to the Eq. (11). This regularization allows optimization to fit the model to the data, giving priority to decreasing functions.
Noise re-generation parameters.
The generation part of the proposed algorithm relies on a color adjustment parameter, which we set to . Lower value of gives less colored noise and, according to our preliminary experiments, this value results in images that are the most pleasant for users.
4.2 Visual results
In this section we show visual results of our algorithm. To better illustrate the advantage of our technique, we select images with some high-frequency signal, as it is altered the most by the compression algorithm. Fig. 7 illustrates the performance of our approach. In particular the first row illustrates the original high resolution image. Then the middle two rows show the image patches cropped from the decompressed and original images respectively. The last row illustrates the patch from the image produced by our algorithm, which appears to be much closer to the original image (third row) than the decompressed one (second row).
4.3 User experiment
In order to evaluate quality of the visual perception of the images generated by our method, we run three experiments. The first two aim at determining if users prefer images with noise compared to their smooth versions (provided by the PIK compression algorithm). For these experiments, we select images and generate noise for them with noise levels chosen by an expert. Finally, in our last experiment we evaluate how close the noise level estimated by our system is to the one preferred by users.
Perceived quality of noise generation.
For this experiment we select a dataset of pairs of images and show them to people. Each pair contains an image processed by the PIK algorithm and another one with additive image noise, generated by our approach. During the experiment, users are asked to choose the image that they perceive to be of higher quality, without knowing which one of the two corresponds to our method. This allows evaluating the performance of our approach according to user preferences. The result of this experiment is depicted by Fig. 8
(a), which for each pair of images shows the probability with theconfidence interval of users preferring image with noise over the one without it. The average probability is then depicted by the final column in Fig. 8(a). As we can see, users often prefer noisy images as compared to smooth ones.
|(a) Perceived quality||(b) Authenticity|
Authenticity of noise generation.
For this experiment we slightly change the conditions of the aforementioned experiment. Here, instead of pairs of images we have triplets that contain the original image, the one processed by the PIK algorithm and another one with our generated noise. These triplets are then shown to users, who need to choose the most authentic copy of the original image between the image processed by PIK and the one generated by our method, without knowing which one is which. The result of this experiment is summarized by Fig.8(b), which for each triplet illustrates the probability with its confidence interval of how likely the image with the noise will be selected. The last column then illustrates the average probability across all triplets in the dataset. As we can see, users typically prefer images with the noise, as according to their perception they look closer to the original ones, over images produced by the PIK algorithm with no additive noise.
These two experiments show that our method generates noise that in most of the cases is pleasant for the users and makes decompressed images look more natural.
Estimation of noise and noise re-generation.
For our last experiment we developed a system that allows users to adjust the level of generated noise that is added to the decompressed image. Specifically, we allow users to select one of noise levels to find the one that makes the decompressed image look as close as possible to the original image (before compression). These levels come from the multiplication of the noise level estimated by our method with the coefficients starting from with the step . For this experiments we have selected examples from our dataset and show them to users. The users are then asked to adjust the smoothness over the graininess of the compressed image to match their viewing experience with the original one. The results of this experiment are summarized in Fig. 9, which shows the median and median absolute difference of the noise levels selected by the users for each of the examples in the dataset. The last column illustrates the median noise level that was selected across all people and images in the dataset. As we can see the average noise level is very close to , which means that our noise generation system allows to estimate the proper noise level in most of the cases.
To further analyze the performance of our system, we build a histogram, which illustrates the distributions of the votes for different noise level over all users and images. We can see that the majority of the votes are concentrated near the value. Further, only very few votes show preference of noiseless images, which means that users typically prefer the images with noise generated by our method, as opposed to the smoother images, produced by the PIK algorithm. This hypothesis is also supported by the previous experiments.
We further investigated the images, for which users generally prefer having higher amount of noise. It turned out that these were the highly textured images, which means that having a model that is solely inspired by the camera sensor is not enough and more complex models that also take into account image texture can be used. Application of such models, however, may result in a significant increase of processing time and memory required for storing the parameters of the these models, which may become a severe limitation for compression algorithms. There are also a number of images in our dataset where the users generally prefer having smaller noise level than the one suggested by the model. These images typically contain very shiny surface areas, where the users expect not to see any noise at all. We would like to address these issues and improve our technique to tackle such cases in future work.
In summary, our experiments show that users typically prefer images with noise with respect to the ones that are processed by the PIK algorithm and do not have any noise at all. Furthermore, our noise estimation model on average selects a level of noise that is perceived favorably by most users.
In this paper we proposed a novel noise re-generation method for image compression algorithms, which estimates the noise model parameters from the input image at the encoding step and re-generates noise at the decompression step. Our model is physically and biologically inspired. We further introduced a fast noise generation technique based on a Laplacian filter which is suitable for fast decompression. As illustrated by the user study, our method is able to generate the appropriate level of additive noise that improves the perceived quality of the image. Our implementation of the proposed algorithm is open-sourced and publicly available.
-  Wallace, G.K.: The JPEG still picture compression standard. IEEE transactions on consumer electronics 38(1) (1992) xviii–xxxiv
-  Haffner, P., Bottou, L., Howard, P.G., LeCun, Y.: Djvu: Analyzing and compressing scanned documents for internet distribution. In: IEEE transactions on document analysis and recognition, IEEE (1999) 625–628
-  : Pik. https://github.com/google/pik Accessed: 2018-01-25.
-  : Dropbox. https://www.dropbox.com Accessed: 2018-01-25.
-  : Google cloud. https://cloud.google.com Accessed: 2018-01-25.
-  : Google photos. https://photos.google.com Accessed: 2018-01-25.
-  Mandel, L.: Fluctuations of photon beams: the distribution of the photo-electrons. Proceedings of the physical society 74(3) (1959) 233
-  : Butteraugli. https://github.com/google/butteraugli Accessed: 2018-01-25.
Liu, C., Freeman, W.T., Szeliski, R., Kang, S.B.:
Noise estimation from a single image.
In: IEEE computer society conference on computer vision and pattern recognition, IEEE (2006) 901–908
-  Donoho, D.L.: De-noising by soft-thresholding. IEEE transactions on information theory 41(3) (1995) 613–627
-  Tai, S.C., Yang, S.M.: A fast method for image noise estimation using laplacian operator and adaptive edge detection. In: IEEE international symposium on communications, control and signal processing, IEEE (2008) 1077–1081
-  Liu, X., Tanaka, M., Okutomi, M.: Signal dependent noise removal from a single image. In: IEEE international conference on image processing, IEEE (2014) 2679–2683
-  Gärdenäs, A.D.: Denoising and renoising of videofor compression (2017)
-  Yan, J.C.K., Hatzinakos, D.: Signal-dependent film grain noise removal and generation based on higher-order statistics. In: IEEE transactions on signal processing workshop on higher-order statistics, IEEE (1997) 77–81
Sun, T., Liang, L., Chiu, K.H., Wan, P., Au, O.C.:
DCT coefficients generation model for film grain noise and its application in super-resolution.In: IEEE international conference on image processing, IEEE (2014) 3857–3861
-  Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE transactions on computers 100(1) (1974) 90–93
-  Süsstrunk, S., Buckley, R., Swen, S.: Standard RGB color spaces. In: Color and imaging conference. Number 1 (1999) 127–134
-  Hunter, R.: Photoelectric color difference meter. Josa 48(12) (1958) 985–995
-  Winkler, S.: Issues in vision modeling for perceptual video quality assessment. Signal processing 78(2) (1999) 231–252
-  Jaynes, E.: Information theory and statistical mechanics. Physical review 106(4) (1957) 620
-  Richardson, I.E.: H. 264 and MPEG-4 video compression: video coding for next-generation multimedia. John Wiley & Sons (2004)
-  Garnett, R., Huegerich, T., Chui, C., He, W.: A universal noise removal algorithm with an impulse detector. IEEE transactions on image processing 14(11) (2005) 1747–1754
-  Bickel, D., Frühwirth, R.: On a fast, robust estimator of the mode: comparisons to other robust estimators with applications. Computational Statistics & Data Analysis 50(12) (2006) 3500–3530
-  Aubry, M., Paris, S., Hasinoff, S.W., Kautz, J., Durand, F.: Fast local laplacian filters: Theory and applications. ACM transactions on graphics 33(5) (2014) 167
-  Verma, R., Ali, D.J.: A comparative study of various types of image noise and efficient noise removal techniques. International journal of advanced research in computer science and software engineering 3(10) (2013)
A scaled conjugate gradient algorithm for fast supervised learning.Neural networks 6(4) (1993) 525–533
-  Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., der Auwera, G.V.: Hevc deblocking filter. IEEE transactions on circuits and systems for video technology 22(12) (2012) 1746–1754
-  : Wikimedia. http://commons.wikimedia.org Accessed: 2018-02-26.