1 Introduction
Consider the classical image processing tasks like image compression and denoising. While there exists a wealth of successful methods to address them, the specificity and the intricate optimization in their design hinders their application to more general tasks and setups. For example suppose instead of one single image, we are given a collection of similarlooking images. Can the standard image compression codecs benefit from the shared redundancy to compress the images further? Such a setup is of great practical importance for compression of facial or iris images in biometrics, medical images, or the compression and transmission of very large, but similarlooking images in remote sensing and astronomy. In these cases, the usage of generic codecs like JPEG2000 whose basis vectors are not adapted to the statistics of images is known to be inefficient.
Take the case of facial images. Inspite of the extensive litereture in generic image compression, only several learningbased algorithms have studied the compression of facial images. For example, [1] was an early attempt based on VQ. [2] learns the dictionaries based on the KSVD [3] while [4] uses a treebased wavelet transform. [5] proposes a codec by using the Iteration Tuned and Aligned Dictionary (ITAD). In spite of their high compression performance, the problem with most of these approaches is that they rely very much on the alignment of images and they are less likely to generalize once the imaging setup is changed a bit. Some of them require the detection of facial features (sometimes manually) and then alignment by geometrical transformation into some canonical form and also a background removal stage.
Similarly for the image denoising tasks, only few methods have benefited from external clean databases of similar images. For example, [6] reports a near improvement over the BM3D.
On the other hand, one can think of different tasks to be performed jointly. Can more favorable scenarios like the availability of a collection of similar domainspecific images help to compress and denoise images at the same time? As a practical scenario, suppose for example the case where in an object identification system, several exemplar images have been taken with highquality acquisition systems in the enrollment mode. At query time, however, only lowquality and noisy cameras are available. It is highly desirable to be able to jointly denoise and compress the acquisitions.
The rest of the paper is organized as follows. In section 2, a very brief overview of the general image representation formulation is considered where several relevant cases are quickly reviewed. Section 3 preludes with a review from a problem in ratedistortion theory, namely the reverse waterfilling paradigm. This will be used as the core concept behind the proposed Regularized Residual Quantization (RRQ) introduced next. Section 4 conducts experiments on the RRQ algorithm under the image compression and denoising tasks. Finally, section 5 concludes the paper.
2 Related work
Many methods for image representation and dictionary learning can be generalized in the inverseproblem formulation of Eq. 1, where contains the datapoints (e.g., image patches) ’s in its columns. The codebook and the codes can be represented in a matrix form as ^{1}^{1}1Notation: matrix
, random vector and vector and , respectively, with and .(1)  
s.t. 
and are a set of constraints on the construction of the codebook and the codes, respectively.
Depending on and , the problem of Eq. 1 can be treated in many different ways.^{2}^{2}2See [7] and [8] for detailed reviews and discussions For example, under the famous sparsity constraint or its relaxed version , the KSVD algorithm [3] solves it for local minima in an iterative way.
In this work, we follow the VQbased interpretation of Eq. 1, where, as a general formulation, it is required that:
This problem can be solved using the kmeans algorithm. However, the lack of structure for this formulation leads to poor generalization performance. To address some of the issues with this simple formulation, Product Quantization (PQ) (e.g., [9] [10]) divides the vectors into several blocks and runs kmeans on each of them independently. While PQ can achieve good ratedistortion performance under certain conditions, its lack of design flexibility and the fact that the system should be retrained for every rate, makes PQ not a suitable solution for image analysis.
As an alternative, RQ is a multilayer approach that at each layer quantizes the residuals of quantization of the previous layer. While having been extensively studied in the 80’s and 90’s for different tasks like image coding (e.g., refer to [11], [12] or [9]), its efficiency was limited for more modern applications. In practice, it was not possible to learn codewords for more than a couple of layers.
In this work, we use an approach based on the RQ for which we introduce a preprocessing and an efficient regularization, making it possible to learn arbitrary numbers of layers. Moreover, the introduced regularization makes it possible to go beyond the image patches and work with the highdimensional image directly. This brings an important advantage for different tasks like image compression. Since the global picture of the image is preserved in the highdimensional representation, one does not have to encode the relation between similar patches after compression.
3 Proposed framework: RRQ
We first recall a concept from ratedistortion theory which is the quantization of Gaussian independent sources. Although in a slightly different setup than a practical quantization (e.g. being asymptotic), this motivates the core idea behind the RRQ algorithm introduced next in this chapter.
3.1 Preliminaries: Quantization of independent sources
The tradeoff between the compactness and the fidelity of representation of a signal is classically treated in the Shannon’s ratedistortion theory [13]^{3}^{3}3Refer to Ch. 10 of [14] for further details of this subsection..
A special setup studied in this theory is the ratedistortion for
independent Gaussian distributed sources,
’s with different variances. Concretely, assume
. Define the expected distortion between a random vector and its estimate as
, where the distortion between two nvectors and is defined as .Here we ask the question: Given a fixed total distortion allowed, i.e., , what is the optimal way to divide the distortion (or rate) between these sources such that the overall allocated rate (distortion) is minimized? This can be posed as:
(2)  
s.t. 
where is the distortion of each source after rateallocation. The solution to this convex problem is known as the reverse waterfilling and is given as:
(3) 
where is a constant which should be chosen to guarantee that .
Denote , the variance of the codewords for quantization of . Due to the principle of orthogonality and the independence of dimensions, we have that . Therefore, according to Eq. 3, the optimal assignment of the codeword variances will be a softthresholding of with :
(4) 
This means that the optimal rateallocation requires that the sources with variances less than should not be assigned any rate at all. This, when used in the codebook design, results in sparsity of the codewords which we incorporate in the RRQ algorithm.
3.2 The RRQ algorithm
Inspired by the setup studied in section 3.1, we argue that after a preprocessing stage, natural images can be represented in a global representation as variance decaying vectors which have independent, or at least uncorrelated dimensions. One might think of the PCA as a simple way to achieve this. However, since the dimensionality of the entire vectorized image is high, apart from the big complexities incurred, there will be too many parameters in the covariance matrix to estimate. Therefore, a global PCA will likely overfit to the training, largely deviating from the test set. To overcome this issue, we propose the preprocessing in Algorithm 1.
After the PCA rotation matrices are learned from the training set, the same procedure applies to images from the test set. In fact, this preprocessing is a more robust estimation for the global PCA. Instead of parameters of the direct PCA rotation matrix, with the help of 2DDCT, this preprocessing has parameters to estimate. This is an effective way to trade independence of dimensions for robustness between train and test sets.
The RRQ framework is introduced in Algorithm 2. For each of the layers, given the desired number of codewords, , after calculation of the variances of the residuals, the algorithm first finds the optimal and calculates the optimal variances of the codewords based on Eq. 4 and then randomly generates codewords based on these variances. Especially at the first layers, since the data has a very strong decaying character, this makes the codewords very sparse, significantly reducing the complexity and storage cost of the codebooks. The algorithm continues by quantizing the residuals ’s with the generated codewords and updating the estimations ’s and the new residuals ’s and finishes at the desired .
4 Experiments
We perform the two tasks of image compression and denoising of facial images. For image compression, we compare the performance of our proposed method with the JPEG and JPEG2000 standards. For denoising, we compare with the BM3D. These are widely considered as baselines for comparison in the literature.
The CroppedYaleB set [15] is used which contains 2408 images of size from 38 subjects. Each subject has between 57 to 64 acquisitions with extreme illumination changes. We choose half of the images for each subject randomly for training and the rest for testing.
We choose two different valuepairs, and , where is the number of layers and is the number of codewords per each layer. As described earlier, all codewords are generated randomly according to Eq. 4. Algorithm 1 is used for preprocessing with subbands. The resulting decorrelated vectors are of size (same as the original images).
Figure 0(a) sketches the DR curve for this set. It is seen that the gap between the training and the test sets for the proposed RRQ is very small, indicating the success of the algorithm in terms of generalization. The nonregularized RQ, on the other hand, while has much lower distortion on the train set, fails to compress the test set at the first several layers.
Fig. 0(b) shows the results of image compression. These results are averaged over 20 randomly chosen images from the test set. The advantage of the proposed method under this setup over the highlyoptimized JPEG2000 codec is significant, particularly at lower rates. It should be noted that we do not perform any entropy coding over the codebook indices. Further compression improvement can be achieved by entropy coding over the treelike structure of the codebooks.
The results of image denoising for three different noise levels^{4}^{4}4Gray values are normalized between 0 and 1., averaged over 20 randomly chosen test images are depicted in Fig. 0(c). The network is trained over clean images and is exactly the same as the one used for compression. Test images are contaminated with noise and are given as the input to the network for reconstruction. When reconstructing the noisy image, the network uses the priors from the clean images based on which it has been trained. These priors are automatically used in the reconstruction process, serving as a very efficient denoising strategy surpassing (only at highly noisy regimes), the priorless BM3D.
As the network tries to reconstruct the noisy image with further details, the noise statistics are becoming more present in the reconstructed image, hence degrading the quality. Therefore, depending on the noise variance, the maximum PSNR is somewhere in the middle of the distortionrate curve. Noisier images have the maximum at lower rates.
Fig. 2 illustrates the denoising quality for two image samples. It is interesting to notice that the BM3D, although producing a smooth image, fails to reconstruct the face contours since it lacks enough priors.
5 Conclusions
A framework for multilayer representation of images was proposed where, instead of local patchbased processing, a global highdimensional vector representation of images is successively quantized within different levels of reconstruction fidelity. As an alternative to the classical RQ framework which is based on kmeans, the proposed RRQ along with preprocessing, randomly generates codewords from a regularized and learned distribution. Apart from the many potential advantages of having random codewords, this is shown lo lead to efficient quantization with low traintest distortion gaps. The experimental results show interesting promise for different practical scenarios, e.g., when the acquisition devices at the query phase are much noisier than the enrollment cameras. Future works consider using the variance priors to further train the codewords, moreover using entropy coding on the tree of indices for better ratedistortion performance.
References
 [1] M. Elad, R. Goldenberg, and R. Kimmel, “Low bitrate compression of facial images,” IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2379–2383, Sept 2007.
 [2] Ori Bryt and Michael Elad, “Compression of facial images using the ksvd algorithm,” Journal of Visual Communication and Image Representation, vol. 19, no. 4, pp. 270 – 282, 2008.
 [3] M. Aharon, M. Elad, and A. Bruckstein, “k svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, Nov 2006.
 [4] I. Ram, I. Cohen, and M. Elad, “Facial image compression using patchorderingbased adaptive wavelet transform,” IEEE Signal Processing Letters, vol. 21, no. 10, pp. 1270–1274, Oct 2014.
 [5] J. Zepeda, C. Guillemot, and E. Kijak, “Image compression using sparse representations and the iterationtuned and aligned dictionary,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 5, pp. 1061–1073, Sept 2011.
 [6] E. Luo, S. H. Chan, and T. Q. Nguyen, “Adaptive image denoising by targeted databases,” IEEE Transactions on Image Processing, vol. 24, no. 7, pp. 2167–2181, July 2015.
 [7] R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparse representation modeling,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1045–1057, June 2010.
 [8] Julien Mairal, Francis Bach, and Jean Ponce, “Sparse modeling for image and vision processing,” Foundations and Trends® in Computer Graphics and Vision, vol. 8, no. 23, pp. 85–283, 2014.
 [9] Allen Gersho and Robert M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Norwell, MA, USA, 1991.
 [10] H. Jegou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, Jan 2011.
 [11] C. F. Barnes, S. A. Rizvi, and N. M. Nasrabadi, “Advances in residual vector quantization: a review,” IEEE Transactions on Image Processing, vol. 5, no. 2, pp. 226–262, Feb 1996.
 [12] N. M. Nasrabadi and R. A. King, “Image coding using vector quantization: a review,” IEEE Transactions on Communications, vol. 36, no. 8, pp. 957–971, Aug 1988.
 [13] Claude E Shannon, “Coding theorems for a discrete source with a fidelity criterion,” .
 [14] T. Cover and J. Thomas, Elements of Information Theory 2nd Edition, WileyInterscience, 2 edition, 7 2006.

[15]
A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman,
“From few to many: Illumination cone models for face recognition under variable lighting and pose,”
IEEE Trans. Pattern Anal. Mach. Intelligence, vol. 23, no. 6, pp. 643–660, 2001.
Comments
There are no comments yet.