1 Introduction
1.1 Motivation
Signal degradation refers to the corruption of the signal due to many different reasons such as interference and the blend of interested signal and uninterested signal or noise, which is observed ubiquitously in practical information systems. The cause of signal degradation may be physical factors, such as the imperfectness of data acquisition devices and the noise in data transmission medium; or may be artificial factors, such as the lossy data compression and the transmission of multiple sources over the same medium at the same time. In addition, in cases where we want to enhance signal, we may assume the signal to have been somehow “degraded,” for example as we want to enhance the resolution of an image, we assume the image is a degraded version of an ideal “original” image that has high resolution [1].
To tackle signal degradation or to fulfill signal enhancement, computational restoration of degraded signal has been investigated for many years. There are various signal restoration tasks corresponding to different degradation reasons. Taken image as example, image denoising [2], image deblur [3]
, single image superresolution
[1], image contrast enhancement [4], image compression artifact removal [5][6], …, all belong to image restoration tasks.Different restoration tasks have various objectives. Some tasks may be keen to recover the “original” signal as faithfully as possible, like image denoising is to recover the noisefree image, compression artifact removal is to recover the uncompressed image. Some other tasks may concern more about the perceptual quality of the restored signal, like image superresolution is to produce image details to make the enhanced image look like “highresolution,” image inpainting is to generate a complete image that looks “natural.” Yet some other tasks may serve for recognition or understanding purpose: for one example, an image containing a car license plate may have blur, and image deblur can achieve a less blurred image so as recognize the license plate [7]; for another example, an image taken at night is difficult to identify, and image contrast enhancement can produce a more naturally looking image that is better understood [8]. Recent years have witnessed more and more efforts about the last category [9, 10].
Given the different objectives, it is apparent that a signal restoration method designed for one specific task shall be evaluated with the specific metric that corresponds to the task’s objective. Indeed, the aforementioned objectives correspond to three groups of evaluation metrics:

Signal fidelity metrics that evaluate how similar is the restored signal to the “original” signal. These include all the fullreference quality metrics, such as the wellknown meansquarederror (MSE) and its counterpart peak signaltonoise ratio (PSNR), the structural similarity (SSIM) [11]
, and the difference in features extracted from original signal and restored signal
[12], to name a few. 
Perceptual naturalness metrics that evaluate how “natural” is the restored signal with respect to human perception. Perceptual naturalness was evaluated by human and approximated by noreference quality assessment methods [13, 14]. Recently, the popularity of generative adversarial network (GAN) has motivated a formulation of perceptual naturalness [15].

Semantic quality metrics that evaluate how “useful” is the restored signal in the sense that it better serves for the following semanticrelated analyses. For example, how well a classifier performs on the restored signal is a measure of the semantic quality. There are only a few studies about semantic quality assessment methods [16].
It is worth noting that signal fidelity metrics have dominated in the researches of signal restoration. However, is one method optimized for signal fidelity also optimal for perceptual naturalness or semantic quality? This question has been overlooked for a long while until recently. Blau and Michaeli considered signal fidelity and perceptual naturalness and concluded that both metrics cannot be optimized simultaneously [15]. Indeed, they provided a rigorous proof of the existence of the perceptiondistortion tradeoff: with distortion representing signal fidelity and perceptual difference representing perceptual naturalness, one signal restoration method cannot achieve both low distortion and low perceptual difference (up to a bound). This conclusion reveals the fundamental limit of the capability of signal restoration, and inspires the adoption of perceptual naturalness metrics in related tasks [17, 18].
Following the work of the perceptiondistortion tradeoff, in this paper, we aim to consider the three groups of metrics jointly, i.e. we want to study the relation between signal fidelity, perceptual naturalness, and semantic quality. We consider classification error rate as the representative of semantic quality, because classification is the most fundamental semanticrelated analysis. We find there is indeed a tradeoff between the three metrics, which is named the classificationdistortionperception (CDP) tradeoff. In short, the CDP tradeoff claims that the distortion, perceptual difference, and classification error rate cannot be made minimal simultaneously. Our proof indicates the essential difference between the three quality metrics. In practice, it implies the adoption of semantic quality metrics instead of signal fidelity or perceptual naturalness metrics, if a signal restoration method is meant to serve for recognition purpose.
1.2 Problem Definition
Consider the process: , where denotes the ideal “original” signal, denotes the degraded signal, and denotes the restored signal. We formulate , , and
each as a discrete random variable. The cases of continuous random variables can be deduced in a similar manner, and thus are omitted hereafter. The probability mass function of
is denoted by . The degradation model is denoted by , which is characterized by a conditional mass function . The restoration method is then denoted by and characterized by .We are interested in classifying the signal into two categories in this paper. Thus, we assume each sample of the original signal belongs to one of two classes: or . The a priori probabilities and the conditional mass functions are assumed to be known as and , respectively. In other words, follows a twocomponent mixture model: . Accordingly, follows the model: , and follows the model: , where
(1)  
(2)  
A binary classifier can be denoted by
(3) 
If we apply this classifier on the original signal , we shall achieve an error rate
(4) 
The optimal classifier is defined as the classifier that achieves the minimal error rate for a given signal, e.g. . According to the Bayes decision rule (see [19] for proof), the optimal classifier shall be
(5) 
which leads to the minimal error rate, a.k.a. the Bayes error rate
(6)  
1.3 Main Theorems
We prove two versions of the CDP tradeoff. For the first version, we consider using a predefined classifier on the restored signal. This leads to
Definition 1.
The classificationdistortionperception (CDP) function is
(7) 
where is to take expectation, is a function to measure distortion between the original and the restored signals, and is a function to measure the difference between two probability mass functions, which is claimed to be indicative for perceptual difference [15].
Theorem 1.
Consider (7), if is convex in , then is

monotonically nonincreasing,

convex in and .
Note that the convexity of the perceptual difference is assumed, which is claimed to be satisfied by a large number of commonly used difference functions, including any fdivergence (e.g. KullbackLeibler divergence, total variation, Hellinger) and the Rényi divergence
[20, 21].For the second version, we consider using the optimal classifier on the restored signal, i.e. the classifier is adaptive to the restored signal. According to the Bayes decision rule, we are actually considering the Bayes error rate of . This leads to
Definition 2.
The strong classificationdistortionperception (SCDP) function is
(8) 
Theorem 2.
Consider (8), is monotonically nonincreasing.
1.4 Paper Organization
In the following sections, we first give some properties of the classification error rate, especially the Bayes error rate, which will be helpful in our proofs of the main theorems. Then we prove the two theorems one by one. Discussion and conclusion are finally presented.
2 Properties of the Classification Error Rate
2.1 Classification Error Rate is Linear
Theorem 3.
Let follow a twocomponent mixture model: , similarly follow: . Let be the random variable with where . Let be a fixed classifier, then
(9) 
Proof.
As is a fixed classifier, it can be denoted in general by . Then we have
(10)  
(11) 
Thus
(12)  
∎
2.2 Bayes Error Rate is Concave
Theorem 4.
Let , , and be defined as in Theorem 3, then
(13) 
Proof.
denotes the optimal classifier for , then . According to (9) we have . Note that and . Thus . ∎
2.3 Bayes Error Rate is NonDecreasing
Theorem 5.
Let the process of be denoted by , which is characterized by a conditional mass function , then and if and only if satisfies: , where , and . Note that is slightly different from defined in (5).
Proof.
(14)  
When , for any , we need to have
(15) 
which is equivalent to: all the ’s that satisfy shall have either or . The condition is further equivalent to: the ’s that satisfy shall be either all in , or all in , where . In other words, . ∎
We can compare Theorem 5 with the data processing theorem in the information theory: consider the process of as a deterministic function , then , and if and only if is invertible [22]. That says, the information quantity we have about the source is nonincreasing after data processing. Similarly, Theorem 5 claims that the Bayes error rate is nondecreasing after data processing, because we lose information, at best not. Moreover, not only invertible function satisfies the condition required in Theorem 5, but also a large group of noninvertible functions as well as probabilistic mappings satisfy the condition, which is quite different from the data processing theorem. In other words, we may lose information but that information loss may not affect classification.
3 Proof of the CDP Tradeoff (Theorem 1)
Proof.
For the first point, simply note that when increasing or , the feasible domain of is enlarged; as is the minimal value of over the feasible domain, and the feasible domain is enlarged, the minimal value will not increase.
For the second point, it is equivalent to prove:
(16) 
for any . First, let (resp. ) denote the optimal restoration method under constraint (resp. ), and (resp. ) be the restored signal, i.e.
(17) 
(18) 
Then the left hand side of (16) becomes
(19) 
where we have used Theorem 3 and denotes the restored signal corresponding to . Let , , then by definition
(20) 
Next, as in (7) is convex in its second argument, we have
(21)  
the last inequality is due to (17) and (18). Similarly, we have
(22)  
the last inequality is again due to (17) and (18). Finally, note that is nonincreasing with respect to and ,
(23) 
4 Proof of the CDP Tradeoff (Theorem 2)
Proof.
Simply note that when increasing or , the feasible domain of is enlarged; as is the minimal value of over the feasible domain, and the feasible domain is enlarged, the minimal value will not increase. ∎
5 Discussion and Conclusion
We would like to mention the difference between Theorem 1 and Theorem 2. Theorem 2 is more fundamental as we deal with the theoretically minimal error rate of the restored signal. However in practice, this error rate is not achievable if the degradation model is unknown. Clearly, if is not available, we cannot make any meaningful conclusion regarding the mass function , which prohibits the search for the optimal restoration method together with the optimal classifier. From a practical perspective, we usually adopt a fixed classifier (for example the classifier trained by some samples of the original signal) and adjust the restoration method only. On the other hand, if the degradation model is known, then it is possible to consider the optimal classifier for the degraded signal directly: actually it is better in theory to consider instead of because we have confirmed that (Theorem 5). In other words, signal restoration has no use to improve the classification accuracy as long as the degradation model is known. According to these analyses, Theorem 2 is less appealing in practice.
Note that we do not prove the convexity of the strong CDP function, as we have done for the CDP function in Theorem 1. This is due to the essential difference between classification error rate with a fixed classifier and Bayes error rate: the former is linear and the latter is concave (Theorems 3 and 4). Note that the distortion is also linear but the perceptual difference is convex. We suspect the strong CDP function may be not convex, which is to be confirmed in the future.
Our findings can be useful especially for computer vision researches where some lowlevel vision tasks (signal restoration) serve for highlevel vision tasks (visual understanding). If the degradation model is known, we recommend directly classifying the degraded signal without any restoration; at this time the classifier can be trained by samples that are simulated with the known degradation on the original signal. If the degradation model is unknown, we recommend using a fixed classifier, which can be trained for example using samples of the original signal; meanwhile, we recommend searching for the restoration method with the classification error rate as the objective (or one of the objectives) of optimization. This strategy is clearly different from previous works that optimize for various kinds of distortion metrics for improving the classification performance.
References
 [1] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image superresolution,” in ECCV, 2014, pp. 184–199.
 [2] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.
 [3] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring for handheld cameras,” in CVPR, 2017, pp. 1279–1288.
 [4] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for realtime image enhancement,” ACM Transactions on Graphics, vol. 36, no. 4, p. 118, 2017.
 [5] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in ICCV, 2015, pp. 576–584.
 [6] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in CVPR, 2018, pp. 5505–5514.

[7]
Q. Lu, W. Zhou, L. Fang, and H. Li, “Robust blur kernel estimation for license plate images from fast moving vehicles,”
IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2311–2323, 2016.  [8] H. Kuang, X. Zhang, Y.J. Li, L. L. H. Chan, and H. Yan, “Nighttime vehicle detection based on bioinspired image enhancement and weighted scorelevel feature fusion,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 4, pp. 927–936, 2017.
 [9] J. Shermeyer and A. Van Etten, “The effects of superresolution on object detection performance in satellite imagery,” arXiv preprint arXiv:1812.04098, 2018.
 [10] R. G. VidalMata, S. Banerjee, B. RichardWebster et al., “Bridging the gap between computational photography and visual recognition,” arXiv preprint arXiv:1901.09482, 2019.
 [11] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
 [12] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in ECCV, 2016, pp. 694–711.
 [13] A. Mittal, A. K. Moorthy, and A. C. Bovik, “Noreference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012.
 [14] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality assessment: A natural scene statistics approach in the DCT domain,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3339–3352, 2012.
 [15] Y. Blau and T. Michaeli, “The perceptiondistortion tradeoff,” in CVPR, 2018, pp. 6228–6237.
 [16] D. Liu, D. Wang, and H. Li, “Recognizable or not: Towards image semantic quality assessment for compression,” Sensing and Imaging, vol. 18, no. 1, pp. 1–20, 2017.
 [17] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. ZelnikManor, “The 2018 PIRM challenge on perceptual image superresolution,” in ECCV, 2018, pp. 1–22.

[18]
T. Vu, C. Van Nguyen, T. X. Pham, T. M. Luu, and C. D. Yoo, “Fast and efficient image quality enhancement via desubpixel convolutional neural networks,” in
ECCV, 2018, pp. 1–17.  [19] K. Fukunaga, Introduction to Statistical Patten Recognition (2nd Edition). San Diego, CA, USA: Academic Press, 1990, ch. 3.1, pp. 51–65.
 [20] I. Csiszár and P. C. Shields, “Information theory and statistics: A tutorial,” Foundations and Trends® in Communications and Information Theory, vol. 1, no. 4, pp. 417–528, 2004.
 [21] T. Van Erven and P. Harremos, “Rényi divergence and KullbackLeibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014.
 [22] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2012.
Comments
There are no comments yet.