Statistical analysis based on lossy or incomplete data has attracted increasing attention in machine learning and information theory. For instance, in order to store and process signals using digital devices,quantization is a common activity. Quantization is the process of mapping the measurements from a large set (often an uncountably infinite set) to values in a smaller set. The resulting values are often called as the quantized samples. A fundamentally important research problem is how to make optimal statistical inferences based on quantized samples. This problem is challenging in that, in addition to the measurement errors, quantized samples suffer from information loss due to the so-called quantization errors. Traditional theory and methods only take into account measurement errors, and hence, are invalid in the quantization setting.
In recent years, researchers have made steady progress in signal recovery based on quantized linear measurements, see, for example, Boufounos and Baraniuk (2008); Gupta et al. (2010); Gopi et al. (2013); Plan and Vershynin (2013); Zhang et al. (2014); Slawski and Li (2015); Zhu and Gu (2015); Slawski and Li (pear). In particular, Slawski and Li (2015) and Slawski and Li (pear) proposed feasible algorithms for compressed sensing based on
-bit measurements with theoretical guarantees. However, most of existing works in this direction have been only focusing on estimations. For instance,Meinicke and Ritter (2002); Chen and Varshney (2010); Zhu and Lafferty (2014, 2018) proposed optimal procedures for estimating a nonparametric function when measurement bits are constrained. On the contrary, researches on the statistical inferences based on quantized data are quite limited. To the best of our best knowledge, literature on nonparametric testing under quantization is still missing. The aim of this paper is to fill this gap by proposing a conceptually simple but asymptotically valid nonparametric testing method under restricted measurement bits and derive its minimax optimality. In particular, our test can achieve the minimax rate of testing in the sense of Ingster (1993). A concrete quantization scheme is later designed to achieve such minimaxity. Our work can be viewed as an extension of the traditional nonparametric inference (Fan et al., 2001; Shang and Cheng, 2013; Cheng and Shang, 2015; Shang and Cheng, 2015) to quantization setting, shedding some lights on the possibility of optimal statistical testing with compressed resources.
The rest of paper is organized as follows. Section 2 first gives a brief review on the classical smoothing spline regression. In Section 3, we propose a -bit nonparametric estimator and corresponding test statistic. In section 4, we first establish a nonasymptotic mean square error (MSE) bound for the proposed -bit estimator followed by its asymptotic convergence rate. The asymptotic normality and the power of the proposed test statistic are then investigated, which are shown to attain minimax optimality for certain concrete quantization designs. Simulation examples are provided to demonstrate the finite sample performance of our methods in Section 5 and a real data analysis is illustrated in Section 6. Technical proofs are collected in a separate supplement document.
2 Classical Smoothing Spline Regression
In this section, we review the classic smoothing spline regression. Consider samples generated from the following nonparametric model:
where are iid
, and a standard deviation, and belongs to an -order () periodic Sobolev space defined by
where for , , . Throughout this paper, we assume that for any .
In classic smoothing spline (ss) regression, is estimated though the following optimization problem:
It follows from Wahba (1990) that endowed with inner product is a reproducing kernel Hilbert space (RKHS). Let be the corresponding reproducing kernel function and define for any . It is well known that (see, e.g., Gu (2013)) has an explicit expression
where is the Bernoulli polynomial of order . Clearly, is an element in for any . By the representer theorem, has an explicit form
where with and being the -dimensional scaled kernel matrix. Define , where
is the “tensor” of. The matrices and will be jointly used in Section 3 to construct both data-driven estimation procedure and our test statistic.
3 -bit Smoothing Spline Regression
In reality, the storage of the samples may require infinitely many measurement bits. When measurement bits are limited, only coarsely quantized samples are available, in which case becomes infeasible. In this section, we propose an estimator of based on quantized samples and subsequently construct a test statistic for hypothesis testing.
3.1 -bit Nonparametric Estimator
Suppose that bits are available for data processing, we can discretize continuous variables ’s with at most distinct values. Consider a quantizer defined as
where quantized values are real constants, for and , . Here, the ’s form a partition of the real line with assigned marks ’s.
Suppose that the -bit samples ’s are obtained through the quantizer in the sense that
Based on these new samples , we consider a -bits (bb) estimation procedure
Similar to (2.2), we get that has an explicit expression
In practice, there are several turning parameters to be specified. For the quantization scheme, one can choose , , with being the -th order statistic of and are chosen to be equally spaced grid points within the interval . Given , we propose two choices for the representatives , (i) either choosing , and for , or (ii)
if the denominator is nonzero and setting otherwise. The reason of design scheme in (3.3) will be optimal in the sense that the information loss is minimized, which will be further discussed in detail in Section 4. Finally, the selection of can be carried out by minimizing the generalized cross validation (GCV) score
being the Euclidean Norm of vectors.
3.2 -bit Nonparametric Testing
In this section, we propose a -bits statistic for testing the following hypothesis
versus the nonparametric alternative
where is a known target function. Such a test maybe useful in applications when there are known expectations of the signal process . For example, testing shows whether the observed process ’s are pure noises (through only the quantized samples ’s). Or can be the signal process for a normally functioning machine obtained from historical data and testing reveals whether the machine is working properly.
Let represent the -norm, i.e., , a natural test statistic for (3.5) can be based on the distance
where is the -bits estimator under a certain quantization scheme and . Intuitively , measures the closeness of and , and tends to be rejected if has a large value. Our goal is to construct a valid test statistic based on quantized samples given by (3.1), and analyze its asymptotic power. To design a valid testing rule, we derive an asymptotic null-distribution of . In Theorem 4, we shall show that under mild conditions, under ,
where is the -percentile of standard Gaussian variable. We reject (3.5) if and only if
. Notice population variancein (3.8) is practically unavailable, we suggest replacing it with empirical variance, i.e., with .
The testing procedure defined in (3.8) is able to perform as good as testing procedure based on original samples and achieve to optimal rate of testing, as long as the turning parameters are well chosen. These results are stated in Theorem 4 and Theorem 5 in Section 4.
In some applications, one may not have the full knowledge of in (3.5) but can only assumes it to reside in a parametric family. For example, one may be interested in testing the linearity of , i.e., . In this case, one can simply obtain a least squared estimator based on quantized samples and replace in (3.7) with . Our Theorems 4 and 5 are still valid with some minor but tedious modifications.
4 Asymptotic Theory
In this section, several asymptotic results of the -bit estimator and the test statistic are presented. For simplicity, we assume that the quantization parameters and are both nonrandom constants. Extensions to random case can be accomplished by more cumbersome arguments.
4.1 Optimal Rate of Convergence
The following theorem describes that the difference between and can be well controlled by carefully choosing quantization parameters and .
For any and , it holds that
It can be shown that the solution to (4.2) is
Let denote the quantization estimator corresponding to . Let be the true function that generates the samples under model (2.1). We now establish a nonasymptotic upper bound for the MSE .
For any , , and , it holds that
where , with
Theorem 2 provides a nonasymptotic error bound for . The error bound consists of two parts: the MSE of the original smoothing spline estimator and . The latter can be viewed as the error resulting from quantization. An extreme case is and , i.e., the quantizer becomes dense enough, in which tends to zero reducing to the classical nonparametric estimation setting.
Following Theorem 2, we have following Corollary 3 stating that, under regularity conditions on the quantizer , the proposed quantization estimator performs as good as the original smoothing spline estimator in the sense that the MSE of the former does not exceed the latter. This suggests that a suitable quantization scheme with only a few measurement bits can indeed preserve estimation optimality.
Suppose that, as , satisfies , where is a constant. Furthermore, satisfies , , and that , , are all of order . Then .
Remark 2 further provides a concrete construction of such a scheme that achieves optimal estimation.
We provide an example quantization scheme of bits that yields estimation optimality. Suppose . Then (Wahba, 1990). Consider for and a uniform quantizer for a positive . Suppose while with provided in Corollary 3, then satisfy conditions of Corollary 3, and so . In particular, if , then we need to achieve optimality. Recalling , we nee bits to maintain optimality. Moreover, if has exponentially decaying tails, i.e., as , for a positive , then a counterpart of Corollary 3 will hold as well. In such a scenario, one can construct a uniform quantizer with and , such that and will satisfy .
4.2 Optimal Rate of Testing
Throughout this section we assume that the samples satisfy the following centralization condition:
|Condition (C): .|
Condition (C) means that for each , i.e.,
is centered at zero under null hypothesis. An example for Condition (C) is the choice for , where are defined by (4.3).
In the following theorems we let .
Suppose that , Condition (C) holds, and as tends to infinity, the following Rate Condition holds
Then under ,
Overall the conditions are rather mild; see Remark 3 for more details. The only assumption that needs some discussions is with regard to , which are deferred to Proposition 1 below. Based on Theorem 4, the Proposition 1 asserts that the condition holds when and ’s satisfy the following boundedness condition
|Condition (B): for .|
In particular, satisfy Condition (B).
Suppose that Condition (B) holds, and . Furthermore, for . Then we have that .
We now proceed to examine the power of the proposed testing methods. For simplicity, we consider the Gaussian regression, i.e., are iid standard Gaussian variables. The results can be naturally extended to more general situations such as variables with sub-Gaussian/exponential tails, with more tedious technical argument. Let be a fixed constant and . Define
Theorem 5 below says that, under regularity conditions, our test can achieve arbitrary high power provided that and are sufficiently separated by the rate . The additional Rate Condition (R2) needed for proving such theorem is easy to verify; see Remark 3 for more details.
Suppose Conditions (B), (C) and (R1) are satisfied. Furthermore, the following Rate Condition holds
|Condition (R2): , .|
Then for any , there exists positive constants and s.t. for any ,
where is the “empirical” norm of based on the design points.
The separation rate consists of two parts. The first part results from the variance of (under ) and the squared bias . This component serves as the separation rate of the classical nonparametric testing problem; see Shang and Cheng (2013); Cheng and Shang (2015); Shang and Cheng (2015, 2017). The additional part comes from quantization error. Indeed, when quantizer becomes dense enough in the sense that , reduces to the classical separation.
When , the separation rate satisfies . The sum of the first two terms inside the above square-root achieves minimum when . Therefore, if , , the minimax rate of testing. And so our test is minimax optimal in the sense of Ingster (1993) under proper quantization scheme.
Theorem 5 indicates a concrete quantizer of bits that yields testing optimality. To see this, assume that and , and for . This scheme guarantees that our testing method is optimal, as indicated by Theorem 5 and Remark 3. Then it can be seen that , leading to that . Together with the convention , we have . That is, only bits are needed for quantization such that our test becomes optimal. In practice, one can simply choose and as the minimum and maximum samples. Such choice will satisfy Condition (R2) provided that the error is sub-Gaussian.
In this section, we evaluate the finite sample performance of our methods through a simulation study. In Section 5.1, we demonstrate the performance of our quantization estimator defined in (3.2). In Section 5.2, we evaluate the performance of our testing procedure. Three simulation settings were conducted to evaluate the MSE of the estimator, size and power of the test based on independent replications. We considered periodic Sobolev space of order with kernel function , where is the Bernoulli polynomial of order . Measurement bits were chosen as . We considered a uniform quantization scheme designed by dividing the real numbers into segments with the middle intervals being the equal-length partitions of the data range. We also compared our quantization results with those based on , which we call as the “nonquantization” results.
5.1 Estimation Performance
We generated data from model for with sample size and examined two types of errors: (1) ; (2) . The MSE of both and are compared to demonstrate the impact of quantization with chosen through GCV defined in (3.4). Results are summarized in Figure 1, where it is apparent that the MSEs decrease as increases in all considered settings. Moreover, always has smaller MSE than , and the gap between the MSE tends to zero as increases. This is consistent with our theory which says that increasing will diminish the quantization error so that the quantization estimator becomes more accurate.
5.2 Hypothesis Testing
Let us now consider hypothesis testing (3.5) vs. (3.6). We generated data from model , with , . The sample size is chosen to be for and for . In particular, was used for examining the size of the test while other values of for power. We examined again two types of errors: (1) ; (2) . The target significance level was chosen as . The tuning parameter was set as with being picked by GCV. This choice is to be accommodate the observation that the optimal for estimation is of the order (see Remark 2) while the optimal for hypothesis testing is of the order (see Remark 3). As is about the optimal choice for estimation Wahba (1990), it is sensible to scale it down by a factor of .
Figure 2 reports the size of the test under various settings. Specially, the size of both quantization and nonquantization tests approach the correct level as increases, for all cases that , while for , the size is different from due to severe loss of information during quantization. This is consistent with the asymptotic distribution of the proposed test established in Theorem 4. Figure 3 and 4 summarize the power of the proposed test under various alternative hypotheses. In all case scenarios, we observe that the powers of both quantization and nonquantization tests approach one when or increases, which supports our theoretical findings in Theorem 5. When increases, additional data information makes it easier to detect the differences between and , hence the larger power. When is small, significant loss of information due to quantization results in lower power and such losses of powers quickly diminishes as increases, indicating the proposed quantization scheme can indeed maintain optimal statistical efficiency although much smaller storage/transmitting capacity are required.
5.3 Additional simulations
Additional simulation results for testing the linearity of the underlying function are provided in a separate online supplement, following the approach described in Remark 1.
6 Empirical Study
In this section, we examined our method by Oregon Climate-Station Data with sample size . The aim is to explore the relationship between elevation () and average annual (centered) temperature (). Consider a nonparametric model error. Figure 5 displays the estimated curve based on full data (non-quan) versus the estimated curves based on b-bits quantizations (). Periodic spline with order was used. It can be observed that the quantization estimations based on , i.e., the red and blue curves, are different from the black curve based on full data. When , such difference quickly diminishes; the purple curve based on almost coincides with the black one. This shows the effectiveness of -bits quantization when is suitably large.
Next, we conduct some hypothesis tests for the relationship between the elevation and the temperature. The first test is to test whether there is any association between them, i.e., . The p-values for based on full data and b-bits quantizations with are all close to zero, implying strong rejection. This is obvious from Figure 5. Next, observe from Figure 5 that except the case , all the estimated curves display strong linear patterns. Therefore, we also test following the approach described in Remark 1. The p-values for are for and for comparing to based on full data, which coincide with the findings based on Figure 5.
7 Conclusion and Extensions
In this paper, we propose a non-parametric testing procedure based on quantized observations. Our test is simple and easy-to-use based on -metric between the quantization estimator and the hypothesized function. Using Stein’s exchangeable pair method, we show that the proposed test is asymptotically Gaussian under null hypothesis, which leads to an asymptotically valid testing rule. We also examine the power of the test under local alternatives and derive minimax optimality. Concrete quantizer for achieving minimaxity is also constructed.
In the end, we discuss two extensions of the current work. First, the present paper only deals with periodic splines. It is interesting to extend our results to more general splines or even kernel ridge regression. The special periodic spline largely reduces the difficulty level of the technical proofs. Indeed, the majority of the proofs can be accomplished by exact calculations based on trigonometric series. For general RKHS, exact calculations are impossible, and so more involved proofs are needed. Second, the current results require a prefixed regularity. When is unknown, a new adaptive testing procedure that is free of the knowledge on will be highly desirable. This may be done by constructing a sequence of quantization tests index by a range of values. The adaptive test can be simply taken as the maximum value of these tests. Motivated by Liu et al. (2018), such adaptive test may asymptotically approach an extreme value distribution. A lower bound on that attains minimax rate of adaptive testing will be a useful result.
- Adams and Fournier (2003) Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces, volume 140. Elsevier.
- Boufounos and Baraniuk (2008) Boufounos, P. T. and Baraniuk, R. G. (2008). 1-bit compressive sensing. Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on, pages 16–21.
- Chen and Varshney (2010) Chen, H. and Varshney, P. K. (2010). Nonparametric one-bit quantizers for distributed estimation. IEEE Transactions on Signal Processing, 58(7):3777–3787.
- Cheng and Shang (2015) Cheng, G. and Shang, Z. (2015). Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics, 43(3):1351–1390.
- Fan et al. (2001) Fan, J., Zhang, C., and Zhang, J. (2001). Generalized likelihood ratio statistics and wilks phenomenon. The Annals of statistics, 29(1):153–193.
- Goldstein and Rinott (1996) Goldstein, L. and Rinott, Y. (1996). Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17.
- Gopi et al. (2013) Gopi, S., Netrapalli, P., Jain, P., and Nori, A. (2013). One-bit compressed sensing: Provable support and vector recovery. In International Conference on Machine Learning, pages 154–162.
- Gu (2013) Gu, C. (2013). Smoothing spline ANOVA models, volume 297. Springer Science & Business Media.
- Gupta et al. (2010) Gupta, A., Nowak, R., and Recht, B. (2010). Sample complexity for 1-bit compressed sensing and sparse classification. In 2010 IEEE International Symposium on Information Theory, pages 1553–1557. IEEE.
- Ingster (1993) Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives. i, ii, iii. Math. Methods Statist, 2(2):85–114.
- Liu et al. (2018) Liu, M., Shang, Z., and Cheng, G. (2018). Nonparametric testing under random projection. arXiv preprint arXiv:1802.06308.
- Meinicke and Ritter (2002) Meinicke, P. and Ritter, H. (2002). Quantizing density estimators. In Advances in Neural Information Processing Systems, pages 825–832.
Plan and Vershynin (2013)
Plan, Y. and Vershynin, R. (2013).
Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach.IEEE Transactions on Information Theory, 59(1):482–494.
- Reinert and Röllin (2009) Reinert, G. and Röllin, A. (2009). Multivariate normal approximation with stein’s method of exchangeable pairs under a general linearity condition. The Annals of Probability, 37(6):2150–2173.
- Shang and Cheng (2013) Shang, Z. and Cheng, G. (2013). Local and global asymptotic inference in smoothing spline models. The Annals of Statistics, 41(5):2608–2638.
- Shang and Cheng (2015) Shang, Z. and Cheng, G. (2015). Nonparametric inference in generalized functional linear models. The Annals of Statistics, 43(4):1742–1773.
- Shang and Cheng (2017) Shang, Z. and Cheng, G. (2017). Computational limits of a distributed algorithm for smoothing spline. The Journal of Machine Learning Research, 18(1):3809–3845.
- Slawski and Li (2015) Slawski, M. and Li, P. (2015). b-bit marginal regression. In Advances in Neural Information Processing Systems, pages 2062–2070.
- Slawski and Li (pear) Slawski, M. and Li, P. (to appear). Linear signal recovery from -bit-quantized linear measurements: precise analysis of the trade-off between bit depth and number of measurements. IEEE Transactions on Information Theory.
- Wahba (1990) Wahba, G. (1990). Spline models for observational data, volume 59. Siam.
- Zhang et al. (2014) Zhang, L., Yi, J., and Jin, R. (2014). Efficient algorithms for robust one-bit compressive sensing. pages 820–828.
- Zhu and Gu (2015) Zhu, R. and Gu, Q. (2015). Towards a lower sample complexity for robust one-bit compressed sensing. In International Conference on Machine Learning, pages 739–747.
- Zhu and Lafferty (2014) Zhu, Y. and Lafferty, J. (2014). Quantized estimation of gaussian sequence models in euclidean balls. In Advances in Neural Information Processing Systems, pages 3662–3670.
- Zhu and Lafferty (2018) Zhu, Y. and Lafferty, J. (2018). Quantized nonparametric estimation over sobolev ellipsoids. Information and Inference: A Journal of the IMA, 7:31–82.
s.1 Additional Simulations
In this section, we provide some additional simulation results for testing linearity proposed in our Remark 1. We generated the model for with two types of error and , where
is the density function of beta distribution with parameters. In particular, the case when was to examine the size of the test and other cases are for power. The sample size was set to be for and for . The turning parameter was selected with being picked by GCV. The significant level of the test is chosen to be .
Figure S.1 reports size of the test under various settings. The size of non-quantized testing and -bit testing with larger , i.e., when and when , are approaching as the sample size increasing, which confirms the validity of our theorem. For small , the size is far way from 0.1, which may due to the inaccurate estimation of the linear function based on quantized data.
Figure S.2 and Figure S.3 summarize the power of testing under different alternative hypotheses. In all cases, the power of the test will approach to one when either or increases, which supports our theoretical results. Moreover, with small , e.g., , the power is smaller comparing to other scenarios with larger , probably is due to too much information loss in the quantization step. For , quantized testing and testing based on full data have almost the same power, which suggests that our statistic has satisfactory finite sample performance.
s.2 Technical Proofs
Proof of Theorem 1.
By direct calculations, we have
where . So
We now look at and . For , let
Since and for , and are both symmetric circulant of order . Let . and
share the same normalized eigenvectors as