Optimal Nonparametric Inference under Quantization

01/24/2019 ∙ by Ruiqi Liu, et al. ∙ Indiana University 0

Statistical inference based on lossy or incomplete samples is of fundamental importance in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on quantized samples. In contrast to the classic nonparametric approach, our method lives on a coarse grid of sample information and are simple-to-use. Under mild technical conditions, we establish the asymptotic properties of the proposed procedures including asymptotic null distribution of the quantization test statistic as well as its minimax power optimality. Concrete quantizers are constructed for achieving the minimax optimality in practical use. Simulation results and a real data analysis are provided to demonstrate the validity and effectiveness of the proposed test. Our work bridges the classical nonparametric inference to modern lossy data setting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical analysis based on lossy or incomplete data has attracted increasing attention in machine learning and information theory. For instance, in order to store and process signals using digital devices,

quantization is a common activity. Quantization is the process of mapping the measurements from a large set (often an uncountably infinite set) to values in a smaller set. The resulting values are often called as the quantized samples. A fundamentally important research problem is how to make optimal statistical inferences based on quantized samples. This problem is challenging in that, in addition to the measurement errors, quantized samples suffer from information loss due to the so-called quantization errors. Traditional theory and methods only take into account measurement errors, and hence, are invalid in the quantization setting.

In recent years, researchers have made steady progress in signal recovery based on quantized linear measurements, see, for example, Boufounos and Baraniuk (2008); Gupta et al. (2010); Gopi et al. (2013); Plan and Vershynin (2013); Zhang et al. (2014); Slawski and Li (2015); Zhu and Gu (2015); Slawski and Li (pear). In particular, Slawski and Li (2015) and Slawski and Li (pear) proposed feasible algorithms for compressed sensing based on

-bit measurements with theoretical guarantees. However, most of existing works in this direction have been only focusing on estimations. For instance,

Meinicke and Ritter (2002); Chen and Varshney (2010); Zhu and Lafferty (2014, 2018) proposed optimal procedures for estimating a nonparametric function when measurement bits are constrained. On the contrary, researches on the statistical inferences based on quantized data are quite limited. To the best of our best knowledge, literature on nonparametric testing under quantization is still missing. The aim of this paper is to fill this gap by proposing a conceptually simple but asymptotically valid nonparametric testing method under restricted measurement bits and derive its minimax optimality. In particular, our test can achieve the minimax rate of testing in the sense of Ingster (1993). A concrete quantization scheme is later designed to achieve such minimaxity. Our work can be viewed as an extension of the traditional nonparametric inference (Fan et al., 2001; Shang and Cheng, 2013; Cheng and Shang, 2015; Shang and Cheng, 2015) to quantization setting, shedding some lights on the possibility of optimal statistical testing with compressed resources.

The rest of paper is organized as follows. Section 2 first gives a brief review on the classical smoothing spline regression. In Section 3, we propose a -bit nonparametric estimator and corresponding test statistic. In section 4, we first establish a nonasymptotic mean square error (MSE) bound for the proposed -bit estimator followed by its asymptotic convergence rate. The asymptotic normality and the power of the proposed test statistic are then investigated, which are shown to attain minimax optimality for certain concrete quantization designs. Simulation examples are provided to demonstrate the finite sample performance of our methods in Section 5 and a real data analysis is illustrated in Section 6. Technical proofs are collected in a separate supplement document.

2 Classical Smoothing Spline Regression

In this section, we review the classic smoothing spline regression. Consider samples generated from the following nonparametric model:

(2.1)

where are iid

zero-mean random variables with a common probability density function

, and a standard deviation

, and belongs to an -order () periodic Sobolev space defined by

where for , , . Throughout this paper, we assume that for any .

In classic smoothing spline (ss) regression, is estimated though the following optimization problem:

It follows from Wahba (1990) that endowed with inner product is a reproducing kernel Hilbert space (RKHS). Let be the corresponding reproducing kernel function and define for any . It is well known that (see, e.g., Gu (2013)) has an explicit expression

where is the Bernoulli polynomial of order . Clearly, is an element in for any . By the representer theorem, has an explicit form

(2.2)

where with and being the -dimensional scaled kernel matrix. Define , where

is the “tensor” of

. The matrices and will be jointly used in Section 3 to construct both data-driven estimation procedure and our test statistic.

3 -bit Smoothing Spline Regression

In reality, the storage of the samples may require infinitely many measurement bits. When measurement bits are limited, only coarsely quantized samples are available, in which case becomes infeasible. In this section, we propose an estimator of based on quantized samples and subsequently construct a test statistic for hypothesis testing.

3.1 -bit Nonparametric Estimator

Suppose that bits are available for data processing, we can discretize continuous variables ’s with at most distinct values. Consider a quantizer defined as

where quantized values are real constants, for and , . Here, the ’s form a partition of the real line with assigned marks ’s.

Suppose that the -bit samples ’s are obtained through the quantizer in the sense that

(3.1)

Based on these new samples , we consider a -bits (bb) estimation procedure

(3.2)

Similar to (2.2), we get that has an explicit expression

with .

In practice, there are several turning parameters to be specified. For the quantization scheme, one can choose , , with being the -th order statistic of and are chosen to be equally spaced grid points within the interval . Given , we propose two choices for the representatives , (i) either choosing , and for , or (ii)

(3.3)

if the denominator is nonzero and setting otherwise. The reason of design scheme in (3.3) will be optimal in the sense that the information loss is minimized, which will be further discussed in detail in Section 4. Finally, the selection of can be carried out by minimizing the generalized cross validation (GCV) score

(3.4)

with

being the Euclidean Norm of vectors.

Even though quantized data will suffer from information loss comparing to original data, the the differences among and can be well controlled by smartly choosing quantization parameters and , which are summarized in Theorem 1, Theorem 2 and Corollary 3 in Section 4.

3.2 -bit Nonparametric Testing

In this section, we propose a -bits statistic for testing the following hypothesis

(3.5)

versus the nonparametric alternative

(3.6)

where is a known target function. Such a test maybe useful in applications when there are known expectations of the signal process . For example, testing shows whether the observed process ’s are pure noises (through only the quantized samples ’s). Or can be the signal process for a normally functioning machine obtained from historical data and testing reveals whether the machine is working properly.

Let represent the -norm, i.e., , a natural test statistic for (3.5) can be based on the distance

(3.7)

where is the -bits estimator under a certain quantization scheme and . Intuitively , measures the closeness of and , and tends to be rejected if has a large value. Our goal is to construct a valid test statistic based on quantized samples given by (3.1), and analyze its asymptotic power. To design a valid testing rule, we derive an asymptotic null-distribution of . In Theorem 4, we shall show that under mild conditions, under ,

where , with defined in Section 2 and with being the th entry of . Consequently, the decision for testing (3.5) vs. (3.6) at significance level is

(3.8)

where is the -percentile of standard Gaussian variable. We reject (3.5) if and only if

. Notice population variance

in (3.8) is practically unavailable, we suggest replacing it with empirical variance, i.e., with .

The testing procedure defined in (3.8) is able to perform as good as testing procedure based on original samples and achieve to optimal rate of testing, as long as the turning parameters are well chosen. These results are stated in Theorem 4 and Theorem 5 in Section 4.

Remark 1.

In some applications, one may not have the full knowledge of in (3.5) but can only assumes it to reside in a parametric family. For example, one may be interested in testing the linearity of , i.e., . In this case, one can simply obtain a least squared estimator based on quantized samples and replace in (3.7) with . Our Theorems 4 and 5 are still valid with some minor but tedious modifications.

4 Asymptotic Theory

In this section, several asymptotic results of the -bit estimator and the test statistic are presented. For simplicity, we assume that the quantization parameters and are both nonrandom constants. Extensions to random case can be accomplished by more cumbersome arguments.

4.1 Optimal Rate of Convergence

The following theorem describes that the difference between and can be well controlled by carefully choosing quantization parameters and .

Theorem 1.

For any and , it holds that

(4.1)

Theorem 1 provides some insights to choose the vector of representatives . For any , we can choose to minimize the expectation of the upper bound in (4.1). That is, we aim to find

(4.2)

It can be shown that the solution to (4.2) is

(4.3)

Since calculation of (4.3) is practically infeasible, one can choose their empirical counterparts ’s defined in (3.3).

Let denote the quantization estimator corresponding to . Let be the true function that generates the samples under model (2.1). We now establish a nonasymptotic upper bound for the MSE .

Theorem 2.

For any , , and , it holds that

where , with

Theorem 2 provides a nonasymptotic error bound for . The error bound consists of two parts: the MSE of the original smoothing spline estimator and . The latter can be viewed as the error resulting from quantization. An extreme case is and , i.e., the quantizer becomes dense enough, in which tends to zero reducing to the classical nonparametric estimation setting.

Following Theorem 2, we have following Corollary 3 stating that, under regularity conditions on the quantizer , the proposed quantization estimator performs as good as the original smoothing spline estimator in the sense that the MSE of the former does not exceed the latter. This suggests that a suitable quantization scheme with only a few measurement bits can indeed preserve estimation optimality.

Corollary 3.

Suppose that, as , satisfies , where is a constant. Furthermore, satisfies , , and that , , are all of order . Then .

Remark 2 further provides a concrete construction of such a scheme that achieves optimal estimation.

Remark 2.

We provide an example quantization scheme of bits that yields estimation optimality. Suppose . Then (Wahba, 1990). Consider for and a uniform quantizer for a positive . Suppose while with provided in Corollary 3, then satisfy conditions of Corollary 3, and so . In particular, if , then we need to achieve optimality. Recalling , we nee bits to maintain optimality. Moreover, if has exponentially decaying tails, i.e., as , for a positive , then a counterpart of Corollary 3 will hold as well. In such a scenario, one can construct a uniform quantizer with and , such that and will satisfy .

4.2 Optimal Rate of Testing

Throughout this section we assume that the samples satisfy the following centralization condition:

Condition (C): .

Condition (C) means that for each , i.e.,

is centered at zero under null hypothesis. An example for Condition (

C) is the choice for , where are defined by (4.3).

In the following theorems we let .

Theorem 4.

Suppose that , Condition (C) holds, and as tends to infinity, the following Rate Condition holds

Then under ,

(4.4)

The proof of Theorem 4 relies on Stein’s exchangeable pair method. Theorem 4 shows that, under regularity conditions, is asymptotically Gaussian under .

Overall the conditions are rather mild; see Remark 3 for more details. The only assumption that needs some discussions is with regard to , which are deferred to Proposition 1 below. Based on Theorem 4, the Proposition 1 asserts that the condition holds when and ’s satisfy the following boundedness condition

Condition (B): for .

In particular, satisfy Condition (B).

Proposition 1.

Suppose that Condition (B) holds, and . Furthermore, for . Then we have that .

We now proceed to examine the power of the proposed testing methods. For simplicity, we consider the Gaussian regression, i.e., are iid standard Gaussian variables. The results can be naturally extended to more general situations such as variables with sub-Gaussian/exponential tails, with more tedious technical argument. Let be a fixed constant and . Define

Theorem 5 below says that, under regularity conditions, our test can achieve arbitrary high power provided that and are sufficiently separated by the rate . The additional Rate Condition (R2) needed for proving such theorem is easy to verify; see Remark 3 for more details.

Theorem 5.

Suppose Conditions (B), (C) and (R1) are satisfied. Furthermore, the following Rate Condition holds

Condition (R2): , .

Then for any , there exists positive constants and s.t. for any ,

where is the “empirical” norm of based on the design points.

The separation rate consists of two parts. The first part results from the variance of (under ) and the squared bias . This component serves as the separation rate of the classical nonparametric testing problem; see Shang and Cheng (2013); Cheng and Shang (2015); Shang and Cheng (2015, 2017). The additional part comes from quantization error. Indeed, when quantizer becomes dense enough in the sense that , reduces to the classical separation.

Remark 3.

When , the separation rate satisfies . The sum of the first two terms inside the above square-root achieves minimum when . Therefore, if , , the minimax rate of testing. And so our test is minimax optimal in the sense of Ingster (1993) under proper quantization scheme.

Remark 4.

Theorem 5 indicates a concrete quantizer of bits that yields testing optimality. To see this, assume that and , and for . This scheme guarantees that our testing method is optimal, as indicated by Theorem 5 and Remark 3. Then it can be seen that , leading to that . Together with the convention , we have . That is, only bits are needed for quantization such that our test becomes optimal. In practice, one can simply choose and as the minimum and maximum samples. Such choice will satisfy Condition (R2) provided that the error is sub-Gaussian.

5 Simulation

In this section, we evaluate the finite sample performance of our methods through a simulation study. In Section 5.1, we demonstrate the performance of our quantization estimator defined in (3.2). In Section 5.2, we evaluate the performance of our testing procedure. Three simulation settings were conducted to evaluate the MSE of the estimator, size and power of the test based on independent replications. We considered periodic Sobolev space of order with kernel function , where is the Bernoulli polynomial of order . Measurement bits were chosen as . We considered a uniform quantization scheme designed by dividing the real numbers into segments with the middle intervals being the equal-length partitions of the data range. We also compared our quantization results with those based on , which we call as the “nonquantization” results.

5.1 Estimation Performance

We generated data from model for with sample size and examined two types of errors: (1) ; (2) . The MSE of both and are compared to demonstrate the impact of quantization with chosen through GCV defined in (3.4). Results are summarized in Figure 1, where it is apparent that the MSEs decrease as increases in all considered settings. Moreover, always has smaller MSE than , and the gap between the MSE tends to zero as increases. This is consistent with our theory which says that increasing will diminish the quantization error so that the quantization estimator becomes more accurate.

Figure 1: MSE. for left 2 panels and for right 2 panels. for top 2 panels; for bottom 2 panels.

5.2 Hypothesis Testing

Let us now consider hypothesis testing (3.5) vs. (3.6). We generated data from model , with , . The sample size is chosen to be for and for . In particular, was used for examining the size of the test while other values of for power. We examined again two types of errors: (1) ; (2) . The target significance level was chosen as . The tuning parameter was set as with being picked by GCV. This choice is to be accommodate the observation that the optimal for estimation is of the order (see Remark 2) while the optimal for hypothesis testing is of the order (see Remark 3). As is about the optimal choice for estimation Wahba (1990), it is sensible to scale it down by a factor of .

Figure 2 reports the size of the test under various settings. Specially, the size of both quantization and nonquantization tests approach the correct level as increases, for all cases that , while for , the size is different from due to severe loss of information during quantization. This is consistent with the asymptotic distribution of the proposed test established in Theorem 4. Figure 3 and 4 summarize the power of the proposed test under various alternative hypotheses. In all case scenarios, we observe that the powers of both quantization and nonquantization tests approach one when or increases, which supports our theoretical findings in Theorem 5. When increases, additional data information makes it easier to detect the differences between and , hence the larger power. When is small, significant loss of information due to quantization results in lower power and such losses of powers quickly diminishes as increases, indicating the proposed quantization scheme can indeed maintain optimal statistical efficiency although much smaller storage/transmitting capacity are required.

5.3 Additional simulations

Additional simulation results for testing the linearity of the underlying function are provided in a separate online supplement, following the approach described in Remark 1.

Figure 2: Size under . Left panel and right panel .
Figure 3: Power under . for left 3 panels; for right 3 panels.
Figure 4: Power under . for left 3 panels; for right 3 panels.

6 Empirical Study

In this section, we examined our method by Oregon Climate-Station Data with sample size . The aim is to explore the relationship between elevation () and average annual (centered) temperature (). Consider a nonparametric model error. Figure 5 displays the estimated curve based on full data (non-quan) versus the estimated curves based on b-bits quantizations (). Periodic spline with order was used. It can be observed that the quantization estimations based on , i.e., the red and blue curves, are different from the black curve based on full data. When , such difference quickly diminishes; the purple curve based on almost coincides with the black one. This shows the effectiveness of -bits quantization when is suitably large.

Next, we conduct some hypothesis tests for the relationship between the elevation and the temperature. The first test is to test whether there is any association between them, i.e., . The p-values for based on full data and b-bits quantizations with are all close to zero, implying strong rejection. This is obvious from Figure 5. Next, observe from Figure 5 that except the case , all the estimated curves display strong linear patterns. Therefore, we also test following the approach described in Remark 1. The p-values for are for and for comparing to based on full data, which coincide with the findings based on Figure 5.


Figure 5: Elevation vs. Temperature based on full data (non-quan) and b-bits quantizations with . Sample size is 2000.

7 Conclusion and Extensions

In this paper, we propose a non-parametric testing procedure based on quantized observations. Our test is simple and easy-to-use based on -metric between the quantization estimator and the hypothesized function. Using Stein’s exchangeable pair method, we show that the proposed test is asymptotically Gaussian under null hypothesis, which leads to an asymptotically valid testing rule. We also examine the power of the test under local alternatives and derive minimax optimality. Concrete quantizer for achieving minimaxity is also constructed.

In the end, we discuss two extensions of the current work. First, the present paper only deals with periodic splines. It is interesting to extend our results to more general splines or even kernel ridge regression. The special periodic spline largely reduces the difficulty level of the technical proofs. Indeed, the majority of the proofs can be accomplished by exact calculations based on trigonometric series. For general RKHS, exact calculations are impossible, and so more involved proofs are needed. Second, the current results require a prefixed regularity

. When is unknown, a new adaptive testing procedure that is free of the knowledge on will be highly desirable. This may be done by constructing a sequence of quantization tests index by a range of values. The adaptive test can be simply taken as the maximum value of these tests. Motivated by Liu et al. (2018), such adaptive test may asymptotically approach an extreme value distribution. A lower bound on that attains minimax rate of adaptive testing will be a useful result.

References

  • Adams and Fournier (2003) Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces, volume 140. Elsevier.
  • Boufounos and Baraniuk (2008) Boufounos, P. T. and Baraniuk, R. G. (2008). 1-bit compressive sensing. Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on, pages 16–21.
  • Chen and Varshney (2010) Chen, H. and Varshney, P. K. (2010). Nonparametric one-bit quantizers for distributed estimation. IEEE Transactions on Signal Processing, 58(7):3777–3787.
  • Cheng and Shang (2015) Cheng, G. and Shang, Z. (2015). Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics, 43(3):1351–1390.
  • Fan et al. (2001) Fan, J., Zhang, C., and Zhang, J. (2001). Generalized likelihood ratio statistics and wilks phenomenon. The Annals of statistics, 29(1):153–193.
  • Goldstein and Rinott (1996) Goldstein, L. and Rinott, Y. (1996). Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17.
  • Gopi et al. (2013) Gopi, S., Netrapalli, P., Jain, P., and Nori, A. (2013). One-bit compressed sensing: Provable support and vector recovery. In International Conference on Machine Learning, pages 154–162.
  • Gu (2013) Gu, C. (2013). Smoothing spline ANOVA models, volume 297. Springer Science & Business Media.
  • Gupta et al. (2010) Gupta, A., Nowak, R., and Recht, B. (2010). Sample complexity for 1-bit compressed sensing and sparse classification. In 2010 IEEE International Symposium on Information Theory, pages 1553–1557. IEEE.
  • Ingster (1993) Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives. i, ii, iii. Math. Methods Statist, 2(2):85–114.
  • Liu et al. (2018) Liu, M., Shang, Z., and Cheng, G. (2018). Nonparametric testing under random projection. arXiv preprint arXiv:1802.06308.
  • Meinicke and Ritter (2002) Meinicke, P. and Ritter, H. (2002). Quantizing density estimators. In Advances in Neural Information Processing Systems, pages 825–832.
  • Plan and Vershynin (2013) Plan, Y. and Vershynin, R. (2013).

    Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach.

    IEEE Transactions on Information Theory, 59(1):482–494.
  • Reinert and Röllin (2009) Reinert, G. and Röllin, A. (2009). Multivariate normal approximation with stein’s method of exchangeable pairs under a general linearity condition. The Annals of Probability, 37(6):2150–2173.
  • Shang and Cheng (2013) Shang, Z. and Cheng, G. (2013). Local and global asymptotic inference in smoothing spline models. The Annals of Statistics, 41(5):2608–2638.
  • Shang and Cheng (2015) Shang, Z. and Cheng, G. (2015). Nonparametric inference in generalized functional linear models. The Annals of Statistics, 43(4):1742–1773.
  • Shang and Cheng (2017) Shang, Z. and Cheng, G. (2017). Computational limits of a distributed algorithm for smoothing spline. The Journal of Machine Learning Research, 18(1):3809–3845.
  • Slawski and Li (2015) Slawski, M. and Li, P. (2015). b-bit marginal regression. In Advances in Neural Information Processing Systems, pages 2062–2070.
  • Slawski and Li (pear) Slawski, M. and Li, P. (to appear). Linear signal recovery from -bit-quantized linear measurements: precise analysis of the trade-off between bit depth and number of measurements. IEEE Transactions on Information Theory.
  • Wahba (1990) Wahba, G. (1990). Spline models for observational data, volume 59. Siam.
  • Zhang et al. (2014) Zhang, L., Yi, J., and Jin, R. (2014). Efficient algorithms for robust one-bit compressive sensing. pages 820–828.
  • Zhu and Gu (2015) Zhu, R. and Gu, Q. (2015). Towards a lower sample complexity for robust one-bit compressed sensing. In International Conference on Machine Learning, pages 739–747.
  • Zhu and Lafferty (2014) Zhu, Y. and Lafferty, J. (2014). Quantized estimation of gaussian sequence models in euclidean balls. In Advances in Neural Information Processing Systems, pages 3662–3670.
  • Zhu and Lafferty (2018) Zhu, Y. and Lafferty, J. (2018). Quantized nonparametric estimation over sobolev ellipsoids. Information and Inference: A Journal of the IMA, 7:31–82.

s.1 Additional Simulations

In this section, we provide some additional simulation results for testing linearity proposed in our Remark 1. We generated the model for with two types of error and , where

is the density function of beta distribution with parameters

. In particular, the case when was to examine the size of the test and other cases are for power. The sample size was set to be for and for . The turning parameter was selected with being picked by GCV. The significant level of the test is chosen to be .

Figure S.1 reports size of the test under various settings. The size of non-quantized testing and -bit testing with larger , i.e., when and when , are approaching as the sample size increasing, which confirms the validity of our theorem. For small , the size is far way from 0.1, which may due to the inaccurate estimation of the linear function based on quantized data.

Figure S.1: Size under is linear. Left panel and right panel .

Figure S.2 and Figure S.3 summarize the power of testing under different alternative hypotheses. In all cases, the power of the test will approach to one when either or increases, which supports our theoretical results. Moreover, with small , e.g., , the power is smaller comparing to other scenarios with larger , probably is due to too much information loss in the quantization step. For , quantized testing and testing based on full data have almost the same power, which suggests that our statistic has satisfactory finite sample performance.

Figure S.2: Power under is not linear with . Left panel , middle panel and right panel
Figure S.3: Power under is not linear with . Left panel , middle panel and right panel

s.2 Technical Proofs

Proof of Theorem 1.

By direct calculations, we have

where . So

(S.1)

We now look at and . For , let

Since and for , and are both symmetric circulant of order . Let . and

share the same normalized eigenvectors as