1 Introduction
In directional statistics, the sample space is the unit sphere in . By far the most classical distributions on are the Fisher–von Mises–Langevin (FvML) ones; see, e.g., Mardia and Jupp (2000) or Ley and Verdebout (2017)
. We say that the random vector
with values in has an distribution, with and , if it admits the density (throughout, densities on the unit sphere are with respect to the surface area measure)(1.1) |
where, denoting as the Euler Gamma function and as the order- modified Bessel function of the first kind, is the surface area of and
Clearly, is a location parameter ( is the modal location on the sphere), that identifies the spike direction of the hyperspherical signal. In contrast, is a scale or concentration parameter: the larger , the more concentrated the distribution is about the modal location . As converges to zero, converges to and the density in (1.1) converges to the density
of the uniform distribution over
. The other extreme case, obtained for arbitrarily large values of , provides distributions that converge to a point mass in . Of course, it is expected that the larger , the easier it is to conduct inference on — that is, the more powerful the tests on and the smaller the corresponding confidence zones.In this paper, we consider inference on and focus on the generic testing problem for which the null hypothesis , for a fixed , is to be tested against on the basis of a random sample from the distribution — the triangular array notation anticipates non-standard setups where (hence, also ) and/or will depend on . Inference problems on in the low-dimensional case have been considered among others in Watson (1983), Chang and Rivest (2001), Larsen, Blæsild and Sørensen (2002), Ley et al. (2013) and Paindaveine and Verdebout (2017). The related spherical regression problem has been tackled in Rivest (1989) and Downs (2003), while testing for location on axial frames has been considered in Arnold and Jupp (2013).
Letting , the most classical test for the testing problem above is the Watson (1983) test rejecting the null at asymptotic level whenever
(1.2) |
where stands for the
-dimensional identity matrix and
denotes the-upper quantile of the chi-square distribution with
degrees of freedom. In the classical setup where and are fixed, the asymptotic properties of the Watson test are well-known, both under the null and under local alternatives; see, e.g., Watson (1983) or Mardia and Jupp (2000). Optimality properties in the Le Cam sense have been studied in Paindaveine and Verdebout (2015). In the non-standard setup where converges to zero, Paindaveine and Verdebout (2017) investigated the asymptotic null and non-null behaviors of the Watson test. Interestingly, irrespective of the rate at which converges to zero (that is, irrespective of how fast the inference problem becomes more challenging as a function of ), the Watson test keeps meeting the asymptotic nominal level constraint and maintains strong optimality properties; see Paindaveine and Verdebout (2017) for details.For a fixed dimension , this essentially settles the investigation of the properties of the Watson test and the study of the corresponding hypothesis testing problem. Nowadays, however, increasingly many applications lead to considering high-dimensional directional data: tests of uniformity on high-dimensional spheres have been studied in Chikuse (1991), Cuesta-Albertos, Cuevas and Fraiman (2009), Cai, Fan and Jiang (2013) and Cutting, Paindaveine and Verdebout (2017aa), while high-dimensional FvML distributions (or mixtures of high-dimensional FvML distributions) have been considered in magnetic resonance, gene-expression, and text mining; see, among others, Dryden (2005), Banerjee et al. (2003), and Banerjee et al. (2005). This motivates considering the high-dimensional spherical location problem, based on a random sample from the distribution, with () diverging to infinity (the dimension of then depends on , which justifies the notation). In this context, Ley, Paindaveine and Verdebout (2015) proved that the Watson test is robust to high-dimensionality in the sense that, as goes to infinity with , this test still has asymptotic size under . This does not require any condition on the concentration sequence nor on the rate at which goes to infinity, hence covers arbitrarily easy problems ( large) and arbitrarily challenging ones ( small), as well as moderately high dimensions and ultra-high dimensions. On its own, however, this null robustness result is obviously far from sufficient to motivate using the Watson test in high dimensions, as it might very well be that robustness under the null is obtained at the expense of power (in the extreme case, the Watson test, in high dimensions, might actually asymptotically behave like the trivial
-level test that randomly rejects the null with probability
).These considerations raise many interesting questions, among which: are there alternatives under which the Watson test is consistent in high dimensions? What are the less severe alternatives (if any) under which the Watson test exhibits non-trivial asymptotic powers? Is the Watson test rate-optimal or, on the contrary, are there tests that show asymptotic powers under less severe alternatives than those detected by the Watson test? Does the Watson test enjoy optimality properties in high dimensions? As we will show, answering these questions will require considering several regimes fixing how the concentration behaves as a function of the dimension and sample size . Our results, that will crucially depend on the regime considered, are extensive in the sense that they answer the questions above in all possible regimes.
We achieve this by combining two approaches that are somewhat orthogonal in spirit. (a) The first approach is based on Le Cam’s theory of asymptotic experiments. While this theory is very general, it does not directly apply in the present context since the high-dimensional spherical location problem involves a parametric space, namely , that depends on (through ). We solve this by exploiting the invariance properties of the testing problem considered. In the image of the model by the corresponding maximal invariant, indeed, the parametric space does not depend on anymore, which opens the door to studying the problem through the Le Cam approach. We derive stochastic second-order expansions of the resulting log-likelihood ratios, which is the main technical ingredient to establish the local asymptotic normality (LAN) of the invariant model. The LAN property takes different forms and involves different contiguity rates depending on the regime that is considered. In each regime, we determine the Le Cam optimal test for the problem considered and apply the Le Cam third lemma to obtain the asymptotic powers of this test and of the Watson test. This allows us to determine the regime(s) in which the Watson test is Le Cam optimal, or only rate-optimal, or not even rate-optimal. While this is first done under specified concentration , we also provide LAN results with respect to both location and concentration to be able to discuss optimality under unspecified , too.
(b) In regimes where the Watson test is not rate-optimal, that is, in regimes where it is blind to contiguous alternatives, this first, Le Cam, approach leaves open the question of the existence of alternatives that can be detected by the Watson test. This motivates complementing our investigation by a second approach, where we resort to martingale CLTs to study the asymptotic non-null properties of the Watson test. We identify the alternatives (if any) under which the Watson test will show non-trivial asymptotic powers in high dimensions, which again requires considering various regimes according to the concentration pattern. We do so in a broad, semiparametric, model, namely in the class of rotationally symmetric distributions. Therefore, the corresponding results not only allow us to answer the questions left open by the Le Cam approach but they also extend beyond the parametric FvML framework many of the results obtained there through the Le Cam approach.
The outline of the paper is as follows. In Section 2, we consider the high-dimensional version of the FvML spherical location problem. In Section 2.1, we describe the invariance approach that allows us to later rely on Le Cam’s theory of asymptotic experiments. In Section 2.2, we provide a stochastic second-order expansion of the resulting invariant log-likelihood ratios and prove, in various regimes that we identify, that these invariant models are locally asymptotically normal. This allows us to derive the corresponding Le Cam optimal tests for the specified concentration problem and to study the non-null asymptotic behavior of the Watson test in the light of these results. In Section 2.3, we tackle the unspecified concentration problem through the derivation of LAN results that are with respect to both location and concentration. In Section 3, we conduct a systematic investigation of the non-null asymptotic properties of the Watson test in the broader context of rotationally symmetric distributions. Jointly with the results of Section 2, this allows us to fully characterize, in the FvML case, the non-null asymptotic behavior of the Watson test. In Section 4, we conduct a Monte Carlo study to investigate how well the finite-sample behaviors of the various tests reflect our theoretical asymptotic results. In Section 5, we summarize the results derived in the paper and comment on the few remaining open questions. The appendix contains all proofs.
2 Invariance and Le Cam optimality
As already mentioned in the introduction, the high-dimensional spherical location problem requires considering triangular arrays of observations of the form , , For any sequence such that belongs to for any and any sequence in , we denote as the hypothesis under which , , form a random sample from the distribution. The resulting sequence of statistical models is then associated with
(2.1) |
(the index in the parameter in principle is superfluous but is used here to stress the dependence of this parameter on , hence on ). The spherical location problem consists in testing the null hypothesis against the alternative , where is a fixed sequence such that belongs to for any . Clearly, is the parameter of interest, whereas plays the role of a nuisance. The main objective of this section is to derive Le Cam optimality results for this problem, referring to local alternatives of the form , with , where the sequence and the bounded sequence , respectively in and in , are such that for any , which imposes that
(2.2) |
for any . Since the sequence of “statistical experiments” associated with (2.1) involves parametric spaces that depend on , applying Le Cam’s theory will require the following reduction of the problem through invariance arguments.
2.1 Reduction through invariance
Denoting as the collection of orthogonal matrices satisfying , the null hypothesis is invariant under the group collecting the transformations
with . The transformation induces a transformation of the parametric space defined through . The orbits of the resulting induced group are , with and . In such a context, the invariance principle (see, e.g., Lehmann and Romano, 2005, Chapter 6) leads to restricting to tests that are invariant with respect to the group . Denoting as a maximal invariant statistic for , the class of invariant tests coincides with the class of -measurable tests. Invariant tests thus are to be defined in the image
(2.3) |
of the model by , where denotes the common distribution of under any with . Unlike the original sequence of statistical experiments in (2.1), the invariant one in (2.3) involves a fixed parametric space , which makes it in principle possible to rely on Le Cam’s asymptotic theory.
Now, the original local log-likelihood ratios associated with the generic local alternatives above correspond, in view of (2.2), to the invariant local log-likelihood ratios
(2.4) |
Deriving local asymptotic normality (LAN) results requires investigating the asymptotic behavior of such invariant log-likelihood ratios, which in turn requires evaluating the corresponding likelihoods. While obtaining a closed-form expression for and its distribution is a very challenging task, these likelihoods can be obtained from Lemma 2.5.1 in Giri (1996), which, denoting as the surface area measure on ( times), yields
(2.5) | |||||
where integration is with respect to the Haar measure on . Note that (2.5) shows that the invariant null probability measure coincides with the original null probability measure . In other words, it is only for non-null probability measures that the invariance reduction above is non-trivial.
2.2 Optimal testing under specified
The main ingredient needed to obtain LAN results is Theorem 2.1 below, that provides a stochastic second-order expansion of the invariant log-likelihood ratios in (2.4). To state this theorem, we need to introduce the following notation. We will refer to the decomposition , with
as the tangent-normal decomposition of with respect to . Under the hypothesis ,
has probability density function
(2.6) |
where denotes the indicator function of , is uniformly distributed over the “equator” , and and are mutually independent. Throughout, we will denote as and ,
the non-centered and centered moments of
under , and as the corresponding non-centered moments of . Although this is not stressed in the notation, these moments clearly depend on and ; for instance,(2.7) |
(this readily follows from (2)-(3) in Schou, 1978 by using the standard properties of exponential families; see also Lemma S.2.1 in Cutting, Paindaveine and Verdebout, 2017b). We can now state the stochastic second-order expansion result of the invariant log-likelihood ratios in (2.4).
Theorem 2.1.
Let be a sequence of integers that diverges to infinity and be an arbitrary sequence in . Let , and be sequences such that and belong to for any , with bounded and such that
(2.8) |
Then, letting
we have that
as under .
Recalling that the log-likelihood ratio refers to the local perturbation of the null reference value , the result in Theorem 2.1 essentially shows that the invariant model considered enjoys a local asymptotic quadraticity (LAQ) structure in the vicinity of the null hypothesis ; see, e.g., Le Cam and Yang (2000), page 120. Actually, quadraticity, which is supposed to be in the increment , only holds for arbitrarily small values of this increment, hence only in regimes where will converge to zero (in regimes below where, in contrast, will be constant, the non-flat manifold structure of the hypersphere actually prevents a standard quadraticity property). This LAQ result hints that optimal testing for the specified- problem at hand is obtained by rejecting the null for small values of (that is, when and project far from each other onto the axis ), for large values of (that is, when and project far from each other onto the orthogonal complement to in
), or, more generally, for large values of a hybrid test statistic of the form
with non-negative weights and . While any provides a reasonable test statistic for the problem at hand, only one set of weights will yield a Le Cam optimal test and, interestingly, this set of weights depends on the way behaves with and . This will be one of the many consequences of the following LAN result.
Theorem 2.2.
Let be a sequence of integers that diverges to infinity, be a sequence in , and be a sequence such that belongs to for any .
Then, there exist a sequence in and a sequence of random variables that is asymptotically normal with zero mean and variance
as under . If (i) , then
if (ii) , then, letting ,
if (iii) with , then
if (iv) , then
if (v) with , then
if (vi) , then
finally, if (vii) , then, even with , the invariant log-likelihood ratio is as under .
In the image model (2.3), the spherical location problem consists in testing against . In the localized at experiments, parametrized by as in (2.4), this reduces to testing against . In any given regime (i)-(vii) from Theorem 2.2, it directly follows from this theorem that a locally asymptotically most powerful test for this problem — hence, locally asymptotically most powerful invariant test for the original spherical location problem — rejects the null at asymptotic level whenever
(2.9) |
where
denotes the cumulative distribution function of the standard normal (in the rest of the paper, the term “
optimal” will refer to this particular Le Cam optimality concept). A routine application of the Le Cam third lemma then shows that, in each regime, the asymptotic distribution of , under the corresponding contiguous alternatives with , is normal with mean and variance , so that the resulting asymptotic power of the optimal test in (2.9) is(2.10) | |||||
In each regime (i)-(vii), is the contiguity rate, which implies that the least severe alternatives under which a test may have non-trivial asymptotic powers are of the form , with a sequence that is but not . Theorem 2.2 shows that this contiguity rate depends on the regime considered and does so in a monotonic fashion, which is intuitively reasonable: the larger (that is, the easier the inference problem), the faster goes to zero, that is, the less severe the alternatives that can be detected by rate-consistent tests. Because the unit sphere has a fixed diameter, characterizes the most severe alternatives that can be considered. In regime (vi), no tests will therefore be consistent under such most severe alternatives, while, in regime (vii), the distribution is so close to the uniform distribution on that no tests can show non-trivial asymptotic powers under such alternatives, so that even the trivial -test is optimal.
One of the most striking consequences of Theorem 2.2 is that the optimal test depends on the regime considered. In regimes (v)-(vii), the optimal test in (2.9) rejects the null when ; of course, this optimality is degenerate in regime (vii), where any invariant test with asymptotic level would also be optimal. In contrast, the optimal -level test in regimes (i)-(iii) rejects the null when
Since the chi-square distribution with degrees of freedom converges, after standardization via its mean
, to the standard normal distribution as
diverges to infinity, this test is asymptotically equivalent to the Watson test in (1.2), based obviously on the dimension at hand. This shows that, in regimes (i)-(iii), the traditional, low-dimensional, Watson test is optimal in high dimensions. In regime (iv), which is at the frontier between these regimes where the optimal test is the Watson test and those where the optimal test is based on , the optimal test is quite naturally based on a linear combination of and .Finally, the Le Cam third lemma allows to derive the asymptotic non-null behavior of the Watson test under the contiguous alternatives considered in any regime (i)-(vii). In regimes (i)-(iv), the limiting powers under contiguous alternatives of the form , with , are given by
(2.11) |
In regimes (i)-(iii), the Watson test is the optimal test and these asymptotic powers are equal to those in (2.10), whereas in regime (iv), the Watson test is only rate-consistent, as the corresponding asymptotic powers of the optimal test are
(2.12) |
In regimes (v)-(vi), the Le Cam third lemma shows that the limiting powers of the Watson test, still under the corresponding contiguous alternatives, are equal to the nominal level , so that the Watson test is not even rate-consistent in those regimes. Finally, as already discussed, the Watson test is optimal in regime (vii), but trivially so since the trivial -test there also is.
2.3 Optimal testing under unspecified
The optimal test in regimes (i)-(iii), namely the Watson test, is a genuine test in the sense that it can be applied on the basis of the observations only. In contrast, the optimal tests in regimes (iv)-(vi) are “oracle” tests since they require knowing the values of and , or equivalently (see (2.7)), the value of the concentration . This concentration, however, can hardly be assumed to be specified in practice, so that it is natural to wonder what is the optimal test, in regimes (iv)-(vi), when is treated as a nuisance parameter.
We first focus on regime (iv). There, the concentration is asymptotically of the form for some . Within regime (iv), , obviously, is a perfectly valid alternative concentration parameter. Inspired by the classical treatment of asymptotically optimal inference in the presence of nuisance parameters (see, e.g., Bickel et al., 1998), this suggests studying the asymptotic behavior of invariant log-likelihood ratios of the form
where is a suitable sequence of perturbed concentrations. We have the following result.
Theorem 2.3.
Let be a sequence of integers that diverges to infinity with as . Let , with , and , where is such that for any . Let and be sequences such that and , with the below, belong to for any . Then, putting ,
we have
(2.13) |
as under , where , under the same sequence of hypotheses, is asymptotically normal with mean zero and covariance matrix .
Theorem 2.3 shows that, in regime (iv), the sequence of high-dimensional FvML experiments is jointly LAN in the location and concentration parameters. The corresponding Fisher information matrix is not diagonal, which entails that the unspecification of the concentration parameter has asymptotically a positive cost when performing inference on the location parameter. In the present joint LAN framework, Le Cam optimal inference for location under unspecified concentration is to be based (see again Bickel et al., 1998) on the residual of the regression (in the limiting Gaussian shift experiment) of the location part of the central sequence with respect to the concentration part , that is, is to be based on the efficient central sequence
Under the null, is asymptotically normal with mean zero and variance , and the Le Cam optimal location test under unspecified rejects the null at asymptotical level when
As a corollary, provided that , the unspecified- optimal test in regime (iv) is the Watson test. Consequently, the difference between the local asymptotic powers in (2.11) and (2.12), associated with the Watson test and the specified- optimal test in regime (iv), respectively, can be interpreted as the asymptotic cost of the unspecification of the concentration when performing inference on location in the regime considered.
We now turn to regime (v), which is associated with , where and is a positive sequence satisfying and . Taking (as in Theorem 2.2) and considering perturbed concentrations of the form , one can show, by working along the same lines as in the proof of Theorem 2.3, that the sequence of experiments is also jointly LAN in location and concentration, provided that . The corresponding central sequence and Fisher information matrix are
(2.14) |
The co-linearity between the location part and concentration part of the central sequence implies that the efficient central sequence is zero in regime (v). As a result, for the unspecified concentration problem, no test can detect deviations from the null hypothesis at the -rate in regime (v), which is in line with the corresponding trivial asymptotic powers of the Watson in Section 2.2. In the next section, we will investigate whether or not the Watson test can detect more severe alternatives in this regime.
Finally, consider regime (vi), where the concentration is asymptotically of the form . Then, taking (still as in Theorem 2.2) and perturbed concentrations of the form , we obtain that, without any condition on , the sequence of experiments is jointly LAN in location and concentration with the same central sequence and Fisher information matrix as in (2.14). The resulting efficient central sequence is therefore zero again. Since, in regime (vi), provides the most severe location alternatives than can be considered, we conclude that, for the unspecified concentration problem, no test in regime (vi) can do asymptotically better than the trivial -level test that randomly rejects the null with probability . Under unspecified , thus, the Watson test is optimal in regime (vi), too, even if it is in a degenerate way.
Wrapping up, we proved that the Watson test is optimal in regimes (i)-(iii) for the specified concentration problem and that it is optimal in all regimes but possibly regime (v) in the more important unspecified concentration one (in regime (iv), the latter optimality actually requires that ). The asymptotic cost due to the unspecification of the concentration is nil in regimes (i)-(iii) (and (vii)), affects limiting powers but not consistency rates in regime (iv), and affects consistency rates in regimes (v)-(vi).
3 Non-null investigation via martingale CLTs
The results of the previous section bring much information on the non-null behavior of the Watson test in high dimensions, but two important questions remain open. First, while the Watson test was shown to be Le Cam optimal for the unspecified concentration problem in regimes (i)-(iv) and (vi)-(vii), little is known on its performances in regime (v). More precisely, it is only known that, like any location test addressing the unspecified- problem, the Watson test is blind to the contiguous alternatives in Theorem 2.2(v), so that it is unclear whether or not this test can detect more severe alternatives. Second, the results of the previous section are limited to the FvML case, while the Watson test is valid (in the sense that it meets the asymptotic nominal level constraint) under much broader distributional assumptions. One may therefore wonder about the non-null behavior of the Watson test also under high-dimensional non-FvML distributions.
In this section, we address these two questions by investigating, through martingale CLTs, the non-null behavior of the Watson test under general rotationally symmetric alternatives. Recall that the distribution of a random vector with values in is rotationally symmetric about if and share the same distribution for any , and that it is rotationally symmetric if it is rotationally symmetric about some in . Clearly, if has an distribution, then it is rotationally symmetric about , so that the distributional context considered in this section will encompass the one in Section 2. Parallel to what was done there, we will refer to the decomposition with , and , as the tangent-normal decomposition of with respect to . If is rotationally symmetric about , then is uniformly distributed over and is independent of . The distribution of is then fully determined by and by the cumulative distribution function of , which justifies denoting the corresponding distribution as . In the sequel, we tacitly restrict to classes of rotationally symmetric distributions making identifiable, which typically only excludes distributions satisfying .
We consider then a triangular array of observations of the form , , , where form a random sample from the rotationally symmetric distribution . The corresponding hypothesis, that will be denoted as involves a sequence of integers diverging to infinity, a sequence such that for any , and a sequence of cumulative distribution functions over . In this framework, the spherical location problem consists in testing against , where is a fixed null parameter sequence. Parallel to the notation that was used in the FvML case, we will write and , for the non-centered and centered moments of , respectively. These are the moments, under , of the quantity in the tangent-normal decomposition of with respect to . The corresponding non-centered moments of will still be denoted as .
Using the notation and from the tangent-normal decomposition of with respect to the null location , the Watson test statistic rewrites
where denotes the Watson test statistic in (1.2) based on the null location . Under the null and under appropriate local alternatives, it is expected that is asymptotically equivalent in probability to
so that an important step in the investigation of the non-null properties of is the study of the non-null behavior of
. A classical martingale central limit theorem (see, e.g., Theorem 35.12 in
Billingsley, 1995) provides the following result.Theorem 3.1.
Let be a sequence of integers that diverges to infinity and be a sequence such that belongs to for any . Let be a sequence of cumulative distribution functions on such that (a) for any , (b) and (c) . Then, we have the following, where in each case refers to an arbitrary sequence such that