Detecting the direction of a signal on high-dimensional spheres: Non-null and Le Cam optimality results

We consider one of the most important problems in directional statistics, namely the problem of testing the null hypothesis that the spike direction θ of a Fisher-von Mises-Langevin distribution on the p-dimensional unit hypersphere is equal to a given direction θ_0. After a reduction through invariance arguments, we derive local asymptotic normality (LAN) results in a general high-dimensional framework where the dimension p_n goes to infinity at an arbitrary rate with the sample size n, and where the concentration κ_n behaves in a completely free way with n, which offers a spectrum of problems ranging from arbitrarily easy to arbitrarily challenging ones. We identify seven asymptotic regimes, depending on the convergence/divergence properties of (κ_n), that yield different contiguity rates and different limiting experiments. In each regime, we derive Le Cam optimal tests under specified κ_n and we compute, from the Le Cam third lemma, asymptotic powers of the classical Watson test under contiguous alternatives. We further establish LAN results with respect to both spike direction and concentration, which allows us to discuss optimality also under unspecified κ_n. To obtain a full understanding of the non-null behavior of the Watson test, we use martingale CLTs to derive its local asymptotic powers in the broader, semiparametric, model of rotationally symmetric distributions. A Monte Carlo study shows that the finite-sample behaviors of the various tests remarkably agree with our asymptotic results.



page 1

page 2

page 3

page 4


On the power of axial tests of uniformity on spheres

Testing uniformity on the p-dimensional unit sphere is arguably the most...

Inference for spherical location under high concentration

Motivated by the fact that many circular or spherical data are highly co...

Sign tests for weak principal directions

We consider inference on the first principal direction of a p-variate el...

On the power of Sobolev tests for isotropy under local rotationally symmetric alternatives

We consider one of the most classical problems in multivariate statistic...

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...

Finite Sample Smeariness on Spheres

Finite Sample Smeariness (FSS) has been recently discovered. It means th...

Joint limiting laws for high-dimensional independence tests

Testing independence is of significant interest in many important areas ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In directional statistics, the sample space is the unit sphere  in . By far the most classical distributions on  are the Fisher–von Mises–Langevin (FvML) ones; see, e.g., Mardia and Jupp (2000) or Ley and Verdebout (2017)

. We say that the random vector 

with values in  has an distribution, with  and , if it admits the density (throughout, densities on the unit sphere are with respect to the surface area measure)


where, denoting as  the Euler Gamma function and as  the order- modified Bessel function of the first kind, is the surface area of  and

Clearly, is a location parameter ( is the modal location on the sphere), that identifies the spike direction of the hyperspherical signal. In contrast,  is a scale or concentration parameter: the larger , the more concentrated the distribution is about the modal location . As  converges to zero, converges to  and the density in (1.1) converges to the density 

of the uniform distribution over 

. The other extreme case, obtained for arbitrarily large values of , provides distributions that converge to a point mass in . Of course, it is expected that the larger , the easier it is to conduct inference on  — that is, the more powerful the tests on  and the smaller the corresponding confidence zones.

In this paper, we consider inference on  and focus on the generic testing problem for which the null hypothesis , for a fixed , is to be tested against  on the basis of a random sample  from the distribution — the triangular array notation anticipates non-standard setups where  (hence, also ) and/or  will depend on . Inference problems on in the low-dimensional case have been considered among others in Watson (1983), Chang and Rivest (2001), Larsen, Blæsild and Sørensen (2002), Ley et al. (2013) and Paindaveine and Verdebout (2017). The related spherical regression problem has been tackled in Rivest (1989) and Downs (2003), while testing for location on axial frames has been considered in Arnold and Jupp (2013).

Letting , the most classical test for the testing problem above is the Watson (1983) test rejecting the null at asymptotic level  whenever


where  stands for the

-dimensional identity matrix and 

denotes the

-upper quantile of the chi-square distribution with 

degrees of freedom. In the classical setup where  and  are fixed, the asymptotic properties of the Watson test are well-known, both under the null and under local alternatives; see, e.g., Watson (1983) or Mardia and Jupp (2000). Optimality properties in the Le Cam sense have been studied in Paindaveine and Verdebout (2015). In the non-standard setup where  converges to zero, Paindaveine and Verdebout (2017) investigated the asymptotic null and non-null behaviors of the Watson test. Interestingly, irrespective of the rate at which  converges to zero (that is, irrespective of how fast the inference problem becomes more challenging as a function of ), the Watson test keeps meeting the asymptotic nominal level constraint and maintains strong optimality properties; see Paindaveine and Verdebout (2017) for details.

For a fixed dimension , this essentially settles the investigation of the properties of the Watson test and the study of the corresponding hypothesis testing problem. Nowadays, however, increasingly many applications lead to considering high-dimensional directional data: tests of uniformity on high-dimensional spheres have been studied in Chikuse (1991), Cuesta-Albertos, Cuevas and Fraiman (2009), Cai, Fan and Jiang (2013) and Cutting, Paindaveine and Verdebout (2017aa), while high-dimensional FvML distributions (or mixtures of high-dimensional FvML distributions) have been considered in magnetic resonance, gene-expression, and text mining; see, among others, Dryden (2005), Banerjee et al. (2003), and Banerjee et al. (2005). This motivates considering the high-dimensional spherical location problem, based on a random sample  from the distribution, with () diverging to infinity (the dimension of  then depends on , which justifies the notation). In this context, Ley, Paindaveine and Verdebout (2015) proved that the Watson test is robust to high-dimensionality in the sense that, as  goes to infinity with , this test still has asymptotic size  under . This does not require any condition on the concentration sequence  nor on the rate at which  goes to infinity, hence covers arbitrarily easy problems ( large) and arbitrarily challenging ones ( small), as well as moderately high dimensions and ultra-high dimensions. On its own, however, this null robustness result is obviously far from sufficient to motivate using the Watson test in high dimensions, as it might very well be that robustness under the null is obtained at the expense of power (in the extreme case, the Watson test, in high dimensions, might actually asymptotically behave like the trivial

-level test that randomly rejects the null with probability 


These considerations raise many interesting questions, among which: are there alternatives under which the Watson test is consistent in high dimensions? What are the less severe alternatives (if any) under which the Watson test exhibits non-trivial asymptotic powers? Is the Watson test rate-optimal or, on the contrary, are there tests that show asymptotic powers under less severe alternatives than those detected by the Watson test? Does the Watson test enjoy optimality properties in high dimensions? As we will show, answering these questions will require considering several regimes fixing how the concentration  behaves as a function of the dimension  and sample size . Our results, that will crucially depend on the regime considered, are extensive in the sense that they answer the questions above in all possible regimes.

We achieve this by combining two approaches that are somewhat orthogonal in spirit. (a) The first approach is based on Le Cam’s theory of asymptotic experiments. While this theory is very general, it does not directly apply in the present context since the high-dimensional spherical location problem involves a parametric space, namely , that depends on  (through ). We solve this by exploiting the invariance properties of the testing problem considered. In the image of the model by the corresponding maximal invariant, indeed, the parametric space does not depend on  anymore, which opens the door to studying the problem through the Le Cam approach. We derive stochastic second-order expansions of the resulting log-likelihood ratios, which is the main technical ingredient to establish the local asymptotic normality (LAN) of the invariant model. The LAN property takes different forms and involves different contiguity rates depending on the regime that is considered. In each regime, we determine the Le Cam optimal test for the problem considered and apply the Le Cam third lemma to obtain the asymptotic powers of this test and of the Watson test. This allows us to determine the regime(s) in which the Watson test is Le Cam optimal, or only rate-optimal, or not even rate-optimal. While this is first done under specified concentration , we also provide LAN results with respect to both location and concentration to be able to discuss optimality under unspecified , too.

(b) In regimes where the Watson test is not rate-optimal, that is, in regimes where it is blind to contiguous alternatives, this first, Le Cam, approach leaves open the question of the existence of alternatives that can be detected by the Watson test. This motivates complementing our investigation by a second approach, where we resort to martingale CLTs to study the asymptotic non-null properties of the Watson test. We identify the alternatives (if any) under which the Watson test will show non-trivial asymptotic powers in high dimensions, which again requires considering various regimes according to the concentration pattern. We do so in a broad, semiparametric, model, namely in the class of rotationally symmetric distributions. Therefore, the corresponding results not only allow us to answer the questions left open by the Le Cam approach but they also extend beyond the parametric FvML framework many of the results obtained there through the Le Cam approach.

The outline of the paper is as follows. In Section 2, we consider the high-dimensional version of the FvML spherical location problem. In Section 2.1, we describe the invariance approach that allows us to later rely on Le Cam’s theory of asymptotic experiments. In Section 2.2, we provide a stochastic second-order expansion of the resulting invariant log-likelihood ratios and prove, in various regimes that we identify, that these invariant models are locally asymptotically normal. This allows us to derive the corresponding Le Cam optimal tests for the specified concentration problem and to study the non-null asymptotic behavior of the Watson test in the light of these results. In Section 2.3, we tackle the unspecified concentration problem through the derivation of LAN results that are with respect to both location and concentration. In Section 3, we conduct a systematic investigation of the non-null asymptotic properties of the Watson test in the broader context of rotationally symmetric distributions. Jointly with the results of Section 2, this allows us to fully characterize, in the FvML case, the non-null asymptotic behavior of the Watson test. In Section 4, we conduct a Monte Carlo study to investigate how well the finite-sample behaviors of the various tests reflect our theoretical asymptotic results. In Section 5, we summarize the results derived in the paper and comment on the few remaining open questions. The appendix contains all proofs.

2 Invariance and Le Cam optimality

As already mentioned in the introduction, the high-dimensional spherical location problem requires considering triangular arrays of observations of the form , , For any sequence  such that  belongs to  for any  and any sequence  in , we denote as  the hypothesis under which , , form a random sample from the distribution. The resulting sequence of statistical models is then associated with


(the index in the parameter  in principle is superfluous but is used here to stress the dependence of this parameter on , hence on ). The spherical location problem consists in testing the null hypothesis  against the alternative , where  is a fixed sequence such that  belongs to  for any . Clearly,  is the parameter of interest, whereas  plays the role of a nuisance. The main objective of this section is to derive Le Cam optimality results for this problem, referring to local alternatives of the form , with , where the sequence  and the bounded sequence , respectively in  and in , are such that  for any , which imposes that


for any . Since the sequence of “statistical experiments” associated with (2.1) involves parametric spaces  that depend on , applying Le Cam’s theory will require the following reduction of the problem through invariance arguments.

2.1 Reduction through invariance

Denoting as  the collection of orthogonal matrices satisfying , the null hypothesis  is invariant under the group  collecting the transformations

with . The transformation  induces a transformation of the parametric space  defined through . The orbits of the resulting induced group are , with  and . In such a context, the invariance principle (see, e.g., Lehmann and Romano, 2005, Chapter 6) leads to restricting to tests  that are invariant with respect to the group . Denoting as  a maximal invariant statistic for , the class of invariant tests coincides with the class of -measurable tests. Invariant tests thus are to be defined in the image


of the model  by , where  denotes the common distribution of  under any  with . Unlike the original sequence of statistical experiments in (2.1), the invariant one in (2.3) involves a fixed parametric space , which makes it in principle possible to rely on Le Cam’s asymptotic theory.

Now, the original local log-likelihood ratios associated with the generic local alternatives  above correspond, in view of (2.2), to the invariant local log-likelihood ratios


Deriving local asymptotic normality (LAN) results requires investigating the asymptotic behavior of such invariant log-likelihood ratios, which in turn requires evaluating the corresponding likelihoods. While obtaining a closed-form expression for  and its distribution is a very challenging task, these likelihoods can be obtained from Lemma 2.5.1 in Giri (1996), which, denoting as  the surface area measure on  ( times), yields


where integration is with respect to the Haar measure on . Note that (2.5) shows that the invariant null probability measure  coincides with the original null probability measure . In other words, it is only for non-null probability measures that the invariance reduction above is non-trivial.

2.2 Optimal testing under specified 

The main ingredient needed to obtain LAN results is Theorem 2.1 below, that provides a stochastic second-order expansion of the invariant log-likelihood ratios in (2.4). To state this theorem, we need to introduce the following notation. We will refer to the decomposition , with

as the tangent-normal decomposition of  with respect to . Under the hypothesis ,

has probability density function


where  denotes the indicator function of , is uniformly distributed over the “equator” , and  and  are mutually independent. Throughout, we will denote as  and ,

the non-centered and centered moments of 

under , and as  the corresponding non-centered moments of . Although this is not stressed in the notation, these moments clearly depend on  and ; for instance,


(this readily follows from (2)-(3) in Schou, 1978 by using the standard properties of exponential families; see also Lemma S.2.1 in Cutting, Paindaveine and Verdebout, 2017b). We can now state the stochastic second-order expansion result of the invariant log-likelihood ratios in (2.4).

Theorem 2.1.

Let  be a sequence of integers that diverges to infinity and  be an arbitrary sequence in . Let , and  be sequences such that and  belong to  for any , with  bounded and  such that


Then, letting

we have that

as  under .

Recalling that the log-likelihood ratio  refers to the local perturbation  of the null reference value , the result in Theorem 2.1 essentially shows that the invariant model considered enjoys a local asymptotic quadraticity (LAQ) structure in the vicinity of the null hypothesis ; see, e.g., Le Cam and Yang (2000), page 120. Actually, quadraticity, which is supposed to be in the increment , only holds for arbitrarily small values of this increment, hence only in regimes where  will converge to zero (in regimes below where, in contrast,  will be constant, the non-flat manifold structure of the hypersphere actually prevents a standard quadraticity property). This LAQ result hints that optimal testing for the specified- problem at hand is obtained by rejecting the null for small values of  (that is, when  and  project far from each other onto the axis ), for large values of  (that is, when  and  project far from each other onto the orthogonal complement to  in 

), or, more generally, for large values of a hybrid test statistic of the form

with non-negative weights  and . While any  provides a reasonable test statistic for the problem at hand, only one set of weights will yield a Le Cam optimal test and, interestingly, this set of weights depends on the way  behaves with  and . This will be one of the many consequences of the following LAN result.

Theorem 2.2.

Let  be a sequence of integers that diverges to infinity,  be a sequence in , and  be a sequence such that belongs to  for any . Then, there exist a sequence  in 

and a sequence of random variables 

that is asymptotically normal with zero mean and variance 

under  such that, for any bounded sequence  such that  belongs to  for any ,

as  under . If (i) , then

if (ii) , then, letting ,

if (iii)  with , then

if (iv) , then

if (v)  with , then

if (vi) , then

finally, if (vii) , then, even with , the invariant log-likelihood ratio  is  as  under .

In the image model (2.3), the spherical location problem consists in testing  against . In the localized at  experiments, parametrized by  as in (2.4), this reduces to testing  against . In any given regime (i)-(vii) from Theorem 2.2, it directly follows from this theorem that a locally asymptotically most powerful test for this problem — hence, locally asymptotically most powerful invariant test for the original spherical location problem — rejects the null at asymptotic level  whenever



denotes the cumulative distribution function of the standard normal (in the rest of the paper, the term “

optimal” will refer to this particular Le Cam optimality concept). A routine application of the Le Cam third lemma then shows that, in each regime, the asymptotic distribution of , under the corresponding contiguous alternatives  with , is normal with mean  and variance , so that the resulting asymptotic power of the optimal test in (2.9) is


In each regime (i)-(vii),  is the contiguity rate, which implies that the least severe alternatives under which a test may have non-trivial asymptotic powers are of the form , with a sequence  that is  but not . Theorem 2.2 shows that this contiguity rate depends on the regime considered and does so in a monotonic fashion, which is intuitively reasonable: the larger  (that is, the easier the inference problem), the faster  goes to zero, that is, the less severe the alternatives that can be detected by rate-consistent tests. Because the unit sphere  has a fixed diameter, characterizes the most severe alternatives that can be considered. In regime (vi), no tests will therefore be consistent under such most severe alternatives, while, in regime (vii), the distribution is so close to the uniform distribution on  that no tests can show non-trivial asymptotic powers under such alternatives, so that even the trivial -test is optimal.

One of the most striking consequences of Theorem 2.2 is that the optimal test depends on the regime considered. In regimes (v)-(vii), the optimal test in (2.9) rejects the null when ; of course, this optimality is degenerate in regime (vii), where any invariant test with asymptotic level  would also be optimal. In contrast, the optimal -level test in regimes (i)-(iii) rejects the null when

Since the chi-square distribution with  degrees of freedom converges, after standardization via its mean 

and standard deviation 

, to the standard normal distribution as 

diverges to infinity, this test is asymptotically equivalent to the Watson test in (1.2), based obviously on the dimension  at hand. This shows that, in regimes (i)-(iii), the traditional, low-dimensional, Watson test is optimal in high dimensions. In regime (iv), which is at the frontier between these regimes where the optimal test is the Watson test and those where the optimal test is based on , the optimal test is quite naturally based on a linear combination of  and .

Finally, the Le Cam third lemma allows to derive the asymptotic non-null behavior of the Watson test under the contiguous alternatives considered in any regime (i)-(vii). In regimes (i)-(iv), the limiting powers under contiguous alternatives of the form , with , are given by


In regimes (i)-(iii), the Watson test is the optimal test and these asymptotic powers are equal to those in (2.10), whereas in regime (iv), the Watson test is only rate-consistent, as the corresponding asymptotic powers of the optimal test are


In regimes (v)-(vi), the Le Cam third lemma shows that the limiting powers of the Watson test, still under the corresponding contiguous alternatives, are equal to the nominal level , so that the Watson test is not even rate-consistent in those regimes. Finally, as already discussed, the Watson test is optimal in regime (vii), but trivially so since the trivial -test there also is.

2.3 Optimal testing under unspecified 

The optimal test in regimes (i)-(iii), namely the Watson test, is a genuine test in the sense that it can be applied on the basis of the observations only. In contrast, the optimal tests in regimes (iv)-(vi) are “oracle” tests since they require knowing the values of  and , or equivalently (see (2.7)), the value of the concentration . This concentration, however, can hardly be assumed to be specified in practice, so that it is natural to wonder what is the optimal test, in regimes (iv)-(vi), when  is treated as a nuisance parameter.

We first focus on regime (iv). There, the concentration  is asymptotically of the form  for some . Within regime (iv), , obviously, is a perfectly valid alternative concentration parameter. Inspired by the classical treatment of asymptotically optimal inference in the presence of nuisance parameters (see, e.g., Bickel et al., 1998), this suggests studying the asymptotic behavior of invariant log-likelihood ratios of the form

where  is a suitable sequence of perturbed concentrations. We have the following result.

Theorem 2.3.

Let  be a sequence of integers that diverges to infinity with as . Let , with , and , where  is such that for any . Let  and  be sequences such that and , with the  below, belong to  for any . Then, putting ,

we have


as under , where , under the same sequence of hypotheses, is asymptotically normal with mean zero and covariance matrix .

Theorem 2.3 shows that, in regime (iv), the sequence of high-dimensional FvML experiments is jointly LAN in the location and concentration parameters. The corresponding Fisher information matrix  is not diagonal, which entails that the unspecification of the concentration parameter has asymptotically a positive cost when performing inference on the location parameter. In the present joint LAN framework, Le Cam optimal inference for location under unspecified concentration is to be based (see again Bickel et al., 1998) on the residual of the regression (in the limiting Gaussian shift experiment) of the location part  of the central sequence  with respect to the concentration part , that is, is to be based on the efficient central sequence

Under the null, is asymptotically normal with mean zero and variance , and the Le Cam optimal location test under unspecified  rejects the null at asymptotical level  when

As a corollary, provided that , the unspecified- optimal test in regime (iv) is the Watson test. Consequently, the difference between the local asymptotic powers in (2.11) and (2.12), associated with the Watson test and the specified- optimal test in regime (iv), respectively, can be interpreted as the asymptotic cost of the unspecification of the concentration when performing inference on location in the regime considered.

We now turn to regime (v), which is associated with , where  and  is a positive sequence satisfying  and . Taking (as in Theorem 2.2) and considering perturbed concentrations of the form , one can show, by working along the same lines as in the proof of Theorem 2.3, that the sequence of experiments is also jointly LAN in location and concentration, provided that . The corresponding central sequence and Fisher information matrix are


The co-linearity between the location part  and concentration part  of the central sequence implies that the efficient central sequence  is zero in regime (v). As a result, for the unspecified concentration problem, no test can detect deviations from the null hypothesis at the -rate in regime (v), which is in line with the corresponding trivial asymptotic powers of the Watson in Section 2.2. In the next section, we will investigate whether or not the Watson test can detect more severe alternatives in this regime.

Finally, consider regime (vi), where the concentration  is asymptotically of the form . Then, taking  (still as in Theorem 2.2) and perturbed concentrations of the form , we obtain that, without any condition on , the sequence of experiments is jointly LAN in location and concentration with the same central sequence and Fisher information matrix as in (2.14). The resulting efficient central sequence is therefore zero again. Since, in regime (vi),  provides the most severe location alternatives than can be considered, we conclude that, for the unspecified concentration problem, no test in regime (vi) can do asymptotically better than the trivial -level test that randomly rejects the null with probability . Under unspecified , thus, the Watson test is optimal in regime (vi), too, even if it is in a degenerate way.

Wrapping up, we proved that the Watson test is optimal in regimes (i)-(iii) for the specified concentration problem and that it is optimal in all regimes but possibly regime (v) in the more important unspecified concentration one (in regime (iv), the latter optimality actually requires that ). The asymptotic cost due to the unspecification of the concentration is nil in regimes (i)-(iii) (and (vii)), affects limiting powers but not consistency rates in regime (iv), and affects consistency rates in regimes (v)-(vi).

3 Non-null investigation via martingale CLTs

The results of the previous section bring much information on the non-null behavior of the Watson test in high dimensions, but two important questions remain open. First, while the Watson test was shown to be Le Cam optimal for the unspecified concentration problem in regimes (i)-(iv) and (vi)-(vii), little is known on its performances in regime (v). More precisely, it is only known that, like any location test addressing the unspecified- problem, the Watson test is blind to the contiguous alternatives in Theorem 2.2(v), so that it is unclear whether or not this test can detect more severe alternatives. Second, the results of the previous section are limited to the FvML case, while the Watson test is valid (in the sense that it meets the asymptotic nominal level constraint) under much broader distributional assumptions. One may therefore wonder about the non-null behavior of the Watson test also under high-dimensional non-FvML distributions.

In this section, we address these two questions by investigating, through martingale CLTs, the non-null behavior of the Watson test under general rotationally symmetric alternatives. Recall that the distribution of a random vector  with values in  is rotationally symmetric about  if  and  share the same distribution for any , and that it is rotationally symmetric if it is rotationally symmetric about some  in . Clearly, if  has an distribution, then it is rotationally symmetric about , so that the distributional context considered in this section will encompass the one in Section 2. Parallel to what was done there, we will refer to the decomposition with , and , as the tangent-normal decomposition of  with respect to . If  is rotationally symmetric about , then is uniformly distributed over  and is independent of . The distribution of  is then fully determined by  and by the cumulative distribution function  of , which justifies denoting the corresponding distribution as . In the sequel, we tacitly restrict to classes of rotationally symmetric distributions making  identifiable, which typically only excludes distributions satisfying .

We consider then a triangular array of observations of the form , , , where  form a random sample from the rotationally symmetric distribution . The corresponding hypothesis, that will be denoted as  involves a sequence of integers  diverging to infinity, a sequence  such that  for any , and a sequence  of cumulative distribution functions over . In this framework, the spherical location problem consists in testing against , where  is a fixed null parameter sequence. Parallel to the notation that was used in the FvML case, we will write  and , for the non-centered and centered moments of , respectively. These are the moments, under , of the quantity  in the tangent-normal decomposition of  with respect to . The corresponding non-centered moments of will still be denoted as .

Using the notation  and  from the tangent-normal decomposition of  with respect to the null location , the Watson test statistic rewrites

where  denotes the Watson test statistic in (1.2) based on the null location . Under the null and under appropriate local alternatives, it is expected that  is asymptotically equivalent in probability to

so that an important step in the investigation of the non-null properties of  is the study of the non-null behavior of 

. A classical martingale central limit theorem (see, e.g., Theorem 35.12 in

Billingsley, 1995) provides the following result.

Theorem 3.1.

Let  be a sequence of integers that diverges to infinity and  be a sequence such that belongs to  for any . Let  be a sequence of cumulative distribution functions on  such that (a) for any , (b) and (c) . Then, we have the following, where in each case  refers to an arbitrary sequence such that