# Asymptotic regime for improperness tests of complex random vectors

Improperness testing for complex-valued vectors and signals has been considered lately due to potential applications in complex-valued time series analysisencountered in many applications from communications to oceanography. This paper provides new results for such tests in the asymptotic regime, i.e. when the vector and sample sizes grow commensurately to infinity. The studied tests are based on invariant statistics named canonical correlation coefficients. Limiting distributions for these statistics are derived, together with those of the Generalized Likelihood Ratio Test (GLRT) and Roy's test, in the Gaussian case. This characterization in the asymptotic regime allows also to identify a phase transition in Roy's test with potential application in detection of complex-valued low-rank signals corrupted by proper noise in large datasets. Simulations illustrate the accuracy of the proposed asymptotic approximations.

## Authors

• 2 publications
• 3 publications
• 8 publications
• ### Tests for circular symmetry of complex-valued random vectors

We propose tests for the null hypothesis that the law of a complex-value...
09/19/2020 ∙ by Norbert Henze, et al. ∙ 0

• ### Large random matrix approach for testing independence of a large number of Gaussian time series

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoo...
07/17/2020 ∙ by Philippe Loubaton, et al. ∙ 0

• ### Two-Sample Test for Sparse High Dimensional Multinomial Distributions

In this paper we consider testing the equality of probability vectors of...
11/15/2017 ∙ by Amanda Plunkett, et al. ∙ 0

• ### Exact Tests for Offline Changepoint Detection in Multichannel Binary and Count Data with Application to Networks

We consider offline detection of a single changepoint in binary and coun...
08/20/2020 ∙ by Shyamal K. De, et al. ∙ 0

• ### On the frequency domain detection of high dimensional time series

In this paper, we address the problem of detection, in the frequency dom...
07/17/2020 ∙ by A Rosuel, et al. ∙ 0

• ### Frequency-Domain Stochastic Modeling of Stationary Bivariate or Complex-Valued Signals

There are three equivalent ways of representing two jointly observed rea...
06/25/2013 ∙ by Adam M. Sykulski, et al. ∙ 0

• ### Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

We consider a data corruption scenario in the classical k Nearest Neighb...
02/13/2020 ∙ by Yue Xing, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Testing for properness consists of deciding whether -dimensional complex random vector is proper or improper. For complex Gaussian vectors, properness means that is equal in distribution to (i.e. the pdf of is invariant by a rotation of [1, 2]. This is actually equivalent to second order circularity [1, 2].This problem has been considered by several authors, using different means, including Generalized Likelihood Ratio Tests (GLRT) [3, 4], locally most powerful (LMP) test [5, Chapter 3] or frequency domain tests [6]

. The asymptotic behavior of GLRT was studied for large sample sizes in the case of random variables (

) [7] or small/fixed values of [8]. The situation where both the dimension of the complex vector and the size of the sample tend to infinity was not considered until our recent preliminary study [9]. This article formalizes and extends [9], therefore filling the gap and provides insight into the asymptotic behavior of improperness test when the vector dimension and sample size grow commensurately to infinity.

Central to many of the tests available in the literature, the set of invariant parameters was first considered in [10], allowing for the derivation of invariant statistics used in [8]. Invariant parameters are in one-to-one correspondence with canonical correlation coefficients [4] as explained in [5].

In this article, we consider -dimensional complex-valued centered random vectors with Cartesian form such as , i.e. and are -dimensional real vectors with zero mean, i.e. , , and . Two augmented representations are classically used in the literature to study complex vectors , namely the real augmented representation and the complex augmented representation . The former consists of representing by a twice larger real-valued vector made up from the real and imaginary parts of , while the latter consists of using a twice larger complex-valued vector containing and its conjugate . Both representations are equivalent and easily connected using linear mappings given for example in [5]. In this paper, we make use of the real representation .

Recalling that , and thus that , second order statistics of are contained in the real-valued covariance matrix of the real representation vector , which reads:

 E[xxT]=C=(CuuCuvCvuCvv) (1)

where denotes the real-valued (cross)covariance matrix between real vectors and , and with . A complex-valued Gaussian vector is called proper iff the following two conditions hold:

 Cuu=Cvv% andCTuv=−Cuv. (2)

If these conditions are not fulfilled, then is called improper. Properness thus means that real and imaginary parts, i.e. and

, have the same covariance matrix and their cross-covariance is skew-symmetric (

2). When using the complex representation, properness is equivalent to having , which means that and are uncorrelated111Note that even in the Gaussian case, and being decorrelated does not mean that they are independent..

The original contributions of the present work consist of the following results in the asymptotic high-dimensional regime: the limiting distribution of the maximal invariant statistics, and accurate approximations for standard test statistics used in multivariate analysis, namely the GLRT and Roy’s test, are derived under the null hypothesis of properness. Moreover, a phase transition behavior is shown to exist, allowing for the detection of complex-valued low-rank signals corrupted by proper noise.

The paper is organized as follows. In Section II, the maximal invariant statistics are introduced for improperness testing. Their limiting distributions are also derived. Section III provides the exact and limiting distributions of GLRT, as well as the limiting distributions of Roy’s test statistics. Some simulations are conducted in IV to appreciate the accuracy of the proposed approximations. Some concluding remarks are given in the last section.

## Ii Canonical correlation coefficients

In order to design statistical tests based on second order statistics and that are not sensitive to reparametrization (by linear transformation), it is known

[10, 5]

that the canonical correlation coefficients should be the quantities to use. In this section, we introduce them and provide joint and marginal distributions in the asymptotic regime for their empirical estimates.

### Ii-a Testing problem

In several applications (e.g. fMRI [11], DOA estimation [12] or communications [5]222See [5] and references therein for a larger list of applications involving complex signals and improperness related issues.), it is common use to model the signal/vector of interest, denoted , as being improper and corrupted by proper Gaussian noise. Consequently, statistical tests have been proposed to investigate the properness/improperness of a complex signal given a sample (of size in the sequel), from which one needs to decide:

 {H0:$z$ is {\em proper}  if % condition (???) holdsH1:$z$ is {\em improper} otherwise (3)

In order to design a statistical test which is invariant under linear transformation, and as explained in [10]

, one should use the eigenvalues of the augmented covariance matrix

given in (1). Before introducing the invariant statistics to be derived from observed complex vectors, we first recall some results about the eigenvalues of real augmented PSD (Positive Semi-Definite) matrices.

### Ii-B Invariant parameters

Let be the set of non-singular matrices s.t.

 G=(G1−G2G2G1),

where . Let be the set of all real positive definite symmetric matrices. According to the test formulation (3) and condition (2), the null hypothesis is equivalent to .

As explained in [10], is a group (isomorphic to the group of non-singular complex matrices under the mapping ), with the matrix multiplication as the group operation. Moreover acts transitively on under the action . Thus, a parametric characterization of should be invariant to this group action: the value of the parameters to be tested should be the same for and for any .

Next, we introduce a decomposition for any that was originally given in [10] and reads:

 C=˙C+¨C (4)

where

 ˙C =12(Cuu+CvvCuv−CvuCvu−CuvCuu+Cvv)∈G, ¨C =12(Cuu−CvvCuv+CvuCuv+CvuCvv−Cuu).

Using this decomposition, one can define the following real symmetric matrix

 Γ(C) =˙C−12¨C˙C−12. (5)

It is now possible to give the following lemma about the parametrization of .

###### Lemma 1 (Invariant parameters [10]).

Any matrix can be written as:

 C=G(IN+Dλ00IN−Dλ)GT,

where , is the identity matrix and is an diagonal matrix whose diagonal entries denoted as , for , are the non-negative eigenvalues of the matrix given in (5). They satisfy the following properties:

1. and , for , form the set of eigenvalues of ,

2. with, by convention, the following ordering .

###### Proof.

See [10, lemma 5.1 and 5.2]. ∎

Lemma 1 shows that any invariant parameterization of the covariance matrix for the group action of depends only on the (non-negative) eigenvalues of . Thus these eigenvalues are termed maximal invariant parameters [13, Chapter 6]. Moreover under the null hypothesis , one has that as

reduces to the zero matrix according to (

2). Within the invariant parameterization, the testing problem in (3) becomes:

 {H0:λ1=0,H1:λ1>0, (6)

where the alternative hypothesis means that there exists at least one positive eigenvalue. Note that the invariance property ensures that the test does not depend on the (common) representation basis of the real and imaginary parts of , i.e. vectors and . Also, , the eigenvalues of , are directly related to the ones obtained using the complex augmented representation, i.e. based on the complex covariance matrix:

 E[~z~z†] =(CzzCzz∗Cz∗zCz∗z∗),

where stands for transposition and conjugation, and denotes the (complex) cross-covariance between the sized complex vectors and . In fact, as detailed in [5, Chapter 3] and [8], the eigenvalues , for , are also the square roots of the eigenvalues of the following complex matrix:

 C−1z∗z∗Cz∗zC−1zzCzz∗ (7)

Matrix (7) corresponds to the usual population canonical correlation matrix to derive the canonical variables between two vectors, here the complex ones and . In the sequel, we will make use of the name population canonical correlation coefficients for the eigenvalues (and their sample versions denoted ) as well as for their squared values (with sample versions denoted ). These coefficients are also known as circularity coefficients [14, 15]. Note finally, that even when and are uncorrelated, i.e. , they cannot be independent since they are deduced from one another in a deterministic way.

### Ii-C Invariant statistics

Consider a sample of size , denoted , where are -dimensional i.i.d. Gaussian real vectors with zero mean and covariance matrix . In the Gaussian framework, a sufficient statistics is given by the sample covariance matrix:

 S =(SuuSuvSvuSvv), (8)

with the real-valued sample (cross)covariance matrix of real vectors and such that:

 Sab=1MM∑m=1ambTm. (9)

We assume here that , thus belongs to the real symmetric positive definite matrices set . According to the previous section, since is invariant under the action of the group , an invariant test statistic must only depend on the non-negative eigenvalues , , of

 Γ(S)=˙S−12¨S˙S−12. (10)

These sample canonical correlations obey according to Lemma 1, and are an estimate of the population canonical correlations obtained from the population covariance . Note that all the are zero under the null hypothesis , and at least one is non-negative otherwise. As a consequence, the distribution of the should be stochastically greater under than under . All invariant test can be derived from this property. A key point to derive now a tractable statistical test procedure is to characterize the null distribution of these sample canonical correlations.

### Ii-D Eigenvalue distribution under H0

Let denote the

-dimensional matrix variate beta distribution with parameters

and as defined for instance in [16, definition 3.3.2, p. 110]. It is possible to obtain, under , the joint pdf of the squared eigenvalues of in terms of this matrix-variate beta distribution.

###### Proposition 2 (Joint distribution of canonical correlations).

Under , the vector of the squared canonical correlations , for , is distributed as the eigenvalues of the matrix-variate beta distribution , with parameters and . Moreover, the joint pdf of is expressed as:

 p(r1,…,rN)∝N∏n=1(1−rn)(M−2N−1)/2N∏k

where .

###### Proof.

As shown in [10, pp. 39-41], the sample eigenvalue vector

is characterized by the following probability density function (pdf):

 p(l1,…,lN)∝N∏n=1(2ln)(1−l2n)(M−2N−1)/2N∏k

A simple change of variables yields the pdf of given in (11). Moreover, according to [16, Theorem 3.3.4, p. 112], (11) is the pdf of the eigenvalues of the matrix variate beta distribution , which concludes the proof. ∎

It is interesting to note that the pdf given in Proposition 2 is very close to what would be obtained if one would perform a canonical correlation analysis on and considered as -dimensional real Gaussian independent vectors333In this case, the squared sample canonical correlations would be distributed as the eigenvalues of a matrix, as shown in [16, Section 11.3]. Here vectors and are actually complex valued and only uncorrelated (not independent).

Expression (11) gives, under the

hypothesis, the joint distribution of the squared sample canonical correlation coefficients

. In the general case, obtaining an analytic expression of marginal distributions of individual eigenvalues is a complicated task. However, in the asymptotic regime, i.e. when the dimension and the number of samples go to infinity while their ratio stays commensurable, one can obtain those marginal laws. The following theorem gives the distribution of one sample canonical correlation coefficient in this regime.

###### Theorem 3 (Limiting empirical distribution).

As with the ratio being finite, the marginal empirical distribution of the squared canonical correlation coefficients (i.e. the squared eigenvalues of ) converges, under the hypothesis, to the probability measure with density:

 f(r) =12π(1−r)√4(γ−1)1−rr−(γ−2)2, (12)

on its support , with .

###### Proof.

See Appendix A. ∎

###### Corollary 4 (Moments).

The mean and variance of the limiting distribution under

of a sample squared canonical correlation are expressed respectively as and .

###### Proof.

Expressions of these limiting moments can be derived directly from the pdf (

12) by symbolic computation. ∎

A few remarks are in order:

• When , the expression of the mean and variance emphasizes that the sample canonical correlations converge to zero, which are the population values under . This is the usual behavior in small dimension when is fixed while tends to infinity.

• Conversely, in the special case where , the asymptotic null distribution of the squared sample canonical correlations is , known as the arcsine law. In this limiting case, the sample canonical correlations are symmetrically distributed on around with two symmetric modes at the edges (even if the population canonical correlations are zero).

## Iii Testing for improperness

In this section, we make use of the results from Proposition 2 and Theorem 3 to introduce the asymptotic behavior of two improperness tests: the classical GLRT and Roy’s test (based on the largest eigenvalue of the correlation matrix).

### Iii-a Glrt

#### Iii-A1 Expression of the GLRT statistic

A very classical procedure to test for improperness is obtained from the Generalized Likelihood Ratio Test (GLRT) statistic defined as:

 T∝[C s.t. H0]supp(X;C)[C s.t. H1]supp(X;C),

where is the multivariate normal pdf of the sample composed of i.i.d. -dimension real Gaussian vectors with zero mean and covariance matrix . Under , is a symmetric definite positive matrix. Its maximum likelihood (ML) estimate is the sample covariance . Under , one has that so that . Then the ML estimate of under reduces to , as shown for instance in [10]. The testing problem in (3) can thus be rephrased:

 {H0:C∈T,H1:C∈S. (13)

Actually, the GLRT statistic is expressed as:

 T =|S|/|˙S|, =|˙S12(I2N+Γ(S))˙S12|/|˙S|=|I2N+Γ(S)|, =N∏n=1(1+ln)(1−ln)=N∏n=1(1−rn), (14)

where the first line is due to the Gaussian pdf expression; the second line comes from the decomposition and the expression (10) of ; the third line comes from Lemma 1, where , , are the squared sample canonical correlations. As explained previously, it is important to note that the GRLT is invariant: the resulting statistics given in (14) only depends on the eigenvalues of .

#### Iii-A2 Distribution under the hypothesis H0

Let denote Wilks lambda distribution, with dimension parameter

and degrees of freedom parameters

and , as defined for instance in [17, definition 3.7.1, p. 81].

###### Theorem 5.

The GLRT statistics given in (14) is distributed under as the following Wilks lambda distribution:

 T∼Λ(N,M−N,N+1).

Moreover this statistics can be expressed under as:

 T=N∏n=1un, (15)

where the are independent beta-distributed random variables such that , for .

###### Proof.

According to Proposition 2, the in (14) are distributed as the eigenvalues of the matrix variate beta distribution with parameters and . Using the mirror symmetry property of the beta distribution, the

are distributed as the eigenvalues of the random matrix

. According now to [16, Theorem 3.3.3, p. 110], can be decomposed as where is upper triangular with diagonal entries that are independent and where for . This concludes the proof since . ∎

Equation (15) gives also a more efficient way to sample from the null distribution of in independent draws, as it is actually not required to generate the sample covariance matrix , nor to compute the eigenvalues of .

#### Iii-A3 High-dimensional asymptotic distribution under H0

The characterization given in (15) allows us to derive, under the null hypothesis , an asymptotic distribution for the GLRT statistic in the high dimensional (i.e. large ) case. This yields a simple tractable closed form approximation of the considered Wilks lambda distribution when both the dimension and the sample size are large.

###### Theorem 6 (Central limit theorem in high dimension).

Let where is the GLRT statistic given in (14). Assume that so that the ratio . Under

, the following asymptotic normal distribution is obtained for

:

 (16)

where

 m =M[lnγγ−1+γ−2γlnγ−2γ−1]+12lnγγ−2, s2 =2[ln(γ−1)2γ(γ−2)+1M1γ−2].
###### Proof.

See Appendix B. ∎

Bartlett derived a classical approximation for Wilks lambda distribution [17, p. 94] in a low-dimensional setting. This gives, when the dimension is fixed while goes to infinity, the same asymptotic distribution as obtained in [8]:

 −(M−N)lnT d→ χ2N(N+1), (17)

where

denotes the chi-squared distribution with

degrees of freedom.

Using Theorem 6, the Bartlett approximation can now be adjusted to cover both the low and high-dimensional cases. Let

denote the gamma distribution with pdf

, where and are the shape and scale parameters respectively.

###### Corollary 7 (Adjusted Bartlett approximation).

Let , then the log-GLRT statistics can be approximated as a shifted gamma distribution:

 1s(T′−α)≈G(q,p), (18)

with

 q =N(N+1)/2,p=√1/q,α=m−pqs,

and and are defined in Theorem 6, and where stands for pointwise equivalence of distribution functions for large under both the low dimensional, i.e. is fixed and small w.r.t. , or the high dimensional, i.e. has order of , regime.

###### Proof.

In the high dimensional setting, under the assumptions of Theorem 6, the gamma distribution converges towards the normal one as the shape parameter goes to infinity. Since the mean and variance of are the same for the shifted gamma approximation (18) and the normal one (16), they are asymptotically equivalent.

In the low-dimensional setting where is fixed while goes to infinity, one gets that and are fixed, , and . Moreover, in this limiting case, according to the decomposition given in (15). Thus . Because , (18) means that is asymptotically distributed. This is the Bartlett limiting distribution (17), which is known to be valid in this low dimensional asymptotic regime ∎

### Iii-B Roy’s test

In multivariate statistics, Roy’s test is a well known procedure to detect the alternate hypothesis for which at least one eigenvalue is non-zero. This test relies on the statistics of the largest eigenvalue [17, p. 84] or, equivalently in our case, the statistics of the largest squared canonical correlation . The principle is to reject the hypothesis as soon as , where the threshold is tuned according to the law of under the hypothesis together with the nominal control level (probability of false alarm).

###### Theorem 8 (Limiting null distribution for Roy’s test).

As such that the ratio is finite, let be the logit transform of the largest canonical correlation coefficient . Under , the asymptotic law of converges towards a first order Tracy-Widom law denoted as :

 W−μσ→TW1, (19)

with

 μ =2logtan(φ+ψ2), σ3 =16M21sin2(φ+ψ)sinψsinφ, ψ =arccos(M−2N+1M), φ =arccos(M−2N−1M).
###### Proof.

This is a direct result of proposition 2 and the asymptotic law of the largest eigenvalue of a matrix-variate distribution given in [18]. ∎

The variable being expressed as an increasing function of , Roy’s test is equivalent to and the Theorem 8 allows for the calibrattion of the test. It should be noted that [19] proposes a procedure to evaluate the exact (not asymptotic) law of . Nevertheless, the simple approximation by the law is in practice sufficiently precise for most cases as long as the dimension is large enough (e.g. typically for ).

### Iii-C Spiked correlation model

Spiked models are special sparse cases for the alternative hypothesis . They assume that the rank of the population matrix is low and remains fixed in the high-dimensional asymptotic regime. For the improperness test setting, this means that the number of non-zero eigenvalues of , or equivalently , is fixed. An example, which corresponds to a low-rank improper signal corrupted by proper noise, is given in Section IV-B2.

###### Theorem 9 (Phase transition threshold).

Assume that there exist non-zero population canonical correlation coefficients , and where is fixed. Under the assumptions of Theorem 3, we have the following convergence for the square of the largest canonical correlations with ,

 if λ2n≤ρc,rn a.s.⟶c, If λ2n>ρc,rn a.s.⟶¯¯¯ρn,

where

 ρc (20)

are respectively the phase transition threshold and the limiting values, and is the edge of the limiting distribution of the bulk defined in Theorem 3.

###### Proof.

This follows from Proposition 2 and from results for high dimensional limiting distribution of spiked models with Beta distributed matrices given in [20, see Theorem 1.8], or [21]. ∎

Theorem 9 shows that when the spikes are weaker than a given phase transition threshold , none of the sample canonical correlations separate from the bulk. This makes the testing problem challenging, and Roy’s test would be powerless in this case. Conversely, for spikes larger than , is is easy to check that . These sample canonical correlations separate now from the bulk, and Roy’s test is expected to be very powerful.

## Iv Simulations

This section starts with simulations validating the accuracy of the asymptotic distributions derived under the properness hypothesis . Then, improperness testing is illustrated under two alternative hypotheses: i) equi-correlated model ( non-zero identical canonical correlation coefficients ) and ii) spiked model ( while ).

### Iv-a Empirical distribution of correlations under H0

#### Iv-A1 Empirical vs limiting distributions

Fig. 1 displays, for different values of and , the empirical distribution of the squares of the sample canonical correlations under the properness assumption . This shows the very good agreement with the limiting empirical distribution derived in Theorem 3. Note that, when , small fluctuations can be observed (white bars) around the right edge of the limiting empirical distribution. But in a larger dimension (), the greatest correlations converged well towards this edge.

#### Iv-A2 Distribution of the GLRT statistic

Fig. 2 depicts, for different values of and , a probability-probability plot of the theoretical null distribution of against each one of these asymptotic approximations. A deviation from the line indicates a difference between the theoretical and the asymptotic distributions. This shows that, as expected for high-dimensional setting (e.g., ) and/or large sample sizes (e.g.,

), the asymptotic log-normal distribution derived in Theorem

5 becomes very accurate and much better than the Bartlett approximation. In addition, the adjusted Bartlett approximation obtained in Corollary 7 is very accurate in all cases (low/high-dimension or small/large sample size). The latter is therefore of practical interest to calibrate the GRLT procedure according to a nominal significance level.

#### Iv-A3 Distribution of the Roy’s statistic

Fig. 3 depicts, for different values of and , a probability-probability plot of the theoretical null distribution for the largest correlation statistics , or equivalently its logit-transform , against the asymptotic approximation given in Theorem 8 . This shows that even for a moderate dimension (), the Tracy-Widom approximation is quite accurate, and becomes very accurate for a larger dimension ().

### Iv-B Two improperness tests scenarios

#### Iv-B1 Equal canonical correlation coefficients

The case where the population canonical correlations are all non-zero and equal, hereinafter referred to as equi-correlated model, can be obtained when the real and imaginary parts have a common contribution:

 um=sm+√θqm,vm=tm+√θqm, (21)

where , , and are i.i.d. Gaussian vectors in , for . Straightforward computations show that the non-negative roots of , i.e the population canonical correlations , are all equal to .

Fig. 4 displays the power of both GRLT and Roy’s test, under the alternative obtained for this equi-correlated model, as a function of the correlation level . As expected, the GLRT, which uses information from all sample correlations, is here much more powerful than Roy’s test, especially in the high dimension case () where and are close.

#### Iv-B2 Spiked model

A Gaussian spike model with a single non-zero canonical correlation can be obtained when the real and imaginary parts have a common contribution of rank one:

 um=sm+√θwmφ,vm=tm+√θwmφ, (22)

where , is a normed deterministic vector , are i.i.d. Gaussian centered random variables with unit variance, and , are Gaussian i.i.d . vectors in , for . This scenario depicts a case where a low-rank improper signal is corrupted by proper noise. Straightforward computations show that there is a single non-zero population canonical correlation, a spike, which expresses as .

Fig. 5 displays the empirical distribution of the squares of the sample canonical correlations under alternative spiked model for different spike level . Again, the bulk of these correlations matches very well the limiting distribution derived under the properness hypothesis , whatever the spike level. In addition, for “weak” spikes, i.e. when is small relative to the phase transition threshold defined in Theorem 9, the greatest sample correlation , which is an estimator of the spike power , does not separate from this bulk and is stuck around the edge of the limiting distribution. Conversely, for stronger spikes where , the greatest correlation clearly separates from the bulk and concentrates around the limiting value . This numerically supports Theorem 9.

In Fig. 6 the power of both GRLT and Roy’s tests are displayed as a function of the spike power . This shows that for “weak” spikes, i.e. when , the two tests have a very low power. In fact, the largest sample canonical correlation does not separate from the bulk and cannot be detected correctly using Roy’s test. It is interesting to note that GLRT, which uses the information in all the correlations, is here slightly “more powerful” to detect such weak spikes. Nevertheless, as soon as separates from the bulk, i.e. for stronger spikes where , Roy’s test becomes much more powerful than the GLRT, with a power that converges quickly towards as expected.

## V Concluding remarks

Properness testing for complex Gaussian random vectors in the asymptotic regime relies on the characterization of sample canonical correlations. In particular, their limiting distributions give access to the behavior of classical GLRT and Roy’s test. The results presented in this article demonstrate that the asymptotic regime is actually reached quite rapidly in practice and the proposed original approximations are well-suited to a wide range of complex-valued datasets.

The phase transition highlighted in Roy’s test has also potential applications in the search for complex-valued low-rank signals corrupted by proper noise in large datasets for example. Another natural extension of the proposed work consists of considering the case of quaternion random vectors which possess several properness levels, thus trying to decipher their correlation symmetry patterns. This could be helpful in the spectral characterization of bivariate signals [22] among other quaternion signal processing applications.

## Acknowledgment

The authors would like to thank Prof. Romain Couillet for his many valuable comments and suggestions.

## Appendix A Proof of Theorem 3 (limiting empirical distribution)

Given two independent matrices , with respective distributions the Wishart laws and . Assume that we are in the asymptotic regime where with and . As demonstrated in [23], the empirical law of the eigenvalues of converges to a distribution with pdf given as:

 f(x)=(1−d′)√(x−a)(b−x)2πx(xd+d′), (23)

on the interval , where and .

According to [16, Thm 3.3.1, p. 109], the eigenvalues of have the same law as the eigenvalues of a matrix with law . Morevover, each eigenvalue of can be deduced from each eigenvalue of the matrix thanks to the relation . The continuous mapping theorem ensures that the asymptotic law of the eigenvalues of such a beta matrix can be directly deduced from (23) using the aforementioned change of variable.

Finally, according to proposition 2, the parameters for the matrix-variate beta law in our case are and . Then, due thanks to hypothesis stated above, one gets that and , which concludes the proof.

## Appendix B Proof of Theorem 6 (central limit theorem)

According to theorem 5, where the are independent random variables such that with for . Based on the centered moments of a logarithmically transformed beta-distributed variable as given in [24], then where is the digamma function, and where is the trigamma function. Using Taylor series expansions of the digamma and trigamma functions, straightforward computations, omitted here for the sake of brevity, yield that and .

In order to apply Lyapunov central limit theorem

[25, p. 362] to , it is sufficient to show that

 1var(T′)2N∑n=1E[(ζn−E[ζn])4]→0.

The expression of the fourth order centered moment of gives that for . As , the previous Lyapunov sufficient condition holds, and

 Z≡1√var(T′)N∑n=1(ζn−E[ζn]) d→ N(0,1).

By noting finally that , Slutsky’s theorem allows us to conclude the proof.

## References

• [1] P. Comon, “Circularité et signaux aléatoires à temps discret,” Traitement du Signal, vol. 11, pp. 41–420, 1994.
• [2] B. Picinbono, “On circularity,” IEEE Trans. on Signal Processing, vol. 42, no. 12, pp. 3473–3482, 1994.
• [3] E. Ollila and V. Koivunen, “Generalized complex elliptical distributions,” in IEEE Sensor Array and Multichannel Signal Processing workshop (SAM), 2004, pp. 460–464.
• [4] P. Schreier, L. Scharf, and A. Hanssen, “A generalized likelihood ratio test for impropriety of complex signals,” IEEE Signal Processing Letters, vol. 13, no. 7, pp. 433 – 436, 2006.
• [5] P. Schreier and L. Scharf, Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals.   Cambridge University Press, 2010.
• [6] S. Chandna and A. Walden, “A frequency domain test for propriety of complex-valued vector time series,” IEEE Transactions on Signal Processing, vol. 65, pp. 1425–1436, 2017.
• [7] J. Delmas, A. Oukaci, and P. Chevalier, “On the asymptotic distribution of glr for impropriety of complex signals,” Signal Processing, vol. 91, no. 10, pp. 2259–2267, 2011.
• [8] A. Walden and P. Rubin-Delanchy, “On testing for improperty of complex-valued gaussian vectors,” IEEE Transactions on Signal Processing, vol. 57, no. 3, pp. 21–51, 2009.
• [9] F. Chatelain and N. Le Bihan, “Exact distribution and high-dimensional asymptotics for improperness test of complex signals,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
• [10] S. Andersson and M. Perlman, “Two testing problems relating the real and complex multivariate normal distributions,” Journal of Multivariate Analysis, vol. 15, pp. 21–51, 1975.
• [11] T. Adali and V. D. Calhoun, “Complex ICA of brain imaging data [life sciences],” IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 136–139, 2007.
• [12] J.-P. Delmas, “Asymptotically minimum variance second-order estimation for noncircular signals with application to doa estimation,” IEEE Transactions on Signal Processing, vol. 52, no. 5, pp. 1235–1241, 2004.
• [13] E. L. Lehmann and J. P. Romano, Testing statistical hypotheses, 3rd ed., ser. Springer Texts in Statistics.   New York: Springer, 2005.
• [14] J. Eriksson and V. Koivunen, “Complex random vectors and ICA models: identifiability, uniqueness, and separability,” IEEE Transactions on Information Theory, vol. 52, no. 3, pp. 1017–1029, 2006.
• [15] E. Moreau and T. Adali, Blind Identification and Separation of Complex-Valued Signals.   ISTE-Wiley, 2013.
• [16] R. Muirhead, Aspects of Multivariate Statistical Theory.   Wiley-Interscience, 2005.
• [17] K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate analysis.   Academic Press London ; New York, 1979.
• [18] I. Johnstone, “Approximate null distribution of the largest root in multivariate analysis,” Ann. Appl. Stat., vol. 3, pp. 1616–1633, 2009.
• [19] M. Chiani, “Distribution of the largest root of a matrix for roy’s test in multivariate analysis of variance,” J. Multivar. Anal., vol. 143, pp. 467–471, 2016.
• [20] Z. Bao, J. Hu, G. Pan, and W. Zhou, “Canonical correlation coefficients of high-dimensional gaussian vectors: Finite rank case,” Ann. Statist., vol. 47, no. 1, pp. 612–640, 02 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1704
• [21] I. Johnstone and A. Onatski, “Testing in high-dimensional spiked models,” arXiv e-prints, p. arXiv:1509.07269v2, Feb 2018.
• [22] J. Flamant, N. Le Bihan, and P. Chainais, “Spectral analysis of stationary random bivariate signals,” IEEE Transactions on Signal Processing, vol. 65, no. 23, pp. 6135–6145, Dec 2017.
• [23] J. Silverstein, “The limiting eigenvalue distribution of a multivariate matrix,” SIAM J. Math. Anal., vol. 16, pp. 641–646, 1985.
• [24]

S. Nadarajah and S. Kotz, “The beta exponential distribution,”

Reliability Engineering & System Safety, vol. 91, no. 6, pp. 689 – 697, 2006.
• [25] P. Billingsley, Probability and Measure, ser. Wiley Series in Probability and Statistics.   Wiley, 1995.