I-a Problem statement
Consider the problem of detecting the presence or absence of a signal from the measured output of an -element antenna array. We are interested in the case where is unknown but structured. A motivating example arises with communications signals, where typically a few “training” symbols are known and the remaining “data” symbols are unknown, apart from their alphabet. We will assume that the signal’s array response
is completely unknown but constant over the measurement epoch and signal bandwidth. The complete lack of knowledge aboutis appropriate when the array manifold is unknown or uncalibrated (e.g., see the discussion in ), or when the signal is observed in a dense multipath environment (e.g., 
). Also, we will assume that the measurements are corrupted by white noise of unknown variance andpossibly strong interferers. The interference statistics are assumed to be unknown, as is .
The signal-detection problem can be formulated as a binary hypothesis test  between hypotheses (signal present) and (signal absent), i.e.,
In (1), refers to the noise and to the interference. We model as white Gaussian noise (WGN)111By white Gaussian, we mean that has i.i.d. zero-mean circularly symmetric complex Gaussian entries. with unknown variance . If the array responses of the interferers are constant over the measurement epoch and bandwidth, then the rank of will be at most . As will be discussed in the sequel, we will sometimes (but not always) model the temporal interference component as white and Gaussian.
Communications signals often take a form like
where is a known training sequence, is an unknown data sequence, is a finite alphabet, and . Suppose that the measurements are partitioned as , conformal with (2). For the purpose of signal detection or synchronization, the data measurements are often ignored (see, e.g., ). But these data measurements can be very useful, especially when the training symbols (and thus the training measurements ) are few. Our goal is to develop detection schemes that use all measurements while handling the incomplete knowledge of in a principled manner.
We propose to model the signal structure probabilistically. That is, we treat
as a random vector with prior pdf, where is statistically independent of , , , and . Although the general methodology we propose supports arbitrary , we sometimes focus (for simplicity) on the case of statistically independent components, i.e.,
where denotes the Dirac delta, the th training symbol, is a finite-cardinality set containing the th data symbol, and is the cardinality of . For coded communications signals, the independent prior (3) would still be appropriate if a “turbo equalization” 
approach was used, where symbol estimation is iterated with soft-input soft-input decoding. A variation of (2) that avoids the need to know follows from modeling as i.i.d. Gaussian. In practical communications scenarios, there exists imperfect time and frequency synchronization, which leads to mismatch in the assumed model (3)-(4). In Sec. V, we discuss synchronization mismatch and investigate its effect in numerical experiments.
The proposed probabilistic framework is quite general. For example, in addition to training/data structures of the form in (2), the independent model (3) covers superimposed training , bit-level training , constant-envelope waveforms , and pulsed signals (i.e., with unknown ) . To exploit sinusoidal signal models, or signals with known spectral characteristics (see, e.g., ), the independent model (3) would be discarded in favor of a more appropriate . There is an excellent description of most of these topics in , and we refer readers to that source for more details.
I-B Prior work
For the case where the entire signal is known, the detection problem (1) has been studied in detail. For example, in the classical work of Kelly [7, 8], the interference-plus-noise was modeled as temporally white222By temporally white and Gaussian, we mean that the columns are i.i.d. circularly symmetric complex Gaussian random vectors with zero mean and a generic covariance matrix. and Gaussian with unknown (and unstructured) spatial covariance , and the generalized likelihood ratio test (GLRT)  was derived. Detector performance can be improved when the interference is known to have low rank. For example, Gerlach and Steiner  assumed temporally white Gaussian interference with known noise variance and unknown interference rank and derived the GLRT. More recently, Kang, Monga, and Rangaswamy  assumed temporally white Gaussian interference with unknown and known and derived the GLRT. Other structures on were considered by Aubry et al. in . In a departure from the above methods, McWhorter  proposed to treat the interference components and , as well as the noise variance , as deterministic unknowns. He then derived the corresponding GLRT. Note that McWhorter’s approach implicitly assumes knowledge of the interference rank . Bandiera et al.  proposed yet a different approach, based on a Bayesian perspective.
For adaptive detection of unknown but structured signals , we are aware of relatively little prior work. Forsythe [1, p.110] describes an iterative scheme for signals with deterministic (e.g., finite-alphabet, constant envelope) structure that builds on Kelly’s GLRT. Each iteration involves maximum-likelihood (ML) signal estimation and least-squares beamforming, based on the intuition that correct decisions will lead to better beamformers and thus better interference suppression. Error propagation remains a serious issue, however, as we will demonstrate in the sequel.
We propose three GLRT-based schemes for adaptive detection of unknown structured signals with unknown array responses , additive WGN of unknown variance , and interference of possibly low rank. All of our schemes use a probabilistic signal model , under which the direct evaluation of the GLRT numerator becomes intractable. To circumvent this intractability, we use expectation maximization (EM) . In particular, we derive computationally efficient EM procedures for the independent prior (3), paying special attention to finite-alphabet and Gaussian cases.
Our first approach treats the interference as temporally white and Gaussian, and it makes no attempt to leverage low interference rank, similar to Kelly’s approach . A full-rank interference model would be appropriate if, say, the interferers’ array responses varied significantly over the measurement epoch. We show that our first approach is a variation on Forsythe’s iterative scheme [1, p.110] that uses “soft” symbol estimation and “soft” signal subtraction, making it much more robust to error propagation.
Our second approach is an extension of our first that aims to exploit the possibly low-rank nature of the interference. As in [9, 10, 11], the interference is modeled as temporally white Gaussian but, different from [9, 10, 11], both the interference rank and the noise variance are unknown. More significantly, unlike [9, 10, 11], the signal is assumed to be unknown.
Our third approach also aims to exploit low-rank interference, but it does so while modeling the interference as deterministic, as in McWhorter . Unlike , however, the interference rank and the signal are assumed to be unknown. Numerical experiments are presented to demonstrate the efficacy of our three approaches.
We first provide some background that will be used in developing the proposed methods. In our discussions below, we will use to denote orthogonal projection onto the column space of a given matrix , i.e.,
and to denote the orthogonal complement. Recall that both and are Hermitian and idempotent.
Ii-a Full-rank Gaussian Interference
The classical work of Kelly [7, 8] tackled the binary hypothesis test (1) by treating the interference-plus-noise as temporally white and Gaussian with unknown spatial covariance matrix . This reduces (1) to
denotes the vector formed by concatenating all columns of the matrix ,
denotes the circularly symmetric multivariate complex Gaussian distribution with mean vector
denotes the circularly symmetric multivariate complex Gaussian distribution with mean vectorand covariance matrix , and denotes the Kronecker product. We note that the covariance structure in (6) corresponds temporal whiteness across time samples and spatial correlation with covariance matrix . With known , the GLRT  takes the form
for decreasing ordered (i.e.,
Kelly’s approach was applied to the detection/synchronization of communications signals by Bliss and Parker in  after discarding the measurements corresponding to the unknown data symbols .
When , some eigenvalues will be zero-valued and so the test (8) is not directly applicable. One can imagine many strategies to circumvent this problem (e.g., restricting to positive eigenvalues, computing eigenvalues from a regularized sample covariance of the form for , etc) that can be considered as departures from Kelly’s approach. In the sequel, we describe approaches that use a low-rank-plus-identity covariance , as would be appropriate when the interferers are few, i.e., .
Ii-B Low-rank Gaussian Interference
The low-rank property of the interference can be exploited to improve detector performance. Some of the first work in this direction was published by Gerlach and Steiner in . They assumed known noise variance and temporally white Gaussian interference, so that where with unknown low-rank . The GLRT was then posed under the constraint that :
More recently, Kang, Monga, and Rangaswamy  proposed a variation on Gerlach and Steiner’s approach  where the noise variance is unknown but is known, , and . In particular, they proposed the GLRT
with a smoothed version of from (9):
Ii-C Low-rank Deterministic Interference
The approaches discussed above all model the interference as temporally white Gaussian. McWhorter  instead proposed to treat the interference components and as deterministic unknowns, yielding the GLRT
using the defined in (9). Comparing (17) to (13), we see that both GLRTs involve noise variance estimates computed by averaging the smallest eigenvalues. However, (17) discards the largest eigenvalues whereas (13) uses them in the test.
Iii GLRTs via White Gaussian Interference
We now consider adaptive detection via the binary hypothesis test (1) with unknown structured . As described earlier, our approach is to model as a random vector with prior density .
is temporally white Gaussian with spatial covariance matrix , where both and are unknown. For now, we will model using a fixed and known rank . The case is reminiscent of Kelly , and the case is reminiscent of Kang, Monga, and Rangaswamy . The estimation of will be discussed in Sec. III-G.
For a fixed rank , the hypothesis test (1) reduces to
where and (defined in (12)) are unknown and . When , note that reduces to . The corresponding GLRT is
Iii-a GLRT Denominator
where follow the definition in (14) with . That is, is a smoothed version of the eigenvalues of the sample covariance matrix in decreasing order, where the smoothing averages the smallest eigenvalues to form the noise variance estimate , as in (15). When , the results in  (see also ) imply that . In either case, the columns of
are the corresponding eigenvectors of the sample covariance matrix. Plugging (23) into (22), taking the log, and rearranging gives
Since , we have
When , note that can be computed using only the principal eigenvalues of , since
Iii-B GLRT Numerator
Exact maximization of over and appears to be intractable. We thus propose to approximate the maximization by applying EM  with hidden data . This implies that we iterate the following over :
The EM algorithm is guaranteed to converge to a local maxima or saddle point of the likelihood (LABEL:eq:pY1aa) . Furthermore, at each iteration , the EM-approximated log-likelihood increases and lower bounds the true log-likelihood .
Because is statistically independent of and , we have , which allows us to rewrite (LABEL:eq:EMa) as
We first perform the minimization in (35) over . Since
the gradient of the cost in (35) w.r.t. equals
and this gradient is set to zero by
which uses the notation
Setting in (35), we obtain the cost that must be minimized over :
Note that is a regularized version of the projection matrix that equals when is completely known. In general, however, is not a projection matrix. Minimizing (42) over is equivalent to maximizing
where are the eigenvalues of the matrix in decreasing order, and the columns of are the corresponding eigenvectors. When , we have that .
Iii-C EM Update under an Independent Prior
The EM updates of and in (39)-(40) compute the conditional mean (or, equivalently, the MMSE estimate ) of and , respectively, given the measurements in (19) under the model and . For any independent prior, as in (3), we can MMSE-estimate the symbols one at a time from the measurement equation
From , we obtain a sufficient statistic  for the estimation of by spatially whitening the measurements via
and then matched filtering via
We find it more convenient to work with the normalized and conjugated statistic
which is a Gaussian-noise-corrupted version of the true symbol , with noise precision .
The computation of the MMSE estimate from depends on the prior . For the Gaussian prior , we have the posterior mean and variance 
which from (40) implies
For the discrete prior , with alphabet
and prior symbol probabilities(such that ), it is straightforward to show that the posterior density is
and thus the posterior mean and second moment are
which from (40) implies
This EM update procedure is summarized in Alg. 1.
Iii-D Fast Implementation of Algorithm 1
The implementation complexity of Alg. 1 is dominated by the eigenvalue decomposition in line 12, which consumes operations per EM iteration. We now describe how the complexity of this step can be reduced. Recall that
using the definition
The key idea is that the eigen-decomposition of