Estimation in the group action channel

01/13/2018 ∙ by Emmanuel Abbe, et al. ∙ 0

We analyze the problem of estimating a signal from multiple measurements on a group action channel that linearly transforms a signal by a random group action followed by a fixed projection and additive Gaussian noise. This channel is motivated by applications such as multi-reference alignment and cryo-electron microscopy. We focus on the large noise regime prevalent in these applications. We give a lower bound on the mean square error (MSE) of any asymptotically unbiased estimator of the signal's orbit in terms of the signal's moment tensors, which implies that the MSE is bounded away from 0 when N/σ^2d is bounded from above, where N is the number of observations, σ is the noise standard deviation, and d is the so-called moment order cutoff. In contrast, the maximum likelihood estimator is shown to be consistent if N /σ^2d diverges.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In this paper, we consider the problem of estimating with measurements from the group action channel, defined as


where the are i.i.d and drawn from , i.e. and its entries are i.i.d standard Gaussian variables; is a projection matrix which is known; are i.i.d. matrices drawn from a distribution on a compact subgroup of , i.e. the space of orthogonal matrices in . The distribution is not known, however the main goal is to estimate the signal .

The goal of this paper is to understand the sample complexity of (I.1), i.e. the relation between the number of measurements and the noise standard deviation such that an estimator , of

, converges in probability to the true value with

diverging, up to a group action. Allowing for a group action is intrinsic to the problem: if we apply an element of to , and its inverse to the right of , we will produce exactly the same samples, thus there is no estimator that is able to distinguish the observations that originate from and the ones from .

The model (I.1) is a generalization of multi-reference alignment (MRA), which arises in a variety of engineering and scientific applications, among them structural biology [1, 2, 3], radar [4, 5], robotics [6] and image processing [7, 8, 9]. The one-dimensional MRA problem, where is the group generated by the matrix that cyclically shifts the elements of the signal, i.e. it maps , has been recently a topic of active research. In [10], it was shown that the sample complexity is when

is the uniform distribution and the projection matrix is the identity. In

[11] it was presented a provable polynomial time estimator that achieves the sample complexity, while in [12] it was presented a non-convex optimization framework that is more efficient in practice. Note that, when the projection matrix is the identity, we can always enforce a uniform distribution on by applying a random group action, i.i.d. and drawn from the uniform distribution, to the observations. In [13], it was shown that is also the sample complexity if is unknown beforehand but is uniform or periodic, this is, for some . However, if is aperiodic, the sample complexity is . It is also presented an efficient estimator that uses the first and second moments of the signal over the group, which can be estimated with order of and observations, respectively, thus achieving the sample complexity. The main result in this paper is a generalization of the information lower bound presented in [13], however the proof techniques remain the same.

We can also use (I.1) to model the problem of single particle reconstruction in cryo-electron microscopy (cryo-EM), in which a three-dimensional volume is recovered from two-dimensional noisy projections taken at unknown viewing directions [14, 15]. Here

is a linear combination of products of spherical harmonics and radial basis functions,

, and its elements act on by rotating the basis functions. Finally, is a tomographic projection onto the plane. The paper [16] considers the problem (I.1) with being known and uniform. It obtains the same result for the sample complexity as this paper, and together with results from computational algebra and invariant theory verifies that in many cases the sample complexity for the considered cryo-EM model is , and at least more generally. They also consider the problem of heterogeneity in cryo-EM.

Ii The Main Result

Since we can only determine up to a group action, we define the best alignment of with by


and the mean square error (MSE) as


The expectation is taken over , which is a function of the observations with distribution determined by (I.1). Since we are interested in estimators that converge to an orbit of in probability as diverges, we only consider estimators which are asymptotically unbiased, i.e., as . However the results presented in this paper can be adapted to biased estimators (see Theorem III.2).

Let us introduce some notation regarding tensors. For a vector

, we denote by the dimensional tensor where the entry indexed by is given by . The space of -dimensional tensors forms a vector space, with sum and multiplication defined entry-wise. This vector-space has inner product and norm defined by and , respectively.

Definition II.1.

The -th order moment of over , is the tensor of order and dimension , defined by


where .

In this paper, we provide lower bounds for the MSE in terms of the noise standard deviation and the number of observations. We show that the MSE is bounded below by order of , where is the moment order cutoff, defined as the smallest such that the moment tensors up to order define unequivocally. We also show that if , then the marginalized maximum likelihood estimator (MLE) converges in probability to the true signal (up to a group action). We now present the main result of the paper.

Theorem II.2.

Consider the estimation problem given by equation (I.1). For any signal such and for any group distribution , let , and define the moment order cutoff as . Finally let

We have


thus the MSE is bounded away from zero if is bounded from above. Moreover, if , then the MLE converges in probability to , for some element .

Ii-a Taking the limit

Theorem II.2 is an application of a modified Chapman-Robbins bound, presented later in Theorem III.2. On the other hand the classical Cramér-Rao bound [17]

, which gives a lower bound on the variance of an estimator

of a parameter , can be obtained from the Chapman-Robbins bound by taking the limit . We present an analog version of Theorem II.2 obtained by taking a similar limit.

Corollary II.3.

Under the conditions of Theorem II.2, let, ,

and . Then


We leave the proof of this corollary to [13, Appendix C]. It is interesting to compare this bound with (II.4) when or diverge. If , then will dominate , and the lower bound for the MSE will be inversely proportional to , which is a behavior typical of estimation problems with continuous model paramaters. On the other hand, if , then dominates . the MSE will depend exponentially on , which is a behaviour typical of discrete parameter estimation problems [18]. One can show that only happens when the supremum in is attained by some not in the orbit of . The exponential decay in is the same as the probability of error of the hypothesis testing which decides if the observations come from or .

We conjecture that the lower bounds presented in this paper can be achieved asymptotically by the MLE. In fact when the search space is discrete, the MLE achieves the least probability of error (assuming a uniform prior on the parameters), which behaves like (II.4). Also, when the search space is continuous, the MLE is asymptotically efficient, which means it achieves the Cramér-Rao lower bound. However this bound is obtained from the Chapman-Robbins lower bound (which we use in this paper) by taking a similar limit as in (II.5), and the bound also scales inversely proportional to the number of observations.

Ii-B Prior Knowledge

The result presented can be adapted to improve the bound if we have prior knowledge about the signal and group distribution. If we know beforehand that (for instance, has a zero element or is the uniform distribution on ), we can instead define and restrict the supremum in (II.4) to in .

Ii-C Examples

1)  Let ; be the group generated by the cyclic shift matrix that maps ; and projects into its first two elements, i.e . Furthermore, we know a-priori that one, and only one, of the elements of is (let’s assume without loss of generality that ), the other two elements are distinct and is uniform, i.e. . We have


From these two moments, we can solve for and , however all these equations are symmetric on and , thus we can’t identify which one of the values obtained is and which one is . In other words, both candidate solutions are and . However differs from , if we look for the entry in indexed by we note that

and analogously . From the 8 entries of , are equal to and differ by , in absolute value, so . This means , , thus if the lower bound (II.4) dominates (II.5), the supremum is attained at and

Note that .

2)  Let ; be the group generated by the cyclic shift matrix that maps ; and projects into its first element, i.e . Furthermore, we know a-priori that is uniform, i.e. . We have


From these two moments we can determine and up to an action of the group. Now take , so that . We have

Here , thus if diverges, (II.5) dominates (II.4), and the lower bound is

Iii Proof Techniques

The outline of the proof is as follows. In Section III-A we use an adaptation of the Chapman-Robbins lower bound [19], to derive a lower bound on the MSE in terms of the divergence, this is Theorem III.2. Then, in Section III-B, we express the

divergence in terms of the Taylor expansion of the posterior probability density and the moment tensors, obtaining Lemma

III.3. Finally in section III-C we combine Theorem III.2 and Lemma III.3 to obtain (II.4), use Lemma III.3 to obtain a similar Taylor expansion for the Kullback-Leibler (KL) divergence and use this to show that the MLE is consistent.

Throughout the paper we denote the expectation by

, use capital letter for random variables and lower case letter for instances of these random variables. Let

be the collection of all measurements as columns in a matrix. Let us denote by the probability density of the posterior distribution of ,


and the expectation of a function of the measurements under the measure by

For ease of notation, we write when the signal and distribution are implicit. The bias-variance trade-off of the MSE is given by:




Our last definition is of the

divergence, which gives a measure of how "far" two probability distributions are.

Definition III.1.

The divergence between two probability densities and is defined by

where .

Due to equation (III.1), the relation between the divergence for and one observations is given by


Iii-a Chapman-Robbins lower bound for an orbit

The classical Chapman-Robbins gives a lower bound on an error metric of the form , hence we modified it to accommodate to the group invariant metric defined in (II.2). We point out that is related to the by (III.2).

Theorem III.2 (Chapman-Robbins for orbits).

For any and group distribution in , we have

where .


The proof mimics the one of the classical Chapman and Robbins bound, and is also presented in [13, Appendix A]. Define

and note that

We have

and by Cauchy-Schwarz

Iii-B divergence and moment tensors

In this subsection we give a characterization of the divergence, which appears in the Chapman-Robbins bound, in terms of the moment tensors.

Instead of considering the posterior probability density of , we will consider its normalized version . We then have


where , and . While this change of variable does not change the divergence, we can now take the Taylor expansion of the probability density around , that is,


where is the probability density of (since when , ) and


thus . We note is in infinitely differentiable for all , thus is always well-defined. We now use  (III.6) to give an expression of the divergence in terms of the moment tensors.

Lemma III.3.

The divergence is expressed in terms of the moment tensors as:


where .


This proof is presented in more detail in [13, Appendix B]. Equation (III.8) is obtained by Taylor expanding the divergence around , using (III.6) and the fact that almost surely for all , which follows from the definition of and equation (III.9). Now to prove (III.9), it is enough to show that


Let and be two independent random variables such that and . On one hand we have


On the other hand, we can write explicitly by


where , and using equation (III.7) we can write

where and are defined as in (III.11), and (III.10) finally follows from equation (III.11). ∎

Iii-C Final details of the proof of Theorem ii.2

By Theorem III.2, Lemma III.3, equations (III.3) and (III.4) we obtain


Equation (II.4) now follows from

and taking the supremum over and .

Finally we prove that the MLE is consistent, i.e. it converges to the true signal in probability, when . Let


The MLE is given by

Fix and , and for ease of notation let . We can write

We have

where denotes the KL divergence, defined for two probability densities and as

where .

Using (III.6), with , we have as , which implies by [20, Section F, Theorem 9] that


thus by the law of large numbers, since


As , the maximum of tends to in probability, and is achieved when , thus the MLE must converge in probability to for some .


EA was partly supported by the NSF CAREER Award CCF–1552131, ARO grant W911NF–16–1–0051 and NSF Center for the Science of Information CCF–0939370. JP and AS were partially supported by Award Number R01GM090200 from the NIGMS, the Simons Foundation Investigator Award and Simons Collaborations on Algorithms and Geometry, the Moore Foundation Data-Driven Discovery Investigator Award, and AFOSR FA9550-17-1-0291.

We would like to thank Afonso Bandeira, Tamir Bendory, Joseph Kileel, William Leeb and Nir Sharon for many insightful discussions.