I Introduction
In this paper, we consider the problem of estimating with measurements from the group action channel, defined as
(I.1) 
where the are i.i.d and drawn from , i.e. and its entries are i.i.d standard Gaussian variables; is a projection matrix which is known; are i.i.d. matrices drawn from a distribution on a compact subgroup of , i.e. the space of orthogonal matrices in . The distribution is not known, however the main goal is to estimate the signal .
The goal of this paper is to understand the sample complexity of (I.1), i.e. the relation between the number of measurements and the noise standard deviation such that an estimator , of
, converges in probability to the true value with
diverging, up to a group action. Allowing for a group action is intrinsic to the problem: if we apply an element of to , and its inverse to the right of , we will produce exactly the same samples, thus there is no estimator that is able to distinguish the observations that originate from and the ones from .The model (I.1) is a generalization of multireference alignment (MRA), which arises in a variety of engineering and scientific applications, among them structural biology [1, 2, 3], radar [4, 5], robotics [6] and image processing [7, 8, 9]. The onedimensional MRA problem, where is the group generated by the matrix that cyclically shifts the elements of the signal, i.e. it maps , has been recently a topic of active research. In [10], it was shown that the sample complexity is when
is the uniform distribution and the projection matrix is the identity. In
[11] it was presented a provable polynomial time estimator that achieves the sample complexity, while in [12] it was presented a nonconvex optimization framework that is more efficient in practice. Note that, when the projection matrix is the identity, we can always enforce a uniform distribution on by applying a random group action, i.i.d. and drawn from the uniform distribution, to the observations. In [13], it was shown that is also the sample complexity if is unknown beforehand but is uniform or periodic, this is, for some . However, if is aperiodic, the sample complexity is . It is also presented an efficient estimator that uses the first and second moments of the signal over the group, which can be estimated with order of and observations, respectively, thus achieving the sample complexity. The main result in this paper is a generalization of the information lower bound presented in [13], however the proof techniques remain the same.We can also use (I.1) to model the problem of single particle reconstruction in cryoelectron microscopy (cryoEM), in which a threedimensional volume is recovered from twodimensional noisy projections taken at unknown viewing directions [14, 15]. Here
is a linear combination of products of spherical harmonics and radial basis functions,
, and its elements act on by rotating the basis functions. Finally, is a tomographic projection onto the plane. The paper [16] considers the problem (I.1) with being known and uniform. It obtains the same result for the sample complexity as this paper, and together with results from computational algebra and invariant theory verifies that in many cases the sample complexity for the considered cryoEM model is , and at least more generally. They also consider the problem of heterogeneity in cryoEM.Ii The Main Result
Since we can only determine up to a group action, we define the best alignment of with by
(II.1) 
and the mean square error (MSE) as
(II.2) 
The expectation is taken over , which is a function of the observations with distribution determined by (I.1). Since we are interested in estimators that converge to an orbit of in probability as diverges, we only consider estimators which are asymptotically unbiased, i.e., as . However the results presented in this paper can be adapted to biased estimators (see Theorem III.2).
Let us introduce some notation regarding tensors. For a vector
, we denote by the dimensional tensor where the entry indexed by is given by . The space of dimensional tensors forms a vector space, with sum and multiplication defined entrywise. This vectorspace has inner product and norm defined by and , respectively.Definition II.1.
The th order moment of over , is the tensor of order and dimension , defined by
(II.3) 
where .
In this paper, we provide lower bounds for the MSE in terms of the noise standard deviation and the number of observations. We show that the MSE is bounded below by order of , where is the moment order cutoff, defined as the smallest such that the moment tensors up to order define unequivocally. We also show that if , then the marginalized maximum likelihood estimator (MLE) converges in probability to the true signal (up to a group action). We now present the main result of the paper.
Theorem II.2.
Consider the estimation problem given by equation (I.1). For any signal such and for any group distribution , let , and define the moment order cutoff as . Finally let
We have
(II.4) 
thus the MSE is bounded away from zero if is bounded from above. Moreover, if , then the MLE converges in probability to , for some element .
Iia Taking the limit
Theorem II.2 is an application of a modified ChapmanRobbins bound, presented later in Theorem III.2. On the other hand the classical CramérRao bound [17]
, which gives a lower bound on the variance of an estimator
of a parameter , can be obtained from the ChapmanRobbins bound by taking the limit . We present an analog version of Theorem II.2 obtained by taking a similar limit.Corollary II.3.
We leave the proof of this corollary to [13, Appendix C]. It is interesting to compare this bound with (II.4) when or diverge. If , then will dominate , and the lower bound for the MSE will be inversely proportional to , which is a behavior typical of estimation problems with continuous model paramaters. On the other hand, if , then dominates . the MSE will depend exponentially on , which is a behaviour typical of discrete parameter estimation problems [18]. One can show that only happens when the supremum in is attained by some not in the orbit of . The exponential decay in is the same as the probability of error of the hypothesis testing which decides if the observations come from or .
We conjecture that the lower bounds presented in this paper can be achieved asymptotically by the MLE. In fact when the search space is discrete, the MLE achieves the least probability of error (assuming a uniform prior on the parameters), which behaves like (II.4). Also, when the search space is continuous, the MLE is asymptotically efficient, which means it achieves the CramérRao lower bound. However this bound is obtained from the ChapmanRobbins lower bound (which we use in this paper) by taking a similar limit as in (II.5), and the bound also scales inversely proportional to the number of observations.
IiB Prior Knowledge
The result presented can be adapted to improve the bound if we have prior knowledge about the signal and group distribution. If we know beforehand that (for instance, has a zero element or is the uniform distribution on ), we can instead define and restrict the supremum in (II.4) to in .
IiC Examples
1) Let ; be the group generated by the cyclic shift matrix that maps ; and projects into its first two elements, i.e . Furthermore, we know apriori that one, and only one, of the elements of is (let’s assume without loss of generality that ), the other two elements are distinct and is uniform, i.e. . We have
and
From these two moments, we can solve for and , however all these equations are symmetric on and , thus we can’t identify which one of the values obtained is and which one is . In other words, both candidate solutions are and . However differs from , if we look for the entry in indexed by we note that
and analogously . From the 8 entries of , are equal to and differ by , in absolute value, so . This means , , thus if the lower bound (II.4) dominates (II.5), the supremum is attained at and
Note that .
2) Let ; be the group generated by the cyclic shift matrix that maps ; and projects into its first element, i.e . Furthermore, we know apriori that is uniform, i.e. . We have
and
From these two moments we can determine and up to an action of the group. Now take , so that . We have
Here , thus if diverges, (II.5) dominates (II.4), and the lower bound is
Iii Proof Techniques
The outline of the proof is as follows. In Section IIIA we use an adaptation of the ChapmanRobbins lower bound [19], to derive a lower bound on the MSE in terms of the divergence, this is Theorem III.2. Then, in Section IIIB, we express the
divergence in terms of the Taylor expansion of the posterior probability density and the moment tensors, obtaining Lemma
III.3. Finally in section IIIC we combine Theorem III.2 and Lemma III.3 to obtain (II.4), use Lemma III.3 to obtain a similar Taylor expansion for the KullbackLeibler (KL) divergence and use this to show that the MLE is consistent.Throughout the paper we denote the expectation by
, use capital letter for random variables and lower case letter for instances of these random variables. Let
be the collection of all measurements as columns in a matrix. Let us denote by the probability density of the posterior distribution of ,(III.1) 
and the expectation of a function of the measurements under the measure by
For ease of notation, we write when the signal and distribution are implicit. The biasvariance tradeoff of the MSE is given by:
(III.2) 
with
(III.3) 
Our last definition is of the
divergence, which gives a measure of how "far" two probability distributions are.
Definition III.1.
The divergence between two probability densities and is defined by
where .
Due to equation (III.1), the relation between the divergence for and one observations is given by
(III.4) 
Iiia ChapmanRobbins lower bound for an orbit
The classical ChapmanRobbins gives a lower bound on an error metric of the form , hence we modified it to accommodate to the group invariant metric defined in (II.2). We point out that is related to the by (III.2).
Theorem III.2 (ChapmanRobbins for orbits).
For any and group distribution in , we have
where .
Proof.
The proof mimics the one of the classical Chapman and Robbins bound, and is also presented in [13, Appendix A]. Define
and note that
We have
and by CauchySchwarz
∎
IiiB divergence and moment tensors
In this subsection we give a characterization of the divergence, which appears in the ChapmanRobbins bound, in terms of the moment tensors.
Instead of considering the posterior probability density of , we will consider its normalized version . We then have
(III.5) 
where , and . While this change of variable does not change the divergence, we can now take the Taylor expansion of the probability density around , that is,
(III.6) 
where is the probability density of (since when , ) and
(III.7) 
thus . We note is in infinitely differentiable for all , thus is always welldefined. We now use (III.6) to give an expression of the divergence in terms of the moment tensors.
Lemma III.3.
The divergence is expressed in terms of the moment tensors as:
(III.8)  
(III.9) 
where .
Proof.
This proof is presented in more detail in [13, Appendix B]. Equation (III.8) is obtained by Taylor expanding the divergence around , using (III.6) and the fact that almost surely for all , which follows from the definition of and equation (III.9). Now to prove (III.9), it is enough to show that
(III.10) 
Let and be two independent random variables such that and . On one hand we have
(III.11) 
On the other hand, we can write explicitly by
(III.12) 
where , and using equation (III.7) we can write
where and are defined as in (III.11), and (III.10) finally follows from equation (III.11). ∎
IiiC Final details of the proof of Theorem ii.2
By Theorem III.2, Lemma III.3, equations (III.3) and (III.4) we obtain
(III.13) 
Equation (II.4) now follows from
and taking the supremum over and .
Finally we prove that the MLE is consistent, i.e. it converges to the true signal in probability, when . Let
(III.14) 
The MLE is given by
Fix and , and for ease of notation let . We can write
We have
where denotes the KL divergence, defined for two probability densities and as
where .
Using (III.6), with , we have as , which implies by [20, Section F, Theorem 9] that
and
thus by the law of large numbers, since
diverges,As , the maximum of tends to in probability, and is achieved when , thus the MLE must converge in probability to for some .
Acknowledgments
EA was partly supported by the NSF CAREER Award CCF–1552131, ARO grant W911NF–16–1–0051 and NSF Center for the Science of Information CCF–0939370. JP and AS were partially supported by Award Number R01GM090200 from the NIGMS, the Simons Foundation Investigator Award and Simons Collaborations on Algorithms and Geometry, the Moore Foundation DataDriven Discovery Investigator Award, and AFOSR FA95501710291.
We would like to thank Afonso Bandeira, Tamir Bendory, Joseph Kileel, William Leeb and Nir Sharon for many insightful discussions.
References
 [1] W. Park, C. R. Midgett, D. R. Madden, and G. S. Chirikjian, “A stochastic kinematic model of class averaging in singleparticle electron microscopy,” The International journal of robotics research, vol. 30, no. 6, pp. 730–754, 2011.
 [2] W. Park and G. S. Chirikjian, “An assembly automation approach to alignment of noncircular projections in electron microscopy,” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 3, pp. 668–679, 2014.
 [3] S. H. Scheres, M. Valle, R. Nuñez, C. O. Sorzano, R. Marabini, G. T. Herman, and J.M. Carazo, “Maximumlikelihood multireference refinement for electron microscopy images,” Journal of molecular biology, vol. 348, no. 1, pp. 139–149, 2005.
 [4] J. P. Zwart, R. van der Heiden, S. Gelsema, and F. Groen, “Fast translation invariant classification of HRR range profiles in a zero phase representation,” IEE ProceedingsRadar, Sonar and Navigation, vol. 150, no. 6, pp. 411–418, 2003.

[5]
R. GilPita, M. RosaZurera, P. JaraboAmores, and F. LópezFerreras, “Using multilayer perceptrons to align high range resolution radar signals,” in
International Conference on Artificial Neural Networks
, pp. 911–916, Springer, 2005.  [6] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard, “A certifiably correct algorithm for synchronization over the special euclidean group,” arXiv:1611.00128, 2016.
 [7] I. L. Dryden and K. V. Mardia, Statistical shape analysis, vol. 4. J. Wiley Chichester, 1998.
 [8] H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase correlation to subpixel registration,” IEEE transactions on image processing, vol. 11, no. 3, pp. 188–200, 2002.

[9]
D. Robinson, S. Farsiu, and P. Milanfar, “Optimal registration of aliased images using variable projection with applications to superresolution,”
The Computer Journal, vol. 52, no. 1, pp. 31–42, 2009.  [10] A. Bandeira, P. Rigollet, and J. Weed, “Optimal rates of estimation for multireference alignment,” arXiv preprint 1702.08546, 2017.
 [11] A. Perry, J. Weed, A. Bandeira, P. Rigollet, and A. Singer, “The sample complexity of multireference alignment,” arXiv preprint 1707.00943, 2017.
 [12] T. Bendory, N. Boumal, C. Ma, Z. Zhao, and A. Singer, “Bispectrum inversion with application to multireference alignment,” IEEE Transactions on Signal Processing. To appear.
 [13] E. Abbe, T. Bendory, W. Leeb, J. M. Pereira, N. Sharon, and A. Singer, “Multireference Alignment is Easier with an Aperiodic Translation Distribution,” arXiv:1710.02793, 2017.
 [14] A. Bartesaghi, A. Merk, S. Banerjee, D. Matthies, X. Wu, J. L. Milne, and S. Subramaniam, “2.2 Å resolution cryoEM structure of galactosidase in complex with a cellpermeant inhibitor,” Science, vol. 348, no. 6239, pp. 1147–1151, 2015.
 [15] D. Sirohi, Z. Chen, L. Sun, T. Klose, T. C. Pierson, M. G. Rossmann, and R. J. Kuhn, “The 3.8 Å resolution cryoEM structure of Zika virus,” Science, vol. 352, no. 6284, pp. 467–470, 2016.
 [16] A. S. Bandeira, B. BlumSmith, A. Perry, J. Weed, and A. S. Wein, “Estimation under group actions: recovering orbits from invariants,” arXiv:1712.10163, 2017.
 [17] H. Cramér, Mathematical Methods of Statistics (PMS9), vol. 9. Princeton university press, 2016.
 [18] E. Abbe, J. M. Pereira, and A. Singer, “Sample complexity of the Boolean multireference alignment problem,” in 2017 IEEE International Symposium on Information Theory (ISIT), pp. 1316–1320, June 2017.
 [19] D. G. Chapman and H. Robbins, “Minimum variance estimation without regularity assumptions,” Ann. Math. Statist., vol. 22, pp. 581–586, 12 1951.
 [20] I. Sason and S. Verdú, “divergence inequalities,” IEEE Transactions on Information Theory, vol. 62, no. 11, pp. 5973–6006, 2016.