1 Introduction
Singleparticle cryoelectron microscopy (cryoEM) is an imaging method for determining the highresolution 3D structure of biological macromolecules without crystallization [24, 33]. The reconstruction process in cryoEM determines the 3D structure of a molecule from its noisy 2D tomographic projection images. By virtue of the experimental setup, each projection image is taken at an unknown viewing direction and has a very high level of noise, due to the small electron dose one can apply to the specimen before inflicting severe radiation damage, e.g., [11, 23, 39]. The computational pipeline that leads from the raw data, given many large unsegmented micrographs of projections, to the 3D model consists of the following stages. The first step is particle picking, in which 2D projection images are selected from micrographs. The selected particle images typically undergo 2D classification to assess data quality and further improve particle picking. At this point, the 3D reconstruction process begins, where often it is divided into two substeps of lowresolution modeling and 3D refinement. In this paper, we focus on the mathematical aspects of the former, namely the modeling part. In particular, we suggest using the method of moments (MoM) for ab initio modeling. We illustrate this workflow with an overview given in Figure 1.
The last step in the reconstruction, also known as the refinement step, aims to improve the resolution as much as possible. This refinement process is typically a variant of the expectationmaximization (EM) algorithm which seeks the maximum likelihood estimator (MLE) via an efficient implementation,
e.g., [50]. As such, 3D refinement requires an initial structure that is close to the correct target structure [26, 49]. Serving this purpose, an ab initio model is the result of a reconstruction process which depends solely on the data at hand with no a priori assumptions about the 3D structure of the molecule [47]. We remark that the two primary challenges for cryoEM reconstruction are the high level of noise and the unknown viewing directions. Mathematically, without the presence of noise, the unknown viewing directions could be recovered using common lines [58, 59]. Then, the 3D structure follows, for example, by tomographic inversion, see, e.g., [2]. Reliable detection of common lines is limited however to high SNR. As a result, the application of common lines based approaches is often limited to 2D class averages rather than the original raw images [53]. Other alternatives such as frequency marching [7] and optimization using stochastic gradient have been suggested [46]. As optimization processes are designed to minimize highly nonconvex cost functions, methods like SGD are not guaranteed to succeed. In addition, as in the case of EM, it is not a priori clear how many images are required.Approximately forty years ago, Zvi Kam proposed a method for 3D ab initio reconstruction based on computing the mean and covariance of the 2D noisy images [31]. In order to uniquely determine the volume, the third moment (triple correlation) is also used besides the mean and covariance. In this approach, known as Kam’s method, the 3D volume is reconstructed without estimating the viewing directions. In this sense, Kam’s method is strikingly different from common lines based approaches and maximum likelihood and other optimization methods that rely on orientation estimation for each image. Crucially, Kam’s method is effective at arbitrary levels of noise, given sufficiently many picked particles for accurate estimation of the moment statistics. Additionally, Kam’s method does not require any starting model, and it requires only one pass through the data to compute moments (contrary to other approaches needing access to the measurements multiple times). Despite the aforementioned advantages, Kam’s method relies on the restrictive assumption that the viewing directions for the images are distributed uniformly over the sphere. This hypothesis, alongside other technical issues, has so far prevented a direct application of Kam’s method to experimental cryoEM data, for which viewing angles are typically nonuniform [4, 25, 42, 56]. This situation motivates us to explore generalizations of Kam’s method better suited to cryoEM data.^{1}^{1}1We remark that Kam’s method, assuming uniform rotations, is of significant current interest in Xray free electron laser (XFEL) single molecule imaging, where the assumption of uniformity more closely matches experimental reality [20, 43, 62].
In this paper, we generalize Kam’s theory to the case of nonuniform distribution of viewing directions. We regard Kam’s original approach with uniform distribution of viewing angles as a degenerate instance of MoM. In our formulation, we estimate both the 3D structure and the unknown distribution of viewing angles jointly from the first two moments of the Fourier transformed images. More precisely, for
images , the first and second empirical moments of the Fourier transformed images, given in polar coordinates, , are(1) 
which are 2D and 4D tensors, respectively. Our basic rationale for trying to obtain the volume from the first two moments is as follows. Supposing the distribution of rotations of the
image plane to be uniform, then in the limit the first moment is radially symmetric, that is, it is only a function of but is independent of . Therefore,may be regarded as a 1D vector. Similarly, the second moment is a 3D tensor (rather than 4D) since it will only depend on
and through as . Also is linearly related the molecule’s volume via a tomographic projection. Thus, for images of size pixels, the first and second moments should give rise to polynomial equations for the unknown volume and distribution. Assuming the volume is of size (and the distribution is of lower dimensionality), then the first and second moments have just the right number of equations (in terms of leading order) to determine the unknowns. Unfortunately, when the distribution of viewing directions is uniform, as noted by Kam [31], the information encoded in the second moment is algebraically redundant; essentially it is the autocorrelation function (or equivalently, the power spectrum), and this information is insufficient for determining the structure of the molecule. As we will see, a nonuniform distribution of viewing directions introduces additional terms in both the first and second moments, and extends the number of independent equations beyond the autocorrelation case. In particular, we will show that nonuniformity guarantees uniqueness from the analytical counterparts of and in cases of a known distribution, and it guarantees finitely many solutions in other, more realistic, cases of an unknown distribution.Our work is inspired by several earlier studies on simplified models in a setting called MultiReference Alignment (MRA). In MRA, a given group of transformations acts on a vector space of signals [5]. For example, the group acts on the space of bandlimited signals over the unit circle by rotating them counterclockwise (as a 1D analog of cryoEM). The task then is to estimate a ground truth signal from multiple noisy samples, corresponding to unknown group elements of a finite cyclic subgroup of acting on the signal. The papers [6, 9]
show that for a uniform distribution over the group, the signal can be estimated from the third moment, and the number of samples required scales like the third power of the noise variance. On the other hand, for a nonuniform and also aperiodic distributions over the group, the signal can be estimated from the second moment, and the required number of samples scales quadratically with the noise variance
[1].Despite the success of signal recovery in MRA from the first two moments under the action of the cyclic group, it is not apparent that such a strategy is still applicable in the case of cryoEM. First, in cryoEM, each image is obtained from the ground truth volume not just by applying a rotation in , but also a tomographic projection. Moreover, the studies mentioned above (of MRA) consider finite abelian groups, whereas, in the case of cryoEM, the group under consideration is the continuous noncommutative group . The goal of this paper is then to investigate whether the first and second moment of the images is also sufficient for solving the inverse problem of structure determination in the cryoEM setting.
1.1 Our contribution
We formulate the reconstruction problem in cryoEM as an inverse problem of determining the volume and the distribution of viewing directions from the first two moments of the images. Assuming the volume and distribution are bandlimited functions, they are discretized by Prolate Spheroidal Wave Functions (PSWFs) and Wigner matrices, respectively. The moments give rise to a polynomial system in which the unknowns are the coefficients of the volume and the distribution. Using computational algebraic geometry techniques [19, 22, 55], we exhibit a range of band limits for the volume and the distribution such that the polynomial system has only finitely many solutions, pointing to the possibility of exact recovery in these regimes. Additionally, we comment on numerical stability issues, by providing condition number formulas for moment inversion. In the setting where the rotational distribution is known, we prove that the number of solutions is generically 1 and present an efficient algorithm for recovering the volume using ideas from tensor decomposition [29]. For the practical case of an unknown distribution, we rely on methods from nonconvex optimization and demonstrate, with synthetic data, successful ab initio model recovery of a molecule from the first two moments.
1.2 Organization
The paper is organized as the following. In Section 2, we present discretizations for the volume and distribution and derive the polynomial system obtained from the first two moments. In Section 3, we demonstrate that there exists a range of band limits where the polynomial system for the unknown molecule and distribution has only finitely many solutions. In Section 4, we discuss some implementation details on how the system is solved and present numerical and visual results. Proofs and background material are provided in appendices. For research reproducibility, MATLAB code is publicly available at GitHub.com.^{2}^{2}2The full address of the GitHub repository is https://github.com/nirsharon/nonuniformMoM.
2 Method of moments
We begin by introducing the image formation model. Then, convenient basis for discretizing various continuous objects, namely the images and the volume (in the Fourier domain) as well as the distribution for orientations, are introduced. From these, relationships between the moments of the 2D images and the 3D molecular volume can be derived, enabling us to fit the molecular structure to the empirical moments of the images.
2.1 Image formation model and the 3D reconstruction problem
In cryoEM, data is acquired by projecting particles embedded in ice along the direction of the beaming electrons, resulting in tomographic images of the particles. The particles orient themselves randomly with respect to the projection direction. More formally, let be the Coulomb potential of the 3D volume, and the projection operator be denoted by , where
(2) 
Assuming the th particle comes from the same volume but rotated by , the image formation model is [10, 24]
(3) 
where is a random field modeling the noise term and is a point spread function, whose Fourier transform is known as the contrast transfer function (CTF) [40, 48, 57]. Each image is assumed to lie within the box . For size discretized images, we assume the random field . Here denotes an element in the group of rotations , and we define the group action by^{3}^{3}3Here we prefer to write the action of and correspondingly later we use Wigner matrices, instead of and Wigner matrices. While simply notational, these conventions allow us to cite identities from [18] verbatim, which are in terms of Wigner matrices and not Wigner matrices.
(4) 
The rotations ’s are not known since the molecules can take any orientation with respect to projection direction. For the purpose of simplifying the exposition, we shall henceforth disregard the CTF, by assuming
(5) 
The presence of CTF is not expected to have a major impact on our main results, and we will incorporate the CTF in a future work. Typically, it is convenient to consider the Fourier transform of the images, since by projection slice theorem, the Fourier transform of gives a slice of the Fourier coefficients of the volume :
(6) 
The goal of cryoEM is to recover from the Fourier coefficients of the projections . While reconstructing given estimated ’s amounts to solving a standard computed tomography problem, we wish to reconstruct directly from the noisy images without estimating the rotations, for reasons detailed above. To this end, we assume the rotations are sampled from a distribution on , where is a smooth bandlimited function. Then from the empirical moments of the images , we jointly estimate the volume and the distribution .
2.2 Representation of the volume, the distribution of rotations and the images
As mentioned previously, the proposed method of moments consists of fitting the analytical moments
(7) 
to their empirical counterparts and as appears in (1) after debiasing.^{4}^{4}4
By the law of large numbers,
and almost surely as , so is fitted to and to . For notational convenience, we drop in what follows, either assuming has been appropriately debiased already or . Through fitting to the empirical moments, we seek to determine the Fourier volume and also the distribution . In this section, we present discretizations of and by expanding them using convenient bases.2.2.1 Basis for the Fourier volume
Since the image formation model involves rotations of the Fourier volume , it is convenient to represent as an element of a function space closed under rotations; in fact, this is the same as representing using spherical harmonics (see the PeterWeyl theorem [18]):
(8) 
Here are the (complex) spherical harmonics:
(9) 
with associated Legendre polynomials defined by:
(10) 
In Cartesian coordinates, spherical harmonics are polynomials of degree . Continuing, in equation (8), without loss of generality the radial frequency functions should form an orthonormal family (for each fixed ) with respect to , where is referred to as the radial index. Choices of radial functions suitable for molecular densities include spherical Bessel functions [3]
, which are eigenfunctions of the Laplacian on a closed ball with Dirichlet boundary condition, as well as the radial components of 3D prolate spheroidal wave functions
[54], to name a few.We assume the volume is bandlimited with Fourier coefficients supported within a radius of size , i.e., the Nyquist cutoff frequency for the images ’s discretized on a grid of size (over the square . Under this assumption, the maximum degree and radial indices and in (8) are essentially finite. Further details on the particular basis functions and cutoffs and that we choose to use are deferred to Section A in the appendix. The coefficients furnish our representation of using spherical harmonics. Note that since is real valued, its Fourier transform is conjugatesymmetric, which imposes restrictions on the coefficients . The specific constraints are presented in Section 4.1.
The advantage of expanding in terms of spherical harmonics is that the space of degree spherical harmonics is closed under rotation; in grouptheoretic language, this space forms a linear representation of .^{5}^{5}5In fact, this is an irreducible representation of and varying these give all irreps. Thus the action of a rotation on
amounts to a linear transformation on the expansion coefficients
(with a block structure according to and ). More precisely, fixing the vector space spanned by for a specific , the action of a rotation on this vector space is represented by the Wigner matrix (see [18, p. 343]) so that:(11) 
In particular, the matrix is unitary, with entries degree polynomials in the entries of [18]. For all and , the group homomorphism property reads . In light of (11), 3D bases of the form have been called steerable bases.
2.2.2 Basis for the probability distribution of rotations
As we shall see, when expanding the volume in terms of spherical harmonics, the analytical moments (7) involve integrating different monomials of with respect to the measure
. To this end, we assume the probability density
over is a smooth bandlimited function (and in a function space closed under rotation) by expanding(12) 
Here for are the matrix entries of the Wigner matrices above. By PeterWeyl, these form an orthonormal basis for , and for higher they are increasingly oscillatory functions on . Thus, expansion (12) is analogous to using spherical harmonics to expand a smooth function on the sphere, or using Fourier modes for a function on the circle. The cutoff is the band limit of the distribution ; we shall see in the next section that since we use only first and second moments it makes sense to assume . Note that in the special case of a uniform distribution, the only nonzero coefficient is . Also, denotes the Haar measure, which is the unique volume form on the group of total mass one that is invariant under left action. Using the Euler angles parameterization of , the Haar measure is of the form
(13) 
where the normalizing constant ensures .
2.2.3 Basis for the 2D images
At this point, we discuss convenient representations for the images after Fourier transform, . Similarly to volumes, it is desirable to represent images using a function space closed under inplane rotations, i.e., . By the PeterWeyl theorem, this is the same as expanding using Fourier modes, in a 2D steerable basis:
(14) 
Here the radial frequency functions (for fixed ) are taken to be an orthonormal basis with respect to , with referred to as the radial frequency. Comparing to expansion (8) (see Section 2.2), it makes most sense to set . Again, owing to the Nyquist frequency for the discretized images , we may bound the cutoffs . Typical choices for for representing tomographic images include FourierBessel functions [63] and the radial components of 2D prolate spheroidal wave functions [54]. Details on our specific choices are given in Section A.2 in the appendix.
2.2.4 Choice of radial functions
For the finite expansions in (8) and (14) to accurately represent the Fourier transforms of the electric potential and its slices, one should carefully choose the radial functions and , together with the truncationrelated quantities , , , and . In this work, we consider and to be the radial parts of the threedimensional and twodimensional PSWFs [54], respectively. In Appendix A, we describe some of the key properties of the PSWFs, and propose upper bounds for setting , , , and . In practice, band limits would be selected by balancing these expressivity considerations together with the wellposedness and conditioning considerations of Section 3.
2.3 Loworder moments
In this section, we derive the analytical relationship between the first two moments for the observed images , and the coefficients and of the volume and distribution of rotations. These relationships will be used to determine and via solving a nonlinear leastsquares problem.
To this end, we first register a crucial relationship between the coefficients of the 2D images and the 3D volume. By indexing the images in terms of (instead of in (14)), we have:
(15) 
On the other hand, using the Fourier slice theorem and (11):
(16)  
(17)  
(18) 
Multiplying (15) and (16) by and integrating against , then combining the orthogonality relation
with , tells us
(19) 
where are constants depending on the radial functions:
(20)  
(21) 
From the term , we see if (mod ) (and if then ). Also one may check . Equation (19) connects 2D image coefficients with 3D volume coefficients. We note we may as well choose in (15), since if then .
2.3.1 The first moment
In this section, from (19) the relationship between the first moment of the images and the volume is derived. Taking the expectation over , since the expectation of the noise term vanishes, we get
(22)  
(23)  
(24) 
The last equation follows from the orthogonality of the Wigner matrix entries
(25) 
and
(26) 
The first moment gives a set of bilinear forms in unknowns , namely, the expression in (24) for each with and .
It is convenient to provide compact notation for the first moment formula. To this end, we introduce:

, a matrix of size given by

, a vector of size given by

, a matrix of size given by .
Item 2 is zero if and item 3 is zero if either or . In this notation, the first moment formula (24) (with fixed and varying ) reads:
(27) 
Here is nonzero only if .
2.3.2 The second moment
Higher moments require higher powers of the variable, and so in the case of the second moment and for , we have
(28)  
(29) 
where
(30) 
The product of two Wigner matrix entries is expressed as a linear combination of Wigner matrix entries [18, p. 351],
(31) 
where
(32) 
is the product of two ClebschGordan coefficients. This product is nonzero only if satisfy the triangle inequalities. Substituting (31) into (30), and invoking (25) and (26), we obtain:
(33) 
where the sum is over satisfying . Now substituting into (28) gives:
(34) 
where the first sum has the range of (28) and the second sum has range of (33). The second moment thus gives a set of polynomials in unknowns and , quadratic in the volume coefficients and linear in the distribution coefficients, namely, the expression in (34) for each with , , , and . Also, we see one may as well assume , since does not contribute in either (34) or (24).
As for the first moment, it will be convenient to rewrite the second moment in compact notation. Let us further introduce:

, a matrix of size given by
where the sum is over and denotes the product ClebschGordan coefficients in (32).
Item 4 is zero if either or or . Now for fixed and varying , the second moment (34) neatly reads:
(35) 
Here is nonzero only if and .
2.3.3 Remark: uniform rotations
Empirically in [31] and theoretically in [5], it was shown that a similar set of volume coefficients may be reconstructed from the first three moments of ’s, under the assumption that the rotation matrices in (5) are uniformly distributed over , i.e., . In this case, the first two moments of fail to determine . We distinguish our work from this prior art by removing the assumption that the rotations in (5) are uniformly distributed over . First of all, the assumption that the particles take on uniformly distributed orientations is often impractical, limiting the applicability of Kam’s method to cryoEM. Secondly, dealing explicitly with nonuniform distribution may allow us to work with just the first and second order moment (similar to [1]), thus bypassing the need of working with the third moment, which is more difficult to estimate statistically.
3 Uniqueness Guarantees and Conditioning
Here, we derive uniqueness guarantees and comment on intrinsic conditioning for the polynomial system defined by the first and second moments, (27) and (35).
Analysis comes in four cases, according to assumptions on the distribution : whether is known or unknown; and if is invariant to inplane rotations (i.e., only the viewing direction is nonuniform, see subsection 4.2) or is totally nonuniform (as a distribution on ). Throughout, our general finding is wellposedness, i.e., the molecule is uniquely determined by first and second moments up to finitely many solutions, under genericity assumptions, for a range of band limits and . In the case of a known totally nonuniform distribution, we prove the number of solutions is , and give an efficient closedform algorithm to solve for . For all cases, sensitivity of the solution to errors in the moments is quantified by condition number formulas.
3.1 Known, totally nonuniform
For this case, we have a “closedform” provable algorithm that recovers from (27) and (35) (up to the satisfaction of technical genericity and band limit conditions). Remarkably, while the polynomial system is nonlinear (consisting of both quadratic and linear equations), our method is based only on linear algebra. The main technical idea is simultaneous diagonalization borrowed from Jennrich’s wellknown algorithm for thirdorder tensor decomposition [29], that was also used recently for signal recovery in MRA [44].
Theorem 1.
The molecule is uniquely determined by the analytical first and second moments, (27) and (35), in the case the distribution is totally nonuniform, known and , provided it also holds:

The matrices and of size both have full rank, and
has distinct eigenvalues. Likewise
and of size both have full rank, and has distinct eigenvalues. 
Writing and for eigendecompositions, the vectors of size and of size both have no zero entries.

For , the matrix of size has full row rank.

For all , the matrix of size has full column rank.

For with , the matrix of size has full column rank.
Moreover in this case, there is a provable algorithm inverting (27) and (35) to get in time , where .
Proofs for this subsection are in Appendix B. We remark that condition (iv), which just involves the choice of radial bases, appears to always hold for PSWFs using the cutoffs proposed in Appendix A. Conditions (i), () and (iii) just involve the distribution, and are fullrank, spectral and nonvanishing hypotheses. Condition (v) just involves the molecule and in particular requires , which limits to be less than the Nyquist frequency where .
Our algorithm goes by reverse^{6}^{6}6Reverse frequency marching is natural given the sparsity structure of (35): only and with , and , appear in the moments . frequency marching, as we solve for topfrequency coefficients from the second moment (35) where
via eigenvectors (similar to simultaneous diagonalization in Jennrich’s algorithm), and then solving for lowerfrequency coefficients via linear systems. While our conditions in Theorem
1 are certainly not necessary, fortunately for generic^{7}^{7}7This means generic with respect to the Zariski topology [28]. Equivalently, there is a nonzero polynomial in such that implies the conditions in Theorem 1 are met. , those conditions are satisfied, so that the method applies:Lemma 2.
Condition (ii) in Theorem 1 holds for Zariskigeneric . If , then condition (iii) holds for Zariskigeneric . At least for , conditions (i) and () hold for Zariskigeneric .
By uniqueness, is a welldefined function of the first and second moments and almost everywhere. It is useful to quantify the sensitivity of to errors in , as, e.g., in practice one can access only empirical estimates and . An a posteriori (absolute) condition number for
is given by the reciprocal of the least singular value of the Jacobian matrix of the algebraic map:
(36) 
Throughout this section, all condition formulas are in the sense of [16, Section 14.3], for which the domain and image of our moment maps are viewed as Riemannian manifolds. To this end, when is unknown, dense open subsets of the orbit spaces , , naturally identify as Riemmannian manifolds (for the construction, see [15]).
3.2 Known, inplane uniform
For this case, given a particular image size (and other image parameters), together with band limits and , we have code which decides if, for generic and , the molecule is determined by (27) and (35), up to finitely many solutions. The basis for this code is the socalled Jacobian test for algebraic maps, see Appendix C. Below is an illustrative computation.