Single-particle cryo-electron microscopy (cryo-EM) is an imaging method for determining the high-resolution 3-D structure of biological macromolecules without crystallization [24, 33]. The reconstruction process in cryo-EM determines the 3-D structure of a molecule from its noisy 2-D tomographic projection images. By virtue of the experimental setup, each projection image is taken at an unknown viewing direction and has a very high level of noise, due to the small electron dose one can apply to the specimen before inflicting severe radiation damage, e.g., [11, 23, 39]. The computational pipeline that leads from the raw data, given many large unsegmented micrographs of projections, to the 3-D model consists of the following stages. The first step is particle picking, in which 2-D projection images are selected from micrographs. The selected particle images typically undergo 2-D classification to assess data quality and further improve particle picking. At this point, the 3-D reconstruction process begins, where often it is divided into two substeps of low-resolution modeling and 3-D refinement. In this paper, we focus on the mathematical aspects of the former, namely the modeling part. In particular, we suggest using the method of moments (MoM) for ab initio modeling. We illustrate this workflow with an overview given in Figure 1.
The last step in the reconstruction, also known as the refinement step, aims to improve the resolution as much as possible. This refinement process is typically a variant of the expectation-maximization (EM) algorithm which seeks the maximum likelihood estimator (MLE) via an efficient implementation,e.g., . As such, 3-D refinement requires an initial structure that is close to the correct target structure [26, 49]. Serving this purpose, an ab initio model is the result of a reconstruction process which depends solely on the data at hand with no a priori assumptions about the 3-D structure of the molecule . We remark that the two primary challenges for cryo-EM reconstruction are the high level of noise and the unknown viewing directions. Mathematically, without the presence of noise, the unknown viewing directions could be recovered using common lines [58, 59]. Then, the 3-D structure follows, for example, by tomographic inversion, see, e.g., . Reliable detection of common lines is limited however to high SNR. As a result, the application of common lines based approaches is often limited to 2-D class averages rather than the original raw images . Other alternatives such as frequency marching  and optimization using stochastic gradient have been suggested . As optimization processes are designed to minimize highly non-convex cost functions, methods like SGD are not guaranteed to succeed. In addition, as in the case of EM, it is not a priori clear how many images are required.
Approximately forty years ago, Zvi Kam proposed a method for 3-D ab initio reconstruction based on computing the mean and covariance of the 2-D noisy images . In order to uniquely determine the volume, the third moment (triple correlation) is also used besides the mean and covariance. In this approach, known as Kam’s method, the 3-D volume is reconstructed without estimating the viewing directions. In this sense, Kam’s method is strikingly different from common lines based approaches and maximum likelihood and other optimization methods that rely on orientation estimation for each image. Crucially, Kam’s method is effective at arbitrary levels of noise, given sufficiently many picked particles for accurate estimation of the moment statistics. Additionally, Kam’s method does not require any starting model, and it requires only one pass through the data to compute moments (contrary to other approaches needing access to the measurements multiple times). Despite the aforementioned advantages, Kam’s method relies on the restrictive assumption that the viewing directions for the images are distributed uniformly over the sphere. This hypothesis, alongside other technical issues, has so far prevented a direct application of Kam’s method to experimental cryo-EM data, for which viewing angles are typically non-uniform [4, 25, 42, 56]. This situation motivates us to explore generalizations of Kam’s method better suited to cryo-EM data.111We remark that Kam’s method, assuming uniform rotations, is of significant current interest in X-ray free electron laser (XFEL) single molecule imaging, where the assumption of uniformity more closely matches experimental reality [20, 43, 62].
In this paper, we generalize Kam’s theory to the case of non-uniform distribution of viewing directions. We regard Kam’s original approach with uniform distribution of viewing angles as a degenerate instance of MoM. In our formulation, we estimate both the 3-D structure and the unknown distribution of viewing angles jointly from the first two moments of the Fourier transformed images. More precisely, forimages , the first and second empirical moments of the Fourier transformed images, given in polar coordinates, , are
which are 2-D and 4-D tensors, respectively. Our basic rationale for trying to obtain the volume from the first two moments is as follows. Supposing the distribution of rotations of theimage plane to be uniform, then in the limit the first moment is radially symmetric, that is, it is only a function of but is independent of . Therefore,
may be regarded as a 1-D vector. Similarly, the second moment is a 3-D tensor (rather than 4-D) since it will only depend onand through as . Also is linearly related the molecule’s volume via a tomographic projection. Thus, for images of size pixels, the first and second moments should give rise to polynomial equations for the unknown volume and distribution. Assuming the volume is of size (and the distribution is of lower dimensionality), then the first and second moments have just the right number of equations (in terms of leading order) to determine the unknowns. Unfortunately, when the distribution of viewing directions is uniform, as noted by Kam , the information encoded in the second moment is algebraically redundant; essentially it is the autocorrelation function (or equivalently, the power spectrum), and this information is insufficient for determining the structure of the molecule. As we will see, a non-uniform distribution of viewing directions introduces additional terms in both the first and second moments, and extends the number of independent equations beyond the autocorrelation case. In particular, we will show that non-uniformity guarantees uniqueness from the analytical counterparts of and in cases of a known distribution, and it guarantees finitely many solutions in other, more realistic, cases of an unknown distribution.
Our work is inspired by several earlier studies on simplified models in a setting called Multi-Reference Alignment (MRA). In MRA, a given group of transformations acts on a vector space of signals . For example, the group acts on the space of band-limited signals over the unit circle by rotating them counterclockwise (as a 1-D analog of cryo-EM). The task then is to estimate a ground truth signal from multiple noisy samples, corresponding to unknown group elements of a finite cyclic subgroup of acting on the signal. The papers [6, 9]
show that for a uniform distribution over the group, the signal can be estimated from the third moment, and the number of samples required scales like the third power of the noise variance. On the other hand, for a non-uniform and also aperiodic distributions over the group, the signal can be estimated from the second moment, and the required number of samples scales quadratically with the noise variance.
Despite the success of signal recovery in MRA from the first two moments under the action of the cyclic group, it is not apparent that such a strategy is still applicable in the case of cryo-EM. First, in cryo-EM, each image is obtained from the ground truth volume not just by applying a rotation in , but also a tomographic projection. Moreover, the studies mentioned above (of MRA) consider finite abelian groups, whereas, in the case of cryo-EM, the group under consideration is the continuous non-commutative group . The goal of this paper is then to investigate whether the first and second moment of the images is also sufficient for solving the inverse problem of structure determination in the cryo-EM setting.
1.1 Our contribution
We formulate the reconstruction problem in cryo-EM as an inverse problem of determining the volume and the distribution of viewing directions from the first two moments of the images. Assuming the volume and distribution are band-limited functions, they are discretized by Prolate Spheroidal Wave Functions (PSWFs) and Wigner matrices, respectively. The moments give rise to a polynomial system in which the unknowns are the coefficients of the volume and the distribution. Using computational algebraic geometry techniques [19, 22, 55], we exhibit a range of band limits for the volume and the distribution such that the polynomial system has only finitely many solutions, pointing to the possibility of exact recovery in these regimes. Additionally, we comment on numerical stability issues, by providing condition number formulas for moment inversion. In the setting where the rotational distribution is known, we prove that the number of solutions is generically 1 and present an efficient algorithm for recovering the volume using ideas from tensor decomposition . For the practical case of an unknown distribution, we rely on methods from non-convex optimization and demonstrate, with synthetic data, successful ab initio model recovery of a molecule from the first two moments.
The paper is organized as the following. In Section 2, we present discretizations for the volume and distribution and derive the polynomial system obtained from the first two moments. In Section 3, we demonstrate that there exists a range of band limits where the polynomial system for the unknown molecule and distribution has only finitely many solutions. In Section 4, we discuss some implementation details on how the system is solved and present numerical and visual results. Proofs and background material are provided in appendices. For research reproducibility, MATLAB code is publicly available at GitHub.com.222The full address of the GitHub repository is https://github.com/nirsharon/nonuniformMoM.
2 Method of moments
We begin by introducing the image formation model. Then, convenient basis for discretizing various continuous objects, namely the images and the volume (in the Fourier domain) as well as the distribution for orientations, are introduced. From these, relationships between the moments of the 2-D images and the 3-D molecular volume can be derived, enabling us to fit the molecular structure to the empirical moments of the images.
2.1 Image formation model and the 3-D reconstruction problem
In cryo-EM, data is acquired by projecting particles embedded in ice along the direction of the beaming electrons, resulting in tomographic images of the particles. The particles orient themselves randomly with respect to the projection direction. More formally, let be the Coulomb potential of the 3-D volume, and the projection operator be denoted by , where
where is a random field modeling the noise term and is a point spread function, whose Fourier transform is known as the contrast transfer function (CTF) [40, 48, 57]. Each image is assumed to lie within the box . For size discretized images, we assume the random field . Here denotes an element in the group of rotations , and we define the group action by333Here we prefer to write the action of and correspondingly later we use Wigner -matrices, instead of and Wigner -matrices. While simply notational, these conventions allow us to cite identities from  verbatim, which are in terms of Wigner -matrices and not Wigner -matrices.
The rotations ’s are not known since the molecules can take any orientation with respect to projection direction. For the purpose of simplifying the exposition, we shall henceforth disregard the CTF, by assuming
The presence of CTF is not expected to have a major impact on our main results, and we will incorporate the CTF in a future work. Typically, it is convenient to consider the Fourier transform of the images, since by projection slice theorem, the Fourier transform of gives a slice of the Fourier coefficients of the volume :
The goal of cryo-EM is to recover from the Fourier coefficients of the projections . While reconstructing given estimated ’s amounts to solving a standard computed tomography problem, we wish to reconstruct directly from the noisy images without estimating the rotations, for reasons detailed above. To this end, we assume the rotations are sampled from a distribution on , where is a smooth band-limited function. Then from the empirical moments of the images , we jointly estimate the volume and the distribution .
2.2 Representation of the volume, the distribution of rotations and the images
As mentioned previously, the proposed method of moments consists of fitting the analytical moments
to their empirical counterparts and as appears in (1) after debiasing.444 By the law of large numbers,
By the law of large numbers,and almost surely as , so is fitted to and to . For notational convenience, we drop in what follows, either assuming has been appropriately debiased already or . Through fitting to the empirical moments, we seek to determine the Fourier volume and also the distribution . In this section, we present discretizations of and by expanding them using convenient bases.
2.2.1 Basis for the Fourier volume
Since the image formation model involves rotations of the Fourier volume , it is convenient to represent as an element of a function space closed under rotations; in fact, this is the same as representing using spherical harmonics (see the Peter-Weyl theorem ):
Here are the (complex) spherical harmonics:
with associated Legendre polynomials defined by:
In Cartesian coordinates, spherical harmonics are polynomials of degree . Continuing, in equation (8), without loss of generality the radial frequency functions should form an orthonormal family (for each fixed ) with respect to , where is referred to as the radial index. Choices of radial functions suitable for molecular densities include spherical Bessel functions 
, which are eigenfunctions of the Laplacian on a closed ball with Dirichlet boundary condition, as well as the radial components of 3-D prolate spheroidal wave functions, to name a few.
We assume the volume is band-limited with Fourier coefficients supported within a radius of size , i.e., the Nyquist cutoff frequency for the images ’s discretized on a grid of size (over the square . Under this assumption, the maximum degree and radial indices and in (8) are essentially finite. Further details on the particular basis functions and cutoffs and that we choose to use are deferred to Section A in the appendix. The coefficients furnish our representation of using spherical harmonics. Note that since is real valued, its Fourier transform is conjugate-symmetric, which imposes restrictions on the coefficients . The specific constraints are presented in Section 4.1.
The advantage of expanding in terms of spherical harmonics is that the space of degree spherical harmonics is closed under rotation; in group-theoretic language, this space forms a linear representation of .555In fact, this is an irreducible representation of and varying these give all irreps. Thus the action of a rotation on
amounts to a linear transformation on the expansion coefficients(with a block structure according to and ). More precisely, fixing the vector space spanned by for a specific , the action of a rotation on this vector space is represented by the Wigner matrix (see [18, p. 343]) so that:
In particular, the matrix is unitary, with entries degree polynomials in the entries of . For all and , the group homomorphism property reads . In light of (11), 3-D bases of the form have been called steerable bases.
2.2.2 Basis for the probability distribution of rotations
As we shall see, when expanding the volume in terms of spherical harmonics, the analytical moments (7) involve integrating different monomials of with respect to the measure
. To this end, we assume the probability densityover is a smooth band-limited function (and in a function space closed under rotation) by expanding
Here for are the matrix entries of the Wigner matrices above. By Peter-Weyl, these form an orthonormal basis for , and for higher they are increasingly oscillatory functions on . Thus, expansion (12) is analogous to using spherical harmonics to expand a smooth function on the sphere, or using Fourier modes for a function on the circle. The cutoff is the band limit of the distribution ; we shall see in the next section that since we use only first and second moments it makes sense to assume . Note that in the special case of a uniform distribution, the only nonzero coefficient is . Also, denotes the Haar measure, which is the unique volume form on the group of total mass one that is invariant under left action. Using the Euler angles parameterization of , the Haar measure is of the form
where the normalizing constant ensures .
2.2.3 Basis for the 2-D images
At this point, we discuss convenient representations for the images after Fourier transform, . Similarly to volumes, it is desirable to represent images using a function space closed under in-plane rotations, i.e., . By the Peter-Weyl theorem, this is the same as expanding using Fourier modes, in a 2-D steerable basis:
Here the radial frequency functions (for fixed ) are taken to be an orthonormal basis with respect to , with referred to as the radial frequency. Comparing to expansion (8) (see Section 2.2), it makes most sense to set . Again, owing to the Nyquist frequency for the discretized images , we may bound the cutoffs . Typical choices for for representing tomographic images include Fourier-Bessel functions  and the radial components of 2-D prolate spheroidal wave functions . Details on our specific choices are given in Section A.2 in the appendix.
2.2.4 Choice of radial functions
For the finite expansions in (8) and (14) to accurately represent the Fourier transforms of the electric potential and its slices, one should carefully choose the radial functions and , together with the truncation-related quantities , , , and . In this work, we consider and to be the radial parts of the three-dimensional and two-dimensional PSWFs , respectively. In Appendix A, we describe some of the key properties of the PSWFs, and propose upper bounds for setting , , , and . In practice, band limits would be selected by balancing these expressivity considerations together with the well-posedness and conditioning considerations of Section 3.
2.3 Low-order moments
In this section, we derive the analytical relationship between the first two moments for the observed images , and the coefficients and of the volume and distribution of rotations. These relationships will be used to determine and via solving a nonlinear least-squares problem.
To this end, we first register a crucial relationship between the coefficients of the 2-D images and the 3-D volume. By indexing the images in terms of (instead of in (14)), we have:
On the other hand, using the Fourier slice theorem and (11):
with , tells us
where are constants depending on the radial functions:
From the term , we see if (mod ) (and if then ). Also one may check . Equation (19) connects 2-D image coefficients with 3-D volume coefficients. We note we may as well choose in (15), since if then .
2.3.1 The first moment
In this section, from (19) the relationship between the first moment of the images and the volume is derived. Taking the expectation over , since the expectation of the noise term vanishes, we get
The last equation follows from the orthogonality of the Wigner matrix entries
The first moment gives a set of bilinear forms in unknowns , namely, the expression in (24) for each with and .
It is convenient to provide compact notation for the first moment formula. To this end, we introduce:
, a matrix of size given by
, a vector of size given by
, a matrix of size given by .
Item 2 is zero if and item 3 is zero if either or . In this notation, the first moment formula (24) (with fixed and varying ) reads:
Here is nonzero only if .
2.3.2 The second moment
Higher moments require higher powers of the variable, and so in the case of the second moment and for , we have
The product of two Wigner matrix entries is expressed as a linear combination of Wigner matrix entries [18, p. 351],
where the sum is over satisfying . Now substituting into (28) gives:
where the first sum has the range of (28) and the second sum has range of (33). The second moment thus gives a set of polynomials in unknowns and , quadratic in the volume coefficients and linear in the distribution coefficients, namely, the expression in (34) for each with , , , and . Also, we see one may as well assume , since does not contribute in either (34) or (24).
As for the first moment, it will be convenient to rewrite the second moment in compact notation. Let us further introduce:
, a matrix of size given by
where the sum is over and denotes the product Clebsch-Gordan coefficients in (32).
Item 4 is zero if either or or . Now for fixed and varying , the second moment (34) neatly reads:
Here is nonzero only if and .
2.3.3 Remark: uniform rotations
Empirically in  and theoretically in , it was shown that a similar set of volume coefficients may be reconstructed from the first three moments of ’s, under the assumption that the rotation matrices in (5) are uniformly distributed over , i.e., . In this case, the first two moments of fail to determine . We distinguish our work from this prior art by removing the assumption that the rotations in (5) are uniformly distributed over . First of all, the assumption that the particles take on uniformly distributed orientations is often impractical, limiting the applicability of Kam’s method to cryo-EM. Secondly, dealing explicitly with non-uniform distribution may allow us to work with just the first and second order moment (similar to ), thus bypassing the need of working with the third moment, which is more difficult to estimate statistically.
3 Uniqueness Guarantees and Conditioning
Analysis comes in four cases, according to assumptions on the distribution : whether is known or unknown; and if is invariant to in-plane rotations (i.e., only the viewing direction is non-uniform, see subsection 4.2) or is totally non-uniform (as a distribution on ). Throughout, our general finding is well-posedness, i.e., the molecule is uniquely determined by first and second moments up to finitely many solutions, under genericity assumptions, for a range of band limits and . In the case of a known totally non-uniform distribution, we prove the number of solutions is , and give an efficient closed-form algorithm to solve for . For all cases, sensitivity of the solution to errors in the moments is quantified by condition number formulas.
3.1 Known, totally non-uniform
For this case, we have a “closed-form” provable algorithm that recovers from (27) and (35) (up to the satisfaction of technical genericity and band limit conditions). Remarkably, while the polynomial system is nonlinear (consisting of both quadratic and linear equations), our method is based only on linear algebra. The main technical idea is simultaneous diagonalization borrowed from Jennrich’s well-known algorithm for third-order tensor decomposition , that was also used recently for signal recovery in MRA .
The matrices and of size both have full rank, and
has distinct eigenvalues. Likewiseand of size both have full rank, and has distinct eigenvalues.
Writing and for eigendecompositions, the vectors of size and of size both have no zero entries.
For , the matrix of size has full row rank.
For all , the matrix of size has full column rank.
For with , the matrix of size has full column rank.
Proofs for this subsection are in Appendix B. We remark that condition (iv), which just involves the choice of radial bases, appears to always hold for PSWFs using the cutoffs proposed in Appendix A. Conditions (i), () and (iii) just involve the distribution, and are full-rank, spectral and non-vanishing hypotheses. Condition (v) just involves the molecule and in particular requires , which limits to be less than the Nyquist frequency where .
Our algorithm goes by reverse666Reverse frequency marching is natural given the sparsity structure of (35): only and with , and , appear in the moments . frequency marching, as we solve for top-frequency coefficients from the second moment (35) where
via eigenvectors (similar to simultaneous diagonalization in Jennrich’s algorithm), and then solving for lower-frequency coefficients via linear systems. While our conditions in Theorem1 are certainly not necessary, fortunately for generic777This means generic with respect to the Zariski topology . Equivalently, there is a non-zero polynomial in such that implies the conditions in Theorem 1 are met. , those conditions are satisfied, so that the method applies:
Condition (ii) in Theorem 1 holds for Zariski-generic . If , then condition (iii) holds for Zariski-generic . At least for , conditions (i) and () hold for Zariski-generic .
By uniqueness, is a well-defined function of the first and second moments and almost everywhere. It is useful to quantify the sensitivity of to errors in , as, e.g., in practice one can access only empirical estimates and . An a posteriori (absolute) condition number for
is given by the reciprocal of the least singular value of the Jacobian matrix of the algebraic map:
Throughout this section, all condition formulas are in the sense of [16, Section 14.3], for which the domain and image of our moment maps are viewed as Riemannian manifolds. To this end, when is unknown, dense open subsets of the orbit spaces , , naturally identify as Riemmannian manifolds (for the construction, see ).
3.2 Known, in-plane uniform
For this case, given a particular image size (and other image parameters), together with band limits and , we have code which decides if, for generic and , the molecule