Log In Sign Up

Autocorrelation analysis for cryo-EM with sparsity constraints: Improved sample complexity and projection-based algorithms

The number of noisy images required for molecular reconstruction in single-particle cryo-electron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly-oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain new theoretical and computational contributions for this challenging non-linear inverse problem with sparsity constraints. We prove that molecular structures modeled as sums of idealized point masses are uniquely determined by the second-order autocorrelation of their projection images, implying that the sample complexity is proportional to the square of the variance of the noise. This theory improves upon the non-sparse case, where the third-order autocorrelation is required for uniformly-oriented particle images and the sample complexity scales with the cube of the noise variance. Furthermore, we build a computational framework to reconstruct molecular structures which are sparse in the wavelet basis. This method combines the sparse representation for the molecule with projection-based techniques used for phase retrieval in X-ray crystallography.


The sample complexity of sparse multi-reference alignment and single-particle cryo-electron microscopy

Multi-reference alignment (MRA) is the problem of recovering a signal fr...

Sparse multi-reference alignment: sample complexity and computational hardness

Motivated by the problem of determining the atomic structure of macromol...

Estimation under group actions: recovering orbits from invariants

Motivated by geometric problems in signal processing, computer vision, a...

A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians

Sparsity-based methods are widely used in machine learning, statistics, ...

Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities

In recent years, an abundance of new molecular structures have been eluc...

Orthogonal Matrix Retrieval in Cryo-Electron Microscopy

In single particle reconstruction (SPR) from cryo-electron microscopy (c...

Multi-Reference Alignment for sparse signals, Uniform Uncertainty Principles and the Beltway Problem

Motivated by cutting-edge applications like cryo-electron microscopy (cr...

1. Introduction

Sparsity is a ubiquitous prior in many linear inverse problems, including regression [85, 41], compressed sensing [25, 17, 28], and various image processing applications [27], to name a few. While sparse priors are also used for non-linear inverse problems, their applicability and theoretical foundations are limited to a few specific (usually linear and quadratic) models, e.g., [15, 21, 29, 88]. Motivated by single-particle cryo-EM—an imaging technology for determining the 3-D structure of biological molecules—this paper uses modern techniques from signal processing, optimization, and applied algebraic geometry to provide new theoretical analysis and computational methods for a challenging non-linear inverse problem with sparsity constraints.

Cryo-EM has garnered increasing interest in the past decade due to a series of technological and algorithmic breakthroughs, driving a striking improvement in the obtainable resolution, up to the level where individual atoms can be distinguished. This has in turn opened new scientific horizons and led to many biological discoveries, e.g., [55, 3, 16].

In a cryo-EM experiment, a solution containing molecules to be imaged is rapidly frozen into a thin ice layer, which is then placed in an electron microscope. Next, the microscope acquires an image, called micrograph, which contains multiple 2-D tomographic projection images of the molecules. The 3-D orientations of individual projection images are unknown and random. To avoid damaging the samples, the electron dose must be kept low, resulting in a low signal-to-noise ratio (SNR). The cryo-EM computational problem is reconstructing the 3-D molecular structure from these projection images 


The renewed interest in cryo-EM led to a thorough investigation of its mathematical and statistical foundations [81, 6]. In particular, a crucial challenge from a statistical perspective is understanding the sample complexity of cryo-EM, i.e., the number of images that are required to obtain accurate reconstructions. A remarkable result revealed an intimate connection between the sample complexity of cryo-EM (and related statistical models) and the method of moments

in the low SNR regime. If the distribution of the 3-D rotations is uniform, the method of moments reduces to autocorrelation analysis. In particular, it was shown that if

is the lowest degree moment of the observations (i.e., the randomly oriented tomographic projections) that determines the molecular structure uniquely, a necessary condition for recovery is where is the variance of the noise [67, 1]. Specifically, if the distribution of rotations is uniform, then the third-order autocorrelation is the lowest order autocorrelation that determines a generic 3-D structure, implying a sample complexity of  [4]. This agrees with long-standing empirical evidence [79].

Autocorrelation analysis was first introduced to cryo-EM by Zvi Kam more than 40 years ago [50]

. Kam showed that the second-order autocorrelation of the projection images (which can be estimated with

observations) determines the 3-D structure up to a set of orthogonal matrices, under the assumption that the rotations are drawn from a uniform distribution. A few methods have been proposed to resolve the missing orthogonal matrices based on typically unavailable side information, such as homologous models with known structure, or a few clean projections images, in order to construct ab initio models 

[12, 59, 46]

. These ab initio models can then be refined using expectation-maximization: the prevalent algorithmic framework for the cryo-EM reconstruction problem that aims to maximize the non-convex posterior distribution 

[79, 74, 69]. In addition, it was recently shown that if the distribution of rotations is non-uniform, then there is at most a finite list of structures that agree with the second moment of the observations [76]. Techniques that are inspired by Kam’s method were also proposed as a solution to the molecular reconstruction problem in X-ray free-electron lasers (XFEL), which, akin to the reconstruction problem in cryo-EM, involves recovering a 3-D structure from its randomly oriented diffraction images [71, 56, 24, 57]. In contrast to cryo-EM, in XFEL the rotations are always uniformly distributed and the reconstruction problem is more involved since the measurements consist of the magnitudes of Fourier coefficients without their phases.

The first contribution of this paper is proving that if the sought-for molecular structure can be described as a sparse combination of ideal point masses, then the structure can be determined uniquely from the second-order autocorrelation of the observations, even if the rotations are distributed uniformly. This eliminates the orthogonal matrix ambiguity in Kam’s original paper 

[50]. This result is the first theoretical guarantee for unique recovery from the second moment. It combines the sparsity assumption with proof techniques from real algebraic geometry, substantially reducing the sample complexity of the cryo-EM reconstruction problem from to . The argument is constructive in the sense that it provides a polynomial-time recovery algorithm. However, said algorithm is tailored to the specific model of point mass functions. It is not well-suited to data discretized into pixels or voxels because it hinges on the ability to cluster points in the support of the second moment into certain distinct components, see Remark 11. (Accurate clustering becomes difficult when data is discretized and the components are close to each other.)

The second contribution of this paper is a practical algorithm, fusing the second-order autocorrelation of the projections with a sparse representation of the molecular structure, and requiring only observations. The algorithm builds on the realization that a typical 3-D structure can be represented by only a few coefficients in a suitable basis. Similar sparsity assumptions have been leveraged in a wide variety of scientific and engineering applications, including compressed sensing [25, 17, 28], image processing [27], and phase retrieval [15, 21, 47, 29, 88], to name a few. In cryo-EM, there have been several attempts to represent the 3-D structure as either a sparse mixture of Gaussians [87, 49, 48, 52, 89, 19, 70] or using alternative bases [86]. Yet, the sparsity property has still not been fully harnessed to represent and recover 3-D molecular structures, and it is not part of the standard computational pipeline of cryo-EM.

The technique is based on a new connection between Kam’s theory for cryo-EM and the crystallographic phase retrieval problem—recovering a sparse signal from its Fourier magnitudes. In particular, we adapt projection-based algorithms that were designed for the crystallographic phase retrieval problem to the cryo-EM setting. These algorithms were extensively validated on experimental X-ray crystallography datasets by prior researchers, see for example [34, 31, 60, 30, 29]. Here, we demonstrate on simulated data that they are also useful in constructing ab initio models in cryo-EM. They can be used to mitigate computational and model bias issues associated with the non-convexity of the cryo-EM reconstruction problem [80, 69, 43]. This new computational approach opens the door to merging more aspects of the phase retrieval and cryo-EM fields in future work.

The rest of the paper is organized as follows. In Section 2, we provide background on the reconstruction problem in cryo-EM, the method of moments, Kam’s theory, and the crystallographic phase retrieval problem. In Section 3, we prove that a structure composed of an ensemble of ideal point masses subject to uniform rotations can be recovered from the second order autocorrelation, implying a sample complexity of . Section 4 outlines the practical computational framework and presents numerical results. Section 5 concludes the paper and discusses potential theoretical and computational extensions.

2. Preliminaries

2.1. The cryo-EM problem

Cryo-EM reconstruction seeks to determine a 3-D molecular structure from its 2-D noisy tomographic projections, taken at random viewing angles. In this work, we focus on the case of uniformly random rotations. Uniformity is often taken as a baseline model, was the setting in Kam’s paper [50], and is harder than the case of a non-uniform distribution of rotations in the sense that it requires asymptotically more images (without sparsity priors) [4, 76]. Formally, let

be the Haar probability measure on the compact group

of 3-D rotations, representing the uniform distribution. Assuming that we observe i.i.d. 2-D images of after it has been randomly rotated according to and then tomographically projected to the plane, each projection image is modeled as:


where is a white Gaussian noise with known variance , and denotes the action of the rotation on . Here, typically, the variance of the noise is much greater than the magnitude of the clean projection.

The cryo-EM problem is to estimate the molecular structure from realizations of (1), i.e., from the 2-D observations . In Section 5, we discuss how the proposed framework can be extended to account for additional aspects in the generative model for cryo-EM images.

2.2. The method of moments

Our theoretical and computational contributions are based on the method of moments—a basic statistical inference technique tracing back to the seminal paper of Karl Pearson in the end of the 19th century. Specifically, we use the second moment of the observations (1), and relate it to the sought-for 3-D structure.

The (debiased) second observable moment is given by


where is a bias term that depends only on the noise variance. For large enough , we have




denotes the (debiased) population second moment, which is a function of through (1). More precisely, for it holds with high probability.

The idea of the method of moments is to find a structure which matches the observable moments. It is an alternative to other standard statistical estimation methods, e.g., maximum likelihood estimation (MLE). That said, a recent paper suggests that in the low SNR regime, the method of moments approximates the MLE [51].

2.3. Kam’s method

We detail a specific approach to autocorrelation analysis in cryo-EM, introduced by Kam [50]. To this end, we need to introduce a convenient basis for representing a 3-D structure , the spherical Bessel basis [2]. An expansion of maximum degree

is defined by first expanding the Fourier transform

of in spherical harmonics as


where denotes the radial frequency and are the spherical harmonics basis functions. In addition, the spherical harmonics coefficients are expanded by spherical Bessel functions, up to degree , as


The functions are the normalized spherical Bessel functions. By allowing and to grow unboundedly, any structure of a biological molecule can be represented in this basis. However, when expanding 3-D molecular structures from discretized projection images, the Nyquist criterion applied to the projection images limits the amount of extractable information. This determines bounds on the maximally allowable truncation parameters and ; see Appendix C for a detailed description.

We aim to recover the coefficients (up to rotation and reflection in ). As will be shown next, it is convenient to gather the coefficients into matrices , of size , via , for each .

Provided that the distribution of viewing angles is uniform, Kam [50] showed that the second moment of the Fourier transform of the projection images provides estimates for the following matrices:


Applying the Cholesky decomposition to each in (7) and imposing to be real-valued, knowledge of (7) identifies each matrix of coefficient up to an unknown real, orthogonal transformation, provided . That is, we can compute for some unknown orthogonal matrix in the group . Therefore, the second moment determines up to a set of orthogonal matrices. To recover these matrices, and thus the 3-D structure, additional information is required. In this paper, we suggest using a sparsity assumption.

2.4. Crystallographic phase retrieval

One of the contributions of this paper is to relate the cryo-EM reconstruction problem to crystallographic phase retrieval. Phase retrieval is the main computational challenge in X-ray crystallography, which is still a leading method for elucidating the atomic structure of molecules. The prevalence of crystallography is witnessed by the remarkable fact that 25 Nobel Prizes have been awarded for work directly or indirectly involving crystallography [38]. Although there exist additional important phase retrieval applications (see for example [77, 24, 7, 5]), X-ray crystallography is by far the most widely investigated application.

The crystallographic phase retrieval problem entails recovering a sparse signal from its periodic autocorrelation (or, equivalently, from its Fourier transform magnitudes, namely, its power spectrum). While simply stated, and despite its importance, the theoretical foundations of this problem have been investigated only recently. In particular, it was conjectured that a generic sparse signal can be recovered from its periodic autocorrelation if the number of non-zero entries is smaller than half the signal’s length [9]

. This conjecture was verified for a few cases. The relation of the crystallographic phase retrieval problem with the beltway problem from combinatorial optimization is explored in 

[39]. Our theoretical reconstruction guarantees in the following section can be viewed as analogous results in the setting of cryo-EM.

The standard algorithms for crystallographic phase retrieval build on two projection operators: one onto the measured data (the power spectrum) and the second onto the space of sparse signals. While simple algorithms that alternate between these two projections tend to quickly stagnate, a more sophisticated family of algorithms, based on reflections, shows excellent performance, though their running time is exponential in the sparsity level [29, 11]. These algorithms are tightly related to splitting methods, such as Douglas-Rachford and the alternating direction method of multipliers (ADMM), and have been applied to a wide variety of problems [30]. A main contribution of this paper is a modification of these algorithms to autocorrelation analysis for cryo-EM. In particular, we focus on one such algorithm, called relaxed-reflect-reflect (RRR), but alternative algorithms, such as Fienup’s hybrid input-output algorithm [34], the difference map algorithm [31], and the relaxed averaged alternating reflections algorithm [60], can be adapted to cryo-EM by the same strategy. Importantly, if the model is accurate (e.g., no noise) RRR iterations halt only when they find a solution that satisfies both constraints (defined by the projection operators). Thus, RRR does not suffer from local minima as gradient-based algorithms do.

3. Superior sample complexity: The second moment suffices for sums of point masses

This section presents our main theoretical result: the second moment suffices to recover an idealized sparse volume, i.e., a volume given as a weighted sum of point masses. We deduce that the second moment also suffices for a pixelated and blurred variant of the model. Our theorems imply an associated sample complexity of . This stands in contrast to previous results which do not assume sparsity. There, the third moment is required for recovery, and the associated sample complexity is  [4, 33].

3.1. Models and main theoretical results

We use an atomistic representation of a molecule. In our first idealized model, an atom is specified by a weighted Dirac delta function, and a molecule is a sum of such point masses. Let be the 3-D points representing atom locations, and be positive weights corresponding to the scattering potentials of the individual atoms. Then


is the molecule composed of the atoms . Relabeling if necessary, we assume that the -norms are in descending order.

[width = ]figure_atom.pdf

Figure 1. (A) Example of a structure of the form in (8). For this illustration, the size of each dot is proportional to the weight . (B–C) Two examples of tomographic projections of rotated versions of (A), with no noise (leftmost), signal-to-noise ratio 0.2 (middle) and signal-to-noise ratio 0.1 (rightmost).

We model each projection image as a mixture of Dirac delta functions on plus noise:


Here denotes coordinate projection onto the first two coordinates, and is a white Gaussian noise with (known) variance . Given 2-D images as in (9), the reconstruction problem is to recover the atoms in (8) up to a global rotation and reflection. (The reflection ambiguity exists because a molecule and its reflection in the microscope’s image plane are indistinguishable given cryo-EM data, see e.g. [36].) See Figure 1 for an illustration of the setup.

Under this model the (debiased) second population moment, obtained by substituting (9) into (4), reads:


where is the uniform distribution. Note that is a measure on .

We introduce certain assumptions on the atom locations:

  1. The vectors

    are pairwise linearly independent;

  2. The norms are distinct.

We remark that conditions A1 and A2 are quite restrictive, e.g., ruling out molecules with nontrivial point-group symmetries.

Our first main theoretical result is stated as follows.

Theorem 1.

Consider the model given by (8)-(9). Assume that conditions A1-A2 hold. Then the support of the second moment uniquely determines the set . Therefore, (Eq. (10)) uniquely determines the set of atom locations up to a rotation and reflection in .

Theorem 1 is proven in Subsection 3.3, after auxiliary results are given in Subsection 3.2.

Building on the uniqueness in Theorem 1, we obtain the following constructive result as well.

Theorem 2.

Consider the model given by (8)-(9). Assume that and conditions A1-A2 hold. Then Algorithm 1 (described in Subsection 3.4) recovers the set of atoms up to a rotation and reflection in from the second moment (Eq. (10)) in flops.

Theorem 2 is proven in Subsection 3.4.

Prior works on the sample complexity of cryo-EM [4, 66, 1] do not directly apply to the model above. The principal reason is that the measurement formation defined by (9) is not finite-dimensional. Therefore, we consider a pixelated and blurred variant of the model. The molecule is still specified by a collection of atoms . However, each projection image now consists of pixels:


Here, we discretized into equi-sized squares, where and . Also, denotes convolution and is the isotropic Gaussian kernel with fixed variance , i.e., . Last, the noise satisfies .

In the pixelated model, the (debiased) second population moment equals:


We now state our main result for the pixelated model. Its proof relies on Theorems 1 and 2.

Theorem 3.

Consider the model given by (11). Fix an integer and real numbers and . Assume that A1-A2 hold, and for each we have and . Then there exists with the following property. Whenever and pixels are used in (11), then the second moment (Eq. (12)) uniquely determines the set of atoms up to a rotation and reflection in .

As the details are technical, we prove Theorem 3 in the appendices. We only use two properties of the Gaussian kernel : that it is real-analytic and that its Fourier transform does not vanish.

Corollary 4.

Assume the setting of Theorem 3 with . Then, the sample complexity for generic unique recovery (in the sense of [4]) is as .

The rest of the section provides the proofs of Theorems 1 and 2, with Theorem 3, Corollary 4 and supporting results shown in the appendices. We emphasize that Algorithm 1 is a theoretical algorithm, not intended for use in practice due to its noise sensitivity as explained in Remark 11. By contrast, Algorithm 2 in the subsequent section is built for practical situations.

3.2. Support of

To recover the atoms from , the main information that we use is actually qualitative. Specifically, we rely on the particular structure of the support of the second moment in . To describe this, we need to first understand the possible images of one pair of atoms.

Definition 5.

For , let be the map given by .

Definition 6.

For , let be the image of , i.e., .

The next lemma characterizes as the solution set to a system of polynomial equations and inequalities. This will enable proof techniques from real algebraic geometry.

Lemma 7.

is connected, compact and semialgebraic. Letting be variables on , then is cut out by one quartic equation and two quadratic inequalities:


It has dimension as a semialgebraic set if condition A1 holds.

A few different examples of the sets are illustrated in Figure 2, for varying values of and . There we show the projection of to when is dropped.

[width = ]views_smaller.jpg

Figure 2. Illustration of the semialgebraic sets described in Lemma 7, projected onto -space. Upper row from left to right: and , respectively. Lower row from left to right: , and , respectively. By Lemma 13, these sets make up the support of , which is the basis of our theory.

The next result is immediate from Definitions 5 and 6.

Lemma 8.

The second moment is


where the subscripts indicate the pushforward measure defined by . In particular, the support of is .

3.3. Information-theoretic uniqueness: Proof of Theorem 1

We begin by proving Theorem 1. The key is a converse to Lemma 7. While Lemma 7 implies the quartic equation in (7) (plus the quadratic inequalities there) determine the set , we need that determines the quartic. The proof of this converse uses results from real algebraic geometry [14].

Lemma 9.

Assume that condition A1 holds. Let . Then the ideal of the real Zariski closure of in is principal and generated by the quartic polynomial


Further, is irreducible over .

Corollary 10.

Assume that conditions A1-A2 hold. Then, the irredundant irreducible decomposition of the Zariski closure of the support of is


We can now prove the information-theoretic uniqueness.

Proof of Theorem 1.

The support of determines the real radical prime ideal of each of top-dimensional irreducible component of its Zariski closure. By A1-A2, Corollary 10 and Lemma 9, these ideals are for . The ideal uniquely determines , since the coefficient of in (15) is . Extracting the coefficients of in (15), determines the triple . Ranging over , we have proven that the support of fixes the set:


By A2 and our assumption the norms are descending, knowledge of (17) lets us fill in the Gram matrix:




However determines up to left multiplication by a orthogonal matrix. Indeed considering a truncated rank-3 eigendecomposition, we write


where has orthonormal columns and is diagonal and positive-semidefinite. Then,


for some . Therefore, the atoms’ locations are determined up to a global rotation and reflection. ∎

3.4. Recovery algorithm: Proof of Theorem 2

Now, we move forward and prove Theorem 2. We present Algorithm 1 for efficiently recovering the atoms from . The algorithm is theoretical in that it relies on oracle access to the following information.

Assumption 1.

We assume oracle access to:

  • , where consists of four or more Zariski-generic points on ;

  • the value of the measure on the set for all .

Remark 11.

In principle, O1 and O2 can be estimated from the sample moment  (2) if . It would require the ability to identify points in the support of and cluster them according to the components . However this encounters difficulty when dealing with noisy moments discretized in pixels. The next section is dedicated to a different computational framework, better-suited for practical settings.

Proceeding, Algorithm 1interpolates to recover from Lemma 12.

Lemma 12.

Assume that condition A1 holds, and O1 is known. Let . Consider the matrix


Then it has rank , with kernel spanned by


By Lemma 12, we compute the triples in time, by forming the matrices (22) and computing their kernels. We then fill in the Gram matrix (18) as in the proof of Theorem 1. The atoms’ locations are recovered from the truncated eigendecomposition (21).

The calculation of the weights is based on the following.

Lemma 13.

Assume that conditions A1-A2 hold. Then, for each , the measure of with respect to is


Therefore, O2 tells us all off-diagonal entries of . We complete this uniquely to a rank- matrix by using


where are any indices such that are all distinct. (This step requires .) The weights

are lastly recovered either by computing the leading eigenvector/eigenvalue pair of

or as the square root of the diagonal of , using the fact that the are non-negative.

Remark 14.

We note that (25) is a particular case of the problem of recovering a low-rank matrix with corrupted diagonal entries; see, e.g.,  [65, 72, 73] for more on that problem.

We summarize the procedure of this section in Algorithm 1.

Second population moment as in (10)
Atoms up to a rotation and reflection in
  1. Access O1 in Assumption 1

  2. Recover the unordered set using Lemma 12

  3. Fill in the Gram matrix with from (19)

  4. Recover the up to orthogonal transformation by computing a truncated eigendecomposition of as in (21)

  5. Access O2 in Assumption 1

  6. Fill in the off-diagonal entries of using Lemma 13

  7. Complete using (25)

  8. Recover the from

Algorithm 1 Recovering a sparse structure from its second moment
Proof of Theorem 2.

The considerations above show Algorithm 1 correctly recovers the set of atoms from (up to a rotation and reflection in ). It costs in flops and storage once O1 and O2 are available if we use a randomized algorithm [64] to compute the truncated decomposition in (20). ∎

4. Kam’s method with sparsity constraints

[width = 0.82]main_fig.png

Figure 3. Reconstruction results when applying the proposed algorithm to three example structures. (A) Resolution of the reconstructed volumes as a function of the number of iterations. The resolution is determined by the cutoff of the Fourier Shell Correlation (FSC). (B) Example FSC curves for reconstructed volumes as function of non-dimensionalized frequency. (C,D,E) Visualization of the reconstructed volumes for the three example structures: EMD-0409 [44], EMD-25892 [61] and the Shepp-Logan phantom [78], respectively. (Top) Ground-truth structure. (Middle) Truncation of ground-truth structure into the spherical Bessel basis using , respectively. (Bottom) Reconstructed volume returned by Algorithm 2 using , respectively. The visualizations were rendered by UCSF Chimera [68].

This section introduces a computational framework to leverage sparsity in recovering the underlying molecular structure. The goal is to devise a principled way to compute ab initio approximations of the underlying structures, that can then be improved further in a refinement step (which is typically performed using expectation-maximization [74, 69]). In this section, we maintain the assumption that the distribution of viewing angles is uniformly distributed. If is a non-uniform distribution, it is known that there is at most a finite list of structures that are consistent with the observed second order moment [76]; employing sparsity to aid in the recovery problem with non-uniform distribution will be considered in future work.

We use projection-based optimization techniques from the related problem of crystallographic phase retrieval, coupled with information extracted from the second moment of the projection images. Without imposing the underlying sparsity, the second moment of the projection images determines the structure up to an ambiguity encoded by a set of unknown orthogonal matrices. The key idea of the algorithm is to alternatingly project the molecular structure onto constraints encoded by the sparsity and by the projection image moments, respectively.

Analogously to (10), the moments of the projection images furnish information about the underlying 3-D structure. Unlike our theoretical results, however, we consider a general 3-D structure expanded in a spherical Bessel basis as in Section 2.3, and assume it can be represented by only a few wavelet coefficients. The next section introduces wavelet bases, and later we provide the details of the projection-based algorithm.

4.1. Wavelet bases

We encode sparsity of a 3-D molecular structure by a sparse expansion in wavelets [22]—a popular choice of sparsifying, localized bases in a wide range of applications [63]. Our algorithm can easily be adapted to any specific wavelet basis, and, more generally, to any choice of basis, for instance, sparsifying bases learned through data.

We denote the multilevel wavelet basis by , where denotes the level of the wavelet and the index of the function within the level. As a shorthand, we define as the map sending a 3-D structure to its vector of coefficients when expanded in the wavelet basis, i.e.,


Likewise, then maps a wavelet coefficient vector into its 3-D expansion by


An additional advantage of using wavelet bases is that and can then be applied in linear time using a fast wavelet transform [62].

4.2. Projection-based algorithm

For a discretized 3-D structure  of size , we define the mapping of the structure into its coordinates in the spherical Bessel basis by


The inverse mapping then expands a set of coefficients in the spherical Bessel basis into its corresponding 3-D structure:


As discussed in Section 2, Kam’s method identifies matrices , for an unknown orthogonal matrix, for each with . By possibly reducing the value of to the largest index with this property, we will for ease of notation assume that this property holds for . Therefore, at the onset of the algorithm, we have access to a set of coefficient matrices satisfying


for unknown orthogonal matrices . Our algorithm aims to recover an approximation of these unknown orthogonal matrices, which leads to an approximation of . This orthogonal matrix retrieval problem is an analogue to the problem of the missing phases in the phase retrieval problem [10]. We therefore adapt a popular algorithm from the phase retrieval literature into the problem of cryo-EM. The algorithm repeatedly utilizes two projections onto the set of structures with a given sparsity level and a set determined by the projection images. These two projections are the main pillars of the algorithm and can be used in different ways, as explained next. But first, we define the two projection operators.

4.2.1. First projection: Moment constraint.

We begin by defining the first projection operator, denoted by , as the projection onto the set defined by the matrices in (7). Let in be the ordered collection of matrices of coefficients in the spherical Bessel basis. Define as the projection


where the matrices are defined by


with from (7). (32

) is an instance of the Orthogonal Procrustes problem. Although it is a non-convex optimization problem, it can be solved in closed form in terms of the singular value decomposition of

, see e.g., [75]. In the implementation, the matrices defining the operations and are precomputed. The computational complexity of subsequently solving an instance of (31) is then , since typically .

4.2.2. Second projection: Sparsity constraint.

The projection promotes sparsity in a given local wavelet basis. For a structure  and an integer , define as the structure with wavelet coefficients obtained by retaining the largest components of and replacing the remaining elements by zero, i.e.,


where the coefficients are defined by and if has magnitude among the largest magnitudes of the , and zero otherwise. The computational complexity of this step is . We again emphasize that, generally, any localized basis or frame can be used to define , and we fix a wavelet basis for the sake of definiteness.

4.2.3. Algorithm.

A straightforward algorithm to attempt to recover is through alternating projections. This procedure is described by fixing a sparsity level and iterating the two projections and in turn. The use of the two projections in an alternating fashion is intended to promote convergence to an intersection point of the two sets. In the case of projecting onto convex sets, convergence results are known [20], but convergence is not guaranteed for the non-convex projections in (31) and (33). Indeed, for non-convex sets, alternating projection schemes frequently suffer from convergence to local minima, and a method to escape the local minima is required. To achieve this, the phase retrieval literature details different iteration schemes combining the two projections and in different ways, for instance using the Relaxed-Reflect-Reflect (RRR) algorithm [32, 29]. In terms of the projection operators, this iterative scheme can be written out as



is a scalar hyperparameter. The algorithm is summarized in Algorithm 

2. As aforementioned, other phase retrieval algorithms which are based on two projection operators, such as the difference map algorithm and the relaxed averaged alternating reflections algorithm, can be adapted to cryo-EM similarly.

Projection images, sparsity level , maximum number of iterations , hyperparameter .
Estimated structure
  1. Form the matrices from (7)

  2. Compute the Cholesky factorizations of the to produce matrices in (30)

  3. For do

Algorithm 2 Recovering

[width = 0.8]view_L2to8.png

Figure 4. (Bottom) Reconstruction results for EMD-0409 truncated with truncation parameter varying from (leftmost) to (rightmost). (Top) Ground truth structure truncated with truncation parameter varying from to .

4.3. Simulation results

We apply Algorithm 2 to two structures from the online EM data bank [58], EMD-0409 [44] and EMD-25892 [61], as well as the Shepp-Logan phantom [78]. For each structure, we run Algorithm 2 for a given value of and to obtain a reconstruction. To measure the reconstruction quality, we follow the standard procedure in the cryo-EM community, and compute the Fourier shell correlation (FSC) between the estimated structure and the ground truth. Specifically, the FSC of two structures and is defined by


where one structure is the estimated structure, and the second is the ground truth. The FSC is real-valued because of symmetry of the summation. The resolution is determined when the FSC curve drops below .

EMD-0409 has dimensions , with each voxel having physical length of Å. EMD-25892 has dimensions , and voxel size Å. The volumes were downsampled by a factor of 2 and 5, respectively, to give structures of size . The ground truth matrices were generated exactly and the matrices in (30) were generated using chosen uniformly at random. To fix the units for the Shepp-Logan phantom, we assume the voxels to have side length 1 Å. The simulations used Haar wavelets to define .

The result of applying Algorithm 2 to each structure is shown in Figure 3. For all three example structures, during a run of the algorithm, the resolution initially rapidly improves. Afterwards, the improvement slows down and exhibits an exploratory and oscillating behavior. This is typical for RRR-based algorithms [29].

During the run of the algorithm, the optimal resolutions obtained for the three structures were Å, Å, and Å, respectively, and the resolutions at the initialization of the algorithm were Å, Å, and Å, respectively. As a comparison, the resolutions between the ground truth structures and their truncation into the spherical Bessel bases with the chosen values of are Å, Å and Å, respectively; these resolutions are bounds on the optimally obtainable resolutions.

Figure 4 also shows a comparison of EMD-0409 with its reconstruction, truncated to different values of . Visually, the reconstructed element captures the relevant features of the ground truth for each value of . The resolution increases with .

Movie 1 visualizes the reconstructed volume as a function of the iteration number. Note that the reconstruction at each step is visually similar to the ground truth, although the computed resolution noticeably improves during the run of the algorithm. This implies that even knowledge of the coefficients with the wrong rotation matrices provides some information about the ground truth.

5. Discussion

The contribution of this paper is twofold. As the first contribution, our theoretical results imply that a sparse mixture of point masses can be uniquely recovered from the second order moment, even in the case of a uniform distribution of viewing angles, whereas previous work has only proven recovery using the third order moment. Thus, fewer images are required for reconstruction. This has a number of potential experimental implications. Firstly, since microscope time is expensive, this may greatly reduce the cost of the experimental part of the cryo-EM pipeline. This might be especially important for XFEL, where throughput is a major bottleneck and viewing directions are uniformly distributed [84, 54]. Secondly, it may enable reconstruction of structures where a limited number of projection images can be captured. This might be the case, for example, when the molecule may appear in several conformational states, and a limited number of images will be available for each conformation.

The second contribution is a new algorithm for ab initio modeling. The computational framework introduced in this article opens the door to incorporating a number of promising techniques from crystallographic phase retrieval into cryo-EM algorithms. There is, for instance, flexibility in choosing the projection operators and . It may include biologically-oriented priors, such as minimum atom-atom distance or Wilson statistics [82, 40], or data-driven priors based on previously resolved structures [53]. A systematic study of adapting these techniques will be initiated in coming work. Additional future work includes extending the use of sparsifying priors in other parts of the cryo-EM reconstruction pipeline, for instance in existing approaches to iterative refinement [74] or in autocorrelation analysis using micrographs without particle picking [8]. Yet another important direction is to incorporate the sparsity prior into reconstruction by the method of moments when there is a non-uniform distribution of viewing directions [76].

Appendix A Proofs of auxiliary results for Theorems 1 and 2

a.1. Proof of Lemma 7

Note that is connected and compact, since and is compact and connected, while is continuous. It also is semialgebraic, as is a real algebraic variety and is a polynomial map (see the Tarski-Seidenberg theorem [14]).

Define as the set cut out by the three constraints in (7). Assume . By definition of , there exist and such that and . Then

likewise and

Also, , and similarly . These show that , whence .

For the converse, take