I Introduction
Xray free electron laser (XFEL) is an emerging imaging technique for elucidating the threedimensional structure of molecules [4, 5]
. Single molecule XFEL imaging collects twodimensional diffraction patterns of single particles at random orientations. The images are very noisy due to the low photon counts and the detector countnoise follows an approximately Poisson distribution. Since we only capture one diffraction pattern per particle and the particle orientations are unknown, it is challenging to reconstruct the 3D structure at low signaltonoise ratio (SNR). One approach is to use expectationmaximization (EM)
[6, 7], which has no guarantee of the global optimality. Alternatively, assuming that the orientations are uniformly distributed over the special orthogonal group
, Kam’s correlation analysis [8, 9, 10, 11, 12, 13, 14, 15] bypasses maximum likelihood estimation. Instead, it requires a robust estimation of the covariance matrix of the noiseless 2D images. This serves as our main motivation for developing efficient and accurate covariance estimation and denoising methods for Poisson data.Principal Component Analysis (PCA) is widely used for dimension reduction and denoising of large datasets [16, 17]. However, it is most naturally designed for Gaussian data, and there is no commonly agreed upon extension to nonGaussian settings such as exponential families [16, Sec. 14.4]. For denoising with nonGaussian noise, popular approaches reduce it to the Gaussian case by a wavelet transform such as a Haar transform [18]; by adaptive wavelet shrinkage [18, 19]
; or by approximate variance stabilization such as the Anscombe transform
[20]. The latter is known to work well for Poisson signals with large parameters, due to approximate normality. However, the normal approximation breaks down for Poisson distributions with a small parameter, such as photonlimited XFEL [21, Sec. 6.6]. Other methods are based on singular value thresholding (SVT)
[22, 23] and Bayesian techniques [24]. In addition, many existing methods are computationally intractable for large datasets. Recently, [1] introduced exponential family PCA (PCA), which extends PCA to a wider class of distributions. It involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic and noniterative way using moment calculations and shrinkage.
PCA was shown to be more accurate than PCA and its alternatives for exponential families. Its computational cost is similar to that of PCA, it has substantial theoretical justification building on random matrix theory, and its output is interpretable.
In XFEL imaging, the orientations are uniformly distributed over , and it is therefore equally likely to observe any planar rotation of the given diffraction pattern. Therefore, it makes sense to include all possible inplane rotations of the images when performing PCA. To this end, we incorporate steerability in PCA, by adapting the steerable PCA algorithm [3]. We take into account the action of the group on diffraction patterns by inplane rotation and reflection. The resulting principal components are invariant to any transformation of the input images. In the new algorithm, we first estimate the rotational invariant sample mean of the image dataset and use it to “prewhiten" the images (we refer to this step as prewhitening, since the mechanism is similar to that of whitening colored noise). We then compute the truncated FourierBessel expansion coefficients of the prewhitened images. Since the truncated FourierBessel transform is almost unitary [3], we can compute the rotational invariant sample covariance matrix in terms of the expansion coefficients. The resulting covariance matrix has block diagonal structure. For each covariance subblock, we apply shrinkage, recoloring, and scaling as in [1] to estimate the underlying clean covariance matrix.
We illustrate the improvement in estimating the covariance matrix through image denoising. Specifically, we introduce a Wienertype filtering of the truncated FourierBessel expansion coefficients. Rotation invariance enhances the effectiveness of PCA in covariance estimation and thus achieves better denoising. In addition, the denoised expansion coefficients are useful in building rotationally invariant image features (i.e. bispectrumlike features [2]). We verify the benefits of steerable PCA through numerical experiments with the simulated XFEL diffraction patterns. Similar to the case with normal PCA, the computational complexity of the steerable PCA is lower than PCA.
The paper is organized as follows: Section II contains the observation model (IIA) and the computation of the steerable exponential principal components (IIB). Various numerical examples comparing the steerable PCA and PCA are presented in Section III. An implementation of steerable PCA in MATLAB is publicly available at github.com/zhizhenz/sepca/.
Ii Methods
Iia The observation model
We adopt the same observation model as in [1]. The observed noisy images (i.e.,
is the number of pixels) are random vectors with some unknown distribution and follow a hierarchical model. First, a latent vector—or hyperparameter—
is drawn from a probability distribution
with mean and covariance matrix . Conditional on , the coordinates of are drawn independently from a canonical oneparameter exponential family , with density(1) 
(2) 
The mean and variance of the random variable
is and . Therefore, the mean of conditional on isso the noisy data vector can be expressed as , with and the marginal mean of is . Thus, one can think of as a noisy realization of the clean vector . However, the latent vector is also random and varies from sample to sample. In the XFEL application, are the unobserved noiseless images, and their randomness stems for the random (and unobserved) orientation of the molecule. We may write , where the coordinates of are conditionally independent and standardized given . The covariance of is given by the law of total covariance:
(3) 
For the special case of Poisson observations , where is random, we can write . The natural parameter is the vector with and . Therefore, we have , and . In words, while the mean of the noisy images agrees with the mean of the clean images, their covariance matrices differ by a diagonal matrix that depends solely on the mean image.
IiB FourierBessel Expansion
Under the observation model in Sec IIA, we develop a method that estimates the rotational invariant efficiently and accurately from the image dataset . We assume that a digital image is composed of discrete samples from a continuous function with bandlimit
. The Fourier transform of
, denoted , can be expanded in any orthogonal basis for the class of squaredintegrable functions in a disk of radius . For the purpose of steerable PCA, it is beneficial to choose a basis whose elements are products of radial functions with Fourier angular modes, such as the FourierBessel functions. The scaled FourierBessel functions are given by(4) 
where are polar coordinates in the Fourier domain (i.e., , , , and ; is the normalization factor; is the Bessel function of the first kind of integer order ; and is the th root of the Bessel function . We also assume that the functions of interest are concentrated in a disk of radius in real domain. In order to avoid aliasing, we only use FourierBessel functions that satisfy the following criterion [25, 3]
(5) 
For each angular frequency , we denote by the number of components satisfying Eq. (5). The total number of components is , where is the maximal possible value of satisfying Eq. (5). We also denote for and .
The inverse Fourier transform (IFT) of is
(6) 
Therefore, we can approximate using the truncated expansion
(7) 
We evaluate the FourierBessel expansion coefficients numerically as in [3] using a quadrature rule that consists of equally spaced points in the angular direction , with and a Gaussian quadrature rule in the radial direction for . To evaluate , we need to sample the discrete Fourier transform of the image , denoted , at the quadrature nodes,
(8) 
which can be evaluated efficiently using the the nonuniform discrete Fourier transform [26], and we get
(9) 
where is the 1D FFT of on each concentric circle of radius . For realvalued images, it is sufficient to evaluate the coefficients with , since . In addition, the coefficients have the following properties: under counterclockwise rotation by an angle , changes to ; and under reflection, changes to .
IiC Prewhitening
Suppose are discretized input images sampled from . Here, the observational vectors for PCA are simply . In PCA, the first step is to prewhiten the data using the sample mean. However, the sample mean is not necessarily rotationally invariant. The rotationally invariant sample mean can be evaluated from , the expansion coefficients of ,
(10) 
The rotationally invariant sample mean is circularly symmetric. We denote by a vector that contains all the coefficients ordered by the radial index . Although the input images are nonnegative, the finite truncation may result in small negative values in the mean estimation, so we threshold any negative entries to zero.
We prewhiten the images by the estimated mean image to create new images as , when , and is 0 otherwise. Combining Eqs. (5) and (9), we compute the truncated FourierBessel expansion coefficients of . Let us denote by the matrix of expansion coefficients with angular frequency , obtained by putting into a matrix, where the columns are indexed by the image number and the rows are ordered by the radial index . The coefficient matrix is of size .
Since the truncated FourierBessel transform is almost unitary [3], the rotationally invariant covariance kernel built from the original image data with all possible inplane rotations and reflections can be computed in terms of the IFT of the FourierBessel basis and the associated expansion coefficients. Subtracting the sample mean is equivalent to subtracting from the coefficients , while keeping other coefficients unchanged. Therefore, we first update the zero angular frequency coefficients by . In terms of the expansion coefficients, the rotational invariant homogenized sample covariance matrix is , with
(11) 
Let and
be the eigenvalues and eigenvectors of
, respectively, that is(12) 
IiD Eigenvalue Shrinkage
For data with independent coordinates and equal variances, previous works [27, 28, 29, 30] show that if the population spike
is above the the BaikBen ArousPéché (BBP) phase transition, then the top sample spike pops out from the MarčenkoPastur distribution of the “noise” eigenvalues. The top eigenvalue will converge to the value given by
the spike forward map:(13) 
The underlying clean population covariance eigenvalues can be estimated by solving the quadratic equation in Eq. (13). Shrinking the eigenvalues helps to denoise the block sample covariance matrices [31].
(14) 
The shrinkers set all noise eigenvalues to zero for within the support of the MarčenkoPastur distribution and reduce other eigenvalues according to Eq. (14). Then the denoised covariance matrices are
(15) 
In the Gaussian standard spiked model the empirical and true eigenvectors have an asymptotically deterministic angle: almost surely, where is the cosine forward map given by [29, 30]:
(16) 
We denote by an estimate of using the estimated clean covariance eigenvalues in Eq. (14) and .
IiE Recoloring
Homogenization changes the direction of the clean eigenvectors. Therefore, after eigenvalue shrinkage, we recolor (heterogenize) the covariance matrix by conjugating the recoloring matrix with : . The recoloring matrix is evaluated with numerical integration,
(17) 
which has a block diagonal structure and is decoupled for each angular frequency, .
(18) 
Therefore, the recoloring step is also decoupled for each angular frequency subblock. The heterogenized covariance estimators are
(19) 
Similar to Eq. (18), we define which will be used to scale the heterogenized covariance matrix estimator (see Eq. (21)) and denoise the expansion coefficients (see Eqs. (23) and (24)),
(20) 
IiF Scaling
The eigendecomposition of gives . The empirical eigenvalues are which is a biased estimate of the true eigenvalue . In [1, Sec. 4.2.3], a scaling rule was proposed to correct the bias. We extend it to the steerable case and scale each eigenvalue of by a parameter ,
(21) 
where the parameter . The scaled covariance matrices are
(22) 
The rotational invariant covariance kernel is well approximated by , where contains IFT of all with angular frequency . The computational complexity of steerable PCA is , same as steerable PCA, and it is lower than the complexity of PCA which is .
IiG Denoising
As an application of steerable PCA, we develop a method to denoise the photonlimited images. Since the FourierBessel expansion coefficients are computed from the prewhitened images, we first recolor the coefficients by multiplying with and then we apply Wienertype filtering to denoise the coefficients. For the steerable basis expansion coefficients with angular frequency
(23) 
For , we need to take into account the rotational invariant mean expansion coefficients,
(24) 
The denoised image sampled on the Cartesian grid in real domain are computed from the filtered expansion coefficients ,
(25) 
where and .
Iii Numerical Results
We apply steerable PCA and PCA to a simulated XFEL dataset and compare the results for covariance estimation and denoising. The algorithms are implemented in MATLAB on a machine with 60 cores, running at 2.3 GHz, with total RAM of 1.5TB. For steerable PCA and steerable PCA, but only 8 cores were used in our experiments. We simulate noiseless XFEL diffraction intensity maps of a lysozyme (Protein Data Bank 1AKI) with Condor [32]. The average pixel intensity is rescaled to be 0.01 for image size pixels such that shot noise dominates [33]. To sample an arbitrary number of noisy diffraction patterns, we sample an intensity map at random, inplane rotate the sample by a random angle following a uniform distribution over , and then sample the photon count of each detector pixel from a Poisson distribution whose mean is the pixel intensity. Figures (a)a and (b)b illustrate the clean intensity maps and the resulting noisy diffraction patterns.
Iiia Covariance estimation and principal components
For experiments using PCA, we use a permutation bootstrapbased method to estimate the rank of the covariance matrix, following e.g. [34]. By randomly permuting each column of the meansubtracted data matrix, we completely destroy structural information including linear structure, while the noise statistics remain the same (see [35, 36] for an analysis). Singular values of the randomly permuted matrices reveal what should be the largest covariance matrix eigenvalues that correspond to noise, up to a userselected confidence level
. This can replace the other rank estimation methods that assume Gaussian distribution when the noise model is nonGaussian, such as in our case. Empirically, we observe that
gives the best performance in covariance estimation. For steerable PCA, we estimate the number of components using the right edge of the MarčenkoPastur distribution for homogenized covariance matrices and include only the components whose scaling factor are above zero.Steerable PCA is able to recover more signal principal components from noisy images than PCA, steerable PCA, and PCA (see Fig. (a)a). When the sample size , the mean number of estimated components is and for PCA and steerable PCA respectively. For , the estimated number of components is and for PCA and steerable PCA respectively. Fig. (b)b shows that steerable PCA is more efficient than PCA and PCA. Because steerable PCA contains extra steps such as prewhitening, recoloring, and scaling, its runtime is slightly larger than steerable PCA. When , steerable PCA is 8 times faster than PCA.
Fig. 3 shows the top 12 eigenimages for clean XFEL diffraction patterns (Fig. (e)e), and noisy diffraction patterns with mean photon count 0.01 (Figs. (c)c–(i)i) using PCA, steerable PCA, PCA, and steerable PCA. The true eigenimages in Fig. (e)e are computed from 70000 clean diffraction patterns whose orientations are uniformly distributed over . Figs. (a)a–(d)d are computed from noisy images. Since the number of samples is much smaller than the size of the image and the noise type is nonGaussian, PCA can only recover the first two or three components. PCA improves the estimation and is able to extract the top 7 eigenimages. Moreover, steerable PCA and steerable PCA achieve much better estimation of the underlying true eigenimages for a given sample size. Steerable PCA achieves the best performance in estimating both the eigenvalues and eigenimages.
Furthermore, we compare the operator norms and Frobenius norms of the difference between the covariance estimates and the true covariance matrix. Fig. 4 shows that steerable PCA significantly improves the covariance estimation, especially when the sample size is small.
IiiB Denoising
We compare the denoising effects of PCA and steerable PCA by the mean squared error, MSE. We perform “PCA denoising” using empirical best linear predictor (EBLP) [1], which had been shown to outperform “PCA denoising”, i.e., orthogonal projection onto sample or PCA/heterogenized eigenimages, as well as the exponential family PCA method proposed by [37]. Note that in our implementation, to avoid inverting a singular matrix (when some coordinates have 0 sample mean and sample variance), we compute with ‘regularization’ , where , , and is the by identity matrix. The number of components are estimated by the permutation rank estimation as described in the previous section.
Steerable PCA is able to recover the images with lower MSEs compared to PCA, steerable PCA, and PCA (see Fig. 6), especially when the sample size is small. Fig. 5 shows some examples of denoised images for sample size and . For robustness, we repeated the numerical experiment for another dataset simulated from the small protein chignolin (Protein Data Bank entry 1UAO) and obtained qualitatively similar results.
Iv Conclusion
We presented steerable PCA, a method for principal component analysis of a set of lowphoton count images and their uniform inplane rotations. This work has been mostly motivated by its application to XFEL. The computational complexity of the new algorithm is , whereas that of PCA is . Incorporating rotational invariance allows more robust estimation of the true eigenvalues and eigenvectors. Our numerical experiments showed that steerable PCA is able to recover more signal components than PCA and achieves better covariance estimation and denoising results. Finally, we remark that the FourierBessel basis can be replaced with other suitable bases, for example, the 2D prolate spheroidal wave functions (PSWF) on a disk [38, 39].
Acknowledgments
The authors would like to thank Edgar Dobriban and Boris Landa for discussions. ZZ is partially supported by National Center for Supercomputing Applications Faculty Fellowship and University of Illinois at UrbanaChampaign College of Engineering Strategic Research Initiative. AS is partially supported by Award Number R01GM090200 from the NIGMS, FA95501710291 from AFOSR, Simons Investigator Award, the Moore Foundation DataDriven Discovery Investigator Award, and NSF BIGDATA Award IIS1837992.
References
 [1] L. T. Liu, E. Dobriban, and A. Singer, “PCA: High dimensional exponential family PCA,” Annals of Applied Statistics, vol. 12, no. 4, pp. 2121–2150, 2018.
 [2] Z. Zhao and A. Singer, “Rotationally invariant image representation for viewing direction classification in cryoem,” Journal of Structural Biology, vol. 186, no. 1, pp. 153–166, 2014.
 [3] Z. Zhao, Y. Shkolnisky, and A. Singer, “Fast steerable principal component analysis,” IEEE Transactions on Computational Imaging, vol. 2, no. 1, pp. 1–12, 2016.
 [4] V. FavreNicolin, J. Baruchel, H. Renevier, J. Eymery, and A. Borbély, “XTOP: highresolution Xray diffraction and imaging,” Journal of Applied Crystallography, vol. 48, no. 3, pp. 620–620, 2015.
 [5] F. R. N. C. Maia and J. Hajdu, “The trickle before the torrent—diffraction data from Xray lasers,” Scientific Data, vol. 3, 2016.
 [6] S. H. W. Scheres, H. Gao, M. Valle, G. T. Herman, P. P. B. Eggermont, J. Frank, and J.M. Carazo, “Disentangling conformational states of macromolecules in 3DEM through likelihood optimization,” Nature Methods, vol. 4, no. 1, pp. 27–29, 2007.
 [7] D. N.T. Loh and V. Elser, “Reconstruction algorithm for singleparticle diffraction imaging experiments,” Phys. Rev. E, vol. 80, p. 026705, Aug 2009.
 [8] Z. Kam, “Determination of macromolecular structure in solution by spatial correlation of scattering fluctuations,” Macromolecules, vol. 10, no. 5, pp. 927–934, 1977.
 [9] D. K. Saldin, V. L. Shneerson, R. Fung, and A. Ourmazd, “Structure of isolated biomolecules obtained from ultrashort Xray pulses: exploiting the symmetry of random orientations,” Journal of Physics: Condensed Matter, vol. 21, no. 13, 2009.
 [10] D. Starodub, A. Aquila, S. Bajt, M. Barthelmess, A. Barty, C. Bostedt, J. D. Bozek, N. Coppola, R. B. Doak, S. W. Epp et al., “Singleparticle structure determination by correlations of snapshot Xray diffraction patterns,” Nature communications, vol. 3, p. 1276, 2012.
 [11] K. Pande, P. Schwander, M. Schmidt, and D. K. Saldin, “Deducing fast electron density changes in randomly orientated uncrystallized biomolecules in a pump–probe experiment,” Phil. Trans. R. Soc. B, vol. 369, no. 1647, p. 20130332, 2014.
 [12] R. P. Kurta, J. J. Donatelli, C. H. Yoon, P. Berntsen, J. Bielecki, B. J. Daurer, H. DeMirci, P. Fromme, M. F. Hantke et al., “Correlations in scattered Xray laser pulses reveal nanoscale structural features of viruses,” Physical Review Letters, vol. 119, no. 15, p. 158102, 2017.
 [13] J. J. Donatelli, J. A. Sethian, and P. H. Zwart, “Reconstruction from limited singleparticle diffraction data via simultaneous determination of state, orientation, intensity, and phase,” Proceedings of the National Academy of Sciences, vol. 114, no. 28, pp. 7222–7227, 2017.
 [14] B. von Ardenne, M. Mechelke, and H. Grubmüller, “Structure determination from single molecule Xray scattering with three photons per image,” Nat Commun., vol. 9, no. 1, p. 2375, 2018.
 [15] K. Pande, J. J. Donatelli, E. Malmerberg, L. Foucar, C. Bostedt, I. Schlichting, and P. H. Zwart, “Ab initio structure determination from experimental fluctuation xray scattering data,” Proceedings of the National Academy of Sciences, vol. 115, no. 46, pp. 11 772–11 777, 2018. [Online]. Available: http://www.pnas.org/content/115/46/11772
 [16] I. Jolliffe, Principal Component Analysis. Wiley Online Library, 2002.
 [17] T. W. Anderson, An Introduction to Multivariate Statistical Analysis. Wiley New York, 2003.
 [18] R. D. Nowak and R. G. Baraniuk, “Waveletdomain filtering for photon imaging systems,” IEEE Transactions on Image Processing, vol. 8, no. 5, pp. 666–678, 1999.
 [19] F. Luisier, T. Blu, and M. Unser, “A new sure approach to image denoising: Interscale orthonormal wavelet thresholding,” IEEE Transactions on image processing, vol. 16, no. 3, pp. 593–606, 2007.
 [20] F. J. Anscombe, “The transformation of poisson, binomial and negativebinomial data,” Biometrika, vol. 35, no. 34, p. 246, 1948.
 [21] J.L. Starck, F. Murtagh, and J. M. Fadili, Sparse image and signal processing: wavelets, curvelets, morphological diversity. Cambridge university press, 2010.
 [22] T. Furnival, R. K. Leary, and P. A. Midgley, “Denoising timeresolved microscopy image sequences with singular value thresholding,” Ultramicroscopy, 2016.
 [23] Y. Cao and Y. Xie, “Lowrank matrix recovery in Poisson noise,” in Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on. IEEE, 2014, pp. 384–388.
 [24] M. Sonnleitner, J. Jeffers, and S. M. Barnett, “Local retrodiction models for photonnoiselimited images,” in SPIE Photonics Europe. International Society for Optics and Photonics, 2016, pp. 98 960V–98 960V.
 [25] A. Klug and R. A. Crowther, “Threedimensional image reconstruction from the viewpoint of information theory,” Nature, vol. 238, pp. 435–440, 1972.
 [26] L. Greengard and J. Lee, “Accelerating the Nonuniform Fast Fourier Transform,” SIAM Review, vol. 46, no. 3, pp. 443–454, 2004.
 [27] J. Baik, G. Ben Arous, and S. Péché, “Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices,” Annals of Probability, vol. 33, no. 5, pp. 1643–1697, 2005.

[28]
J. Baik and J. W. Silverstein, “Eigenvalues of large sample covariance
matrices of spiked population models,”
Journal of Multivariate Analysis
, vol. 97, no. 6, pp. 1382–1408, 2006.  [29] D. Paul, “Asymptotics of sample eigenstructure for a large dimensional spiked covariance model,” Statistica Sinica, vol. 17, no. 4, pp. 1617–1642, 2007.
 [30] F. BenaychGeorges and R. R. Nadakuditi, “The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices,” Advances in Mathematics, vol. 227, no. 1, pp. 494–521, 2011.
 [31] D. L. Donoho, M. Gavish, and I. M. Johnstone, “Optimal shrinkage of eigenvalues in the Spiked Covariance Model,” Annals of Statistics, vol. 46, no. 4, p. 1742, 2018.
 [32] M. F. Hantke, T. Ekeberg, and F. R. N. C. Maia, “Condor: A simulation tool for flash Xray imaging,” Journal of Applied Crystallography, vol. 49, no. 4, p. 1356–1362, 2016.
 [33] P. Schwander, D. Giannakis, C. H. Yoon, and A. Ourmazd, “The symmetries of image formation by scattering. II. Applications,” Opt. Express, vol. 20, no. 12, pp. 12 827–12 849, Jun 2012.
 [34] J. Landgrebe, W. Wurst, and G. Welzl, “Permutationvalidated principal components analysis of microarray data,” Genome Biology, vol. 3, no. 4, 2002.
 [35] E. Dobriban, “Permutation methods for factor analysis and PCA,” arXiv preprint arXiv:1710.00479, Oct. 2017.
 [36] E. Dobriban and A. B. Owen, “Deterministic parallel analysis: an improved method for selecting factors and principal components,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2018.
 [37] M. Collins, S. Dasgupta, and R. E. Schapire, “A generalization of principal component analysis to the exponential family,” Advances in Neural Information Processing Systems (NIPS), 2001.
 [38] D. Slepian, “Prolate spheroidal wave functions, Fourier analysis, and uncertainty–IV: extensions to many dimensions, generalized prolate spheroidal wave functions,” Bell System Technical Journal, vol. 43, pp. 3009–3057, 1964.
 [39] B. Landa and Y. Shkolnisky, “Steerable principal components for spacefrequency localized images,” SIAM Journal on Imaging Sciences, vol. 10, no. 2, pp. 508–534, 2017.