# Multi-Frequency Vector Diffusion Maps

We introduce multi-frequency vector diffusion maps (MFVDM), a new framework for organizing and analyzing high dimensional datasets. The new method is a mathematical and algorithmic generalization of vector diffusion maps (VDM) and other non-linear dimensionality reduction methods. MFVDM combines different nonlinear embeddings of the data points defined with multiple unitary irreducible representations of the alignment group that connect two nodes in the graph. We illustrate the efficacy of MFVDM on synthetic data generated according to a random graph model and cryo-electron microscopy image dataset. The new method achieves better nearest neighbor search and alignment estimation than the state-of-the-arts VDM and diffusion maps (DM) on extremely noisy data.

## Authors

• 5 publications
• 30 publications
• ### Vector Diffusion Maps and the Connection Laplacian

We introduce vector diffusion maps (VDM), a new mathematical framework ...
02/01/2011 ∙ by Amit Singer, et al. ∙ 0

• ### Diffusion Maps meet Nyström

Diffusion maps are an emerging data-driven technique for non-linear dime...
02/23/2018 ∙ by N. Benjamin Erichson, et al. ∙ 0

• ### Cryo-Electron Microscopy Image Analysis Using Multi-Frequency Vector Diffusion Maps

Cryo-electron microscopy (EM) single particle reconstruction is an entir...
04/16/2019 ∙ by Yifeng Fan, et al. ∙ 12

• ### Compressed Diffusion

Diffusion maps are a commonly used kernel-based method for manifold lear...
01/31/2019 ∙ by Scott Gigante, et al. ∙ 0

• ### Diffusion map for clustering fMRI spatial maps extracted by independent component analysis

Functional magnetic resonance imaging (fMRI) produces data about activit...
06/06/2013 ∙ by Tuomo Sipola, et al. ∙ 0

• ### Web image annotation by diffusion maps manifold learning algorithm

Automatic image annotation is one of the most challenging problems in ma...
12/08/2014 ∙ by Neda Pourali, et al. ∙ 0

• ### Representation Theoretic Patterns in Multi-Frequency Class Averaging for Three-Dimensional Cryo-Electron Microscopy

We develop in this paper a novel intrinsic classification algorithm -- m...
05/31/2019 ∙ by Tingran Gao, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Nonlinear dimensionality reduction methods, such as locally linear embedding (LLE) (Roweis & Saul, 2000), ISOMAP (Tenenbaum et al., 2000), Hessian LLE (Donoho & Grimes, 2003), Laplacian eigenmaps (Belkin & Niyogi, 2002, 2003), and diffusion maps (DM) (Coifman & Lafon, 2006)

are invaluable tools for embedding complex data in a low dimensional space and for regression problems on graphs and manifolds. To this end, those methods assume that the high-dimensional data lies on a low dimensional manifold and local affinities in a weighted neighborhood graph are used to learn the global structure of the data. Spectral clustering

(Nadler et al., 2006; Von Luxburg, 2007)(Zhu, 2006; Goldberg et al., 2009; Yang et al., 2016), out-of-sample extension (Belkin et al., 2006), image denoising (Gong et al., 2010; Singer et al., 2009) share similar geometrical considerations. Those techniques are either directly or indirectly related to the heat kernel for functions on the data. Vector diffusion maps (VDM) (Singer & Wu, 2012)

generalizes DM to define heat kernel for vector fields on the data manifold. The corresponding adjacency matrix is based on edge weights and orthogonal transformations between connected nodes. Using the spectral decomposition of the matrix, VDM defines a metric for the data to indicate the closeness of the data points on the manifold. For some applications, the vector diffusion metric is beneficial, since it takes into account linear transformations, and as a result, it provides a better organization of the data. However, for extremely noisy data, VDM nearest neighbor search may fail at identifying the true nearby points on the manifold. This results in shortcut edges that connect points with large geodesic distances on the manifold.

To address this issue, we introduce a new algorithm called multi-frequency vector diffusion maps (MFVDM) to represent and organize complex high-dimensional data, exhibiting a non-trivial group invariance. To this end, we augment VDM with multiple irreducible representations of the compact group to improve the rotationally invariant nearest neighbor search and the alignment estimation between nearest neighbor pairs, when the initial estimation contains a large number of outliers due to noise. Specifically, we define a set of kernels, denoted by

, using multiple irreducible representations of the compact alignment group indexed by integer and introduce the corresponding frequency--VDMs. The MFVDM is constructed by concatenating all the frequency--VDMs up to a cutoff

. We use the new embeddings to identify nearest neighbors. The eigenvectors of the normalized

are used to estimate the pairwise alignments between nearest neighbors. This framework also extends the mathematical theory of cryo-electron microscopy (EM) image analysis (Singer et al., 2011; Hadani & Singer, 2011; Giannakis et al., 2012; Schwander et al., 2012; Dashti et al., 2014). We show that MFVDM outperforms VDM and DM for data sampled from low-dimensional manifolds, when a large proportion of the edge connections are corrupted. MFVDM is also able to improve the nearest neighbor search and rotational alignment for 2-D class averaging in cryo-EM.

## 2 Preliminaries and Problem Setup

Given a dataset for , we assume that the data lie on or close to a low dimensional smooth manifold of intrinsic dimension . Suppose that is a compact Lie group, which has unitary irreducible representations according to Peter-Weyl theorem. The data space is closed under if for all and all , , where ‘’ denotes the group action. The -invariant distance between two data points is defined as,

 dij=ming∈G∥xi−g⋅xj∥, (1)

and the associated optimal alignment is,

 gij=argming∈G∥xi−g⋅xj∥. (2)

We assume that the optimal alignment is unique and construct an undirected graph based on the distances in (1) using the -neighborhood criterion, i.e.  iff , or -nearest neighbor criterion, i.e.  iff is one of the nearest neighbors of . The edge weights are defined using a kernel function on the -invariant distance . For example, the Gaussian kernel leads to weights of the form

 wij=Kσ(dij)=exp(−ming∈G∥xi−g⋅xj∥2σ). (3)

The resulting graph is defined on the quotient space and is invariant to the group transformation of the individual data points. Under certain conditions, the quotient space is also a smooth manifold. We can identify each data point with and the dimension of is lower than the dimension of . The unitary irreducible representation of the group is represented by . If and are close on the manifold, then the representation of the optimal alignment is an approximation of the local parallel transport operator  (Singer et al., 2011; Singer & Wu, 2012).

Take cryo-EM imaging as an example, each image is a tomographic projection of a 3D object at an unknown orientation represented by a orthogonal matrix satisfying and  (Singer et al., 2011; Hadani & Singer, 2011; Zhao & Singer, 2014). The viewing direction of each image can be represented as a point on the unit sphere, denoted by (). The first two columns of the orthogonal matrix and correspond to the lifted vertical and horizontal axes of the image in the tangent plane . Therefore, each image can be represented by a unit tangent vector on the sphere and the base manifold is . Images with similar ’s are identified as the nearest neighbors and they can be accurately estimated using (1) from clean images. Registering the centered images corresponds to in-plane rotationally aligning the nearest neighbor images according to (2).

In many applications, noise in the observational data affects the estimations of -invariant distances and optimal alignments . This results in shortcut edges in the -neighborhood graph or -nearest neighbor graph, and connects points on where the underlying geodesic distances are large.

## 3 Algorithm

To address this issue of shortcut edges induced by noise, we extend VDM using multiple irreducible representations of the compact alignment group.

### 3.1 Affinity and mapping

We assume the initial graph is given along with the optimal alignments on the connected edges. For simplicity and because of our interest in cryo-EM image classification, we focus on and we denote the optimal alignment angle by . The corresponding frequency- unitary irreducible representations is , where . For points that are nearby on , the alignments should have cycle consistency under the clean case, for example, for integers , if nodes , and are true nearest neighbors. To systematically incorporate the alignment information and impose the consistency of alignments, for a given graph , we construct a set of affinity matrices ,

 Wk(i,j)={wijeıkαij(i,j)∈E,0otherwise, (4)

where the edge weights according to (3) are real, and for all . At frequency , the weighted degree of node is:

 deg(i):=∑j:(i,j)∈E|Wk(i,j)|=∑j:(i,j)∈Ewij, (5)

and the degree is identical through all frequencies. We define a diagonal degree matrix of size , where the diagonal entry .

We construct the normalized matrix which is applied to complex vectors of length and each entry can be viewed as a vector in . The matrix is an averaging operator for vector fields, i.e. . In our framework, we define affinity between and by considering the consistency of the transformations over all paths of length that connect and . In addition, we also consider the consistencies in the transported vectors at frequency (see Fig. 1). Intuitively, this means sums the transformations of all length- paths from to , and a large value of indicates not only the strength of connection between and , but also the level of consistency in the alignment along all connected paths.

We obtain the affinity of and by observing the following decomposition:

 Ak=D−1Wk=D−1/2D−1/2WkD−1/2SkD1/2. (6)

Since

is Hermitian, it has a complete set of real eigenvalues

and eigenvectors , where . We can express in terms of the eigenvalues and eigenvectors of :

 S2tk(i,j) =n∑l=1(λ(k)l)2tu(k)l(i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯u(k)l(j). (7)

Therefore the affinity of and at the frequency is given by

 |S2tk(i,j)|2 =n∑l,r=1(λ(k)lλ(k)r)2tu(k)l(i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯u(k)r(i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯u(k)l(j)u(k)r(j) =⟨V(k)t(i),V(k)t(j)⟩, (8)

which is expressed by an inner product between two vectors via the mapping :

 V(k)t:i↦((λ(k)lλ(k)r)t⟨u(k)l(i),u(k)r(i)⟩)nl,r=1. (9)

We call this frequency--VDM.

Truncated mapping: Notice the matrices and are both positive semi-definite (PSD) due to the following property: we have

 z∗(I±Sk)z= (10) ∑(i,j)∈Ewij∣∣ ∣∣z(i)√deg(i)±eıkαijz(j)√deg(j)∣∣ ∣∣2≥0.

Therefore all eigenvalues of lie within the interval . Consequently, for large , most terms in (8) are close to , and can be well approximated by using only a few of the largest eigenvalues and their corresponding eigenvectors. Hence, we truncate the frequency--VDM mapping using a cutoff for each frequency :

 (11)

The affinity of and at the frequency after truncation is given by

 |^S2tk(i,j)|2=⟨^V(k)t(i),^V(k)t(j)⟩≈|S2tk(i,j)|2. (12)

Remark 1: The truncated mapping not only has the advantage of computational efficiency, but also enhances the robustness to noise since the eigenvectors with smaller eigenvalues are more oscillatory and sensitive to noise.

Multi-frequency mapping: Consider the affinity in (8) for , if and are connected by multiple paths with consistent transformations, the affinity should be large for all . Then we can combine multiple representations (i.e., combine multiple ) to evaluate the consistencies of the group transformations along connected paths. Therefore, a straightforward way is to concatenate the truncated mappings for all as:

 ^Vt(i):i↦(^V(1)t(i);^V(2)t(i);…;^V(kmax)t(i)), (13)

called multi-frequency vector diffusion maps (MFVDM). We define the new affinity of and as the inner product of and :

 |^S2t(i,j)|2 :=kmax∑k=1|^S2tk(i,j)|2=kmax∑k=1⟨^V(k)t(i),^V(k)t(j)⟩ =⟨^Vt(i),^Vt(j)⟩. (14)

MFVDM systematically incorporates the cycle consistencies on the geometric graph across multiple irreducible representations of the transformation group elements (in-plane rotational alignments in this case, see Fig. 1). Using information from multiple irreducible group representations leads to a more robust measure of rotationally invariant similarity.

Remark 2: Empirically, we find the normalized mapping to be more robust to noise than . A similar phenomenon was discussed in VDM (Singer & Wu, 2012). The normalized affinity is defined as,

 Nt(i,j)=⟨^Vt(i)∥^Vt(i)∥, ^Vt(j)∥^Vt(j)∥⟩. (15)

Comparison with DM and VDM: Diffusion maps (DM) only consider scalar weights over the edges and the vector diffusion maps (VDM) only take into account consistencies of the transformations along connected edges using only one representation of , i.e. . In this paper, we generalize VDM and use not only one irreducible representation, i.e. , but also higher order up to .

### 3.2 Nearest neighbor search and rotational alignment

In this section we introduce our method for joint nearest neighbor search and rotational alignment.

Nearest neighbor search: Based on the extended and normalized mapping , we define the multi-frequency vector diffusion distance between node and as

 d2MFVDM,t(i,j)=∥∥ ∥∥^Vt(i)∥^Vt(i)∥−^Vt(j)∥^Vt(j)∥∥∥ ∥∥22 (16) =2−2⟨^Vt(i)∥^Vt(i)∥, ^Vt(j)∥^Vt(j)∥⟩=2−2Nt(i,j),

which is the Euclidean distance between mappings of and . We define the nearest neighbor for a node to be the node with smallest . Similarly, for VDM and DM, we define the distances and , and perform the nearest neighbor search accordingly.

Rotational alignment: We notice that the eigenvectors of encode the alignment information between neighboring nodes, as illustrated in Fig. 2. Assume that two nodes and are located at the same base manifold point, for example, the same point on , but their tangent bundle frames are oriented differently, with an in-plane rotational angle . Then the corresponding entries of the eigenvectors are vectors in the complex plane and the following holds,

 u(k)l(i)=eıkαiju(k)l(j),∀l=1,2,…,n. (17)

When and are close but not identical, (17) holds approximately. Recalling Remark 1, due to the existence of noise, for each frequency we approximate the alignment using only top eigenvectors. We then use weighted least squares to estimate , which can be written as the following optimization problem:

 ^αij =argminαkmax∑k=1mk∑l=1(λ(k)l)2t∣∣u(k)l(i)−eıkαu(k)l(j)∣∣2 =argmaxαkmax∑k=1(mk∑l=1(λ(k)l)2tu(k)l(i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯u(k)l(j))e−ıkα =argmaxαkmax∑k=1S2tk(i.j)e−ikα. (18)

To solve this, we define a sequence and set for to be

 z(k)=S2tk(i.j)=mk∑l=1(λ(k)l)2tu(k)l(i)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯u(k)l(j). (19)

According to (19) and (18), the alignment angles

can be efficiently estimated by using an FFT on zero-padded

and identifying its peak. Due to usage of multiple unitary irreducible representations of

, this approximation is more accurate and robust to noise than VDM. The improvement of the alignment estimation using higher order trigonometric moments is also observed in phase synchronization

(Gao & Zhao, 2019).

Computational complexity: Our joint nearest neighbor search and alignment algorithm is summarized in Alg. 1. The computational complexity is dominated by the eigen-decomposition: Computing the top eigenvectors of the sparse Hermitian matrices , for requires , where is the average number of non-zero elements in each row of (e.g. number of nearest neighbors). If we assume to use an identical truncation (i.e., for all ), and express the above in terms of the mapping dimension , then the complexity is . For large and moderate , the dominant term is , therefore MFVDM and VDM () could have similar computational complexity for generating the mapping. Moreover, MFVDM can be faster by parallelizing for each frequency . Next, searching for -nearest neighbors takes flops. The alignment step requires FFT of zero-padded of length , therefore identifying the alignments takes or .

## 4 Analysis

We use a probabilistic model to illustrate the noise robustness of our embedding using the top eigenvectors and eigenvalues of ’s. We start with the clean neighborhood graph, i.e.  if is among ’s -nearest neighbors or is among ’s -nearest neighbors according to the

-invariant distances. We construct a noisy graph based on the following process starting from the existing clean graph edges: with probability

, the distance is still small and we keep the edge between and . With probability we remove the edge and link to a random vertex, drawn uniformly at random from the remaining vertices that are not already connected to . We assume that if the link between and is a random link, then the optimal alignment

is uniformly distributed over

. Our model assumes that the underlying graph of links between noisy data points is a small-world graph (Watts & Strogatz, 1998) on the manifold, with edges being randomly rewired with probability . The alignments take their correct values for true links and random values for the rewired edges. The parameter controls the signal to noise ratio of the graph connection where indicates the clean graph.

The matrix

is a random matrix under this model. Since the expected value of the random variable

vanishes for , the expected value of the matrix is

 EWk=pWcleank, (20)

where is the clean matrix that corresponds to obtained in the case that all links and angles are set up correctly. At a single frequency , the matrix can be decomposed into

 Wk=pWcleank+Rk, (21)

where is a random matrix whose elements are independent and identically distributed (i.i.d) zero mean random variables with finite moments, since the elements of are bounded for . The top eigenvectors of approximate the top eigenvectors of as long as the 2-norm of is not too large. Various bounds on the spectral norm of random sparse matrices are proven in (Khorunzhy, 2001; Khorunzhiy, 2003). This ensures the noise robustness for each frequency-

-VDM. Combining an ensemble of classifiers is able to boost the performance

(Zhou, 2012). Across different frequencies, the entries are dependent through the relations of the irreducible representations. We will provide detailed analysis across frequency channels in the future.

Spectral properties for : Related to the application in cryo-EM image analysis, we assume that the data points are uniformly distributed over according to the Haar measure. The base manifold characterized by the viewing directions ’s is a unit two sphere and the pairwise alignment group is . Then approximates the local parallel transport operator from to , whenever and have similar viewing directions and that satisfy , where characterizes the size of the small spherical cap of the neighborhood. The matrices approximate the local parallel transport operators , which are integral operators over . We have the following spectral properties for the integral operators,

###### Theorem 1

The operator has a discrete spectrum , , with multiplicities equal to , for every . Moreover, in the regime , the eigenvalue has the asymptotic expansion

 λ(k)l(h)=12h−k+(l−1)(l+2k)8h2+O(h3). (22)

The proof of Theorem 1 is detailed in the Appendix A.1 of (Gao et al., 2019b). Each eigenvalue , as a function of , is a polynomial of degree . This extends Theorem 3 in (Hadani & Singer, 2011) to frequencies . The multiplicities of the eigenvalues can be seen in the last column of Fig. 3 and Fig. 12. A direct consequence of Theorem 1 is that the top spectral gap of for small can be explicitly obtained. When , the top spectral gap is , which increases with the angular frequency. If we use top eigenvectors for the frequency--VDM, then from a perturbation analysis perspective, it is well known (see e.g. (Rohe et al., 2011; Eldridge et al., 2018; Fan et al., 2018) and the references therein) that the stability of the eigenmaps essentially depends on the top spectral gap. Therefore, we are able to jointly achieve more robust embedding and nearest neighbor search under high level of noise or a large number of outliers. Moreover, we are not restricted to use only top eigenvectors and incorporating more eigenvectors can improve the results (Singer et al., 2011).

## 5 Experiments

### 5.1 Synthetic examples on 2 dimensional sphere and torus

We test MFVDM on two synthetic examples: 2-D sphere and torus . For the first example, we simulate points uniformly distributed over according to the Haar measure. Each can be represented by a orthogonal matrix whose determinant is equal to 1. The third column of the rotation matrices (denoted as ) forms a point on the manifold ,

 S2={v∈R3:∥v∥=1}. (23)

The pairwise alignment is computed based on (2). The hairy ball theorem (Milnor, 1978) says that a continuous tangent vector field to the two dimensional sphere must vanish at some points on the sphere, therefore, we cannot identify for , such that , for all and . As a result, we cannot globally align the tangent vectors. For the torus, we sample points uniformly distributed on the manifold, which are embedded in three dimensional space according to,

 T2=⎧⎨⎩x=(R+rcosu)cosv,y=(R+rcosu)sinv,z=rsinu, (24)

where , and , and for each node we assign an angle that is uniformly distributed in , due to the existence of a continuous vector field, we set the pairwise alignment . For both examples, we connect each node with its top 150 nearest neighbors based on their geodesic distances on the base manifold, then noise is added on edges following the random graph model described in Sec. 4 with parameter . Finally, we build the affinity matrix by setting weights , with .

Parameter setting: For MFVDM, we set the maximum frequency and for each , we select top eigenvectors. For VDM and DM, we set the number of eigenvectors to be . In addition, we set random walk step size .

Spectral property on : We numerically verify the spectrum of graph connection Laplacian on for different and random rewiring parameter . Smaller indicates more edges are corrupted by noise. Fig. 3 shows that the multiplicities of (normalized matrix) agree with Theorem 1. The spectral gaps persist even when 80% of the edges are corrupted (see the right column of Fig. 3).

Multi-frequency vector diffusion distances on : Based on (16), Fig. 4 displays the normalized and truncated multi-frequency vector diffusion distances , vector diffusion distances , and diffusion distances between a reference point (marked in red) and others, on at (clean graph). Moreover, we increase the diffusion step size from to . In this clean case, all three distances are highly correlated to the geodesic distance. Specifically, MFVDM and VDM perform similarly.

To demonstrate the robustness to noise of , we compare , , and against the geodesic distance on in Fig. 5 at different noise levels. When , all the distances are highly correlated with the geodesic distance, e.g., small , , and all correspond to small geodesic distance. However at high noise level as or , both and become more scattered, while remains correlated with the geodesic distance. Here the random walk steps and the results are similar for or .

Nearest neighbor search and rotational alignment: We test the nearest neighbor search (NN search) and rotational alignment results on both sphere and torus, with different noise levels . As mentioned, one advantage of MFVDM is its robustness to noise. Even at a high noise level, the true affinity between nearest neighbors can still be preserved. In our experiments, for each node we identify its nearest neighbors.

We evaluate the NN search by the geodesic distance between each node and its nearest neighbors. A better method should find more neighbors with geodesic distance close to 0. In the top rows of Fig. 6 and Fig. 7 we show the histograms of such geodesic distance. Note that in the low noise regime (), MFVDM, VDM and DM all perform well and MFVDM is slightly better. When the noise level increases to , both VDM and DM have poor result while MFVDM still works well. These comparisons show MFVDM, which benefits from multiple irreducible representations, is very robust to noise.

We evaluate the rotational alignment estimation by computing the alignment errors for all pairs of nearest neighbors , where is the ground truth and is the estimation. In the bottom rows of Fig. 6 and Fig. 7, we show the histograms of such alignment errors. The results demonstrate that for a wide range of , i.e., , the MFVDM alignment errors are closer to than the baseline VDM. At , the VDM errors disperse between 0 to 180 degrees, whereas a large number of the alignment errors of MFVDM are still close to 0.

At each frequency , we individually perform NN search based on frequency--VDM and the corresponding affinity in (12). For the example, we find that all single frequency mappings achieve similar accuracies when ’s are identical (see Fig. 8). MFVDM combines those weak single frequency classifiers into a strong classifier to boost the accuracy of nearest neighbor search

Choice of parameters: The performance of MFVDM depends on two parameters: the maximum frequency cutoff and the number of top eigenvectors . We assume that ’s are the same for all frequencies, that is . In the top row of Fig. 10, we show the average geodesic distances between the nearest neighbor pairs identified by MFVDM, with different values of and . First, we fix and vary . The performance of MFVDM improves with increasing and plateaus when approaches 50 (see the upper left panel of Fig. 10). Then we fix and vary . The upper right panel of Fig. 10 shows that choosing achieves the best performance. Using a larger number of eigenvectors, i.e. , does not lead to higher accuracy in nearest neighbor search, because the eigenvectors of with small eigenvalues are more sensitive to noise and including them will reduce the robustness to noise of the mappings. In addition, we evaluate the performance of VDM and DM under varying number of eigenvectors in the bottom row of Fig. 10. VDM and DM also achieve the best performance at . Comparing the upper left and lower left panels of Fig. 10, we find that MFVDM greatly improves the nearest neighbor search accuracy of VDM when 90% of the true edges are rewired. Note that the solid blue line in the upper left panel of Fig. 10 corresponds to the best performance curve in the lower left panel of Fig. 10 (green line with ).

### 5.2 Application: Cryo-EM 2-D image analysis

MFVDM is motivated by the cryo-EM 2-D class averaging problem. In the experiments, protein samples are frozen in a very thin ice layer. Each image is a tomographic projection of the protein density map at an unknown random orientation. It is associated with a rotation matrix , where the third column of indicates the projection direction , which can be realized by a point on . Projection images and that share the same views look the same up to some in-plane rotation. The goal is to identify images with similar views, then perform local rotational alignment and averaging to denoise the image. Therefore, MFVDM is suitable to perform the nearest neighbor search and rotational alignment estimation.

In our experiment, we simulate projection images from a 3-D electron density map of the 70S ribosome (see Fig. 11), the orientations for the projection images are uniformly distributed over and the images are contaminated by additive white Gaussian noise at signal-to-noise ratio (SNR) equal to 0.05. Note that such high noise level is commonly observed in real experiments. In Fig. 11, we display samples of such clean and noisy images. We use fast steerable PCA (sPCA) (Zhao et al., 2016) and rotationally invariant features (Zhao & Singer, 2014) to initially identify the images of similar views and the in-plane rotational alignment angles according to (Zhao & Singer, 2014). Then we take the initial graph structure and the estimated optimal alignments as the input of Alg. 1.

In addition to DM, VDM, and MFVDM, we apply another type of kernel introduced in steerable graph Laplacian (SGL) (Landa & Shkolnisky, 2018), which is defined on image pairs considering all possible rotational alignments, to the image datasets. In Fig. 12, we present 30 smallest eigenvalues of the graph connection Laplacian . The spectral gaps are more prominent for both clean and noisy images with MFVDM. We set , , , and for MFVDM, VDM, and DM respectively. Although using SGL kernel achieves slightly better NN search, its performance on alignment estimation is worse than MFVDM (see Fig. 13).

## 6 Discussion

In the current probabilistic model, we only consider independent edge noise, i.e., the entries in for a fixed are independent. This does not cover the measurement scenarios in some applications. For example, in cryo-EM 2-D image analysis, each image is corrupted by independent noise. Therefore, the entries in become dependent since the edge connections and alignments are affected by the noise in each image node. Empirically, our new algorithm is still applicable and results in the improved nearest neighbor search and rotational alignment estimation compared to the state-of-the-art VDM. We leave the analysis of node level noise to future work. In addition, there can be other approaches to define the multi-frequency mapping, such as weighted average among different frequencies or majority voting. We will explore other ways to integrate multi-frequency information in the future.

The current analysis focuses on data points that are uniformly distributed on the manifold. For non-uniformly distributed data points, different normalization techniques introduced in DM (Coifman & Lafon, 2006; Zelnik-Manor & Perona, 2005) are needed to compensate for the non-uniform sampling density.

Since our framework is motivated by the cryo-EM nearest neighbor image search and alignment, we have so far only considered the compact manifold where the intrinsic dimension is and the local parallel transport operator can be well approximated by the in-plane rotational alignment of the images or the alignment of the local tangent bundles as discussed in VDM (Singer & Wu, 2012). In the future, we will extend the current algorithm to manifolds with higher intrinsic dimension and other compact group alignments with their corresponding irreducible representations

, for example, the symmetric group which is widely used in computer vision

(Bajaj et al., 2018).

## 7 Conclusion

In this paper, we have introduced MFVDM for joint nearest neighbor search and rotational alignment estimation. The key idea is to extend VDM using multiple irreducible representations of the compact Lie group. Enforcing the consistency of the transformations at different frequencies allows us to achieve better nearest neighbor identification and accurately estimate the alignments between the updated nearest neighbor pairs. The approach is based on spectral decomposition of multiple kernel matrices. We use the random matrix theory and the rationale of ensemble methods to justify the robustness of MFVDM. Experimental results show efficacy of our approach compared to the state-of-the-art methods. This general framework can be applied to many other problems, such as joint synchronization and clustering (Gao et al., 2019a) and multi-frame alignment in computer vision.

## References

• Bajaj et al. (2018) Bajaj, C., Gao, T., He, Z., Huang, Q., and Liang, Z. SMAC: Simultaneous mapping and clustering using spectral decompositions. In International Conference on Machine Learning, pp. 334–343, 2018.
• Belkin & Niyogi (2002) Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2002.
• Belkin & Niyogi (2003) Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 2003.
• Belkin et al. (2006) Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.
• Coifman & Lafon (2006) Coifman, R. R. and Lafon, S. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
• Dashti et al. (2014) Dashti, A., Schwander, P., Langlois, R., Fung, R., Li, W., Hosseinizadeh, A., Liao, H. Y., Pallesen, J., Sharma, G., Stupina, V. A., et al. Trajectories of the ribosome as a brownian nanomachine. Proceedings of the National Academy of Sciences, 111(49):17492–17497, 2014.
• Donoho & Grimes (2003) Donoho, D. L. and Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.
• Eldridge et al. (2018) Eldridge, J., Belkin, M., and Wang, Y. Unperturbed: spectral analysis beyond Davis-Kahan. In Algorithmic Learning Theory, pp. 321–358, 2018.
• Fan et al. (2018) Fan, J., Wang, W., and Zhong, Y. An eigenvector perturbation bound and its application. Journal of Machine Learning Research, 18(207):1–42, 2018.
• Gao & Zhao (2019) Gao, T. and Zhao, Z. Multi-frequency phase synchronization. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2132–2141, 2019.
• Gao et al. (2019a) Gao, T., Brodzki, J., and Mukherjee, S. The geometry of synchronization problems and learning group actions. Discrete & Computational Geometry, 2019a.
• Gao et al. (2019b) Gao, T., Fan, Y., and Zhao, Z. Representation theoretic patterns in multi-frequency class averaging for three-dimensional cryo-electron microscopy. arXiv preprint arXiv:1906.01082, 2019b.
• Giannakis et al. (2012) Giannakis, D., Schwander, P., and Ourmazd, A. The symmetries of image formation by scattering. I. theoretical framework. Optics express, 20(12):12799–12826, 2012.
• Goldberg et al. (2009) Goldberg, A., Zhu, X., Singh, A., Xu, Z., and Nowak, R. Multi-manifold semi-supervised learning. In Artificial Intelligence and Statistics, pp. 169–176, 2009.
• Gong et al. (2010) Gong, D., Sha, F., and Medioni, G. Locally linear denoising on image manifolds. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 265–272, 2010.
• Hadani & Singer (2011) Hadani, R. and Singer, A. Representation theoretic patterns in three-dimensional cryo-electron microscopy II—the class averaging problem. Foundations of Computational Mathematics, 11(5):589–616, 2011.
• Khorunzhiy (2003) Khorunzhiy, O. Rooted trees and moments of large sparse random matrices. In Discrete Mathematics and Theoretical Computer Science, pp. 145–154. Discrete Mathematics and Theoretical Computer Science, 2003.
• Khorunzhy (2001) Khorunzhy, A. Sparse random matrices: spectral edge and statistics of rooted trees. Advances in Applied Probability, 33(1):124–140, 2001.
• Landa & Shkolnisky (2018) Landa, B. and Shkolnisky, Y. The steerable graph laplacian and its application to filtering image datasets. SIAM Journal on Imaging Sciences, 11(4):2254–2304, 2018.
• Milnor (1978) Milnor, J. Analytic proofs of the “hairy ball theorem” and the brouwer fixed point theorem. The American Mathematical Monthly, 85(7):521–524, 1978.
• Nadler et al. (2006) Nadler, B., Lafon, S., Kevrekidis, I., and Coifman, R. R.

Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators.

In Advances in neural information processing systems, pp. 955–962, 2006.
• Rohe et al. (2011) Rohe, K., Chatterjee, S., and Yu, B. Spectral clustering and the high-dimensional stochastic block model. The Annals of Statistics, 39(4):1878–1915, 2011.
• Roweis & Saul (2000) Roweis, S. T. and Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
• Schwander et al. (2012) Schwander, P., Giannakis, D., Yoon, C. H., and Ourmazd, A. The symmetries of image formation by scattering. II. applications. Optics express, 20(12):12827–12849, 2012.
• Singer & Wu (2012) Singer, A. and Wu, H.-T. Vector Diffusion Maps and the Connection Laplacian. Communications on Pure and Applied Mathematics, 65(8):1067–1144, 2012.
• Singer et al. (2009) Singer, A., Shkolnisky, Y., and Nadler, B. Diffusion interpretation of nonlocal neighborhood filters for signal denoising. SIAM Journal on Imaging Sciences, 2(1):118–139, 2009.
• Singer et al. (2011) Singer, A., Zhao, Z., Shkolnisky, Y., and Hadani, R. Viewing angle classification of cryo-electron microscopy images using eigenvectors. SIAM Journal on Imaging Sciences, 4(2):723–759, 2011.
• Tenenbaum et al. (2000) Tenenbaum, J. B., De Silva, V., and Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science, 2000.
• Von Luxburg (2007) Von Luxburg, U. A tutorial on spectral clustering. Statistics and computing, 2007.
• Watts & Strogatz (1998) Watts, D. J. and Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440, 1998.
• Yang et al. (2016) Yang, Z., Cohen, W., and Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pp. 40–48, 2016.
• Zelnik-Manor & Perona (2005) Zelnik-Manor, L. and Perona, P. Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608, 2005.
• Zhao & Singer (2014) Zhao, Z. and Singer, A. Rotationally invariant image representation for viewing direction classification in cryo-EM. Journal of structural biology, 186(1):153–166, 2014.
• Zhao et al. (2016) Zhao, Z., Shkolnisky, Y., and Singer, A.

Fast steerable principal component analysis.

IEEE transactions on computational imaging, 2(1):1–12, 2016.
• Zhou (2012) Zhou, Z.-H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012.
• Zhu (2006) Zhu, X. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2(3):4, 2006.