Manifold Alignment with Feature Correspondence

09/30/2018 ∙ by Jay S. Stanley III, et al. ∙ Yale University 0

We propose a novel framework for combining datasets via alignment of their associated intrinsic dimensions. Our approach assumes that two datasets are sampled from a common latent space, i.e., they measure equivalent systems. Thus, we expect there to exist a natural (albeit unknown) alignment of the data manifolds associated with the intrinsic geometry of these datasets, which are perturbed by measurement artifacts in the sampling process. Importantly, we do not assume any individual correspondence (partial or complete) between data points. Instead, we rely on our assumption that a subset of data features have correspondence across datasets. We leverage this assumption to estimate relations between intrinsic manifold dimensions, which are given by diffusion map coordinates over each of the datasets. We compute a correlation matrix between diffusion coordinates of the datasets by considering graph (or manifold) Fourier coefficients of corresponding data features. We then orthogonalize this correlation matrix to form an isometric transformation between the diffusion maps of the datasets. Finally, we apply this transformation to the diffusion coordinates and construct a unified diffusion geometry of the datasets together. We show that this approach successfully corrects misalignment artifacts and enables data integration.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many natural science settings, and in particular in biology, an often encountered problem is that while data are measured from the same system, is is collected by different equipment or in different days, where sensors are calibrated differently. This is often termed batch effect in biology and can include, for example, drastic variations between subjects, experimental settings, or even times of day when an experiment is conducted. In such settings, it is important to globally and locally align the datasets such that they can be combined for effective further analysis. Otherwise, measurement artifacts may dominate downstream analysis. For instance, clustering the data will group samples by measurement time or sensor used rather than by biological or meaningful differences between datapoints.

Recent works regard the two datasets as “views” of the same system, and construct a multiview diffusion geometry to analyze them [e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. However, most of these require at least partial bijection, if not full one, between views. Other approaches attempt to directly match data points, either in their ambient space or by local data geometry, and these can be very sensitive to differences in sampling density rather than data geometry [11]. Here, we present a principled approach called harmonic alignment for correcting this type of effect based on the popular manifold assumption.

The manifold assumption holds that high dimensional data originates from an intrinsically low dimensional smoothly varying space that is mapped via nonlinear functions to observable high dimensional measurements. Thus, we assume that the datasets are from transformed versions of the same low dimensional manifold. We learn the manifolds separately from the two datasets using diffusion geometric approaches, and then find an isometric transformation to map from one manifold to the other. Note that we are not aligning points to points. Indeed, there may be sampling differences and density differences in the data. However, our manifold learning approach uses an anisotropic kernel that detects the geometry of the data for alignment purposes, rather than working by point-by-point matching as done in previous methods. Our method involves first embedding each dataset separately into harmonic diffusion components, and then finding an isometric transformation that aligns these diffusion representations. To find such transformation, we utilize the duality between diffusion coordinates and geometric harmonics that act as generalized Fourier harmonics in graph (or manifold) spaces. These harmonic diffusion components are eigenvectors of a Markov-normalized data diffusion operator, whose corresponding eigenvalues encode frequencies of these eigenvectors. We attempt to find a transformation from one set of eigenvectors to another, via feature correspondences in the data.

While data point correspondences may be difficult or impossible to obtain, since many biological measurements are destructive, feature correspondences are often available. For instance, single-cell experiments performed on the same device will have counts from the same genes as their features, even though the measurements are typically affected by batch differences. When these corresponding features are transformed via the graph Fourier transform (GFT) into diffusion components, their representations should be similar, with potentially small frequency-proximal perturbations. For example, the GFT of slowly varying features across the manifold should be relatively localized to the low-frequency harmonics. This insight allows us to create a correlation matrix between the graph harmonics of one dataset to another based on correlation between the GFT of features. Since we are only interested in finding frequency-proximal correlations (i.e., we do not want to correlate high frequency harmonics with low frequency harmonics between datasets), we need not compute the entire correlation matrix but rather only its near-diagonal values. We then find the closest orthogonal approximation to our correlation matrix, yielding an isometric linear transformation that maps the diffusion representation of one space into those of the other. This transformation allows us to align the two datasets with each other. Finally, given an aligned representation, we build a robust unified diffusion geometry that is invariant to batch effects and sample-specific artifacts, and further low-pass filter this geometry to denoise the unified manifold. Thus, in addition to aligning data manifolds, our method also denoises them along the way.

We demonstrate the results of our method on artificial manifolds created from rotated MNIST digits, corrupted MNIST digits, and single-cell biological data measuring peripheral blood cells. In each case, our method successfully aligns the data manifolds such that they have appropriate neighbors both within and across the two datasets. Further, we show an application of our approach in transfer learning by applying a lazy classifier to one unlabeled dataset based on labels provided by another dataset (with batch effects between them), and compare the classification accuracy before and after alignment. Finally, comparisons with recently developed methods such as the MNN-based method from 

Haghverdi et al. [11] show significant improvements in performance and denoising ability are achieved by our harmonic alignment methods.

2 Harmonic alignment

A typical and effective assumption in machine learning is that high dimensional data originates from an intrinsic low dimensional manifold that is mapped via nonlinear functions to observable high dimensional measurements; this is commonly referred to as the manifold assumption. Formally, let

be a hidden dimensional manifold that is only observable via a collection of nonlinear functions that enable its immersion in a high dimensional ambient space as from which data is collected. Conversely, given a dataset of high dimensional observations, manifold learning methods assume its data points originate from a sampling of the underlying manifold via , , and aim to learn a low dimensional intrinsic representation that approximates the manifold geometry of  [see, for example, 12, 13, 14, 15, and references therein].

A popular approach towards this manifold learning task is to construct diffusion geometry from data [16], and then embed data points into diffusion coordinates that provide a natural global coordinate system derived from Laplace operator over manifold geometries, as explained in Section 2.1. However, this approach, as well as other manifold learning ones, implicitly assumes that the feature functions represent data collection technologies (e.g., sensors or markers) that operate in a consistent manner on all samples in . While this assumption may be valid in some fields, biological data collection (and more generally, data collection in natural sciences) is often highly affected by a family of phenomena known as batch effects

, which introduce nonnegligible variance between different data batches due to various uncontrollable factors, as explained in Section 

1. These include, for example, drastic variations between subjects, experimental settings, or even times of day when an experiment is conducted.

Therefore, in such settings, one should consider a collection of samples , each originating from feature functions that aim to measure the same quantities in the data, but are also affected by sample-dependent artifacts. While each sample can be analyzed to find its intrinsic structure, their union into a single dataset often yields an incoherent geometry biased by batch effects, where neither the relations between samples or within each sample can be clearly seen. To address such artifacts, and construct a unified geometry of multiple batches (i.e., samples or datasets) together, we propose to first embed each batch separately in diffusion coordinates, and then find an isometric transformation that aligns these diffusion representations.

In order to find such transformation, we utilize the duality between diffusion coordinates and geometric harmonics that act as generalized Fourier harmonics, as shown in graph signal processing [17]. As explained in Section 2.2, this duality allows us to capture cross-batch correlations between diffusion coordinates, and orthogonalize the resulting correlation matrix to provide a map between batch-specific diffusion representations. Finally, given an aligned representation, we build a robust unified diffusion geometry that is invariant to both batch effects and batch-specific artifacts. While our approach generalizes naturally to any number of batches, for simplicity, we focus our formulation here on the case of two batches. A succinct summary of these steps is presented in Algorithm 1.

2.1 Diffusion geometry & graph harmonics

The first step in our approach is to capture the intrinsic geometry of each batch using the diffusion maps method from Coifman and Lafon [16], which non-linearly embeds the data in a new coordinate system (i.e., diffusion coordinates) that is often considered as representing a data manifold, or more generally a diffusion geometry, over the data. We note that as we are only considering an individual batch in this section, we may drop the superscript batch index notation from time to time, for simplicity, when its existence is clearly implied from context.

The diffusion maps construction starts by considering local similarities defined via a kernel , that capture local neighborhoods in the data. We note that a popular choice for is the Gaussian kernel , where

is interpreted as a user-configurable neighborhood size. Next, these similarities are normalized to defined transition probabilities

that are organized in an

row stochastic matrix

that describes a Markovian diffusion process over the intrinsic geometry of the data. Finally, a diffusion map is defined by taking the eigenvalues and (corresponding) eignevectors of , and mapping each data point to an

dimensional vector

, where represents a diffusion-time (i.e., number of transitions considered in the diffusion process). In this work, we denote the diffusion map for the entire batch as . We refer the reader to Coifman and Lafon [16] for further details and mathematical derivation, but note that in general, as increases, most of the eigenvalue weights , , become numerically negligible, and thus truncated diffusion map coordinates (i.e., using only nonnegligible ones) can be used for dimensionality reduction purposes.

Much work has been done in various fields on applications of diffusion maps as a whole, as well as individual diffusion coordinates (i.e., eigenvectors of ), in data analysis [e.g., 18, 19, 20, 21, 22]. In particular, as discussed in Coifman and Lafon [16] and Nadler et al. [23], the diffusion coordinates are closely related to eigenvectors of Laplace operators on manifolds, as well as their discretizations as eigenvectors of graph Laplacians, which were studied previously, for example, in Belkin and Niyogi [24]. Indeed, the similarities measured in can be considered as determining edge weights of a graph structure defined over the data. Formally, we define this graph by considering every data point in as a vertex on the graph, and then defining weighted edges between them via an adjacency matrix with , . Then, the (normalized) graph Laplacian is defined as with being a diagonal degree matrix (i.e., with ). Finally, it is easy to see that , and thus it can be verified that the eigenvectors of can be written as , with corresponding eigenvalues . It should be noted that if data is uniformly sampled from a manifold [as considered in 24]

, these two sets of eigenvectors coincide and the diffusion coordinates can be considered as Laplacian eigenvectors (or eigenfunctions, in continuous settings).

A central tenet of graph signal processing is that the Laplacian eigenfunctions can be regarded as generalized Fourier harmonics [17, 25], i.e., graph harmonics. Indeed, a classic result in spectral graph theory shows that the discrete Fourier basis can be derived as Laplacian eigenvectors of the ring graphs [see, e.g., 26, Proposition 10]. Based on this interpretation, a graph Fourier transform (GFT) is defined on graph signals (i.e., functions over the vertices of the graph) as , , similar to the definition of the classic discrete Fourier transform (DFT). Further, we can also write the GFT in terms of the diffusion coordinates as , given their relation to graph harmonics. Therefore, up to appropriate weighting, the diffusion coordinates can conceptually be interpreted as intrinsic harmonics of the data, and conversely, the graph harmonics can be considered (conceptually) as intrinsic coordinates of data manifolds. In Section 2.2, we leverage this duality between coordinates and harmonics in order to capture relations between data manifolds of individual batches, which are then used in Section 2.3 to align these intrinsic coordinates and construct a unified data manifold over them.

0:  Datasets where has observations by features
0:  Aligned graph Laplacian .
1:  for  do
2:     Compute the anisotropic weight matrix (Equation 1) and degree matrix
3:     Construct the normalized graph Laplacian and its truncated eigensystem , , with and
4:     Compute the diffusion map
5:

     The spectral domain wavelet transform tensor

(Equation 2).
6:  end for
7:  Compute intraband harmonic correlations between each dataset (Section 2.2).
8:  Compute the total interband correlation
9:  Orthogonalize via SVD,
10:  Construct the transformed matrix .
11:  Embed using a Gaussian kernel to obtain .
Algorithm 1 Manifold Alignment via Feature Correspondence

2.2 Cross-graph harmonic correlation

We now turn our attention to considering the relation between two batches via their their intrinsic data manifold structure, as it is captured by diffusion coordinates or, equivalently, graph harmonics. We note that, as discussed extensively in Coifman and Lafon [16], a naïve construction of an intrinsic data graph with a Gaussian kernel (as described, for simplicity, in Section 2.1) may be severely distorted by density variations in the data. Such distortion would detrimental in our case, as the resulting diffusion geometry and its harmonic structure would not longer reflect a stable (i.e., batch-invariant) intrinsic “shape” of the data. Therefore, we follow the normalization suggested in there to separate data geometry from density, and define a graph structure (i.e., adjacency matrix) over each batch via an anistotropic kernel given by

(1)

where is the previously defined Gaussian kernel. Further, we note that, in practice, we chose to remove self-edges (i.e., set ), in order to conform with standard settings in graph signal processing, but this has no effect on the derivation of the proposed method in this section. The resulting graph structure is then used, as previously described, to construct the intrinsic harmonic structure given by on each batch.

While the intrinsic geometry constructed by our graph structures should describe similar “shapes” for the two batches, there is no guarantee that their computed intrinsic coordinates will match. Indeed, it is easy to see how various permutations of these coordinates can be obtained if some eigenvalues have multiplicities greater than one (i.e., their monotonic order is no longer deterministic), but even beyond that, in practical settings batch effects often result in various misalignments (e.g., rotations and other affine transformations) between derived intrinsic coordinates. Therefore, to properly recover relations between multiple batches, we aim to quantify relations between their coordinates, or more accurately, between their graph harmonics.

We note that if we had even a partial overlap between data points in the two batches, this task would be trivially enabled by taking correlations between these harmonics. However, given that here we assume a setting without such predetermined overlap, we have to rely on other properties that are independent of individual data points. To this end, we now consider the feature functions and our initial assumption that corresponding functions aim to measure equivalent quantities in the batches (or datasets). Therefore, while they may differ in the original raw form, we expect their expression over the intrinsic structure of the data (e.g., as captured by GFT coefficients) to correlate, at least partially, between batches. Therefore, we use this property to compute cross-batch correlations between graph harmonics based on the GFT of corresponding data features. To formulate this, it is convenient to extend the definition of the GFT from functions (or vectors) to matrices, by slight abuse of the inner product notation, as , where consists of data features as columns and has graph harmonics as columns (both with rows representing data points).

Notice that the resulting Fourier matrix , for each batch, no longer depends on individual data points, and instead it expresses the graph harmonics in terms of data features. Therefore, we can now use this matrix to formulate cross-batch harmonic correlations by considering inner products between rows of these matrices. Further, we need not consider all the correlations between graph harmonics, since we also have access to their corresponding frequency information, expressed via the associated Laplacian eigenvalues . Therefore, instead of computing correlations between every pair of harmonics across batches, we only consider them within local frequency bands, defined via appropriate graph filters, as explained in the following.

Let be a smooth window defined on the interval as . Then, by translating this window along the along the real line, we obtain equally spaced wavelet windows that can be applied to the eigenvalues in order to smoothly partition the spectrum of each data graph. This construction is known as the itersine filter bank, which can be shown to be a tight frame [27]. The resulting windows are centered at frequencies . The generating function for these wavelets ensures that each halfway overlaps with . This property implies that there are smooth transitions between the weights of consecutive frequency bands. Furthermore, as a tight frame, this filter bank has the property that for any Laplacian eigenvalue . This, in turn, ensures that any filtering we do using the filter bank will behave uniformly across the spectrum. Together, these two properties imply that cross-batch correlations between harmonics within and between bands across the respective batch spectra will be robust. To obtain such band-limited correlations we construct the following filter bank tensor

(2)

Each of this matrix corresponds to the Fourier matrix with rows scaled by . Then, we use these filter-bank tensors to compute band-limited correlations via , and finally merge these to generate a combined matrix , which we refer to as the harmonic (cross-batch) correlation matrix. This step, when combined with the half-overlaps discussed above, allows flexibility in aligning harmonics across bands, which is demonstrated in practice in Section 3.

2.3 Isometric alignment

Given the harmonic correlation matrix , we now define an isometric transformation between the intrinsic coordinate systems of the two data manifolds representing and . Such transformation ensures our alignment fits the two coordinate systems together without breaking the rigid structure of each batch, thus preserving their intrinsic structure. To formulate such transformation, we recall that isometric transformations are given by orthogonal matrices, and thus we can rephrase our task as finding the best approximation of

by an orthogonal matrix. Such approximation is a well studied problem, dating back to 

Schönemann [28]

, which showed that it can be obtained directly from the singular value decomposition

by taking .

Finally, given the isometric transformation defined by , we can now align of the data manifolds of two batches, and define a unified intrinsic coordinate system for the entire data in . While such alignment could equivalently be phrased in terms of either diffusion coordinates or harmonic coordinates , we opt here for the latter, as they relates more directly to the computed harmonic correlations. Therefore, we construct the transformed embedding matrix as

(3)

where we drop the superscript for (as they are clear from context), are diagonal matrices that consists of the nonzero Laplacian eigenvalues of each view, and consist of the corresponding eigenvectors (i.e., harmonics) as its columns. We note that the truncated zero-valued eigenvalues correspond to zero frequencies (i.e., flat constant harmonics), and therefore they only encode global shifts that we anyway aim to remove in the alignment process. Accordingly, this truncation is also applied to the harmonic correlation matrix prior to its orthogonalization, in order to ensure the orthogonality of as a transformation on truncated harmonic coordinates. Finally, we note that this construction in terms of Laplacian eigenpairs is equivalent to a diffusion map, albeit using a slightly different derivation of a discretized heat kernel [popular, for example, in graph signal processing works such as 25], with the parameter again serving an analogous purpose to diffusion time.

3 Empirical results

3.1 Rotational Alignment

As a first proof of concept, we first demonstrate harmonic alignment of two circular manifolds. To generate these manifolds, we rotated two different MNIST examples of the digit ‘3’ in a full circle (i.e., consisting of degrees), and sampled a point for each degree (see Figure (a)a). As we noted in Section 2, the manifold coordinates obtained by diffusion maps capture intrinisic structure that should be invariant to the phase of this rotation (e.g., the starting angle). However, as seen in this example, it is clear that the two generated manifolds are out of phase with one another, even in their respective diffusion coordinates.

DM2

DM1

DM2

DM1

(a)

ConcatenatedEmbedding

DM2

DM1

DM3

Transformation

DM2

DM1

DM3

Aligned Embedding

Unaligned Embedding

Nearest Neighbors

(b)
Figure 3: Alignment of circular manifolds. LABEL:sub@subfig:rotatedcircles A circular manifold was generated by sampling 360 points per batch from two MNIST examples of the digit ‘3’ (each serving as a batch) rotated in a complete circle. A 2D diffusion geometries obtained from each digit is circular, but the phase of the circle (i.e., rotation) is arbitrary. LABEL:sub@subfig:rotcircles_dm Top: An alignment is obtained by rotating the two embeddings into the latent space of the other. Only three dimensions are displayed here, but alignment is performed based on all nonzero-frequency harmonics. Bottom: The nearest neighbors of a given point in the unaligned embedding are within-sample. Alignment creates connections between samples that are faithful to the phase of the digit, as seen by the presence of out-of-sample nearest neighbors.

Figure (b)b demonstrates the simple rotation that is learned by harmonic alignment between the two embeddings. On the left side, we see the out-of-phase embeddings. Taking nearest neighbors in this space illustrates the disconnection between the embeddings: nearest neighbors are only formed for within-sample points. After alignment, however, we see that the embeddings are in phase with each other because nearest neighbors in the aligned space span both samples and are in the same orientation with each other.

3.2 Feature Corruption

k-Neighborhood recovery after Feature Corruption

Uncorrupted Features

5-Nearest Neighbor classification accuracy
(a)

Input

Corruption

NoAlignment

HarmonicAlignment

Recovery

5-Nearest Neighbor classification accuracy
(b)
Figure 6: Recovery of k-neighborhoods under feature corruption. LABEL:sub@subfig:KNNrecovery At each iteration, two sets and of points were sampled from MNIST. was then distorted by a corruption matrix for various identity percentages (see section 3.2). Subsequently, a lazy classification scheme was used to classify points in using a nearest neighbor vote from . Results for harmonic alignment with different filterbank sizes, mutual nearest neighbors (MNN), and classification without alignment are shown. LABEL:sub@subfig:KNNrecons Reconstruction of digits with only 25% uncorrupted features. Left: Input digits. Left middle: 75% of the pixels in the input are corrupted. Right middle: Reconstruction without harmonic alignment. Right: Reconstruction after harmonic alignment.

Next, we assess the ability of harmonic alignment in recovering -nearest neighborhoods after random feature corruption (see Figure  6). To do this, we drew random samples from MNIST to get batch datasets and , both of size of data points (i.e., digit images). Then, for each trial in this experiment we drew

samples from a unit-variance Normal distribution to create a

random matrix. We orthogonalized this matrix to yield the corruption matrix . To vary the amount of feature corruption, we produced partial corruption matrices (for several values of ) by randomly substituting (i.e., ) of the columns in with columns of the identity matrix. Right multiplication of by these matrices yields corrupted images with only preserved pixels (see Figure (b)b, ‘corrupted’). To assess the recovery of -nearest neighborhoods, we then performed a lazy classification on data points (i.e., rows) in by only using the labels of neighbors from . The results of this experiment, performed for are reported in Figure (a)a. For robustness, at each we sampled three different non-overlapping pairs , and for each pair we sampled three matrices, each with random identity columns, for a total of nine trials per .

In general, both unaligned mutual nearest neighbors (MNN) and harmonic alignment with any filter set cannot recover -nearest neighborhoods under total corruption; note that in our case, it clearly violates our (partial) feature correspondence assumption. Therefore, we start our analysis with overlap, which achieves 10% accuracy, essentially giving the baseline (random chance) accuracy given that MNIST has ten classes. However, when using sufficiently many bandlimited filters (see Section 2.2), our harmonic alignment quickly recovers over accuracy and consistently outperforms both MNN and unaligned classifications, except under under high correspondence (i.e., when ).

Next we examined the ability of harmonic alignment to reconstruct corrupted data (see Figure (b)b). We performed the same corruption procedure as before with and selected ten examples of each digit in MNIST. We show the ground truth from and the corrupted result in Figure (b)b. Then, reconstruction was performed by setting each pixel in a new image to the dominant class average of the nearest neighbors from . In the unaligned case, we see that most examples turn into smeared fives or ones; this is likely a random intersection formed by and (e.g., accounting for the baseline random chance classification accuracy). On the other hand, the reconstructions produced by harmonic alignment resemble their original input examples.

3.3 Comparisons

Alignment Algorithm Performance

Input Size

Runtime

Harmonic Alignment

MNN

Wang
(a)

Lazy Classification Accuracy

Input Size

5-Nearest Neighbor classification accuracy

Harmonic Alignment

MNN

Wang
(b)

Lazy Classification with Training/Test imbalance

Training to test sample ratio

5-Nearest Neighbor classification accuracy

Harmonic Alignment

MNN

Wang
(c)
Figure 10: Comparison to other unsupervised alignment methods. LABEL:sub@subfig:compruntimes Runtime as a function of input size. Method implementations were obtained from github repositories of their authors. Runtime performance was measured on an Intel 3.8 GHz i7-7700HQ laptop with 64 GB Dual-channel DDR4 memory at 2400 MHz running Pop!OS 4.15 and MATLAB R2018a. LABEL:sub@subfig:compclassification Lazy classification accuracy relative to input size. For each input size , the average of three iterations of lazy classification of unlabeled randomly corrupted digits with 35% preserved pixels (see Section 3.2) is reported. LABEL:sub@subfig:transferlearning Transfer learning performance. For each ratio, 1K uncorrupted, labeled digits were sampled from MNIST, and then 1K, 2K, 4K, and 8K (x-axis) unlabeled points were sampled and corrupted with 35% column identity. Mean over three iterations of lazy classification is reported for each method.

In Figure 10, we compare the runtime, k-nn accuracy, and transfer learning capabilities of our method with two other contemporary alignment methods. First, we examine the unsupervised algorithm proposed by Wang and Mahadevan [29] for generating weight matrices between two different samples. The algorithm first creates a local distance matrix of size around each point and its four nearest neighbors. Then, it computes an optimal match between k-nn distance matrices of each pair of points in and by comparing all permutations of the k-nn matrices and computing the minimal Frobenius norm between such permuted matrices. We report runtime results for , as failed to complete after running for eight hours. Because the method in Wang and Mahadevan [29] merely gives a weight matrix that can be used with separate algorithms for computing the final features, we report accuracy results using their implementation. Regardless of input size, we were unable to recover k-nearest neighborhoods for datasets with 35% uncorrupted columns (Figure (b)b), in spite of the computational cost of the method (Figure (a)a).

A more scalable approach to manifold alignment [i.e., 11] has recently emerged in computational biology literature. This approach uses mutual nearest neighbor (MNN) matching to compute mapping between datasets based on the assumption that if two points are truly neighbors they will resemble each other in both datasets. Because this approach amounts to building a k-nearest neighbors graph for each dataset and then choosing a set of neighbors between each dataset, MNN scales comparably to our method (see Figure (a)a). Additionally, MNN is able to recover 20-30% of k-nearest neighborhoods when only 35% of features match (see Figure (b)b). However, while this is an improvement over Wang and Mahadevan [29], its results are substantially lower than those of the proposed harmonic alignment method. We note tha, additionally, the performance of harmonic alignment monotonically correlates (i.e., improves) with input size, whereas MNN did not improve with more points in our tests.

3.4 Transfer Learning

An interesting use of manifold alignment algorithms is transfer learning. In these setting, an algorithm is trained to perform well on a small (e.g., preliminarly collected) dataset, and the goal is to extend the algorithm to a new larger dataset (e.g., as more data is being collected) after alignment. In Figure (c)c we explore the utility of harmonic alignment in transfer learning and compare it to MNN and the method proposed by Wang and Mahadevan [29]. In this experiment, we first randomly selected uncorrupted examples of MNIST digits, and constructed their diffusion map to use as our training set. Next, we took 65% corrupted unlabeled points (see Section 3.2) in batches of , , , and , as a test set to perform lazy classification on using the labels from the uncorrupted examples. As shown in (c)c, as the testing set increases up to a ratio of one training sample to eight testing samples, harmonic alignment consistently outperformed both competing methods, and remains above 60% post alignment lazy-classification accuracy even with ratio. In addition to showing the use of manifold alignment in transfer learning, this demonstrates the robustness of our algorithm to imbalance between batches.

3.5 Biological data

Noisy Inflammatory Response

IFN Abundance

TNF Abundance

IFN

EMD = 2.699

Abundance

P(x)
(a)

Unaligned Inflammatory Response

IFN Abundance

TNF Abundance

IFN

EMD = 3.127

Abundance

P(x)
(b)

Aligned Inflammatory Response

IFN Abundance

TNF Abundance

IFN

EMD = 0.135

Abundance

P(x)
(c)
Figure 14: Batch effect removal in biological data. 4K cells were subsampled from two single-cell immune profiles obtained via mass cytometry on blood samples of two patients infected with Dengue fever. Top: Both patients exhibit heightened IFN (x-axis), a pro-inflammatory cytokine associated with tumor necrosis factor alpha (TNF, y-axis) Bottom: IFN histograms for each batch. EMD: “Earth Mover’s Distance” LABEL:sub@subfig:noisy Data before denoising. LABEL:sub@subfig:biounaligned Denoising of unaligned data enhances a technical effect between samples in IFN. LABEL:sub@subfig:bioaligned Harmonic alignment corrects the IFN shift.

To illustrate the need for robust manifold alignment in computational biology, we turn to a simple real-world example obtained from Amodio et al. [30] (Figure 14). This dataset was collected by mass cytometry (CyTOF) of peripheral blood mononuclear cells (PBMC) from patients who contracted dengue fever. Subsequently, the Montgomery lab at Yale University experimentally introduced these PBMCs to Zika virus strains.

The canonical response to dengue infection is upregulation of interferon gamma (IFN), as discussed in Chesler and Reiss [31], Chakravarti and Kumaria [32], and Braga et al. [33]. During early immune response, IFN works in tandem with acute phase cytokines such as tumor necrosis factor to induce febrile response and inhibit viral replication [34]. In the PBMC dataset, we thus expect to see upregulation of these two cytokines together, which we explore in Figure 14.

In Figure (a)a, we show the relationship between IFN and TNF without denoising. Note that there is a substantial difference between the IFN distributions of sample 1 and sample 2 (”Earth Mover’s Distance” (EMD) = 2.699). In order to identify meaningful relationships in CyTOF data, it is common to denoise it first [15]. We used a graph low-pass filter proposed in van Dijk et al. [35] to denoise the cytokine data. The results of this denoising are shown in Figure (b)b. This procedure introduced more technical artifacts by enhancing the difference between batches, as seen by the increased EMD (3.127) between the IFN distributions of both patients. This is likely due to a substantial connectivity difference between the two batch subgraphs in the total graph of the combined dataset.

Next, we performed harmonic alignment of the two patient profiles. We show the results of this in Figure (c)c. Harmonic alignment corrected the difference between IFN distributions and restored the canonical correlation of IFN and TNF. This example illustrates the utility of harmonic alignment for biological data, where it can be used for integrated analysis of data collected across different experiments, patients, and time points.

4 Conclusion

We presented a novel method for aligning or batch-normalizing datasets, which involves learning and aligning their intrinsic manifold dimensions. Our method leverages the fact that common or corresponding features across datasets should have similar harmonics on the intrinsic geometry of the data, represented by as a graph. Our

harmonic alignment method finds an isometric transformation that maximizes the similarity of graph-based frequency harmonics of common features. Results show that our method successfully aligns artificially misaligned samples, as well as biological data containing batch effects. Our method has the advantages that it aligns manifold geometry rather than density (and thus is insensitive to sampling differences in data). Further, our method inherently denoises the datasets to obtain alignments of significant manifold dimensions rather than noise. We expect future applications of harmonic alignment to include, for example, integration of data from different measurement types performed on the same system, where only a subset of features have known correlations.

References

  • Lafon et al. [2006] Stephane Lafon, Yosi Keller, and Ronald R Coifman. Data fusion and multicue data matching by diffusion maps. IEEE Transactions on pattern analysis and machine intelligence, 28(11):1784–1797, 2006.
  • Ham et al. [2003] Ji Hun Ham, Daniel D Lee, and Lawrence K Saul. Learning high dimensional correspondences from low dimensional manifolds. In ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
  • Ham et al. [2005] Jihun Ham, Daniel D Lee, and Lawrence K Saul. Semisupervised alignment of manifolds. In AISTATS, pages 120–127, 2005.
  • Wang and Mahadevan [2008] Chang Wang and Sridhar Mahadevan. Manifold alignment using procrustes analysis. In Proceedings of the 25th international conference on Machine learning, pages 1120–1127. ACM, 2008.
  • Tuia and Camps-Valls [2016] Devis Tuia and Gustau Camps-Valls. Kernel manifold alignment for domain adaptation. PloS one, 11(2):e0148655, 2016.
  • Coifman and Hirn [2014] Ronald R Coifman and Matthew J Hirn. Diffusion maps for changing data. Applied and computational harmonic analysis, 36(1):79–107, 2014.
  • Lindenbaum et al. [2015] Ofir Lindenbaum, Arie Yeredor, Moshe Salhov, and Amir Averbuch. Multiview diffusion maps. arXiv preprint arXiv:1508.05550, 2015.
  • Marshall and Hirn [2016] Nicholas F Marshall and Matthew J Hirn. Time coupled diffusion maps. arXiv preprint arXiv:1608.03628, 2016.
  • Lederman and Talmon [2018] Roy R Lederman and Ronen Talmon. Learning the geometry of common latent variables using alternating-diffusion. Applied and Computational Harmonic Analysis, 44(3):509–536, 2018.
  • Boumal et al. [2018] Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer. Heterogeneous multireference alignment: A single pass approach. In Information Sciences and Systems (CISS), 2018 52nd Annual Conference on, pages 1–6. IEEE, 2018.
  • Haghverdi et al. [2018] Laleh Haghverdi, Aaron TL Lun, Michael D Morgan, and John C Marioni. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nature biotechnology, 36(5):421, 2018.
  • Van Der Maaten et al. [2009] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative. J Mach Learn Res, 10:66–71, 2009.
  • Izenman [2012] Alan Julian Izenman. Introduction to manifold learning. Wiley Interdisciplinary Reviews: Computational Statistics, 4(5):439–446, 2012.
  • Lin et al. [2015] Binbin Lin, Xiaofei He, and Jieping Ye. A geometric viewpoint of manifold learning. Applied Informatics, 2(1):3, 2015.
  • Moon et al. [2017] Kevin R Moon, Jay Stanley, Daniel Burkhardt, David van Dijk, Guy Wolf, and Smita Krishnaswamy. Manifold learning-based methods for analyzing single-cell rna-sequencing data. Current Opinion in Systems Biology, 2017.
  • Coifman and Lafon [2006] Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  • Shuman et al. [2013] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013.
  • Farbman et al. [2010] Zeev Farbman, Raanan Fattal, and Dani Lischinski. Diffusion maps for edge-aware image editing. ACM Transactions on Graphics (TOG), 29(6):145, 2010.
  • Barkan et al. [2013] Oren Barkan, Jonathan Weill, Lior Wolf, and Hagai Aronowitz.

    Fast high dimensional vector multiplication face recognition.

    In

    Proceedings of the IEEE International Conference on Computer Vision

    , pages 1960–1967, 2013.
  • Mahmoudi and Sapiro [2009] Mona Mahmoudi and Guillermo Sapiro. Three-dimensional point cloud recognition via distributions of geometric distances. Graphical Models, 71(1):22–31, 2009.
  • Angerer et al. [2015] Philipp Angerer, Laleh Haghverdi, Maren Büttner, Fabian J Theis, Carsten Marr, and Florian Buettner. destiny: diffusion maps for large-scale single-cell data in r. Bioinformatics, 32(8):1241–1243, 2015.
  • Haghverdi et al. [2016] Laleh Haghverdi, Maren Buettner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. Diffusion pseudotime robustly reconstructs lineage branching. Nature methods, 13(10):845, 2016.
  • Nadler et al. [2006] Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman.

    Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators.

    In Advances in neural information processing systems, pages 955–962, 2006.
  • Belkin and Niyogi [2002] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pages 585–591, 2002.
  • Shuman et al. [2016] David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
  • Olfati-Saber [2007] Reza Olfati-Saber. Algebraic connectivity ratio of ramanujan graphs. In American Control Conference, 2007. ACC’07, pages 4619–4624. IEEE, 2007.
  • Perraudin et al. [2014] Nathanaël Perraudin, Nicki Holighaus, Peter L Søndergaard, and Peter Balazs. Designing gabor windows using convex optimization. arXiv preprint arXiv:1401.6033, 2014.
  • Schönemann [1966] Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  • Wang and Mahadevan [2009] Chang Wang and Sridhar Mahadevan. Manifold alignment without correspondence. In IJCAI, volume 2, page 3, 2009.
  • Amodio et al. [2018] Matthew Amodio, David van Dijk, Krishnan Srinivasan, William S Chen, Hussein Mohsen, Kevin R Moon, Allison Campbell, Yujiao Zhao, Xiaomei Wang, Manjunatha Venkataswamy, Anita Desai, Ravi V., Priti Kumar, Ruth Montgomery, Guy Wolf, and Smita Krishnaswamy.

    Exploring single-cell data with deep multitasking neural networks.

    bioRxiv, 2018. doi: 10.1101/237065.
  • Chesler and Reiss [2002] David A Chesler and Carol Shoshkes Reiss. The role of ifn- in immune responses to viral infections of the central nervous system. Cytokine & growth factor reviews, 13(6):441–454, 2002.
  • Chakravarti and Kumaria [2006] Anita Chakravarti and Rajni Kumaria. Circulating levels of tumour necrosis factor-alpha & interferon-gamma in patients with dengue & dengue haemorrhagic fever during an outbreak. Indian Journal of Medical Research, 123(1):25, 2006.
  • Braga et al. [2001] Elzinandes LA Braga, Patrícia Moura, Luzia MO Pinto, Sonia Ignácio, Maria José C Oliveira, Marly T Cordeiro, and Claire F Kubelka. Detection of circulant tumor necrosis factor-alpha, soluble tumor necrosis factor p75 and interferon-gamma in brazilian patients with dengue fever and dengue hemorrhagic fever. Memorias do Instituto Oswaldo Cruz, 96(2):229–232, 2001.
  • Ohmori et al. [1997] Yoshihiro Ohmori, Robert D Schreiber, and Thomas A Hamilton. Synergy between interferon- and tumor necrosis factor- in transcriptional activation is mediated by cooperation between signal transducer and activator of transcription 1 and nuclear factor b. Journal of Biological Chemistry, 272(23):14899–14907, 1997.
  • van Dijk et al. [2018] David van Dijk, Roshan Sharma, Juoas Nainys, Kristina Yim, Pooja Kathail, Ambrose Carr, Cassandra Burdziak, Kevin R Moon, Christine L Chaffer, Diwakar Pattabiraman, Brian Bierie, Linas Mazutis, Guy Wolf, Krishnaswamy Smita, and Data Pe’er. Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716 – 729.e27, 2018. doi: 10.1016/j.cell.2018.05.061.