The geometric notion of a low-dimensional manifold is a common, yet powerful, tool for modeling high-dimensional data. Manifold models arise in cases where (i) a -dimensional parameter can be identified that carries the relevant information about a signal and (ii) the signal changes as a continuous (typically nonlinear) function of these parameters. Some typical examples include a one-dimensional (1-D) signal shifted by an unknown time delay (parameterized by the translation variable), a recording of a speech signal (parameterized by the underlying phonemes spoken by the speaker), and an image of a 3-D object at an unknown location captured from an unknown viewing angle (parameterized by the 3-D coordinates of the object and its roll, pitch, and yaw). In these and many other cases, the geometry of the signal class forms a nonlinear -dimensional manifold in ,
where is the -dimensional parameter space [1, 2, 3]. Low-dimensional manifolds have also been proposed as approximate models for nonparametric signal classes such as images of human faces or handwritten digits [4, 5, 6].
In many scenarios, multiple observations of the same event may be performed simultaneously, resulting in the acquisition of multiple manifolds that share the same parameter space. For example, sensor networks — such as camera networks or microphone arrays — typically observe a single event from a variety of vantage points, while the underlying phenomenon can often be described by a set of common global parameters (such as the location and orientation of the objects of interest). Similarly, when sensing a single phenomenon using multiple modalities, such as video and audio, the underlying phenomenon may again be described by a single parameterization that spans all modalities. In such cases, we will show that it is advantageous to model this joint structure contained in the ensemble of manifolds as opposed to simply treating each manifold independently. Thus we introduce the concept of the joint manifold
: a model for the concatenation of the data vectors observed by the group of sensors. Joint manifolds enable the development of improved manifold-based learning and estimation algorithms that exploit this structure. Furthermore, they can be applied to data of any modality and dimensionality.
In this work we conduct a careful examination of the theoretical properties of joint manifolds. In particular, we compare joint manifolds to their component manifolds to see how quantities like geodesic distances, curvature, branch separation, and condition number are affected. We then observe that these properties lead to improved performance and noise-tolerance for a variety of signal processing algorithms when they exploit the joint manifold structure, as opposed to processing data from each manifold separately. We also illustrate how this joint manifold structure can be exploited through a simple and efficient data fusion algorithm that uses random projections, which can also be applied to multimodal data.
Related prior work has studied manifold alignment, where the goal is to discover maps between several datasets that are governed by the same underlying low-dimensional structure. Lafon et al. proposed an algorithm to obtain a one-to-one matching between data points from several manifold-modeled classes . The algorithm first applies dimensionality reduction using diffusion maps to obtain data representations that encode the intrinsic geometry of the class. Then, an affine function that matches a set of landmark points is computed and applied to the remainder of the datasets. This concept was extended by Wang and Mahadevan, who apply Procrustes analysis on the dimensionality-reduced datasets to obtain an alignment function between a pair of manifolds . Since an alignment function is provided instead of a data point matching, the mapping obtained is applicable for the entire manifold rather than for the set of sampled points. In our setting, we assume that either (i) the manifold alignment is provided intrinsically via synchronization between the different sensors or (ii) the manifolds have been aligned using one of the approaches described above. Our main focus is a theoretical analysis of the benefits provided by analyzing the joint manifold versus solving our task of interest separately on each of the manifolds observed by individual sensors.
This paper is organized as follows. Section 2 introduces and establishes some basic properties of joint manifolds. Section 3 considers the application of joint manifolds to the tasks of classification and manifold learning. Section 4 then describes an efficient method for processing and aggregating data when it lies on a joint manifold, and Section 5 concludes with discussion.
2 Joint manifolds
In this section we develop a theoretical framework for ensembles of manifolds which are jointly parameterized by a small number of commondegrees of freedom. Informally, we propose a data structure for jointly modeling such ensembles; this is obtained by concatenating points from different ensembles that are indexed by the same articulation parameter to obtain a single point in a higher-dimensional space. We begin by defining the joint manifold for the general setting of arbitrary topological manifolds222A comprehensive introduction of topological manifolds can be found in Boothby ..
Let be an ensemble of topological manifolds of equal dimension . Suppose that the manifolds are homeomorphic to each other, in which case there exists a homeomorphism between and for each . For a particular set of mappings , we define the joint manifold as
Furthermore, we say that are the corresponding component manifolds.
Notice that serves as a common parameter space for all the component manifolds. Since the component manifolds are homeomorphic to each other, this choice is ultimately arbitrary. In practice it may be more natural to think of each component manifold as being homeomorphic to some fixed dimensional parameter space . However, in this case one could still define as is done above by defining as the composition of the homeomorphic mappings from to and from to .
As an example, consider the one-dimensional manifolds in Figure 1. Figures 1 (a) and (b) show two isomorphic manifolds, where is an open interval, and where , i.e., is a circle with one point removed (so that it remains isomorphic to a line segment). In this case the joint manifold , illustrated in Figure 1 (c), is a helix. Notice that there exist other possible homeomorphic mappings from to , and that the precise structure of the joint manifold as a submanifold of is heavily dependent on the choice of this mapping.
|(a) : line segment||(b) : circle segment||(c) : helix segment|
Returning to the definition of , observe that although we have called the joint manifold, we have not shown that it actually forms a topological manifold. To prove that is indeed a manifold, we will make use of the fact that the joint manifold is a subset of the product manifold . One can show that the product manifold forms a -dimensional manifold using the product topology . By comparison, we now show that has dimension only .
is a -dimensional submanifold of .
We first observe that since is a subset of the product manifold, we automatically have that is a second countable Hausdorff topological space. Thus, all that remains is to show that is locally homeomorphic to . Let be an arbitrary point on . Since , we have a pair such that is an open set containing and is a homeomorphism where is an open set in . We now define for and . Note that for each , is an open set and is a homeomorphism (since is a homeomorphism).
Now define . Observe that is an open set and that . Furthermore, let be any element of . Then for each . Thus, since the image of each in under their corresponding is the same, we can form a single homeomorphism by assigning . This shows that is locally homeomorphic to as desired. ∎
Since is a submanifold of , it also inherits some desirable properties from its component manifolds.
Suppose that are isomorphic topological manifolds and is defined as above.
If are Riemannian, then is Riemannian.
If are compact, then is compact.
The proofs of these facts are straightforward and follow from the fact that if the component manifolds are Riemannian or compact, then the product manifold will be as well. then inherits these properties as a submanifold of the product manifold . ∎
Up to this point we have considered general topological manifolds. In particular, we have not assumed that the component manifolds are embedded in any particular space. If each component manifold is embedded in , the joint manifold is naturally embedded in where . Hence, the joint manifold can be viewed as a model for data of varying ambient dimension linked by a common parametrization. In the sequel, we assume that each manifold is embedded in , which implies that . Observe that while the intrinsic dimension of the joint manifold remains constant at , the ambient dimension increases by a factor of . We now examine how a number of geometric properties of the joint manifold compare to those of the component manifolds.
We begin with the following simple observation that Euclidean distances between points on the joint manifold are larger than distances on the component manifolds. In the remainder of this paper, whenever we use the notation we mean , i.e., the (Euclidean) norm on . When we wish to differentiate this from other norms, we will be explicit.
Let and be two points on the joint manifold . Then
This follows from the definition of the Euclidean norm:
While Euclidean distances are important (especially when noise is introduced), the natural measure of distance between a pair of points on a Riemannian manifold is not Euclidean distance, but rather the geodesic distance. The geodesic distance between points is defined as
where is a -smooth curve joining and , and is the length of as measured by
In order to see how geodesic distances on compare to geodesic distances on the component manifolds, we will make use of the following lemma.
Suppose that are Riemannian manifolds, and let be a -smooth curve on the joint manifold. Then we can write where each is a -smooth curve on , and
We are now in a position to compare geodesic distances on to those on the component manifold.
Suppose that are Riemannian manifolds. Let and be two points on the corresponding joint manifold . Then
If the mappings are isometries, i.e., for any and for any pair of points (), then
If is a geodesic path between and , then from Lemma 2.1,
By definition ; hence, this establishes (7).
Now observe that lower bound in Lemma 2.1 is derived from the lower inequality of (5). This inequality is attained with equality if and only if each term in the sum is equal, i.e., for all and . This is precisely the case when are isometries. Thus we obtain
We now conclude that since if we could obtain a shorter path from to this would contradict the assumption that is a geodesic on , which establishes (8). ∎
Next, we study local smoothness and global self avoidance properties of the joint manifold using the notion of condition number.
 Let be a Riemannian submanifold of . The condition number is defined as , where is the largest number satisfying the following: the open normal bundle about of radius is embedded in for all .
The condition number of a given manifold controls both local smoothness properties and global properties of the manifold. Intuitively, as becomes smaller, the manifold becomes smoother and more self-avoiding. This is made more precise in the following lemmata.
 Suppose has condition number . Let be two distinct points on , and let denote a unit speed parameterization of the geodesic path joining and . Then
 Suppose has condition number . Let be two points on such that If , then the geodesic distance is bounded by
We wish to show that if the component manifolds are smooth and self avoiding, the joint manifold is as well. It is not easy to prove this in the most general case, where the only assumption is that there exists a homeomorphism (i.e., a continuous bijective map ) between every pair of manifolds. However, suppose the manifolds are diffeomorphic, i.e., there exists a continuous bijective map between tangent spaces at corresponding points on every pair of manifolds. In that case, we make the following assertion.
Suppose that are Riemannian submanifolds of , and let denote the condition number of . Suppose also that the that define the corresponding joint manifold are diffeomorphisms. If is the condition number of , then
Let , which we can write as with . Since the are diffeomorphisms, we may view as being diffeomorphic to ; i.e., we can build a diffeomorphic map from to as
We also know that given any two manifolds linked by a diffeomorphism , each vector in the tangent space of the manifold at the point is uniquely mapped to a tangent vector in the tangent space of the manifold at the point through the map , where denotes the Jacobian operator.
Consider the application of this property to the diffeomorphic manifolds and . In this case, the tangent vector to the manifold can be uniquely identified with a tangent vector to the manifold . This mapping is expressed as
since the Jacobian operates componentwise. Therefore, the tangent vector can be written as
In other words, a tangent vector to the joint manifold can be decomposed into component vectors, each of which are tangent to the corresponding component manifolds.
Using this fact, we now show that a vector that is normal to can also be broken down into sub-vectors that are normal to the component manifolds. Consider , and denote as the normal space at . Suppose . Decompose each as a projection onto the component tangent and normal spaces, i.e., for ,
such that for each . Let and . Then , and since is tangent to the joint manifold , we have , and thus
Hence , i.e., each is normal to .
Armed with this last fact, our goal now is to show that if then the normal bundle of radius is embedded in , or equivalently, that provided that . Indeed, suppose . Since and for all , we have that . Since we have proved that are vectors in the normal bundle of and their magnitudes are less than , then by the definition of condition number. Thus and the result follows. ∎
This result states that for general manifolds, the most we can say is that the condition number of the joint manifold is guaranteed to be less than that of the worst manifold. However, in practice this is not likely to happen. As an example, Figure 2 illustrates the point at which the normal bundle intersects itself for the case of the joint manifold from Figure 1 (c). In this case we obtain . Note that the condition numbers for the manifolds and generating are given by and . Thus, while the condition number in this case is not as good as the best manifold, it is still notably better than the worst manifold. In general, even this example may be somewhat pessimistic, and it is possible that in many cases the joint manifold may be better conditioned than even the best manifold.
3 Joint manifolds in signal processing
Manifold models can be exploited by a number of algorithms for signal processing tasks such as pattern classification, learning, and control . The performance of such algorithms often depends on geometric properties of the manifold model such as its condition number and geodesic distances along its surface. The theory developed in Section 2 suggests that the joint manifold preserves or improves these properties. We will now see that when noise is introduced these results suggest that, in the case of multiple data sources, it can be extremely beneficial to use algorithms specifically designed to exploit the joint manifold structure.
We first study the problem of manifold-based classification. The problem is defined as follows: given manifolds and , suppose we observe a signal where either or and is a noise vector, and we wish to find a function that attempts to determine which manifold “generated” . We consider a simple classification algorithm based on the generalized maximum likelihood framework described in 
. The approach is to classify by computing the distance from the observed signalto each of the manifolds, and then classify based on which of these distances is smallest, i.e., our classifier is
We will measure the performance of this algorithm for a particular pair of manifolds by considering the probability of misclassifying a point fromas belonging to , which we denote .
To analyze this problem, we employ three common notions of separation in metric spaces:
The minimum separation distance between two manifolds and is defined as
The Hausdorff distance from to is defined to be
with defined similarly. Note that , while in general
The maximum separation distance between manifolds and is defined as
As one might expect, is controlled by the separation distances. For example, suppose that ; if the noise vector is bounded and satisfies , then we have that and hence
Thus we are guaranteed that
Therefore, and the classifier defined by (9) satisfies . We can refine this result in two possible ways. First, note that the amount of noise that we can tolerate without making an error depends on . Specifically, for a given , provided that we still have that . Thus, for a given we can tolerate noise bounded by .
A second possible refinement that we will explore below is to ignore this dependence of , but to extend our noise model to the case where with non-zero probability. We can still bound since
We provide bounds on this probability for both the component manifolds and the joint manifold as follows: first, we first compare the separation distances for these cases.
Consider the joint manifolds and . Then, the following bounds hold:
Joint minimum separation:
Joint Hausdorff separation from to :
Joint maximum separation from to :
since the distance between two points in any given component space is greater than the minimum separation distance corresponding to that space. This establishes the lower bound in (11). We obtain the upper bound by selecting a , and selecting and such that and attain the minimum separation distance . From the definition of , we have that
and since this holds for every choice of , (11) follows by taking the minimum over all .
To prove inequality (12), we follow a similar course. We begin by selecting and that satisfy
which establishes the upper bound in (12). To obtain the lower bound, we again select a , and now let be the point for which the corresponding at which the Hausdorff separation for the component manifold is attained, i.e., the corresponding point is furthest away from as can be possible in . Let be the nearest point in to . From the definition of the Hausdorff distance, we get that
since the Hausdorff distance is the maximal distance between the points in and their respective nearest neighbors in . Again, it also follows that
Since this again holds for every choice of , (12) follows by taking the maximum over all .
As an example, if we consider the case where the separation distances are constant for all , then the joint minimum separation distance satisfies
In the case where then we observe that can be considerably larger than . This means that we can potentially tolerate much more noise while ensuring . To see this, write and recall that we require to ensure that . Thus, if we require that for all , then we have that
However, if we instead only require that we only need , which can be a significantly less stringent requirement.
The benefit of classification using the joint manifold is made more apparent when we extend our noise model to the case where we allow with non-zero probability and apply (10). To bound the probability in (10), we will make use of the following adaptation of Hoeffding’s inequality .
Suppose that is a random vector that satisfies , for . Suppose also that the are independent and identically distributed (i.i.d.) with . Then if , we have that for any ,
Using this lemma we can relax the assumption on so that we only require that it is finite, and instead make the weaker assumption that for a particular pair of manifolds , . This assumption ensures that , so that we can combine Lemma 3.1 with (10) to obtain a bound on . Note that if this condition does not hold, then this is a very difficult classification problem since the expected norm of the noise is large enough to push us closer to the other manifold, in which case the simple classifier given by (9) makes little sense.
We now illustrate how Lemma 3.1 can be be used to compare error bounds between classification using a joint manifold and classification using a particular pair of component manifolds .
Suppose that we observe a vector where and is a random vector such that , for , and that the are i.i.d. with . If
and we classify the observation according to (9), then