Many problems in computer vision and pattern recognition boil down to constructing a Laplacian operator describing some data manifold and finding its eigenvectors. Notable examples include spectral clustering, eigenmaps , diffusion maps and distances [12, 25], spectral graph partitioning , spectral hashing , and image segmentation .
Recently, there has been an increased interest in extending spectral geometric constructions to the multimodal setting, involving two or more data spaces. Many data analysis applications involve observations and measurements of data using different modalities, such as multimedia documents [3, 37, 29, 24, 18, 23], audio and video [19, 1, 30], images with different lighting conditions , or medical imaging modalities .
Multimodal (or ‘multi-view’) clustering was studied in the computer vision and pattern recognition community [13, 22, 33, 8, 21, 15]. Sindhwani et al.  used a convex combination of Laplacians in the ‘co-regularization’ framework. Manifold alignment considered multiple manifolds as a single space with ‘connections’ between points and tries to find an aligned set of eigenvectors [17, 35, 34]. Eynard et al.  proposed finding a common eigenbasis of multiple Laplacians by means of joint approximate diagonalization (JADE). Kovnatsky et al.  improved this method using subspace parametrization. Bronstein et al.  studied the problem of finding closest commuting operators (CCO) and showed its equivalence to joint diagonalization.
One of the main limitations of JADE and CCO problems is the assumption of given bijective correspondence (or more generally, functional correspondence ) between the underlying manifolds or graphs. In this paper, we consider the setting where such correspondence is unknown or may not exist, and instead, one is given a set of corresponding functions. We show a problem similar to CCO, wherein we try to minimally modify the Laplacians such that the corresponding heat kernels behave consistently. In the limit case with given bijective correspondence, this heat kernel coupling problem is equivalent to Laplacian averaging.
Notation and definitions. Let be two real symmetric matrices. We denote by
the Frobenius norm of . We say that and commute if , and call their commutator
. If there exists a unitary matrixsuch that and are diagonal, we say that are jointly diagonalizable and call such the joint eigenbasis of and . Two matrices are jointly diagonalizable iff they commute.
We denote by
a column vector containing the diagonal elements of matrix, and by a diagonal matrix containing on the diagonal the elements . Furthermore, we use to denote a diagonal matrix obtained by setting to zero the off-diagonal elements of .
Laplacians. Let us be given an undirected weighted graph without loops (i.e., a simple graph ) with vertice set and edges such that for . Let . There are given non-negative weights , satisfying if are not connected (i.e., ). The matrix is called the adjacency matrix and
Hereinafter, we denote by the set of all valid Laplacian matrices of a simple graph , which is defined as follows: iff
Defining the Laplacian according to (1) through the edge weight matrix , we automatically get properties (i) - (iii) satisfied. The other way round: Any valid Laplacian of a simple graph - in the sense of (i)-(iii) - gives rise to a weight matrix of a simple weighted graph by defining .
For numerical purposes, we will make use of a proper parametrization of the set of valid Laplacians. Let denote the number of edges of . For we define the weight matrix by
Defining as in (1), the requirements (i)-(iii)of a valid Laplacian are satisfied. In undirected weighted graph, the matrices and are symmetric. Furthermore, is positive semi-definite. Consequently, admits the unitary eigendecomposition with orthonormal eigenvectors
and real eigenvalues, .
Heat diffusion on graphs. Let denote a function defined on the vertex set of the graph. We can identify with an -dimensional vector , and denote by the space of such functions.
Similarly to the standard heat diffusion equation, one can define a diffusion process on , governed by the following equation:
where the solution is the amount of heat at time at the vertices . The solution of the heat equation is given by , where
is the heat operator (or the heat kernel).
3 Multimodal spectral geometry
Consider two graphs with the same vertices and edges with different weights . We denote their respective Laplacians by . Such graphs are referred to as multi-level graphs , and are used to represent multiple modalities or ‘views’ of the same data. 111For simplicity, we consider only two modalities, though extension to more modalities is straightforward. The main topic of this paper is how to re-define the above spectral geometric constructions (heat kernels, diffusion distances, etc.) in a way that they account for information from both graphs.
3.1 Laplacian averaging
3.2 Joint diagonalization
Instead of averaging the Laplacians, Eynard et al. 
proposed ‘averaging’ their eigenspaces by means of a joint diagonalization approach: construct a common (approximate) eigenbasisthat (approximately) diagonalizes the Laplacians , by the following minimization
where denotes the squared norm of the off-diagonal elements of a matrix . The joint basis obtained in this way satisfies . The approximate matrices
obtained by setting to zero the off-diagonal elements of are jointly diagonalizable by . Importantly, in most cases , i.e., Laplacian structure does not survive joint diagonalization.
3.3 Closest commuting operators
In , we considered a different problem of finding a pair of commuting matrices (referred to as closest commuting operators or CCOs) that are closest to the given ,
and showed that this problem is equivalent to JADE (6) in the following sense:
Let be symmetric matrices. Then:
A big advantage of this approach compared to JADE is the possibility to demand that the closest commuting matrices define valid Laplacians, i.e., restrict the search space to :
4 Heat kernel coupling
The methods described in Section 3 rely on the assumption of graphs with equal vertex set, which may be too restrictive in many cases. More generally, we are given two different graphs , where . The correspondence between the vertices is not bijective anymore, but one can consider functional correspondence , represented by the matrix .
Let us consider the heat equation (3) on the graphs . We say that the corresponding heat kernels are strongly coupled if the solution of the heat equation on with some initial condition and the solution of the heat equation on with the corresponding initial condition coincide under the correspondence:
for . The strong coupling condition implies that the structure of the graphs is similar, in the sense that heat flows on them in the same way. 222If the strong coupling condition holds for a set of functions that span the whole , it is equivalent to commutativity of the heat- and the functional correspondence operators, . In the case of bijective correspondence ( and w.l.o.g. ), having the strong coupling condition hold for one value of implies that and thus the weighted graphs are isometric .
If the correspondence is further assumed to be unknown, we have to replace the strong coupling condition (9) with a weak coupling condition
requiring that the projection of the solution on the corresponding functions is equal. Note that while condition (9) compares vectors (which requires the knowledge of correspondence ), condition (10) compares scalars, which does not require the knowledge of the correspondence but rather the pair of corresponding functions and . Obviously, condition (9) implies (10), but not vice versa.
In this weaker setting, we assume that correspondence is unknown, but we have a set of corresponding functions on and represented as columns of matrices and such that . Writing the weak coupling condition (10) for every pair , we get the condition on the equality of matrices
which we consider on a finite set of the values .
Typically, two different graphs will have their heat kernels uncoupled, violating the coupling conditions (see example in Figure 1 (top), where different behavior of the heat equation stems from topological noise). The problem of heat kernel coupling (HKC) treated in this paper is how to minimally modify the Laplacians of the graphs to make the respective heat kernels (approximately) satisfy the weak coupling condition by enforcing (11); in Figure 1 (bottom) such a modification amounts to disconnecting the rings in both graphs.
Our HKC problem bears resemblance to the CCO problem described in Section 3: we are looking for new graphs with respective adjacency matrices , such that the new Laplacians are as close as possible to the original , and the corresponding new heat operators are as coupled as possible,
It is important to observe the following limit case: for graphs with equal vertex and edge sets discussed in Section 3, we have bijective correspondence between the entries of the heat operators , implying . In the limit , we have , from which it follows that . Thus, the HKC problem (12) boils down to the simple Laplacian averaging (5), and can be considered an extension of this technique to the setting where one cannot straightforwardly average Laplacians since the correspondence between the graphs is not given.
5 Numerical optimization
Its solution is carried out using standard optimization techniques, requiring the gradient of the cost function.
We differentiate the cost function (13) w.r.t. the edge weights constituting the vectors , accounting for the symmetric structure of . The gradient of the distance term is given by
where is an matrix with equal columns containing the diagonal of .
The gradient of the coupling term is computed by applying the chain rule several times, as follows. First, let
be a matrix containing only four non-zero elements in its th and th row and column. Second, for each and compute the matrix exponent
and extract its upper right block, which we denote by . Finally,
In this section, we demonstrate our HKC approach on several synthetic and real datasets coming from shape analysis, manifold learning, and pattern recognition problems. The experiments closely follow our previous work , and their leitmotif is, given two datasets representing similar objects in somewhat different ways, to reconcile the information of the two modalities producing a single consistent representation. We should stress that though we know the groundtruth correspondence between the vertices of the graphs representing different modalities, we are not using it in our HKC problem. Instead, we only assume to be given few corresponding functions that are used to couple the heat kernels.
Circles. We used two graphs shaped as two eccentric circles , containing 64 points and having different connectivity (Figure 1, top). We used four corresponding functions in the HKC optimization. The closest Laplacians that produce coupled heat kernels result in edge weights shown in Figure 1 (bottom): the optimization performs a ‘surgery’ disconnecting the inconsistent connections and producing two connected components.
Ring. We used a ring and a cracked ring sampled at 70 points and connected using four nearest neighbors (Figure 2, top and bottom). Three functions only were used for coupling (Figure 2, three leftmost columns). Because of the topological difference, the behavior of the heat flow differs dramatically (Figure 2, fourth column from left) The HKC optimization cuts the connections in the first graph, making the two rings topologically equivalent and resulting in the same heat flow (Figure 2, rightmost)
Man. We used two poses of the human shape from the TOSCA dataset , uniformly sampled at 500 points and connected using five nearest neighbors. The resulting graphs have different topology (the hands are connected or disconnected, compare Figure 3 top and bottom), resulting in a very different heat flow. Two functions were used for coupling (Figure 3, two leftmost columns) in our HKC problem; our optimization disconnects these links (Figure 3, right) making the heat flow in both cases behave similarly.
NUS. We used a subset of the NUS-WIDE dataset  containing images (represented by 64-dimensional color histograms) and their text annotations (represented by 1000-dimensional distributions of most frequent tags) from seven classes. The classes were selected on purpose in order to be ambiguous in different modalities: for example, in the Tags modality underwater tigers can be similar both to tigers and water animals, as they share many tags. On the other hand, in the Color modality tigers may be similar to the class of nature, containing images with orange-yellow autumn colors .
In each modality, we used Laplacians with Gaussian weights and 25 nearest neighbors, computed with self-tuning scales. Seven functions were used for coupling. We computed diffusion distances (4) on the original and the modified graphs, and used them to rank the dataset entries in a leave-one-out retrieval experiment. Retrieval performance was evaluated using mean average precision , where is the relevance of a given rank (one if it belongs to the same class of the query and zero otherwise), is the number of retrieved results, and is precision at , defined as the percentage of relevant results in the first top-ranked retrieved matches. Respectively, recall is defined as the percentage of relevant results in the first top-ranked retrieved matches out of all items belonging to the query class.
Figure 4 shows the precision-recall curve of different methods, and Table 1 summarizes the mean average precision. We can see that after HKC optimization, performance increases significantly, outperforming each modality on its own. Figure 5 shows examples of first matches corresponding to ambiguous queries. For reference only, we show the performance of Laplacian averaging, which however relies on bijective correspondence between the graphs (which is not used in our HKC problem) and is thus not directly comparable.
|Tags only||0.75||82.3 %||78.4 %|
|1.0||81.2 %||77.0 %|
|1.25||79.7 %||76.2 %|
|Color only||0.75||61.8 %||55.7 %|
|1.0||61.6 %||53.5 %|
|1.25||59.6 %||51.5 %|
|HKC Tags||0.75||87.3 %||82.2 %|
|1.0||86.3 %||81.5 %|
|1.25||84.4 %||80.3 %|
|HKC Color||0.75||83.2 %||76.2 %|
|1.0||82.3 %||75.6 %|
|1.25||80.6 %||74.6 %|
|Average||0.75||68.7 %||64.2 %|
|1.0||66.5 %||61.0 %|
|1.25||63.6 %||58.8 %|
We showed the heat kernel coupling problem, whereby we seek to minimally modify a pair of Laplacians to make the corresponding heat kernels to become coupled, such that the solution of a heat equation on two graphs behaves consistently. This problem generalizes simple Laplacian averaging to the setting when the correspondence between the two graphs is unknown.
This research was supported by the ERC Starting Grant No. 307047 (COMET).
-  X. Alameda-Pineda, V. Khalidov, R. Horaud, and F. Forbes. Finding audio-visual events in informal social gatherings. In Proc. ICMI, 2011.
-  M. Bansal and K. Daniilidis. Joint spectral correspondence for disparate image matching. In Proc. CVPR, 2013.
-  R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In Proc. ICML, 2005.
-  M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:1373–1396, 2002.
-  A. M. Bronstein, M. M. Bronstein, and R. Kimmel. Numerical geometry of non-rigid shapes. Springer, 2008.
-  M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Proc. CVPR, 2010.
-  M. M Bronstein, K. Glashoff, and T. A. Loring. Making Laplacians commute. ArXiv:1307.6549, 2013.
-  X. Cai, F. Nie, H. Huang, and F. Kamangar. Heterogeneous image feature integration via multi-modal spectral clustering. In Proc. CVPR, 2011.
-  J.-F. Cardoso and A. Souloumiac. Jacobi angles for simultaneous diagonalization. SIAM J. Mat. Analysis Appl., 17:161–164, 1996.
-  T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. NUS-WIDE: A real-world web image database from National University of Singapore. In Proc. CIVR, 2009.
-  R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5–30, 2006.
-  R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS, 102(21):7426–7431, 2005.
-  V.R. de Sa. Spectral clustering with two views. In Proc. ICML Workshop on Learning with Multiple Views, 2005.
-  C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proc. ICDM, 2001.
-  X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov. Clustering on multi-layer graphs via subspace analysis on Grassmann manifolds. ArXiv:1303.2221, 2013.
-  D. Eynard, K. Glashoff, M.M. Bronstein, and A.M. Bronstein. Multimodal diffusion geometry by joint diagonalization of Laplacians. ArXiv:1209.2295, 2012.
J. Ham, D. Lee, and L. Saul.
Semisupervised alignment of manifolds.
Proc. Conf. Uncertainty in Artificial Intelligence, 2005.
-  G. Irie, D. Liu, Z. Li, and S.-F. Chang. A Bayesian approach to multimodal visual dictionary learning. In Proc. CVPR, 2013.
-  E. Kidron, Y. Y. Schechner, and M. Elad. Pixels that sound. In Proc. CVPR, 2005.
-  A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and R. Kimmel. Coupled quasi-harmonic bases. Computer Graphics Forum, 32:439–448, 2013.
-  A. Kumar, P. Rai, and H. Daumé III. Co-regularized multi-view spectral clustering. In Proc. NIPS, 2011.
-  C. Ma and C.-H. Lee. Unsupervised anchor shot detection using multi-modal spectral clustering. In Proc. ICASSP, 2008.
-  J. Masci, M. M. Bronstein, A. M. Bronstein, and J. Schmidhuber. Multimodal similarity-preserving hashing. Trans. PAMI, 2014.
-  B. McFee and G. R. G. Lanckriet. Learning multi-modal similarity. JMLR, 12:491–523, 2011.
B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis.
Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators.In Proc. NIPS, 2005.
A. Y. Ng, M. I. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm.In Proc. NIPS, 2001.
-  M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. J. Guibas. Functional maps: A flexible representation of maps between shapes. Trans. Graphics, 31(4), 2012.
-  M. Ovsjanikov, Q. Mérigot, F. Mémoli, and L. Guibas. One point isometric matching with the heat kernel. Computer Graphics Forum, 29(5):1555–1564, 2010.
-  N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In Proc. ICM, 2010.
-  A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In Proc. CVPR, 2012.
-  J. Shi and J. Malik. Normalized cuts and image segmentation. Trans. PAMI, 22:888–905, 2000.
V. Sindhwani, P. Niyogi, and M. Belkin.
A co-regularization approach to semi-supervised learning with multiple views.In Proc. ICML Workshop on Learning with Multiple Views, 2005.
-  W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In Proc. Data Mining, 2009.
-  C. Wang and S. Mahadevan. Manifold alignment using procrustes analysis. In Proc. ICML, 2008.
-  C. Wang and S. Mahadevan. A general framework for manifold alignment. In Proc. Symp. Manifold Learning and its Applications, 2009.
-  Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proc. NIPS, 2008.
-  J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21–35, 2010.