1 Introduction
Many problems in computer vision and pattern recognition boil down to constructing a Laplacian operator describing some data manifold and finding its eigenvectors. Notable examples include spectral clustering
[26], eigenmaps [4], diffusion maps and distances [12, 25], spectral graph partitioning [14], spectral hashing [36], and image segmentation [31].Recently, there has been an increased interest in extending spectral geometric constructions to the multimodal setting, involving two or more data spaces. Many data analysis applications involve observations and measurements of data using different modalities, such as multimedia documents [3, 37, 29, 24, 18, 23], audio and video [19, 1, 30], images with different lighting conditions [2], or medical imaging modalities [6].
Multimodal (or ‘multiview’) clustering was studied in the computer vision and pattern recognition community [13, 22, 33, 8, 21, 15]. Sindhwani et al. [32] used a convex combination of Laplacians in the ‘coregularization’ framework. Manifold alignment considered multiple manifolds as a single space with ‘connections’ between points and tries to find an aligned set of eigenvectors [17, 35, 34]. Eynard et al. [16] proposed finding a common eigenbasis of multiple Laplacians by means of joint approximate diagonalization (JADE). Kovnatsky et al. [20] improved this method using subspace parametrization. Bronstein et al. [7] studied the problem of finding closest commuting operators (CCO) and showed its equivalence to joint diagonalization.
One of the main limitations of JADE and CCO problems is the assumption of given bijective correspondence (or more generally, functional correspondence [27]) between the underlying manifolds or graphs. In this paper, we consider the setting where such correspondence is unknown or may not exist, and instead, one is given a set of corresponding functions. We show a problem similar to CCO, wherein we try to minimally modify the Laplacians such that the corresponding heat kernels behave consistently. In the limit case with given bijective correspondence, this heat kernel coupling problem is equivalent to Laplacian averaging.
2 Background
Notation and definitions. Let be two real symmetric matrices. We denote by
the Frobenius norm of . We say that and commute if , and call their commutator
. If there exists a unitary matrix
such that and are diagonal, we say that are jointly diagonalizable and call such the joint eigenbasis of and . Two matrices are jointly diagonalizable iff they commute.We denote by
a column vector containing the diagonal elements of matrix
, and by a diagonal matrix containing on the diagonal the elements . Furthermore, we use to denote a diagonal matrix obtained by setting to zero the offdiagonal elements of .Laplacians. Let us be given an undirected weighted graph without loops (i.e., a simple graph ) with vertice set and edges such that for . Let . There are given nonnegative weights , satisfying if are not connected (i.e., ). The matrix is called the adjacency matrix and
(1) 
Hereinafter, we denote by the set of all valid Laplacian matrices of a simple graph , which is defined as follows: iff
Defining the Laplacian according to (1) through the edge weight matrix , we automatically get properties (i)  (iii) satisfied. The other way round: Any valid Laplacian of a simple graph  in the sense of (i)(iii)  gives rise to a weight matrix of a simple weighted graph by defining .
For numerical purposes, we will make use of a proper parametrization of the set of valid Laplacians. Let denote the number of edges of . For we define the weight matrix by
(2) 
Defining as in (1), the requirements (i)(iii)of a valid Laplacian are satisfied. In undirected weighted graph, the matrices and are symmetric. Furthermore, is positive semidefinite. Consequently, admits the unitary eigendecomposition with orthonormal eigenvectors
and real eigenvalues
, .Heat diffusion on graphs. Let denote a function defined on the vertex set of the graph. We can identify with an dimensional vector , and denote by the space of such functions.
Similarly to the standard heat diffusion equation, one can define a diffusion process on , governed by the following equation:
(3) 
where the solution is the amount of heat at time at the vertices . The solution of the heat equation is given by , where
is the heat operator (or the heat kernel).
3 Multimodal spectral geometry
Consider two graphs with the same vertices and edges with different weights . We denote their respective Laplacians by . Such graphs are referred to as multilevel graphs [15], and are used to represent multiple modalities or ‘views’ of the same data. ^{1}^{1}1For simplicity, we consider only two modalities, though extension to more modalities is straightforward. The main topic of this paper is how to redefine the above spectral geometric constructions (heat kernels, diffusion distances, etc.) in a way that they account for information from both graphs.
3.1 Laplacian averaging
3.2 Joint diagonalization
Instead of averaging the Laplacians, Eynard et al. [16]
proposed ‘averaging’ their eigenspaces by means of a joint diagonalization approach: construct a common (approximate) eigenbasis
that (approximately) diagonalizes the Laplacians , by the following minimization(6) 
where denotes the squared norm of the offdiagonal elements of a matrix [9]. The joint basis obtained in this way satisfies . The approximate matrices
obtained by setting to zero the offdiagonal elements of are jointly diagonalizable by . Importantly, in most cases , i.e., Laplacian structure does not survive joint diagonalization.
3.3 Closest commuting operators
In [7], we considered a different problem of finding a pair of commuting matrices (referred to as closest commuting operators or CCOs) that are closest to the given ,
(7) 
and showed that this problem is equivalent to JADE (6) in the following sense:
Theorem 3.1.
Let be symmetric matrices. Then:
1. .
A big advantage of this approach compared to JADE is the possibility to demand that the closest commuting matrices define valid Laplacians, i.e., restrict the search space to :
(8) 
4 Heat kernel coupling
The methods described in Section 3 rely on the assumption of graphs with equal vertex set, which may be too restrictive in many cases. More generally, we are given two different graphs , where . The correspondence between the vertices is not bijective anymore, but one can consider functional correspondence , represented by the matrix [27].
Let us consider the heat equation (3) on the graphs . We say that the corresponding heat kernels are strongly coupled if the solution of the heat equation on with some initial condition and the solution of the heat equation on with the corresponding initial condition coincide under the correspondence:
(9) 
for . The strong coupling condition implies that the structure of the graphs is similar, in the sense that heat flows on them in the same way. ^{2}^{2}2If the strong coupling condition holds for a set of functions that span the whole , it is equivalent to commutativity of the heat and the functional correspondence operators, . In the case of bijective correspondence ( and w.l.o.g. ), having the strong coupling condition hold for one value of implies that and thus the weighted graphs are isometric [28].
If the correspondence is further assumed to be unknown, we have to replace the strong coupling condition (9) with a weak coupling condition
(10) 
requiring that the projection of the solution on the corresponding functions is equal. Note that while condition (9) compares vectors (which requires the knowledge of correspondence ), condition (10) compares scalars, which does not require the knowledge of the correspondence but rather the pair of corresponding functions and . Obviously, condition (9) implies (10), but not vice versa.
In this weaker setting, we assume that correspondence is unknown, but we have a set of corresponding functions on and represented as columns of matrices and such that . Writing the weak coupling condition (10) for every pair , we get the condition on the equality of matrices
(11) 
which we consider on a finite set of the values .
Typically, two different graphs will have their heat kernels uncoupled, violating the coupling conditions (see example in Figure 1 (top), where different behavior of the heat equation stems from topological noise). The problem of heat kernel coupling (HKC) treated in this paper is how to minimally modify the Laplacians of the graphs to make the respective heat kernels (approximately) satisfy the weak coupling condition by enforcing (11); in Figure 1 (bottom) such a modification amounts to disconnecting the rings in both graphs.
Our HKC problem bears resemblance to the CCO problem described in Section 3: we are looking for new graphs with respective adjacency matrices , such that the new Laplacians are as close as possible to the original , and the corresponding new heat operators are as coupled as possible,
(12) 
It is important to observe the following limit case: for graphs with equal vertex and edge sets discussed in Section 3, we have bijective correspondence between the entries of the heat operators , implying . In the limit , we have , from which it follows that . Thus, the HKC problem (12) boils down to the simple Laplacian averaging (5), and can be considered an extension of this technique to the setting where one cannot straightforwardly average Laplacians since the correspondence between the graphs is not given.
5 Numerical optimization
We parametrize our problem through the adjacency matrix , where the edge weights are defined according to (2). Problem (12) can be rewritten as
(13) 
Its solution is carried out using standard optimization techniques, requiring the gradient of the cost function.
We differentiate the cost function (13) w.r.t. the edge weights constituting the vectors , accounting for the symmetric structure of . The gradient of the distance term is given by
where is an matrix with equal columns containing the diagonal of .
The gradient of the coupling term is computed by applying the chain rule several times, as follows. First, let
be a matrix containing only four nonzero elements in its th and th row and column. Second, for each and compute the matrix exponent
and extract its upper right block, which we denote by . Finally,
6 Results
In this section, we demonstrate our HKC approach on several synthetic and real datasets coming from shape analysis, manifold learning, and pattern recognition problems. The experiments closely follow our previous work [7], and their leitmotif is, given two datasets representing similar objects in somewhat different ways, to reconcile the information of the two modalities producing a single consistent representation. We should stress that though we know the groundtruth correspondence between the vertices of the graphs representing different modalities, we are not using it in our HKC problem. Instead, we only assume to be given few corresponding functions that are used to couple the heat kernels.
In all the experiments, we used unnormalized Laplacians (1) constructed with Gaussian weights. We used in the cost (13). Optimization was performed using MATLAB optimization toolbox.
Circles. We used two graphs shaped as two eccentric circles , containing 64 points and having different connectivity (Figure 1, top). We used four corresponding functions in the HKC optimization. The closest Laplacians that produce coupled heat kernels result in edge weights shown in Figure 1 (bottom): the optimization performs a ‘surgery’ disconnecting the inconsistent connections and producing two connected components.
Ring. We used a ring and a cracked ring sampled at 70 points and connected using four nearest neighbors (Figure 2, top and bottom). Three functions only were used for coupling (Figure 2, three leftmost columns). Because of the topological difference, the behavior of the heat flow differs dramatically (Figure 2, fourth column from left) The HKC optimization cuts the connections in the first graph, making the two rings topologically equivalent and resulting in the same heat flow (Figure 2, rightmost)
Man. We used two poses of the human shape from the TOSCA dataset [5], uniformly sampled at 500 points and connected using five nearest neighbors. The resulting graphs have different topology (the hands are connected or disconnected, compare Figure 3 top and bottom), resulting in a very different heat flow. Two functions were used for coupling (Figure 3, two leftmost columns) in our HKC problem; our optimization disconnects these links (Figure 3, right) making the heat flow in both cases behave similarly.
NUS. We used a subset of the NUSWIDE dataset [10] containing images (represented by 64dimensional color histograms) and their text annotations (represented by 1000dimensional distributions of most frequent tags) from seven classes. The classes were selected on purpose in order to be ambiguous in different modalities: for example, in the Tags modality underwater tigers can be similar both to tigers and water animals, as they share many tags. On the other hand, in the Color modality tigers may be similar to the class of nature, containing images with orangeyellow autumn colors [16].
In each modality, we used Laplacians with Gaussian weights and 25 nearest neighbors, computed with selftuning scales. Seven functions were used for coupling. We computed diffusion distances (4) on the original and the modified graphs, and used them to rank the dataset entries in a leaveoneout retrieval experiment. Retrieval performance was evaluated using mean average precision , where is the relevance of a given rank (one if it belongs to the same class of the query and zero otherwise), is the number of retrieved results, and is precision at , defined as the percentage of relevant results in the first topranked retrieved matches. Respectively, recall is defined as the percentage of relevant results in the first topranked retrieved matches out of all items belonging to the query class.
Figure 4 shows the precisionrecall curve of different methods, and Table 1 summarizes the mean average precision. We can see that after HKC optimization, performance increases significantly, outperforming each modality on its own. Figure 5 shows examples of first matches corresponding to ambiguous queries. For reference only, we show the performance of Laplacian averaging, which however relies on bijective correspondence between the graphs (which is not used in our HKC problem) and is thus not directly comparable.
Method  Precision@5  mAP  

Tags only  0.75  82.3 %  78.4 % 
1.0  81.2 %  77.0 %  
1.25  79.7 %  76.2 %  
Color only  0.75  61.8 %  55.7 % 
1.0  61.6 %  53.5 %  
1.25  59.6 %  51.5 %  
HKC Tags  0.75  87.3 %  82.2 % 
1.0  86.3 %  81.5 %  
1.25  84.4 %  80.3 %  
HKC Color  0.75  83.2 %  76.2 % 
1.0  82.3 %  75.6 %  
1.25  80.6 %  74.6 %  
Average  0.75  68.7 %  64.2 % 
1.0  66.5 %  61.0 %  
1.25  63.6 %  58.8 % 
7 Conclusions
We showed the heat kernel coupling problem, whereby we seek to minimally modify a pair of Laplacians to make the corresponding heat kernels to become coupled, such that the solution of a heat equation on two graphs behaves consistently. This problem generalizes simple Laplacian averaging to the setting when the correspondence between the two graphs is unknown.
8 Acknowledgement
This research was supported by the ERC Starting Grant No. 307047 (COMET).
References
 [1] X. AlamedaPineda, V. Khalidov, R. Horaud, and F. Forbes. Finding audiovisual events in informal social gatherings. In Proc. ICMI, 2011.
 [2] M. Bansal and K. Daniilidis. Joint spectral correspondence for disparate image matching. In Proc. CVPR, 2013.
 [3] R. Bekkerman, R. ElYaniv, and A. McCallum. Multiway distributional clustering via pairwise interactions. In Proc. ICML, 2005.
 [4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:1373–1396, 2002.
 [5] A. M. Bronstein, M. M. Bronstein, and R. Kimmel. Numerical geometry of nonrigid shapes. Springer, 2008.
 [6] M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through crossmodality metric learning using similaritysensitive hashing. In Proc. CVPR, 2010.
 [7] M. M Bronstein, K. Glashoff, and T. A. Loring. Making Laplacians commute. ArXiv:1307.6549, 2013.
 [8] X. Cai, F. Nie, H. Huang, and F. Kamangar. Heterogeneous image feature integration via multimodal spectral clustering. In Proc. CVPR, 2011.
 [9] J.F. Cardoso and A. Souloumiac. Jacobi angles for simultaneous diagonalization. SIAM J. Mat. Analysis Appl., 17:161–164, 1996.
 [10] T.S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.T. Zheng. NUSWIDE: A realworld web image database from National University of Singapore. In Proc. CIVR, 2009.
 [11] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5–30, 2006.
 [12] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, F. Warner, and S. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS, 102(21):7426–7431, 2005.
 [13] V.R. de Sa. Spectral clustering with two views. In Proc. ICML Workshop on Learning with Multiple Views, 2005.
 [14] C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A minmax cut algorithm for graph partitioning and data clustering. In Proc. ICDM, 2001.
 [15] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov. Clustering on multilayer graphs via subspace analysis on Grassmann manifolds. ArXiv:1303.2221, 2013.
 [16] D. Eynard, K. Glashoff, M.M. Bronstein, and A.M. Bronstein. Multimodal diffusion geometry by joint diagonalization of Laplacians. ArXiv:1209.2295, 2012.

[17]
J. Ham, D. Lee, and L. Saul.
Semisupervised alignment of manifolds.
In
Proc. Conf. Uncertainty in Artificial Intelligence
, 2005.  [18] G. Irie, D. Liu, Z. Li, and S.F. Chang. A Bayesian approach to multimodal visual dictionary learning. In Proc. CVPR, 2013.
 [19] E. Kidron, Y. Y. Schechner, and M. Elad. Pixels that sound. In Proc. CVPR, 2005.
 [20] A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and R. Kimmel. Coupled quasiharmonic bases. Computer Graphics Forum, 32:439–448, 2013.
 [21] A. Kumar, P. Rai, and H. Daumé III. Coregularized multiview spectral clustering. In Proc. NIPS, 2011.
 [22] C. Ma and C.H. Lee. Unsupervised anchor shot detection using multimodal spectral clustering. In Proc. ICASSP, 2008.
 [23] J. Masci, M. M. Bronstein, A. M. Bronstein, and J. Schmidhuber. Multimodal similaritypreserving hashing. Trans. PAMI, 2014.
 [24] B. McFee and G. R. G. Lanckriet. Learning multimodal similarity. JMLR, 12:491–523, 2011.

[25]
B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis.
Diffusion maps, spectral clustering and eigenfunctions of FokkerPlanck operators.
In Proc. NIPS, 2005. 
[26]
A. Y. Ng, M. I. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm.
In Proc. NIPS, 2001.  [27] M. Ovsjanikov, M. BenChen, J. Solomon, A. Butscher, and L. J. Guibas. Functional maps: A flexible representation of maps between shapes. Trans. Graphics, 31(4), 2012.
 [28] M. Ovsjanikov, Q. Mérigot, F. Mémoli, and L. Guibas. One point isometric matching with the heat kernel. Computer Graphics Forum, 29(5):1555–1564, 2010.
 [29] N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to crossmodal multimedia retrieval. In Proc. ICM, 2010.
 [30] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In Proc. CVPR, 2012.
 [31] J. Shi and J. Malik. Normalized cuts and image segmentation. Trans. PAMI, 22:888–905, 2000.

[32]
V. Sindhwani, P. Niyogi, and M. Belkin.
A coregularization approach to semisupervised learning with multiple views.
In Proc. ICML Workshop on Learning with Multiple Views, 2005.  [33] W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In Proc. Data Mining, 2009.
 [34] C. Wang and S. Mahadevan. Manifold alignment using procrustes analysis. In Proc. ICML, 2008.
 [35] C. Wang and S. Mahadevan. A general framework for manifold alignment. In Proc. Symp. Manifold Learning and its Applications, 2009.
 [36] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proc. NIPS, 2008.
 [37] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint wordimage embeddings. Machine Learning, 81(1):21–35, 2010.
Comments
There are no comments yet.