1 Introduction
Physical diffusion describes how energy, mass, or substances spread over time — how their densities smoothen out
in a medium. Simulating physical diffusion on a Euclidean space, a manifold, or their discrete approximations, e.g., grids or graphs, has application in image processing, computer vision, and machine learning. For instance, diffusion is now a standard tool for removing noise or to highlight salient structures
[32]. The graph Laplacian, as a discrete approximation of the generator of the diffusion process on manifolds, i.e., the LaplaceBeltrami operator, is commonly used in spectral clustering and semisupervised learning, which finds applications in object recognition
[7, 33][10], and segmentation and matting [3, 25]. Similarly, stochastic diffusion process on graphs find application in multilabel classification [30] and image retrieval [12].In these applications, typically we are given a set of objects and corresponding assignments of variables at time . Then, (simulated) diffusion models how smooths over . For instance, when denotes vertices of a mesh, is the coordinate representations of in an embedding space , leading to mesh fairing. More generally, if denotes noisy observations of data points lying on a manifold, diffusion leads to manifold denoising. If represents class labels of data points in , diffusion leads to label propagation and facilitates semisupervised learning. In this case, is assumed to be a sample from an underlying classification function on (i.e., ).
Diffusion is determined by the initial condition and the diffusivity defined on or . Roughly, the diffusivity describes the direction and strength of (and equivalently ) being smoothed at each time instance . In general, the diffusivity is inhomogeneous as it varies over , and is anisotropic as its strength varies over different directions at each point . For instance, in image processing, diffusivity is strong in flat regions but weaker on edges. Further, on an edge, diffusivity is stronger along the direction of edges than across it. This leads to edgepreserving image smoothing as pioneered by Weickert [32].
For graph data, diffusion can be seen as label propagation in semisupervised learning. Thus far, label propagation has mainly focused on isotropic diffusion (i.e., the diffusivity is fixed on the entire data space and all directions at each point therein), and only recently has anisotropic diffusion been explored: Coifman and Lafon [5] apply anisotropic diffusion to the graphbased dimensionality reduction problem. They control diffusivity by normalizing the (originally isotropic) pairwise similarity with the evaluations of diffused coordinate values. Szlam et al. [29] generalizes and extends this framework to semisupervised learning by controlling diffusivity via evaluations of class labels : If and are similar, i.e., if the class labels of and are likely to be the same, then diffusivity along the edge joining them is high. Otherwise, diffusivity becomes low, which prevents label propagation across class boundaries. This leads to significant performance improvement over classical isotropic diffusion. Kim et al. [21]
proposed adapting diffusivity on Riemannian manifolds based on local curvature estimates: Diffusivity is strong in flat regions and weak along the direction of the curvature operator, which leads to an awareness of intersections between manifolds and so improves performance over isotropic equivalents. However, this requires the data
to be embedded in an ambient Euclidean space, and so does not apply to inference on general graphs.We propose two contributions for anisotropic diffusion on graphs. First, we analyze continuous anisotropic diffusion processes on smooth manifolds, and show that anisotropic diffusion is nothing more than isotropic diffusion on a manifold with a new metric. Based on this analysis, we arrive at a new anisotropic graph Laplacian approach which is similar to the stochastic kernel smoothing approach of Szlam et al. [29], but with a new geometric intuition. This provides explicit criteria to define valid diffusivities on graphs and manifolds, and it facilitates nonlinear diffusion on graphs. Second, we explore two possible operators which control diffusivity of each edge based on local neighborhood contexts and not just their end vertices. This contextguided diffusion extends to graphs the robust diffusion algorithm originally developed for image enhancement [32], and we demonstrate on 11 different classification problems that this improves semisupervised learning performance over isotropic diffusion, the stochastic anisotropic diffusion of Szlam et al. [29], and three existing label propagation algorithms [37, 11, 31].
To assist readers and subsequent development, we make our code available on the web.
2 Anisotropic diffusion on graphs
We develop anisotropic analogs to the existing isotropic diffusion process and to the corresponding graph Laplacian. We also introduce contextguided diffusion for semisupervised learning. These contributions are based on the analysis of the continuous positive definite diffusivity operators on Riemannian manifolds, which we leave for Sec. 3.
Existing works [35, 17] establish the (isotropic) graph Laplacian as a discrete approximation of the LaplaceBeltrami operator on a data manifold. We build upon these works to develop isotropic and anisotropic graph Laplacians by combining local diffusivity operators defined on subgraphs centered at each data point. As such, first, we explain existing approaches.
Discrete isotropic diffusion.
A weighted graph consists of sets of nodes of size , edges , and nonnegative similarities for each edge , with if .
For subsequent definition of diffusivity operators based on local gradients and divergences, we need spaces with defined inner products (i.e., Hilbert spaces), and so we introduce spaces and of functions on and , with inner products defined as [35, 17]:
(1)  
(2) 
where and is the degree of node :
(3) 
For each node , a subgraph centered at is defined as the set of nodes that are connected to and the corresponding edges, i.e., , , and are obtained by evaluating at . The innerproduct structures on and are induced as restrictions of the corresponding structures on the entire graph to the subgraph , which we denote by and , respectively. Given these structures, we define discrete gradient and divergence operators at . First, the graph gradient operator is defined as the collection of differences along the edges:
(4) 
for and . The graph divergence operator is defined as the formal adjoint of : for all :
(5) 
By combining the local gradient and divergence operators, we can construct the global normalized graph Laplacian :
(7) 
Our definition of the graph Laplacian is consistent with [35, 17]. In particular, at the th node, it is explicitly given as:
(8) 
If the nodes of are sampled from an underlying data generating manifold
, i.e., the probability distribution
is supported in , the graph Laplacian converges to the LaplaceBeltrami operator on as [17, 1]. This is often regarded as the reason for using graph Laplacian as a regularizer in many applications: The seminorm induced by is equivalent to the norm of the gradient of a function on (see Sec. 3). Then, is obtained as a discrete approximation of the firstorder regularizer on graphs. Further, is the generator of isotropic diffusion process on and accordingly, is also a discrete approximation of the isotropic diffusion generator on .Anisotropic diffusion on graphs.
Next, we extend isotropic graph Laplacian to be anisotropic. Our derivation is based on Weickert’s definition on positive definite (PD) diffusivity operators on [32]. In Section 3, we introduce an extension of these operators to general Riemannian manifolds and, based on that, establish a rigorous connection between our anisotropic diffusion process on and that of the data generating manifold .
First, we formally introduce the local diffusivity operator :
(9) 
where is the tensor product and the basis function is defined as the indicator of , i.e., . Similar to the construction of diffusivity operators on [32], our diffusivity operators are constructed based on its spectral decomposition:
is an eigenvalue of the operator
corresponding to the eigenfunction
. This enables us to straightforwardly define a globally PD diffusivity operator on : Our global diffusivity operator is obtained by identifying as the restriction of on . In this case, is positive definite if and only if is symmetric and positive, i.e., . Furthermore, is uniformly PD if all eigenvalues are lowerbounded by a positive constant .Now we are ready to define an anisotropic diffusion process on . We construct an anisotropic graph Laplacian:
(10) 
where the equality in the second line is obtained by substituting Eqs. 4, 5, and 9 into the first line.
Except for the normalization term in , the construction of is identical to the isotropic graph Laplacian case: The original weights are replaced by new weights :
(11) 
Given the anisotropic graph Laplacian , we can define the corresponding anisotropic diffusion process on . For instance, for label propagation applications, we propose using the explicit Euler approximation (cf. Eq. 20 for the continuous counterpart):
(12) 
where denotes the value of at time and is the time discretization interval. The uniform positive definiteness of the diffusivity operators is crucial to the wellposedness of the corresponding diffusion process in [32]. The same applies to the positive definiteness of our discrete diffusivity operator : This is the only way that is a conditionally PD matrix and therefore it can be a valid regularizer on :
(13) 
where : For simplicity, we assume that is a scalar. When is a vector, e.g., for multiclass classification, is summed over the output dimensions. If is fixed throughout diffusion, the difference equation (12) is linear and the corresponding analytical solution exists for any and given . However, in general, depends on (e.g., Eq. 15) and so Eq. 12 becomes nonlinear, where the solution can be obtained by iterating updating with the right side of Eq. 12.
Anisotropic diffusion for semisupervised learning.
With proper choices of , our diffusion equation (Eq. 12) can be used in various applications including label propagation for semisupervised learning. Assume we are given a set of data points where only the first data points are provided with the groundtruth class labels . Our goal is to propagate these labels to the entire dataset . We approach this problem by first building a graph with:
(14) 
where is the nearest neighborhood of and is a hyperparameter. Then, we diffuse the labels on . Specifically, our label propagation algorithm adopts the approach of Zhou et al. [34]: For a class classification problem, each label is given as a dimensional row vector. When the groundtruth class of is , the elements of are all zero except for the th element that is assigned with one: . The label propagation is then performed by building the initial where th row is if is labeled () and , otherwise, and running the difference equation (explicit Euler scheme; Eq. 12) until the stopping criteria is met: As suggested by the form of regularizer , similarly to the isotropic graph Laplacian, the only nullspace of anisotropic graph Laplacian is the space of constant functions. This implies that the difference equation (Eq. 12) converges to a constant function as . Accordingly, for practical applications, we stop diffusion at a finite time step and obtain the resulting function as the output. The final class label for data point is obtained as for each .
The best choice for the eigenvalues of the diffusivity operator depends on the application. Intuitively, the diffusivity should be high when the corresponding function evaluations and are similar, i.e., is small. One way to define such diffusivity is to use a Gaussian weight function as is common in image enhancement:
(15) 
where is the scale hyperparameter. Algorithm 1 shows pseudocode to construct the corresponding anisotropic graph Laplacian on .
The resulting anisotropic graph Laplacian can be immediately applied to any labelpropagation problems. However, for semisupervised learning algorithm, naïvely applying to the difference equation (12) may require many iterations before it actually starts propagating labels. The progress of diffusion can be very slow in the early stage ( is small) at the vicinity of labeled points: If a point is labeled and are all unlabeled (this is typically the case for semisupervised learning), the corresponding eigenvalues (Eq. 15) are all small, and accordingly, the weights are also small for all . To speed up the process, we run the isotropic diffusion (with the isotropic graph Laplacian ) and smooth out the initial distribution of . For all experiments, the initial diffusion runs for 20 time steps while the length of the anisotropic diffusion is regarded as a hyperparameter.
Discussion.
Our derivation of anisotropic graph Laplacian is strongly connected to the kernelbased anisotropic diffusion approach of Szlam et al. [29], yet the motivating ideas are different: their anisotropic kernel is based on stochastic Markov diffusion processes on graphs, while our anisotropic graph Laplacian is obtained based on a formulation of geometric diffusion on manifolds: is obtained by extending Weickert’s diffusivity operators in [32] to and then discretizing it onto a graph (see Sec. 3).
Since the kernel smoothing corresponds to calculating analytic solution at each time step of diffusion, and our anisotropic weights used in constructing can be regarded as an instance of such kernels, the final diffusion algorithms of Szlam et al. [29] and ours are very similar when applied to linear diffusion: Kernel smoothing is given by first obtaining the continuous Gaussian smoothing as an analytical solution of the linear diffusion equation, and then discretizing it, while our explicit Euler scheme is obtained by directly discretizing both the manifold and the LaplaceBeltrami operator. In preliminary linear diffusion experiments, minor differences in weights normalization^{1}^{1}1In , the normalization coefficients are constructed from (see Eq. 10), while the diffusion kernel in Szlam et al. [29]
is normalized so that it leads to a stochastic matrix.
led to only negligible differences in semisupervised learning performances.The major differences between the two diffusion algorithms are that 1) our algorithm is nonlinear, i.e. depends on at each time , while the anisotropic kernel of [29] is obtained as an analytic solution of linear diffusion equation and therefore is fixed a priori to the entire diffusion process. In our experiments, we demonstrate that extending the approach of Szlam et al. [29] to nonlinear diffusion already significantly improves semisupervised learning performance. Furthermore, unlike Szlam, 2) our construction explicitly states sufficient conditions ( are symmetric and positive) for the wellposedness of the resulting diffusion on as a discretization of the underlying manifold. This enables exploring various possibilities of inducing new diffusion on .
2.1 Contextguided diffusion.
We have seen how defining positive eigenvalues leads to a PD diffusivity operator and to the corresponding anisotropic graph Laplacian . This can be regarded as updating the similarity measure between data points in : The isotropic graph Laplacian matrix is constructed from the positive weights which are the pairwise similarities of data points measured by the original Euclidean metric of (see Eq. 14). By construction, the information in is precisely the same as the pairwise similarities and, therefore, defining a graph Laplacian corresponds to defining a similarity measure. Now, defining the anisotropic diffusivity operator , which is constructed based on the original similarity measure plus the eigenvalues , can be interpreted as introducing a new similarity measure on .^{2}^{2}2This intuition holds rigorously on the LaplaceBeltrami operator on a Riemannian manifold : 1) Indeed, uniquely defines a Riemannian metric on [27] and 2) Section 3 shows that defining a diffusivity operator on corresponds to defining the corresponding new metric .
In particular, we have seen how the Gaussian function (Eq. 15) measures the deviation between the two function evaluations and as each edge . This is only an example and there are various possibilities given the positivity constraint. Furthermore, does not have to defend only based on and and it can take into account the neighborhood context as well. For instance, spatially smoothing the diffusivity operator, e.g., by convolving it with a Gaussian kernel, leads to much more stable image enhancement than using the original diffusivity operators (which is commonly constructed based on gradient vectors): Theoretically, the smoothing operation guarantees the wellposedness of the resulting diffusion equation even when the corresponding original version is not. From a practical perspective, this operation offers robustness against noise in the image since the gross effect of smoothing the diffusivity is to take the spatial averaging of the gradients of [32].
The spatial smoothing of the diffusivity operator can be regarded as an instance of controlling the diffusivity based on local context. We investigate two possibilities of exploiting this local context. The first case is to adapt the idea of Gaussian smoothing on images to graphs: For a given edge and the corresponding local neighborhoods at each end node, and , the smooth diffusivity is obtained based on weighted averages of the diffusivities in the mutual neighborhood .
(16) 
where and . The interpretation of our smooth diffusivity is straightforwardly transferred from the smooth diffusivity operators in the image domain: The resulting diffusion process is robust against noise in edge weights.
Another example of exploiting the context is to adopt the intuitive notion of matching between the two entities in context: If a pair of objects and matches, then often spatial neighbors of , have the corresponding matching elements in their neighborhoods of , i.e., the match of is supported if the neighborhoods of and find matches in each pair of elements. Our local match diffusivity is defined as a smooth version of considering this match context:
(17) 
where . The in the definition of implies that if there’s any entity in that matches , the corresponding diffusivity between and is supported. The normalization factor is actually obtained as times the maximum possible value of (which corresponds to the match case) which is 1 (Eq. 15).
3 Connection to continuous operators
As we have seen in Sec. 2.1, our anisotropic diffusion process on is nothing more than isotropic diffusion on a new graph (regularizationform definition of in Eq. 13, and corresponding diffusion process in Eq. 12) — our (discrete) diffusivity operator (Eq. 9) changes the notion of similarity. In this section, first, we show that this intuition applies to the continuous limit case of LaplaceBeltrami operator on a data generating manifold , i.e., anisotropic diffusion on is isotropic diffusion with a new metric. Then, we discuss the convergence properties of our anisotropic graph Laplacian to the continuous anisotropic LaplaceBeltrami operator.
Anisotropic diffusion on Riemannian manifolds.
On a Riemannian manifold with being a Riemannian metric on , the isotropic diffusion of a smooth function
is described as a partial differential equation:
(18) 
where is the gradient of , is the formal adjoint of , and is the LaplaceBeltrami operator defined by .
If we extend Weickert’s diffusivity operator originally defined on [32] to a manifold , then we introduce a smooth positive definite operator with being the tangent bundle of , i.e., is a smooth field of symmetric positive definite operators each defined on a tangent space at . The corresponding anisotropic diffusion process is given as:
(19) 
Defining an anisotropic Laplacian operator , we restate Eq. 19 similarly to the isotropic case:
(20) 
We show that our anisotropic diffusion (Eq. 20) boils down to isotropic diffusion on with a new metric :
Proposition 1 (The equivalence of and ).
The anisotropic Laplacian operator on a compact Riemannian manifold is equivalent to the LaplaceBeltrami operator on with a new metric depending on . Specifically, when the diffusivity operator is uniformly positive definite, is explicitly obtained as , where , , and are the coordinate representations (matrices) of , , and at each point , and which is a smooth function on .
Proof.
The proof is obtained by applying the techniques developed for analyzing maps between general weighted manifolds [14]. For any function , we have:
(21) 
where is the natural volume element [23] corresponding to () and the second equality is obtained by applying the divergence theorem on . The third equality corresponds to the definition of gradient based on the differential operator [23]. Applying Green’s theorem to , we obtain:
(22) 
Now, identifying the two integrals, and using and , we obtain
(23)  
(24) 
It is always possible to find a coordinate representation of the Riemannian metric at each point such that it becomes Euclidean (up to second order) [19]. This implies that, up to scale,^{3}^{3}3Note that the ratio is coordinate independent. the metric in Eq. 24 boils down to wellestablished Mahalanobis distance, with being the corresponding covariance matrix in . This greatly helps to understand of the anisotropic diffusion process. For any PD diffusivity operator , there is a corresponding isotropic LaplaceBeltrami operator on . If we discretize in time the differential equation of the isotropic diffusion process (Eq. 18) on (see [18] for derivation):
(25) 
then the solution at time , is obtained as the minimizer of the following regularization energy:^{4}^{4}4This applies even when Eq. 25 is nonlinear, i.e. depends on .
(26) 
which is now equivalent to:
(27) 
Accordingly, the anisotropic diffusion process (Eq. 19) can be regarded as continuously solving a regularized regression problem where the regularizer penalizes at each point , the firstorder deviation heavily along the direction where the covariance matrix is less spread, i.e. the corresponding diffusivity is weak along that direction.
This perspective provides a connection to the problem of inducing anisotropic diffusion as a special instance of metric learning on Riemannian manifolds and, as the corresponding discretization, learning a graph structure from data. See [2] for an example of datadriven graph construction which relies on the known dimensionality of the underlying manifold.
On the convergence of to .
It is well known that when data points are generated from an underlying Riemannian manifold embedded in an ambient Euclidean space, the isotropic graph Laplacian on converges to the LaplaceBeltrami operator on as , with the neighborhood size controlled accordingly [1, 17]. However, despite its strong connection to the (continuous) anisotropic Laplacian on , our discrete anisotropic graph Laplacian is not by itself, consistent, i.e. it does not converge to as . This is because, by design, our diffusivity operator is agnostic to the dimensionality of the manifold . To elaborate this further, note that given fixed data points and the corresponding local neighborhood size , our local diffusivity operator at (Eq. 9) defines a (new) innerproduct in :
(28) 
The convergence of to requires a certain form^{5}^{5}5Although and , the convergence of to cannot be uniquely defined (see [16] for details) and therefore the convergence of (which depends on ) to is also not uniquely defined. of convergence of to at each . In particular, the continuum limit (as ) of should induce an innerproduct on . However, in general, cannot induce any inner product since
has infinite degrees of freedom (i.e.,
has infinitely many parameters): has eigenvalues and as . Actually, for a given fixed with corresponding , can be defined as the restriction of on . On the other hand, the continuous diffusivity operator on has only up to degrees of freedom with being the dimensionality of . This implies that cannot be a bilinear operator on . Actually, this is the only property that prevents being an innerproduct: By construction, the limit of is nonnegative and positive definite.The relation between and is exactly the same as the relationship between the innerproduct in the Euclidean space and a nonlinear positive definite kernel as commonly used in kernel machines: induces a similarity measure on . However, in general, it is not bilinear and therefore it does not corresponds to an innerproduct. Instead, induces an innerproduct in a (potentially infinitedimensional) feature space which is mapped by a nonlinear function .
This insight leads to an algorithm to build consistent local graph diffusivity operators (and the corresponding global operator ) by reducing the degree of freedom of each from to . In the accompanying supplemental material, we show how can be explicitly constructed and it converges to .
Discussion.
While the consistent diffusivity operators might be of theoretical interest and may deserve further analysis, in this paper we focus on using the inconsistent diffusivity operator (Eq. 9). This design choice is made based on two facts: 1) In general, estimating the dimensionality of a manifold and the corresponding tangent bundle based on a finite sample are difficult problems [20]. Therefore, existing approaches that involve estimating make it a hyperparameter. Optimizing many hyperparameters is a difficult problem in semisupervised learning due to the limited number of labeled points. 2) More importantly, some semisupervised learning problems are inherently formulated as an inference on a graph that may not have any explicit connection to a manifold or the corresponding ambient space. For instance, if each node represents an image, and if each edge and corresponding weight represents the possibility of match and match score between and , respectively, then there is no natural manifold or ambient space structure defined on . Accordingly, our algorithm is obtained as a design choice that favors general applicability over theoretical consistency.
Lastly, we would like to add that it is tempting to build a consistency argument based on the fact that any graph with positive weights can be embedded into a manifold with a sufficiently highdimensionality , and therefore any data and the corresponding PD graph diffusivity operator can be regarded as a sample from such a manifold and the operators on , respectively. Unfortunately, this does not lead to a useful interpretation.
4 Experiments
Algorithm  USPS  BCI  MNIST  COIL1  COIL2  RealSim  Pcmac  MPEG7  SWDLEAF  ETH80  CPASCAL  Avg. % 

8.76  41.60  10.65  7.32  4.37  23.61  11.77  3.36  2.39  11.49  54.54  148.1  
[29]  5.55  41.80  8.47  7.36  4.11  25.02  12.58  3.01  2.54  11.30  54.47  137.0 
4.48  39.53  7.62  6.85  2.98  23.46  11.88  2.63  2.47  9.91  52.22  120.8  
4.31  42.00  7.55  6.48  2.22  19.55  11.47  2.54  2.17  10.05  51.19  111.7  
3.93  42.13  7.18  6.21  2.13  20.08  11.34  2.59  2.33  10.01  51.30  110.5  
GRF [37]  6.13  42.68  10.96  4.93  1.65  28.09  11.78  2.96  2.76  12.16  61.91  127.6 
FLAP [11]  5.66  44.63  10.99  6.97  2.73  20.08  14.49  2.16  2.84  12.59  57.97  131.0 
LNP [31]  7.27  44.33  13.25  5.53  3.12  16.02  14.39  N/A  N/A  11.94  62.36  139.1 
We evaluate our anisotropic diffusion algorithm in classification on seven standard semisupervised learning datasets [15, 36, 4] and four object recognition datasets for which semisupervised learning has been successful in the literature in retrieval contexts. We report performance for isotropic diffusion and the original kernel smoothingtype anisotropic diffusion approach of Szlam et al. [29]. We also report the performances of three existing semisupervised learning algorithms including Zhu et al.’s Gaussian random fields (GRFs)based algorithm [37], Gong and Tao’s label propagation algorithm (FLAP: Fick’s Law Assisted Propagation, [11]) inspired by Fick’s first law which describes the diffusion process at a steady state, and Wang and Zhang’s [31] linear neighborhood propagation (LNP) algorithm which automatically determines the edge weights by representing each input point based on a convex combination of its neighbors [31].
Datasets.
The MPEG7 shape dataset [22] consists of 1,400 images which show silhouettes of objects from 70 different categories. Adopting the experimental setting for data retrieval experiments [6], with 280 labels, we use shape matching [12] to infer pairwise distances from which the (isotropic) weight matrix is constructed. In this dataset, each data point in is not explicitly presented and so the data generating manifold is not explicitly considered. Our algorithm is applicable even in this case, which justifies the use of the inconsistent diffusivity operator.^{6}^{6}6For consistent diffusivity operators, we would have to explicitly estimate the dimensionality of the data manifold; see Sec. 3.
The ETH80 dataset consists of 3,280 photographs of objects from 8 different classes [24]. The CPASCAL dataset (as a subset of the PASCAL VOC challenge 2008 data, where single objects are extracted based on bounding box annotations) contains 4,450 images of 20 classes [9]. For both ETH80 and CPASCAL datasets, each data point is represented based on the HOG (histogram of oriented gradients) descriptors and the number of labels are set to 50 [8]. The SWDLEAF (Swedish leaf) datasets contains 15 different tree species with 75 leaves per species [28]. For this dataset, we use 50 labels per class, with Fourier descriptors to represent each entry [26].
Results.
In Table 1, refers to isotropic diffusion, is the algorithm of Szlam et al. [29]. is an extension of [29] to nonlinear diffusion based on our diffusion approach (see Sec. 2) while and are local match and smooth anisotropic diffusion, respectively.
Overall, all four anisotropic diffusion algorithms significantly improve classification accuracies over isotropic diffusion (). However, for some datasets (SWDLEAF, RealSim, Pcmac), the performance of linear anisotropic diffusion () [29] is equal to or even worse than . In contrast, all three nonlinear diffusion algorithms outperformed both and , while the local match () and smooth () versions of the contextguided diffusion led to further improvement over in all but the ETH and BCI datasets. These results are in accordance with the superior performance of the smooth diffusivity operators (which is an example of exploiting context) in image processing and demonstrate the effectiveness of exploiting context information in anisotropic diffusion on graphs. For the BCI dataset, and showed the best and the worst performances, while essentially all four anisotropic diffusion algorithms did not show any noticeable improvement from the isotropic case. This is because the initial labeling based on isotropic diffusion is almost random (around 40% error rate for binary classification), and so this is a poor initialization for an anisotropic diffusion and does not lead to better label propagation. Similar observation were reported in [29]. The anisotropic diffusion algorithms also demonstrated their competence in comparison with stateoftheart labelpropagation algorithms [37, 11, 31]: GRF is best on COIL1 and COIL2, and FLAP and LNP are the best for MPEG7 and RealSim. However, except for few cases, the results of and are included in the three best results for each dataset demonstrating the overall steady performance improvements over existing algorithms. Lastly, all three algorithms are designed for data graphs constructed based on input features rather than from function evaluations. Therefore, they can potentially benefit from our proposed anisotropic diffusion approaches.
Parameters.
Isotropic diffusion has three parameters: the weight (Eq. 14), the size of local neighborhood , and the number of diffusion steps . We automatically determine based on the average Euclidean distance of to [29, 18]. We determine the two other parameters with a separate validation label set which is the same size as the training label set.
For all anisotropic diffusion algorithms, an additional hyperparameter (Eq. 15) is determined in the same way. The step size of the explicit Euler approximation in our algorithms (Eq. 12) is fixed at 1. In general, can also be tuned per dataset to improve performance. GRF, FLAP, and LNP hyperparameters are all determined in the same way based on the validation set.
Computational complexity.
This depends upon the number of data points, the size of the local neighborhood, and the number of diffusion process iterations (Eq. 12). Each diffusion iteration requires multiplying the matrix of size with a vector of size , where is the number of classes. Accordingly, in theory, the complexity of each step is . However, typically , which leads to a sparse matrix : in practice, the computational complexity of each step is subquadratic. For USPS datasets with data points, running 100 iterations of the local match diffusion process takes 0.3 seconds on an Intel Xeon 3.4GHz CPU in MATLAB.
5 Discussion and conclusion
We show two ways to exploit local contexts: smooth and local match. These can be extended to consider the full topological features of evaluated at and . For instance, one could perform spectral analysis on and and measure the similarity of the corresponding Eigenspectra to define a new diffusivity operator . This is different from precalculating topological features, as is commonly used in graph matching, since features are extracted from the input rather than from function evaluations , and therefore the former stay constant during the diffusion process. We briefly explored this possibility in preliminary experiments, which indicate that full topological analysis is promising. However, due to the significantly increased computational complexity, we focus on smooth and local match operators and leave this extension for future work.
We adopted an explicit Euler scheme (Eq. 12) to discretize the continuous diffusion equation (Eq. 20). This scheme can be obtained as a gradient descent step of the convex regularization functional (Eq. 27). An alternative implicit Euler scheme (Eq. 25) can be obtained as the analytic solution of . Since our diffusion equation (Eq. 20) is nonlinear, both approaches eventually lead to iterative algorithms. A major advantage of an implicit Euler scheme is that it is uniformly stable with respect to , while our explicit Euler scheme is stable only at sufficiently small values of , which we regard as a hyperparameter. On the other hand, implicit Euler approximation is computationally less favorable as it requires, at each iteration, explicitly solving a (sparse) linear system of size . Our explicit counterpart is computed by a matrixvector multiplication. We choose the explicit scheme due to its fast convergence in experiments and its applicability to largescale problems. Future work should carefully analyze the tradeoff between these two approaches, especially on smallerscale problems.
For simplicity of exposition, in Sec. 3, we assumed that the underlying probability distribution on is uniform. However, our interpretation applies to more general cases where is nonuniform. If the sampling distribution on is nonuniform, the isotropic LaplaceBeltrami operator is locally weighted by the corresponding probability density , rendering the weighted Laplacian. In particular, if is differentiable, the weighted Laplacian is explicitly given as [17, 14]:
(29) 
The weighted Laplacian satisfies Green’s theorem, and the divergence theorem holds similarly [13]. Accordingly, the corresponding weighted anisotropic Laplacian based on the diffusivity operator is obtained as in Proposition 1.
Conclusion.
We have presented an approach for anisotropic diffusion on graphs, by first extending wellestablished geometric diffusion on images to Riemannian manifolds and then discretizing it onto graphs. The resulting positive definite diffusivity operators on graphs leads to new diffusion possibilities that take local neighborhood structures into account, and thereby lead to robust diffusion. Applied to semisupervised learning, our algorithms demonstrate improved accuracy over existing isotropic diffusion and anisotropic diffusionbased algorithms.
Acknowledgements
The authors thank the reviewers for their constructive feedback. Kwang In Kim thanks EPSRC EP/M00533X/1. James Tompkin and Hanspeter Pfister thank NSF CGV1110955, the Air Force Research Laboratory, and the DARPA Memex program. Christian Theobalt thanks the Intel Visual Computing Institute.
References
 [1] M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacianbased manifold methods. Journal of Computer and System Sciences, 74(8):1289–1308, 2005.
 [2] C. Carey and S. Mahadevan. Manifold spanning graphs. In Proc. AAAI, pages 1708–1714, 2014.
 [3] W. Casaca, L. G. Nonato, and G. Taubin. Laplacian coordinates for seeded image segmentation. In Proc. IEEE CVPR, pages 384–391, 2014.
 [4] O. Chapelle, B. Schölkopf, and A. Zien. SemiSupervised Learning. MIT Press, Cambridge, MA, 2010.
 [5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
 [6] M. Donoser and H. Bischof. Diffusion processes for retrieval revisited. In Proc. IEEE CVPR, pages 1320–1327, 2013.
 [7] S. Ebert, M. Fritz, and B. Schiele. Active metric learning for object recognition. In Proc. DAGMOAGM, pages 327–336, 2012.

[8]
S. Ebert, M. Fritz, and B. Schiele.
RALF: a reinforced active learning formulation for object class recognition.
In Proc. IEEE CVPR, pages 3626–3633, 2012.  [9] S. Ebert, D. Larlus, and B. Schiele. Extracting structures in image collections for object recognition. In Proc. ECCV, pages 720–733, 2010.
 [10] E. Elboer, M. Werman, and Y. HelOr. The generalized Laplacian distance and its applications for visual matching. In Proc. IEEE CVPR, pages 2315–2322, 2013.
 [11] C. Gong and D. Tao. Fick’s law assisted propagation for semisupervised learning. IEEE TNNLS, PP(99):1,1, 2014 (Early Access).
 [12] R. Gopalan, P. Turaga, and R. Chellappa. Diffusion processes for retrieval revisited. In Proc. ECCV, pages 286–299, 2010.
 [13] A. Grigor´yan. Heat kernels on weighted manifolds and applications. In J. Jorgenson and L. Walling, editors, The Ubiquitous Heat Kernel, Contemporary Mathematics. American Mathematical Society, 2006.
 [14] A. Grigor´yan. Heat Kernel and Analysis on Manifolds (AMS/IP Studies in Advanced Mathematics). American Mathematical Society, 2013.
 [15] Y. Guo and X. N. H. Zhang. An extensive empirical study on semisupervised learning. In Proc. ICDM, pages 186–195, 2010.

[16]
M. Hein.
Geometrical Aspects of Statistical Learning Theory
. PhD thesis, Fachbereich Informatik, Technische Universität Darmstadt, Germany, 2005.  [17] M. Hein, J.Y. Audibert, and U. von Luxburg. From graphs to manifolds  weak and strong pointwise consistency of graph Laplacians. In Proc. COLT, pages 470–485, 2005.
 [18] M. Hein and M. Maier. Manifold denoising. In NIPS, pages 561–568, 2007.
 [19] J. Jost. Riemannian Geometry and Geometric Analysis. Springer, New York, 6th edition, 2011.
 [20] B. Kégl. Intrinsic dimension estimation using packing numbers. In NIPS, pages 681–688, 2002.
 [21] K. I. Kim, J. Tompkin, and C. Theobalt. Curvatureaware regularization on Riemannian submanifolds. In Proc. IEEE ICCV, pages 881–888, 2013.
 [22] L. J. Latecki, R. Lakämper, and U. Eckhardt. Shape descriptors for nonrigid shapes with a single closed contour. In Proc. IEEE CVPR, pages 424–429, 2000.
 [23] J. M. Lee. Riemannian Manifolds An Introduction to Curvature. Springer, New York, 1997.
 [24] B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. In Proc. IEEE CVPR, 2003.

[25]
D. Li, Q. Chen, and C.K. Tang.
Motionaware knn Laplacian for video matting.
In Proc. IEEE ICCV, pages 3599–3606, 2013.  [26] H. Ling and D. W. Jacobs. Shape classification using the innerdistance. IEEE TPAMI, 29(2):286–299, 2007.
 [27] S. Rosenberg. The Laplacian on a Riemannian Manifold. Cambridge University Press, 1997.
 [28] O. Söderkvist. Computer Vision Classification of Leaves from Swedish Trees. PhD thesis, Master thesis, Linköping University, Sweden, 2001.
 [29] A. D. Szlam, M. Maggioni, and R. R. Coifman. Regularization on graphs with functionadapted diffusion processes. JMLR, 9:1711–1739, 2008.
 [30] B. Wang, Z. Tu, and J. K. Tsotsos. Dynamic label propagation for semisupervised multiclass multilabel classification. In Proc. IEEE ICCV, pages 425–432, 2013.
 [31] F. Wang and C. Zhang. Label propagation through linear neighborhoods. In Proc. ICML, pages 985–992, 2006.
 [32] J. Weickert. Anisotropic Diffusion in Image Processing. ECMI Series, TeubnerVerlag, Stuttgart, 1998.
 [33] R. Wu, Y. Yu, and W. Wang. SCaLE: supervised and cascaded Laplacian eigenmaps for visual object recognition based on nearest neighbors. In Proc. IEEE CVPR, pages 867–874, 2013.
 [34] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, pages 321–328, 2003.
 [35] D. Zhou and B. Schölkopf. Discrete regularization. In Semisupervised Learning, pages 237–250. MIT Press, Cambridge, MA, USA, 2006.
 [36] X. Zhou and M. Belkin. Semisupervised learning by higher order regularization. JMLR W&CP (Proc. AISTATS), pages 892–900, 2011.
 [37] X. Zhu, Z. Ghahramani, and J. Lafferty. Semisupervised learning using gaussian fields and harmonic functions. In Proc. ICML, pages 912–919, 2003.
Comments
There are no comments yet.