1. Introduction
Graph Diffusion Distance (GDD) is a measure of similarity between graphs originally introduced by Hammond et. al (Hammond et al., 2013) and greatly expanded by Scott et. al (Scott and Mjolsness, 2021)
. This metric measures the similarity of two graphs by comparing their respective spectra (the eigenvalues of the graph Laplacian). However, it is wellknown that there exist pairs of
cospectralgraphs which are not isomorphic but have identical spectra. Furthermore, because even a small change to entries of a matrix may change its eigenvalues, another limitation of GDD is that it is sensitive to small changes in the topology of the graph (as well as small variations in edge weights). Finally, since GDD does not make use of edge or node attributes, it cannot distinguish between two different signals on the same source graph, diminishing its applicability in data science. In this work, we provide several generalizations to GDD which resolve these issues and make it a powerful machine learning tool for datasets of graphs.
2. Graph Diffusion Distance
We use the definition of Graph Diffusion Distance (GDD) first given by Hammond et. al and later expanded (to cover differentlysized graphs) by Scott et. al. Given two graphs and , let and be their respective graph Laplacians. Furthermore, let be the diagonalizations of each Laplacian, so that is a diagonal matrix whose th diagonal entry, , is the th eigenvalue of . Then the graph diffusion distance between these graphs is given by
(1) 
This simplification relies on several properties of the Frobenius norm and the exponential map (rotation invariance and continuity, respectively) which we shall not detail here. It is clear that this distance measure requires the two graphs to be the same size, since otherwise this matrix difference is not defined.
The generalization to differentsized graphs given by Scott et. al can also be modified in the way we discuss in Section 3, but we do not consider this version of GDD in this paper. Given two graphs , of differing sizes , we can define graph diffusion distance similarly to Equation 2:
(2) 
In Equation 2, is a timedilation factor which dilates the passage of time in one graph with respect to the other. P is a rectangular matrix which is optimized according to some set of constraints . In the cited paper by Scott et. al, is taken to be orthogonality: .
is a change of basis from graphspace to eigenspace, allowing us to again represent the equation for varyingsize GDD as a comparison between lists of eigenvalues.
2.1. GDD is a differentiable function of and edge weights
Once all of the eigenvalues
and eigenvectors
(of a matrix) are computed, we may backpropagate through the eigendecomposition as described in
(Nelson, 1976) and (Andrew et al., 1993). If our edge weights (and therefore the values in the Laplacian matrix ) are parametrized by some value , and our loss function is dependent on the eigenvalues of , then we can collect the gradient as:(3) 
In practice, if the entries of are computed as a function of
using an automatic differentiation package (such as PyTorch
(Paszke et al., 2019)) the gradient matrix is already known before eigendecomposition. We note here that for any fixed value of , all of the operations needed to compute GDD are either simple linear algebra or continuous or both. Therefore, for any loss function which takes the GDD between two graphs as input, we may optimize by backpropagation through the calculation of GDD using Equation 3. We note here that although all numerical experiments in this paper use the samesized version of GDD, this backpropagation will work for the varyingsized version as well, allowing gradients to be used to adjust any of the inputs to the GDD equation.3. Learning Parameters for Diffusion Kernels
In this section, we describe our method for learning edge weights for Laplacian diffusion kernels, beginning with our generalization of GDD to make it trainable, and then introducing a method for learning edge weights.
3.1. Diff2Dist: Differentiable Graph Diffusion Distance
We make two main changes to GDD to make it capable of being tuned to specific graph data. First, we replace the realvalued optimization over with a maximum over an explicit list of values
. This removes the need for an optimization step inside the GDD calculation. Second, we reweight the Frobenius norm in the GDD calculation with a vector of weights
which is the same length as the list of eigenvalues (these weights are normalized to sum to 1). The resulting GDD calculation is then:(4) 
We call this version of GDD Differentiable Graph Diffusion Distance, or Diff2Dist. Because this distance calculation is comprised entirely of differentiable components and linear algebra, it may be explicitly included in the computation graph (e.g. in PyTorch) of a machine learning model, without needing to invoke some external optimizer to find the supremum over all . and may be tuned by gradient descent or some other optimization algorithm to minimize a loss function which takes as input. Tuning the values results in a list of values of for which GDD is most informative for a given dataset, while tuning reweights GDD to pay most attention to the eigenvalues which are most discriminative. In the experiments in Section 4.1.1 we demonstrate the efficacy of tuning these parameters using contrastive loss.
3.2. Learning Edge Weighting Functions
Here, we note that if graph edge weights are determined by some differentiable function parametrized by parameters , we may still apply all of the machinery of Sections 2.1 and 3.1. A common edge weighting function for graphs embedded in Euclidean space is the Gaussian Distance Kernel, , where is the distance between nodes and in the embedding. is the ‘radius’ of the distance kernel and can be chosen a priori or tuned in the same way as and , using a numerical optimization procedure. However, the Gaussian distance kernel, while mathematically wellmotivated for embedded graphs, is a somewhat arbitrary choice of edge weight, especially in cases where the edges of our graphs have more complicated edge labels. In cases like the data discussed in this paper, our edge labels are vectorvalued, and it is therefore advantageous to replace this handpicked edge weight with weights chosen by a general function approximator, e.g. an artificial neural network (Fukushima and Miyake, 1982). As before, the parameters of this ANN could be tuned using gradients backpropagated through the GDD calculation and eigendecomposition.
4. Numeric Experiments
4.1. Data Description
4.1.1. Arabidopsis Cell Graph Dataset
The species Arabidopsis thaliana is of high interest in plant morphology studies, since 1) its genome was fully sequenced in 1996, relatively early (Kaul et al., 2000), and 2) its structure makes it relatively easy to capture images of the area of active cell division: the shoot apical meristem (SAM). Recent work (Schaefer et al., 2017) has found that mutant Arabidopsis specimens with a loss of function of genes trm6,trm7, and trm8
demonstrate more variance in the placement of new cell walls during cell division. This is thought to be the result of the
trmmutants having abnormal or absent preprophase bands (PPBs) during cell divison. The PBB is a cellular substructure which finetunes the placement of cell walls during division.We take 20 confocal microscope images (13 from wildtype plants and 7 from trm678 mutants) of shoot apical meristems, and process them to extract graphs representing local neighborhoods of cells. Each graph consists of a cell and its 63 closest neighbors (64 cells total). Cell neighborhood selection was limited to the central region of each SAM image, since the primordia surrounding the SAM are known to have different morphological properties. For each cell neighborhood, we produce a graph by connecting two cells if and only if their shared boundary is 30 pixels or longer (pixel size : 1 px = 0.1818 µm). For each edge, we save the length of this shared boundary, as well as the angle of the edge from horizontal and the edge length. We extracted 2318 cell neighborhoods in this way, resulting in a dataset of 13. See Figure 2 for an example morphological graph.
We test each of the GDD generalizations proposed, on the task of classifiying wildtype vs. mutant morphological graphs. We split our dataset 85%/15% train/validation; all metrics we report are calculated on the validation set. We compare the following four methods:

Original GDD of unweighted graphs, with no tuning of or other parameters;

Diff2Dist of graphs with Gaussian kernel edge weights (fixed ), with and tuned;

Diff2Dist of graphs with Gaussian kernel edge weights, with , , and all tuned;

Diff2Dist of graphs with general edge weights parametrized by a small neural network. Input to this neural network was a vector of all three edge attributes.
For methods 2 and 3, the input to the distance kernel was the distance between nodes in the original image. All parameters were tuned using ADAMOpt (Kingma and Ba, 2014)
(with default PyTorch hyperparameters and batch size 256) to minimize the
contrastive loss function (Hadsell et al., 2006):(5) 
This loss function encourages and to be closer than if they have the same label, and further apart than if they differ ( is a binary indicator of label agreement). These margins were set (,
) by trialanderror on the training set. Training took 600 epochs. For the neural network approach, edge weights were chosen as the final output of a neural network with seven layers of sizes
with SiLU activations on the first six layers and no activation function on the last layer. Results of these experiments can be found in Table
1. We also present distance matrices for each approach, as well as Isomap (Tenenbaum et al., 2000) embeddings of each (we used the scikitlearn implementation of Isomap with 15 neighbors and default hyperparameters). The distance matrices developed using the ANN approach clearly show better separation between the two categories. The distance matrices and resulting embeddings for methods 1 and 2 do not result in a clear separation between the two categories, while methods 3 and 4 do. Furthermore, method 4 (Diff2Dist) demonstrates a dramatic improvement over method 3, showing that this version of GDD can be tuned so that the resulting graph distance minimizes an arbitrary loss function (e.g. separates classes of graphs).Method  Accuracy % 

GDD only  89.4 
tuning and weights  90.6 
and tuning,weights  94.7 
ANN Parametrization  99.1 
5. Analysis of Simulation Hyperparameters
In this section, we use the distance metric trained in the previous section to characterize the differences in cell growth between the two types of cells, by comparing the biological graphs to graphs generated by a simulation.
We generated a population of artificial morphological graphs using the simulation code Tissue (Hamant et al., 2008). In our model, cells divide when they reach a volume of 40 arbitrary units, with a small random chance
of dividing at any timestep if they are larger than 20 units. Cell division planes are usually placed along the shortest path between two nonadjacent cell walls, but are randomly oriented with probability
. These parameters were implemented in a new division rule we contributed to the Tissue simulator (our modified version of Tissue’s source code is available in the supplementary material of this manuscript). We also varied two parameters from the original version of Tissue, representing the spring constant of each cell wall and the exclusion radius around each vertex (where new vertices are not allowed to be placed during division). A summary of the values swept over for each parameter is in Table 2. There were combinations of parameters, resulting in that many simulations (all simulations were started from the meristem.init file included with Tissue). Each simulation resulted in a mesh file representing the final positions of cell vertices and cell walls after 10000 timesteps. Each mesh was converted into an image and processed into a set of graphs (one centered on each cell) exactly as described in Section 4.1.1.We used the Diff2Dist model (trained on the biological graphs only) to compute the distance between each biological graph and all of the graphs which originated from a simulation. Each biological graph can then be assigned a numerical label for each parameter, by taking the mean of the parameters for the 100 closest simulations. This gives us an estimate of which parameter values a given biological graph is most similar to (under our learned distance estimate, which is shown to separate
trm678 and wildtype graphs). The results of this experiment can be seen in Figure 7. Comparing biologically derived graphs to syntheticallygenerated ones in this way allows us to see that wildtype graphs are characterized by high vertex avoidance and low rate of random cell division direction, while trm678 graphs are more likely to have cell divisions in random directions.Parameter Name  Values used 

Spring Constant  0.1, 0.3, 1.0, 3.0 
Vertex Exclusion Size  0.1, 0.3, 0.6 
Random Div Freq  0.0, 0.00001, 0.00003, 0.0001 
Random Div Direction  0.0, 0.01, 0.03, 0.1, 0.5, 1.0 
6. Conclusion and Future Work
This paper presents a method to compute distance metrics between edgelabelled graphs, in such a way as to respect class labels. This approach is flexible and can be implemented entirely in PyTorch, making it possible to learn a distance metric between graphs that were previously not able to be discriminated by Graph Diffusion Distance. In the future hope to apply this method to more heterogenous graph datasets by including the varyingsize version of GDD. We also note here that our neural network approach, as described, is not a Graph Neural Network in the sense described by prior works like (Kipf and Welling, 2016; Bacciu et al., 2020), as there is no messagepassing step. We expect messagepassing layers to directly improve these results and hope to include them in a future version of differentiable GDD.
7. Acknowledgements
Research presented in this article was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20200041ER. This work was also supported by the UC Irvine Donald Bren School of Information and Computer Sciences Endeavor grant, and by grants from the Human Frontiers Science Program [grant HFSP  RGP0023/2018] and National Institute of Health [grant R01 HD073179]. The authors would also like to thank (in no particular order) Jaques Dumais, Olivier Hamant, And Henrik Jönsson for their advice and guidance.
References
 Derivatives of eigenvalues and eigenvectors of matrix functions. SIAM journal on matrix analysis and applications 14 (4), pp. 903–926. Cited by: §2.1.
 A gentle introduction to deep learning for graphs. Neural Networks. Cited by: §6.

Neocognitron: a selforganizing neural network model for a mechanism of visual pattern recognition
. In Competition and cooperation in neural nets, pp. 267–285. Cited by: §3.2. 
Dimensionality reduction by learning an invariant mapping.
In
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
, Vol. 2, pp. 1735–1742. Cited by: §4.1.1.  Developmental patterning by mechanical signals in arabidopsis. Science 322 (5908), pp. 1650–1655. External Links: Document, ISSN 00368075, Link, https://science.sciencemag.org/content/322/5908/1650.full.pdf Cited by: §5.
 Graph diffusion distance: a difference measure for weighted graphs based on the graph laplacian exponential kernel. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 419–422. Cited by: §1.
 Analysis of the genome sequence of the flowering plant arabidopsis thaliana. nature 408 (6814), pp. 796–815. Cited by: §4.1.1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.1.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §6.
 Simplified calculation of eigenvector derivatives. AIAA journal 14 (9), pp. 1201–1205. Cited by: §2.1.
 Pytorch: an imperative style, highperformance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §2.1.
 The preprophase band of microtubules controls the robustness of division orientation in plants. Science 356 (6334), pp. 186–189. Cited by: §4.1.1.
 Graph diffusion distance: properties and efficient computation. Plos one 16 (4), pp. e0249624. Cited by: §1.
 A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §4.1.1.