Diff2Dist: Learning Spectrally Distinct Edge Functions, with Applications to Cell Morphology Analysis

06/29/2021 ∙ by Cory Braker Scott, et al. ∙ University of California, Irvine Los Alamos National Laboratory 12

We present a method for learning "spectrally descriptive" edge weights for graphs. We generalize a previously known distance measure on graphs (Graph Diffusion Distance), thereby allowing it to be tuned to minimize an arbitrary loss function. Because all steps involved in calculating this modified GDD are differentiable, we demonstrate that it is possible for a small neural network model to learn edge weights which minimize loss. GDD alone does not effectively discriminate between graphs constructed from shoot apical meristem images of wild-type vs. mutant Arabidopsis thaliana specimens. However, training edge weights and kernel parameters with contrastive loss produces a learned distance metric with large margins between these graph categories. We demonstrate this by showing improved performance of a simple k-nearest-neighbors classifier on the learned distance matrix. We also demonstrate a further application of this method to biological image analysis: once trained, we use our model to compute the distance between the biological graphs and a set of graphs output by a cell division simulator. This allows us to identify simulation parameter regimes which are similar to each class of graph in our original dataset.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Graph Diffusion Distance (GDD) is a measure of similarity between graphs originally introduced by Hammond et. al (Hammond et al., 2013) and greatly expanded by Scott et. al (Scott and Mjolsness, 2021)

. This metric measures the similarity of two graphs by comparing their respective spectra (the eigenvalues of the graph Laplacian). However, it is well-known that there exist pairs of

cospectral

graphs which are not isomorphic but have identical spectra. Furthermore, because even a small change to entries of a matrix may change its eigenvalues, another limitation of GDD is that it is sensitive to small changes in the topology of the graph (as well as small variations in edge weights). Finally, since GDD does not make use of edge or node attributes, it cannot distinguish between two different signals on the same source graph, diminishing its applicability in data science. In this work, we provide several generalizations to GDD which resolve these issues and make it a powerful machine learning tool for datasets of graphs.

2. Graph Diffusion Distance

We use the definition of Graph Diffusion Distance (GDD) first given by Hammond et. al and later expanded (to cover differently-sized graphs) by Scott et. al. Given two graphs and , let and be their respective graph Laplacians. Furthermore, let be the diagonalizations of each Laplacian, so that is a diagonal matrix whose th diagonal entry, , is the th eigenvalue of . Then the graph diffusion distance between these graphs is given by

(1)

This simplification relies on several properties of the Frobenius norm and the exponential map (rotation invariance and continuity, respectively) which we shall not detail here. It is clear that this distance measure requires the two graphs to be the same size, since otherwise this matrix difference is not defined.

The generalization to different-sized graphs given by Scott et. al can also be modified in the way we discuss in Section 3, but we do not consider this version of GDD in this paper. Given two graphs , of differing sizes , we can define graph diffusion distance similarly to Equation 2:

(2)

In Equation 2, is a time-dilation factor which dilates the passage of time in one graph with respect to the other. P is a rectangular matrix which is optimized according to some set of constraints . In the cited paper by Scott et. al, is taken to be orthogonality: .

is a change of basis from graph-space to eigenspace, allowing us to again represent the equation for varying-size GDD as a comparison between lists of eigenvalues.

2.1. GDD is a differentiable function of and edge weights

Once all of the eigenvalues

and eigenvectors

(of a matrix

) are computed, we may backpropagate through the eigendecomposition as described in

(Nelson, 1976) and (Andrew et al., 1993). If our edge weights (and therefore the values in the Laplacian matrix ) are parametrized by some value , and our loss function is dependent on the eigenvalues of , then we can collect the gradient as:

(3)

In practice, if the entries of are computed as a function of

using an automatic differentiation package (such as PyTorch

(Paszke et al., 2019)) the gradient matrix is already known before eigendecomposition. We note here that for any fixed value of , all of the operations needed to compute GDD are either simple linear algebra or continuous or both. Therefore, for any loss function which takes the GDD between two graphs as input, we may optimize by backpropagation through the calculation of GDD using Equation 3. We note here that although all numerical experiments in this paper use the same-sized version of GDD, this backpropagation will work for the varying-sized version as well, allowing gradients to be used to adjust any of the inputs to the GDD equation.

3. Learning Parameters for Diffusion Kernels

In this section, we describe our method for learning edge weights for Laplacian diffusion kernels, beginning with our generalization of GDD to make it trainable, and then introducing a method for learning edge weights.

3.1. Diff2Dist: Differentiable Graph Diffusion Distance

We make two main changes to GDD to make it capable of being tuned to specific graph data. First, we replace the real-valued optimization over with a maximum over an explicit list of values

. This removes the need for an optimization step inside the GDD calculation. Second, we re-weight the Frobenius norm in the GDD calculation with a vector of weights

which is the same length as the list of eigenvalues (these weights are normalized to sum to 1). The resulting GDD calculation is then:

(4)

We call this version of GDD Differentiable Graph Diffusion Distance, or Diff2Dist. Because this distance calculation is comprised entirely of differentiable components and linear algebra, it may be explicitly included in the computation graph (e.g. in PyTorch) of a machine learning model, without needing to invoke some external optimizer to find the supremum over all . and may be tuned by gradient descent or some other optimization algorithm to minimize a loss function which takes as input. Tuning the values results in a list of values of for which GDD is most informative for a given dataset, while tuning reweights GDD to pay most attention to the eigenvalues which are most discriminative. In the experiments in Section 4.1.1 we demonstrate the efficacy of tuning these parameters using contrastive loss.

3.2. Learning Edge Weighting Functions

Here, we note that if graph edge weights are determined by some differentiable function parametrized by parameters , we may still apply all of the machinery of Sections 2.1 and 3.1. A common edge weighting function for graphs embedded in Euclidean space is the Gaussian Distance Kernel, , where is the distance between nodes and in the embedding. is the ‘radius’ of the distance kernel and can be chosen a priori or tuned in the same way as and , using a numerical optimization procedure. However, the Gaussian distance kernel, while mathematically well-motivated for embedded graphs, is a somewhat arbitrary choice of edge weight, especially in cases where the edges of our graphs have more complicated edge labels. In cases like the data discussed in this paper, our edge labels are vector-valued, and it is therefore advantageous to replace this hand-picked edge weight with weights chosen by a general function approximator, e.g. an artificial neural network (Fukushima and Miyake, 1982). As before, the parameters of this ANN could be tuned using gradients backpropagated through the GDD calculation and eigendecomposition.

4. Numeric Experiments

4.1. Data Description

4.1.1. Arabidopsis Cell Graph Dataset

The species Arabidopsis thaliana is of high interest in plant morphology studies, since 1) its genome was fully sequenced in 1996, relatively early (Kaul et al., 2000), and 2) its structure makes it relatively easy to capture images of the area of active cell division: the shoot apical meristem (SAM). Recent work (Schaefer et al., 2017) has found that mutant Arabidopsis specimens with a loss of function of genes trm6,trm7, and trm8

demonstrate more variance in the placement of new cell walls during cell division. This is thought to be the result of the

trm-mutants having abnormal or absent pre-prophase bands (PPBs) during cell divison. The PBB is a cellular substructure which fine-tunes the placement of cell walls during division.

We take 20 confocal microscope images (13 from wild-type plants and 7 from trm678 mutants) of shoot apical meristems, and process them to extract graphs representing local neighborhoods of cells. Each graph consists of a cell and its 63 closest neighbors (64 cells total). Cell neighborhood selection was limited to the central region of each SAM image, since the primordia surrounding the SAM are known to have different morphological properties. For each cell neighborhood, we produce a graph by connecting two cells if and only if their shared boundary is 30 pixels or longer (pixel size : 1 px = 0.1818 µm). For each edge, we save the length of this shared boundary, as well as the angle of the edge from horizontal and the edge length. We extracted 2318 cell neighborhoods in this way, resulting in a dataset of 13. See Figure 2 for an example morphological graph.

Figure 2. Left: an image of the SAM of a mutant Arabidopsis specimen. The original 3D confocal microscope image is here represented as a 2D skeleton. Right: a zoomed-in view of the same specimen, with separate cells false-colored and an example extracted cell neighborhood graph overlaid.

We test each of the GDD generalizations proposed, on the task of classifiying wild-type vs. mutant morphological graphs. We split our dataset 85%/15% train/validation; all metrics we report are calculated on the validation set. We compare the following four methods:

  1. Original GDD of unweighted graphs, with no tuning of or other parameters;

  2. Diff2Dist of graphs with Gaussian kernel edge weights (fixed ), with and tuned;

  3. Diff2Dist of graphs with Gaussian kernel edge weights, with , , and all tuned;

  4. Diff2Dist of graphs with general edge weights parametrized by a small neural network. Input to this neural network was a vector of all three edge attributes.

For methods 2 and 3, the input to the distance kernel was the distance between nodes in the original image. All parameters were tuned using ADAMOpt (Kingma and Ba, 2014)

(with default PyTorch hyperparameters and batch size 256) to minimize the

contrastive loss function (Hadsell et al., 2006):

(5)

This loss function encourages and to be closer than if they have the same label, and further apart than if they differ ( is a binary indicator of label agreement). These margins were set (,

) by trial-and-error on the training set. Training took 600 epochs. For the neural network approach, edge weights were chosen as the final output of a neural network with seven layers of sizes

with SiLU activations on the first six layers and no activation function on the last layer. Results of these experiments can be found in Table

1. We also present distance matrices for each approach, as well as Isomap (Tenenbaum et al., 2000) embeddings of each (we used the scikit-learn implementation of Isomap with 15 neighbors and default hyperparameters). The distance matrices developed using the ANN approach clearly show better separation between the two categories. The distance matrices and resulting embeddings for methods 1 and 2 do not result in a clear separation between the two categories, while methods 3 and 4 do. Furthermore, method 4 (Diff2Dist) demonstrates a dramatic improvement over method 3, showing that this version of GDD can be tuned so that the resulting graph distance minimizes an arbitrary loss function (e.g. separates classes of graphs).

Method Accuracy %
GDD only 89.4
-tuning and -weights 90.6
and -tuning,-weights 94.7
ANN Parametrization 99.1
Table 1. Validation set accuracy for a simple K-nearest neighbors classifier for all four methods. The validation set was the same for each of these tests. The value reported is the highest value over all .
Figure 3. Top: distance matrix for morphological graphs, generated with method 1 (GDD on unweighted graphs). Bottom: Isomap embedding of this distance matrix, which ensures that points with small distance are placed near each other. Blue points represent wild type, and orange points represent mutants. Transparent points represent training set graphs; solid points represent those from the validation set. We see that naive GDD leads to an embedding where the two categories of graph overlap, indicating that GDD by iteself is not capturing the distinction between these two classes of graph.
Figure 4. As in Figure 3, but for method 2 (parametrized GDD on unweighted graphs). This distance matrix and embedding have the same flaws as those in Figure 3.
Figure 5. As in Figure 3, but for method 3 (Gaussian kernel with tuned ). We see that using a Gaussian distance kernel for edge weights, with radius tuned by gradient descent, results in a graph distance metric which better separates the two categories.
Figure 6. As in Figure 3, but for the ANN-determined edge weights. Replacing the arbitrary Gaussian distance kernel with weights chosen by a machine-learning model makes this approach fully general and produces a distance metric which fully separates the two graph categories.

5. Analysis of Simulation Hyperparameters

In this section, we use the distance metric trained in the previous section to characterize the differences in cell growth between the two types of cells, by comparing the biological graphs to graphs generated by a simulation.

We generated a population of artificial morphological graphs using the simulation code Tissue (Hamant et al., 2008). In our model, cells divide when they reach a volume of 40 arbitrary units, with a small random chance

of dividing at any timestep if they are larger than 20 units. Cell division planes are usually placed along the shortest path between two non-adjacent cell walls, but are randomly oriented with probability

. These parameters were implemented in a new division rule we contributed to the Tissue simulator (our modified version of Tissue’s source code is available in the supplementary material of this manuscript). We also varied two parameters from the original version of Tissue, representing the spring constant of each cell wall and the exclusion radius around each vertex (where new vertices are not allowed to be placed during division). A summary of the values swept over for each parameter is in Table 2. There were combinations of parameters, resulting in that many simulations (all simulations were started from the meristem.init file included with Tissue). Each simulation resulted in a mesh file representing the final positions of cell vertices and cell walls after 10000 timesteps. Each mesh was converted into an image and processed into a set of graphs (one centered on each cell) exactly as described in Section 4.1.1.

We used the Diff2Dist model (trained on the biological graphs only) to compute the distance between each biological graph and all of the graphs which originated from a simulation. Each biological graph can then be assigned a numerical label for each parameter, by taking the mean of the parameters for the 100 closest simulations. This gives us an estimate of which parameter values a given biological graph is most similar to (under our learned distance estimate, which is shown to separate

trm678 and wild-type graphs). The results of this experiment can be seen in Figure 7. Comparing biologically derived graphs to synthetically-generated ones in this way allows us to see that wild-type graphs are characterized by high vertex avoidance and low rate of random cell division direction, while trm678 graphs are more likely to have cell divisions in random directions.

Parameter Name Values used
Spring Constant 0.1, 0.3, 1.0, 3.0
Vertex Exclusion Size 0.1, 0.3, 0.6
Random Div Freq 0.0, 0.00001, 0.00003, 0.0001
Random Div Direction 0.0, 0.01, 0.03, 0.1, 0.5, 1.0
Table 2. Summary of input parameters and values used during comparison of simulations to biologically-derived graphs. Spring constant controls the stiffness of cell walls during the simulation. Vertex exclusion size is the proportional size of an envelop around each vertex where new vertices may not be placed during division. is the random chance of a cell dividing at a given timestep even when it has not reached 40 units of volume. is the probability (given that division has occurred) that the placement of the new cell wall will be random instead of optimal.
Figure 7. Visualizing simulation parameters using Diff2Dist. Each plot shows one point for each morphological graph in our Arabidopsis dataset. Points are placed in 2D using Isomap, exactly as in Figure 6. Points are colored according to the parameter values of the nearest simulation-derived graphs, where ’nearest’ means in the sense of our trained distance metric.

6. Conclusion and Future Work

This paper presents a method to compute distance metrics between edge-labelled graphs, in such a way as to respect class labels. This approach is flexible and can be implemented entirely in PyTorch, making it possible to learn a distance metric between graphs that were previously not able to be discriminated by Graph Diffusion Distance. In the future hope to apply this method to more heterogenous graph datasets by including the varying-size version of GDD. We also note here that our neural network approach, as described, is not a Graph Neural Network in the sense described by prior works like (Kipf and Welling, 2016; Bacciu et al., 2020), as there is no message-passing step. We expect message-passing layers to directly improve these results and hope to include them in a future version of differentiable GDD.

7. Acknowledgements

Research presented in this article was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20200041ER. This work was also supported by the UC Irvine Donald Bren School of Information and Computer Sciences Endeavor grant, and by grants from the Human Frontiers Science Program [grant HFSP - RGP0023/2018] and National Institute of Health [grant R01 HD073179]. The authors would also like to thank (in no particular order) Jaques Dumais, Olivier Hamant, And Henrik Jönsson for their advice and guidance.

References

  • A. L. Andrew, K. E. Chu, and P. Lancaster (1993) Derivatives of eigenvalues and eigenvectors of matrix functions. SIAM journal on matrix analysis and applications 14 (4), pp. 903–926. Cited by: §2.1.
  • D. Bacciu, F. Errica, A. Micheli, and M. Podda (2020) A gentle introduction to deep learning for graphs. Neural Networks. Cited by: §6.
  • K. Fukushima and S. Miyake (1982)

    Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition

    .
    In Competition and cooperation in neural nets, pp. 267–285. Cited by: §3.2.
  • R. Hadsell, S. Chopra, and Y. LeCun (2006) Dimensionality reduction by learning an invariant mapping. In

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

    ,
    Vol. 2, pp. 1735–1742. Cited by: §4.1.1.
  • O. Hamant, M. G. Heisler, H. Jönsson, P. Krupinski, M. Uyttewaal, P. Bokov, F. Corson, P. Sahlin, A. Boudaoud, E. M. Meyerowitz, Y. Couder, and J. Traas (2008) Developmental patterning by mechanical signals in arabidopsis. Science 322 (5908), pp. 1650–1655. External Links: Document, ISSN 0036-8075, Link, https://science.sciencemag.org/content/322/5908/1650.full.pdf Cited by: §5.
  • D. K. Hammond, Y. Gur, and C. R. Johnson (2013) Graph diffusion distance: a difference measure for weighted graphs based on the graph laplacian exponential kernel. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 419–422. Cited by: §1.
  • S. Kaul, H. L. Koo, J. Jenkins, M. Rizzo, T. Rooney, L. J. Tallon, T. Feldblyum, W. Nierman, M. I. Benito, X. Lin, et al. (2000) Analysis of the genome sequence of the flowering plant arabidopsis thaliana. nature 408 (6814), pp. 796–815. Cited by: §4.1.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.1.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §6.
  • R. B. Nelson (1976) Simplified calculation of eigenvector derivatives. AIAA journal 14 (9), pp. 1201–1205. Cited by: §2.1.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §2.1.
  • E. Schaefer, K. Belcram, M. Uyttewaal, Y. Duroc, M. Goussot, D. Legland, E. Laruelle, M. de Tauzia-Moreau, M. Pastuglia, and D. Bouchez (2017) The preprophase band of microtubules controls the robustness of division orientation in plants. Science 356 (6334), pp. 186–189. Cited by: §4.1.1.
  • C. Scott and E. Mjolsness (2021) Graph diffusion distance: properties and efficient computation. Plos one 16 (4), pp. e0249624. Cited by: §1.
  • J. B. Tenenbaum, V. De Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. science 290 (5500), pp. 2319–2323. Cited by: §4.1.1.