The Weighted Euler Curve Transform for Shape and Image Analysis

04/23/2020 ∙ by Qitong Jiang, et al. ∙ Florida State University The Ohio State University 0

The Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al. on Euler integral calculus to prove that this invariant—dubbed the Weighted Euler Curve Transform (WECT)—is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. This WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Tools from algebraic topology have become increasingly popular in shape analysis applications over the past several years. At an intuitive level, the topological perspective is appealing because algebraic topology is, at its core, designed to extract tractable algebraic invariants from complex shape data. The dominant technique in topological shape analysis is persistent homology, which summarizes multiscale topological features of a shape, where scale is measured relative to some filtration function. Roughly, for a continuous function on a topological space (satisfying certain tameness conditions), one computes the degree- homology of the sublevel sets and tracks “births” and “deaths” of homological features as the filtration value is increased. This produces a summary statistic for the pair called a persistence diagram (see standard references [19, 10]), which can be used as a proxy for in shape analysis applications. This approach has been taken in several shape analysis tasks, with shape data coming from cortical surfaces [13], brain artery systems [3], proteins [29] and leaf contours [37]. While the persistence diagram of a pair provides a computationally tractable shape summary, the complex structure of the invariant means that it is difficult to incorporate into statistical models. A simpler invariant is the Euler curve of ; this is an integer-valued function on whose value at is the Euler characteristic (i.e., the alternating sum of ranks of the homology groups) of the sublevel set .

Given shape data, one must answer the question of which filtration function to apply in order to apply these topological methods. For a shape represented as a simplicial complex embedded in a Euclidean space , recent work has advocated for using an ensemble of filtration functions given by the height function along directions sampled from the unit sphere [41, 24, 20, 16, 4, 14, 21]. The collection of all persistence diagrams for these height filtrations is referred to as the persistent homology transform of . Likewise, the collection of Euler curves for all filtration directions is called the Euler curve transform (ECT) for . The ECT provides a particularly attractive shape representation, as its simplistic structure allows it to be easily incorporated into statistical models. This was the approach taken in [14], where the ECTs for Glioblastoma Multiforme (GBM) brain tumor shapes were used as covariates in a model for survival prediction.

In this paper, we consider a variant of the ECT, which we dub the weighted Euler Characteristic Transform (WECT). This object is defined for shape data consisting of an embedded simplicial complex endowed with an extra weighting function . The pair is referred to as a weighted simplicial complex. The WECT invariant incorporates both the shape of and the weighting function into a topological summary. Our motivation for defining this summary also comes from analysis of brain tumor data, which is naturally given as a segmented grayscale image. The segmented shape is used to construct a simplicial complex embedded in , and the grayscale pixel values inside the shape define the weight function . While the WECT is a simple generalization of the ECT, it is able to efficiently incorporate vital information that is ignored by the ECT.

1.1 Contributions and Organization of Paper

The proposed mathematical framework is laid out in detail in Section 2. There, we give a precise definition of the WECT as a generalization of ideas appearing in [41, 4]. We show that recent work of Ghrist, Levanger and Mai implies that the WECT is a complete descriptor of weighted simplicial complexes, i.e., two weighted simplicial complexes have the same WECT if and only if they are equal. In this section, we also provide comparisons between the WECT and other techniques appearing in the topological shape analysis literature. In Section 3

, we demonstrate some applications of the WECT framework. We begin with a toy example exploring the utility of the WECT in classifying and registering MNIST digit images. Next, we explore a real application wherein we study the shape and appearance of Glioblastoma Multiforme tumors using WECT representations. Using a simple distance-based clustering scheme, we are able to distinguish clusters of tumors with low survival times, purely from imaging data. Open source code for producing and analyzing WECTs has been made publicly available

[27].

2 Mathematical Framework

In this section, we lay out the mathematical framework for the WECT. We begin by reviewing some basic definitions in order to set notation.

2.1 Simplicial Complexes and the Euler Characteristic

Let be a simplicial complex embedded in some Euclidean space . That is, is a set of embedded simplices . Each is the convex hull of a set of points in general position in , where is the dimension of the simplex; we write . For example, a -dimensional simplex is a point, a -dimensional simplex is a closed line segment and a -dimensional simplex is a triangle. The points defining are called its vertices. The convex hull of of these vertices is also a simplex of and is called an -dimensional face of . If is a face of , we write . If and are simplices of , we require that is also a simplex of . The maximum dimension of a simplex in is called the dimension of , denoted . A collection of simplices of which itself forms a simplicial complex is called a subcomplex of . The union of all simplices of of dimension less than or equal to is a subcomplex called the -skeleton of , denoted . The set of simplices of of dimension exactly is denoted ; note that is not a simplicial complex in general.

Abusing notation, we will alternate between treating each embedded simplicial complex as a combinatorial object (a set of simplices) and as a geometric object (a set of points in ). We hope that the interpretation should always be clear from context.

Figure 1:

Examples of embedded simplicial complexes commonly arising in computer vision. A triangulated surface is a two-dimensional simplicial complex embedded in

. An embedded planar graph is a 1-dimensional simplicial complex in .

A simple combinatorial invariant of a simplicial complex is its Euler characteristic, denoted . The Euler characteristic is defined as

where will generally be used to denote the cardinality of a set . The concept of the Euler characteristic generalizes to more flexible classes of spaces, and it is a basic fact of algebraic topology that is a homotopy equivalence invariant. Simplicial complexes form a convenient category for computation, since they can be represented abstractly in a purely combinatorial way by keeping track of all simplices and their inclusions. In this paper, we are focused on the geometrically motivated case where are simplicial complexes are specified by an embedding into a Euclidean space. While not strictly necessary, the invariants we describe are most interesting when is a -dimensional simplicial complex. Moreover, we restrict our attention to the finite setting, i.e., is finite for all .

2.2 Euler Curve Transform

Consider a function as an assignment of a real number to each simplex of , i.e., the function is constant along faces. The function is a filtration function if each sublevel set is a subcomplex of . A filtration function induces a chain of inclusions of simplicial complexes , where are the finitely many (using the assumption that is finite) values in the range of . From this data, one obtains the Euler curve defined as .

Figure 2: Glioblastoma multiforme tumor image data. From left to right: axial slice with largest tumor area selected from a 3D MRI image; binary tumor segmentation mask; segmented tumor image; weighted simplicial complex created from the segmented tumor image. Observe that the tumor shape data from the segmentation mask is enriched by the overlaid pixel value function extracted from the original image: the level sets of the pixel value function have interesting shape and topological features.

Given data consisting of an embedded simplicial complex and a relevant function (or more general space and function where similar concepts can be defined), the Euler curve produces a multiscale topological summary which is amenable to classical analysis, and can be viewed as a simplification of the richer but more computationally taxing persistence diagram [19, 10]. On the other hand, if a relevant function is not provided, one is left with the question of how to filter the simplicial complex.

It was observed in [41] that for an embedded complex , there is a family of natural filtration functions: orthogonal projections onto the oriented one-dimensional subspaces of , which can be parameterized by the unit sphere . The Euler Curve Transform (ECT) of an embedded simplicial complex is the function defined as

with defined on the vertex set by the dot product

(1)

The function is extended inductively to higher-dimensional simplices as

(2)

In practical computations, one uses an approximation of the ECT given by sampling finitely many projection directions from and finitely many filtration values from .

One can also apply a smoothing operator to each single variable function to obtain the Smooth Euler Curve Transform (SECT). The SECT was applied in [14] to study Glioblastoma Multiforme tumor imaging data. In particular, the SECT served as a shape covariate in a Gaussian process regression model for survival prediction. Another variant of the ECT—very closely related to the one that we consider in subsequent sections—was applied in [4] to provide a topological signature for grayscale image data.

2.3 Weighted Euler Characteristic

Next, suppose that our data consists of an embedded simplicial complex together with a function , where . We refer to the pair as a weighted simplicial complex. The goal is to define a variant of , which also incorporates data from . We note that weighted simplicial complexes have already appeared in the literature in various contexts. To the best of our knowledge, they were first studied in [18], where a homology theory was developed. Abstract weighted simplicial complexes, i.e., those which do not come with a preferred embedding into a Euclidean space, serve as models for collaboration networks [11] and Vietoris-Rips complexes for weighted point clouds [38]. We provide some examples of embedded weighted simplicial complexes next.

Example 1.

Our main motivating example comes from grayscale images containing a region of interest, e.g., a tumor image with a segmentation mask, which can be converted into weighted simplicial complexes using Algorithm 1. An example of this process is described in Figure 2.

Example 2.

Although the main examples considered in this paper will be of the form described in Example 1, we note that there are many other situations where one might wish to consider weighted simplicial complexes. Given shape data as a simplicial complex , one could consider the weight function as an annotation or measure of importance. For example, if is a complex representing a molecule shape, the weight function could be used to annotate different atom types. If is an anatomical surface, can be used to indicate regions of importance landmarked by a radiologist.

1:function ImageToWeightedComplex()
2:      greyscale image matrix
3:     
4: treat nonzero pixels as coords for vertices
5:      initialize vertex list
6:     for  do add corner vertices
7:         append to
8:     end for
9:      remove duplicates
10:      initialize face list
11:     for  do
12:         append triangles containing to
13:     end for
14:      all resulting edges
15:     for  containing  do
16:          weight of corresponding pixel value
17:     end for
18:     for  do
19:          largest weight of face containing
20:     end for
21:     for  do
22:          largest weight of face containing
23:     end for
24:     return
25:end function
Algorithm 1 Grayscale Image to Weighted Complex

For a simplicial complex and a function , we define the weighted Euler characteristic

Remark 1.

If for all , then . The weighted Euler characteristic is therefore a direct generalization of the classical version.

Remark 2.

The same definition essentially appears in [4]; the only difference is that only simplicial complexes which are finite axis-aligned lattices were considered there.

Remark 3.

A generalization of the weighted Euler characteristic is a classical object of study in algebraic geometry; see, e.g., [28].

We are particularly interested in functions which satisfy the consistency condition Note that this condition is satisfied by the construction given in Algorithm 1. If a function satisfies this condition, we say that it is admissible. For functions of this type, the weighted Euler characteristic has a natural interpretation.

Proposition 1.

Suppose that is an admissible function. Then, each superlevel set is a subcomplex of . The weighted Euler characteristic is the sum of Euler characteristics of all superlevel complexes of ; that is,

(3)
Proof.

We first show that the superlevel sets are subcomplexes of . It suffices to show that for any and , we have . This is easy to see from the definition of an admissible function, since implies , which implies . It remains to show that Equation (3) is true. In what follows, for a logical statement , let denote the indicator function taking the value if is true, and if is false. Then,

2.4 Weighted Euler Curve Transform

We now define the Weighted Euler Curve Transform (WECT) as a straightforward generalization of the ECT; the WECT is specifically designed to treat weighted simplicial complexes. Let be a weighted simplicial complex, and let be a filtration function. The weighted Euler curve associated to is the function defined as

where is understood by context to be the restriction of to the subcomplex . We then define the WECT of a weighted simplicial complex with as the function defined as

with the projection function as defined in Equations (1) and (2). Clearly, if the weight function is constant and equal to one, then .

As in the case of the ECT, a WECT is represented in practice by sampling a finite number of directions on the sphere . An example of a WECT is shown in Figure 3. As in [14], when analyzing WECTs, we often preprocess them to improve robustness, by applying a smoothing operator. Unlike [14]

, we do not specify a particular smoothing operation, and leave the particular method as a hyperparameter in the data analysis pipeline.

Figure 3:

The WECT for a weighted simplicial complex constructed from an MNIST digit. Each panel shows a single weighed Euler curve, with the red curve on the left representing filtering by projection to the vector

, and the other curves constructed similarly by projection onto other directions.

2.5 Distance Between WECTs

The WECT of a weighted simplicial complex in is naturally viewed as a family of integer-valued functions , parameterized by . Since is assumed to be compact, each function is constant outside of a compact subset of , and we may restrict each function to this common compact domain; moreover, given a dataset of weighted simplicial complexes, one may assume without loss of generality that all WECT functions are defined on the same compact domain. After applying a smoothing operator, the smoothed WECT is likewise identified with a parameterized family of compactly supported functions of higher regularity. Any metric on such functional data gives rise to a metric on WECT data, by integrating the function

over with respect to its standard volume form.

The most convenient metric on compactly supported functions is the one induced by the standard norm (with respect to Lebesgue measure), denoted . We abuse notation slightly and denote the induced metric on the space of WECTs also using norm notation as follows:

(4)

This notation is in fact warranted, since this metric is equivalent to the one induced by the norm on , where is a compact interval, with respect to the product of the standard measure on with Lebesgue measure on

. With this metric, the space of WECTs has a Euclidean structure, meaning that WECTs are amenable to methods from functional data analysis and machine learning.

Computationally, a WECT is represented by a finite number of samples. Taking samples from and samples from , the values of the WECT can be arranged in a matrix of size . Then, the distance in Equation (4) can be computed simply as a Frobenius norm, making the process of comparing WECTs numerically efficient.

2.6 Injectivity of the WECT

Inverse problems in topological data analysis have recently become an active topic of research [36]. The basic general question is: Is it possible for inequivalent spaces to be mapped to the same topological summary statistic? This question has recently been tackled for various flavors of topological signatures [22, 35, 17, 12] including Persistent Homology and Euler Curve Transforms [41, 24, 20, 16, 4, 14, 21].

The original paper on the ECT [41] demonstrated a uniqueness result for ECT representations of compact embedded simplicial complexes with an algorithmic proof. This perspective has been pushed further to provide a sufficient number of direction samples to guarantee injectivity [16]. It is shown in [4] that for weighted cubical complexes defined on a regular axis-aligned lattice in , only generic samples are sufficient and an explicit reconstruction algorithm is provided. Our Algorithm 1 produces a simplicial complexes which is essentially equivalent to the cubical complexes of [4], so the reconstruction results their can be ported over directly to weighted simplicial complexes constructed via Algorithm 1.

In anticipation of the possibility of studying non-axis-aligned weighted simplicial complexes through the WECT signature, one might hope for a more general injectivity result. An alternative approach to the injectivity question for ECTs is given in [24, 16]. In these articles, the theory of Euler integral calculus is employed to prove injectivity. This approach is more theoretical and comes with the cost of a less explicit inversion algorithm. This is balanced by more general applicability. In particular, one has the following, quite general, result.

Theorem 1 (Theorem 1, [24]).

The map

defined by

(5)

is injective.

We use to denote the space of constructible functions; these are functions whose level sets satisfy a certain tameness condition, defined nowadays in the technical language of -minimal set theory [2, 15, 24]. The set is defined similarly. We are restricting to compactly supported constructible functions . This space in particular contains admissible functions defined on embedded simplicial complexes in . The right side of Equation (5) is defined in terms of Euler integration. Roughly, one treats the Euler characteristic formally as a measure, allowing for integration of sufficiently well-behaved functions. The transform can be understood as a topological version of the classical Radon transform used in tomography applications [25]. Theorem 1 is proved by appealing to a general result of Schapira on inverting topological Radon transforms of this type [40]. The authors of [24] observe that if is the indicator function for an embedded simplicial complex , then is exactly the ECT for , whence the ECT is injective [24, Corollary 1]. On the other hand, if we consider functions which are admissible weight functions on embedded simplicial complexes, we obtain the following result as an immediate corollary.

Theorem 2.

The Weighted Euler Characteristic Transform is injective on the space of weighted simplicial complexes. That is, if and are weighted simplicial complexes in with , then .

2.7 Comparison to Other Methods

The WECT provides a topological signature which simultaneously incorporates shape data and non-geometric weight data. In the case of image data, by discretely sampling the domain one obtains a discrete signature with a similar memory footprint to the original image. However, we show experimentally that the WECT provides a representation, which is more effective at distinguishing shape features. In this subsection, we compare the WECT representation to other shape descriptors appearing in the topological data analysis literature.

Persistent Homology. The WECT representation has several benefits over the commonly used persistence diagram signature. Foremost, it is a nontrivial task to simultaneously incorporate geometric and non-geometric features into a persistence diagram. One approach is to use a multiparameter filtration of the dataset [23, 9]. The major drawback of such an approach is that multiparameter persistent homology does not in general admit a convenient analogue of the persistence diagram statistics used in classical persistent homology. An alternative approach to incorporating geometric and non-geometric features into persistent cohomology was recently proposed in [8], where an enriched barcode representation is obtained through least squares optimization of persistent cohomology cycle representatives.

The simple WECT representation for weighted simplicial complex data also has the benefit of immediately providing a vectorized topological signature. This allows straightforward usage of WECT summaries as covariates in statistical models—this was the main idea of [14], where the ECT summaries were used as covariates in a Gaussian process regression for prediction of survival times of subjects with Glioblastoma Multiforme brain tumors. This is in stark contrast to analysis using persistence diagrams or barcodes from persistent homology. Indeed, a persistence diagram is an unstructured point cloud in and care must be taken to vectorize this signature in order to incorporate it into statistical models. There are several extant vectorization methods in the literature, including persistence landscapes [7] and persistence images [1], as well as more straightforward feature aggregation [3]. Any vectorization of the persistence diagram space necessarily distorts its natural latent geometry, since the canonical metric on persistence diagrams, the bottleneck distance, is non-Euclidean [6].

Variants of the ECT. When studying simplicial complexes arising from grayscale image data, one could imagine other relevant simplicial complexes to which one could apply the standard ECT. Examples include thresholding pixel values in the image and building restricted two-dimensional complexes or using the pixel values to build a three-dimensional simplicial complex. We found these approaches to give unsatisfactory performance on our tumor dataset, although they may be viable approaches for other applications.

3 Applications

3.1 Classification of MNIST Digit Images

To understand the descriptive power of the WECT representation of image data, we first explore its ability to classify images from the ubiquitous MNIST handwritten digit dataset [30]. We use a small subset of 1000 grayscale images, evenly distributed over 10 digits . As a baseline, we treat each image as a vector in

and classify them using Support Vector Machines (SVM) with a linear kernel. Next, we produce WECT representations of all digit images. In this experiment, we discretize

into a grid (i.e., 25 Euler curve directions, 50 points along each curve domain). We also smooth the Euler curves to improve robustness using a Gaussian kernel with window size (these particular parameters were chosen in a tuning step, but we found that the results are generally insensitive to the parameter choice). We then considered each WECT representation as a vector in and classified using SVM with a linear kernel. We also produced smoothed ECT representations with similar parameters and ran an SVM classification. The ten-fold cross-validated classification rates from these experiments are displayed in Table 1.

Representation Classification Rate
Image 87.84 1.42 %
ECT 89.88 1.66 %
WECT 94.68 1.57 %
Table 1: SVM ten-fold classification performance of vectorized image, ECT and WECT representations for the MNIST digit data.

The classification results show that the WECT representation of the digit images is adept at encoding and distinguishing shape features, while having a similar memory footprint to the original image representation. It also outperforms the classification using smoothed ECT representations. We stress that this classification result is, of course, not meant to be competitive with those obtained by deep learning methods. Rather, this simple experiment suggests that the WECT representation produces an interesting shape summary for this type of image data, which is computationally efficient and can be trivially incorporated into various statistical models.

To get a more detailed qualitative picture of the differences between the raw image, ECT and WECT representations of the MNIST image data, we also computed t-SNE embeddings [31] for each representation; see Figure 4. While class separation is apparent in all three embeddings, it is immediately evident that the embeddings of the ECTs and WECTs are much more distinctly clustered. On the other hand, one can easily see how classification errors arise in the ECT embedding. We believe that these errors occur because the ECT is more sensitive to topological differences between digits, while the WECT smooths these differences using weight data.

Figure 4: T-SNE embeddings of the MNIST image dataset. Left: Raw image vectors. Middle: Smoothed ECTs. Right: Smoothed WECTs.

3.2 Rigid and Scale Registration

Figure 5: Top: MNIST digit images randomly rotated and translated. Bottom: The same digits after rigid registration to a template digit via the process described in Section 3.2.

One benefit of the simplicial complex representation of image data is that registering over scale and rigid transformations (translations and rotations) becomes trivial. Once a pair of images have been converted to weighted simplicial complexes and , they can be immediately registered with respect to translation and scaling by centering each complex at the origin, and normalizing (treating vertex locations as vectors). To register over rotations, one then computes weighted Euler characteristic transforms and solves the optimization problem

(6)

where the rotation group acts on a WECT by precomposition in the -coordinate. As was noted above, the distance is numerically trivial to compute for finite approximations of WECTs. Thus, the optimization problem in Equation (6) can be solved quickly by an exhaustive search over cyclic permutations of the WECT matrix. The minimizing rotation can then be used to register to with respect to rotations—see Figure 5.

3.3 Analysis of GBM Tumor Data

Glioblastoma Multiforme (GBM) is the most common malignant brain tumor in adults [26]; for most patients, the prognosis is very poor: less than of individuals survive longer than five years and the median survival time is approximately 12 months [42, 34, 33]. GBM is a morphologically heterogeneous disease. GBM tumors exhibit complex structure in terms of their overall shape as well as internal makeup. Often, dead cells are present inside the tumor and increased blood flow near the boundary of the tumor [32]. These features result in various pixel value patterns of GBM tumor images. Thus, characterization of both the shape and texture of GBM tumors, based on medical imaging data, is important for disease prognosis as well as survival prediction. While previous studies have considered these two features separately [5, 39] in the analysis, our approach is to analyze them jointly under a unified representation.

In this study, we use T1-weighted post contrast magnetic resonance images (MRIs) of GBM tumors from 63 subjects. For our analysis, we select a single axial slice with largest tumor area from each 3D image (the same approach was taken in [5, 39]), and summarize the tumors’ shapes and textures via the WECT. For details on the image pre-processing steps that were used prior to our analysis, see [39].

We use a simple distance-based clustering approach to analyze the tumor data. First, each of the 63 tumor images is converted into a weighted simplicial complex using Algorithm 1. To isolate the shape and weight information, all simplicial complexes are centered at the origin and normalized so that the vertex farthest from the origin is at distance . The weights of the simplicial complexes are then normalized to have maximum weight one; this was done to account for the varying pixel value distributions of the MRIs for each subject. Next, each weighted simplicial complex is given a smoothed WECT representation. Specifically, for each tumor image, we use 25 directions and 50 points along the domain of the Euler curve for each direction. The Euler curves were smoothed using a Gaussian kernel with a smoothing window of ten. Next, the distance between each pair of smoothed WECT representations was computed with registration of the tumor images over rotations (see Section 3.2

). We applied hierarchical clustering with Ward linkage to the

distance matrix, which first suggested three natural clusters. The clusterwise mean and median survival times (in months) are reported in Table 2.

Mean 6.7 12.9 20.2
Med. 6.2 9.6 15.2
Table 2: Clusterwise mean and median survival.
Figure 6: Weighted simplicial complex representations of tumors from the low survival time cluster in Table 2.

These statistics suggest that the clusters are roughly characterized as low, medium and high survival. Figure 6 shows tumors from the low survival cluster; they are visually irregular in shape and intensity distribution, which explains their presence as a distinct cluster. To explore the data in more depth, we consider the clustering dendrogram with this cluster of tumors removed. Figure 7 shows this dendrogram on the remaining 58 tumors, with six highlighted clusters; mean and median survival times for patients in these clusters are shown in Table 3. Inspecting the tumors in these clusters, one can observe various common qualitative shape and intensity features. For example, the tumors in the blue and cyan clusters both tend to have intensity patterns with a ring-like structure near the boundary. The tumors in the blue cluster tend to have higher irregularity in shape and/or intensity patterns, see Figure 8.

Figure 7: Clustering dendrogram for the tumor dataset with low survival cluster tumors removed.
Blue Cyan Red Magenta Yellow Green
Mean 18.1 28.0 17.9 19.4 5.0 12.6
Med. 14.9 22.3 14.3 20.4 4.5 10.7
Table 3: Clusterwise mean and median survival for Figure 7.
Figure 8: Samples of weighted simplicial complex representations of tumors from cyan (top) and blue (bottom) clusters of Figure 7.

4 Future Work

Our work suggests several directions for future research. Driven by the qualitative distance-based clustering results presented here, we next plan to incorporate WECT representations into more sophisticated statistical models for tumor survival prediction. The WECT representation is flexible in the sense that it provides a summary of any weighted simplicial complex. We plan to apply this type of analysis to other shape data, such as weighted simplicial complexes representing annotated molecule shapes. On the theoretical side, there are several interesting questions left open. Principally, one could attempt to strengthen Theorem 2 on injectivity of the WECT in several ways. In its current form, it is mainly a theoretical result and an implementation of an inversion algorithm would be desirable. A practical version of such a construction would only require information about weighted Euler curve measurements in finitely many directions, along the lines of results in [16] on the ECT. It would also be interesting to have a quantitative version of the injectivity theorem; if WECTs of and are close in distance, does this imply that and are close in some resonable metric, such as Wasserstein distance (treating a normalization of

as a probability measure supported on

)?

Acknowledgments: We thank Arvind Rao for sharing the GBM dataset. SK was partially supported by NSF DMS-1613054, NSF CCF-1740761, NSF CCF-1839252 and NIH R37-CA214955.

References

  • [1] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier (2017) Persistence images: a stable vector representation of persistent homology. The Journal of Machine Learning Research 18 (1), pp. 218–252. Cited by: §2.7.
  • [2] Y. Baryshnikov, R. Ghrist, and D. Lipsky (2011) Inversion of Euler integral transforms with applications to sensor data. Inverse problems 27 (12), pp. 124001. Cited by: §2.6.
  • [3] P. Bendich, J. S. Marron, E. Miller, A. Pieloch, and S. Skwerer (2016) Persistent homology analysis of brain artery trees. The annals of applied statistics 10 (1), pp. 198. Cited by: §1, §2.7.
  • [4] L. M. Betthauser (2018) Topological reconstruction of grayscale images. Ph.D. Thesis, University of Florida. Cited by: §1.1, §1, §2.2, §2.6, §2.6, Remark 2.
  • [5] K. Bharath, S. Kurtek, A. Rao, and V. Baladandayuthapani (2018) Radiologic image-based statistical shape analysis of brain tumours. Journal of the Royal Statistical Society, Series C 67 (5), pp. 1357–1378. Cited by: §3.3, §3.3.
  • [6] P. Bubenik and A. Wagner (2019) Embeddings of persistence diagrams into Hilbert spaces. arXiv preprint arXiv:1905.05604. Cited by: §2.7.
  • [7] P. Bubenik (2015) Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research 16 (1), pp. 77–102. Cited by: §2.7.
  • [8] Z. Cang and G. Wei (2018) Persistent cohomology for data with multicomponent heterogeneous information. arXiv preprint arXiv:1807.11120. Cited by: §2.7.
  • [9] G. Carlsson and A. Zomorodian (2009) The theory of multidimensional persistence. Discrete & Computational Geometry 42 (1), pp. 71–93. Cited by: §2.7.
  • [10] G. Carlsson (2014)

    Topological pattern recognition for point cloud data

    .
    Acta Numerica 23, pp. 289–368. Cited by: §1, §2.2.
  • [11] C. J. Carstens and K. J. Horadam (2013) Persistent homology of collaboration networks. Mathematical problems in engineering 2013. Cited by: §2.3.
  • [12] M. J. Catanzaro, J. Curry, B. T. Fasy, J. Lazovskis, G. Malen, H. Riess, B. Wang, and M. Zabka (2019) Moduli spaces of morse functions for persistence. arXiv preprint arXiv:1909.10623. Cited by: §2.6.
  • [13] M. K. Chung, P. Bubenik, and P. T. Kim (2009) Persistence diagrams of cortical surface data. In International Conference on Information Processing in Medical Imaging, pp. 386–397. Cited by: §1.
  • [14] L. Crawford, A. Monod, A. X. Chen, S. Mukherjee, and R. Rabadán (2019) Predicting clinical outcomes in glioblastoma: an application of topological and functional data analysis. Journal of the American Statistical Association, pp. 1–12. Cited by: §1, §2.2, §2.4, §2.6, §2.7.
  • [15] J. Curry, R. Ghrist, and M. Robinson (2012) Euler calculus with applications to signals and sensing. In Proceedings of Symposia in Applied Mathematics, Vol. 70, pp. 75–146. Cited by: §2.6.
  • [16] J. Curry, S. Mukherjee, and K. Turner (2018) How many directions determine a shape and other sufficiency results for two topological transforms. arXiv preprint arXiv:1805.09782. Cited by: §1, §2.6, §2.6, §2.6, §4.
  • [17] J. Curry (2018) The fiber of the persistence map for functions on the interval. Journal of Applied and Computational Topology 2 (3-4), pp. 301–321. Cited by: §2.6.
  • [18] R. J. M. Dawson (1990) Homology of weighted simplicial complexes. Cahiers de Topologie et Géométrie Différentielle Catégoriques 31 (3), pp. 229–243. Cited by: §2.3.
  • [19] H. Edelsbrunner and J. Harer (2010) Computational topology: an introduction. American Mathematical Soc.. Cited by: §1, §2.2.
  • [20] B. T. Fasy, S. Micka, D. L. Millman, A. Schenfisch, and L. Williams (2018) Challenges in reconstructing shapes from Euler characteristic curves. arXiv preprint arXiv:1811.11337. Cited by: §1, §2.6.
  • [21] B. T. Fasy, S. Micka, D. L. Millman, A. Schenfisch, and L. Williams (2019) Persistence diagrams for efficient simplicial complex reconstruction. arXiv preprint arXiv:1912.12759. Cited by: §1, §2.6.
  • [22] P. Frosini and C. Landi (2011) Uniqueness of models in persistent homology: the case of curves. Inverse problems 27 (12), pp. 124005. Cited by: §2.6.
  • [23] P. Frosini, M. Mulazzani, et al. (1999) Size homotopy groups for computation of natural size distances. Bulletin of the Belgian Mathematical Society-Simon Stevin 6 (3), pp. 455–464. Cited by: §2.7.
  • [24] R. Ghrist, R. Levanger, and H. Mai (2018) Persistent homology and Euler integral transforms. Journal of Applied and Computational Topology 2 (1-2), pp. 55–60. Cited by: §1, §2.6, §2.6, §2.6, Theorem 1.
  • [25] S. Helgason and S. Helgason (1980) The radon transform. Vol. 2, Springer. Cited by: §2.6.
  • [26] E. C. Holland (2000) Glioblastoma multiforme: the terminator. In Proceedings of the National Academy of Sciences, Vol. 97, pp. 6242–6244. Cited by: §3.3.
  • [27] Q. Jiang, S. Kurtek, and T. Needham Weighted Euler curve transform Github repository. Note: https://github.com/trneedham/Weighted-Euler-Curve-Transform Cited by: §1.1.
  • [28] M. Kashiwara (1985) Index theorem for constructible sheaves. Astérisque 130, pp. 193–209. Cited by: Remark 3.
  • [29] V. Kovacev-Nikolic, P. Bubenik, D. Nikolić, and G. Heo (2016) Using persistent homology and dynamical distances to analyze protein binding. Statistical applications in genetics and molecular biology 15 (1), pp. 19–38. Cited by: §1.
  • [30] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §3.1.
  • [31] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §3.1.
  • [32] A. Marusyk, V. Almendro, and K. Polyak (2012) Intra-tumour heterogeneity: a looking glass for cancer?. Nature Reviews Cancer 12 (5), pp. 323–334. Cited by: §3.3.
  • [33] R. McLendon, A. Friedman, D. Bigner, E. G. Van Meir, D. J. Brat, G. M. Mastrogianakis, J. J. Olson, T. Mikkelsen, N. Lehman, K. Aldape, et al. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455 (7216), pp. 1061–1068. Cited by: §3.3.
  • [34] M. G. McNamara, S. Sahebjam, and W. P. Mason (2013) Emerging biomarkers in glioblastoma. Cancers 5 (3), pp. 1103–1119. Cited by: §3.3.
  • [35] S. Oudot and E. Solomon (2017) Barcode embeddings for metric graphs. arXiv preprint arXiv:1712.03630. Cited by: §2.6.
  • [36] S. Oudot and E. Solomon (2019) Inverse problems in topological persistence: a survey. In Abel Symposia, Cited by: §2.6.
  • [37] V. Patrangenaru, P. Bubenik, R. L. Paige, and D. Osborne (2018) Challenges in topological object data analysis. Sankhya A, pp. 1–28. Cited by: §1.
  • [38] S. Ren, C. Wu, J. Wu, et al. (2018) Weighted persistent homology. Rocky Mountain Journal of Mathematics 48 (8), pp. 2661–2687. Cited by: §2.3.
  • [39] A. Saha, S. Banerjee, S. Kurtek, S. Narang, J. Lee, G. Rao, J. Martinez, K. Bharath, A.U.K. Rao, and V. Baladandayuthapani (2016) DEMARCATE: density-based magnetic resonance image clustering for assessing tumor heterogeneity in cancer. NeuroImage: Clinical 12, pp. 132 – 143. Cited by: §3.3, §3.3.
  • [40] P. Schapira (1995) Tomography of constructible functions. In International Symposium on Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes, pp. 427–435. Cited by: §2.6.
  • [41] K. Turner, S. Mukherjee, and D. M. Boyer (2014) Persistent homology transform for modeling shapes and surfaces. Information and Inference: A Journal of the IMA 3 (4), pp. 310–344. Cited by: §1.1, §1, §2.2, §2.6, §2.6.
  • [42] B. Tutt (2011) Glioblastoma cure remains elusive despite treatment advances. OncoLog 56 (3), pp. 1–8. Cited by: §3.3.