Jaccard Filtration and Stable Paths in the Mapper

06/19/2019 ∙ by Dustin L. Arendt, et al. ∙ Washington State University PNNL 0

The contributions of this paper are two-fold. We define a new filtration called the cover filtration built from a single cover based on a generalized Jaccard distance. We provide stability results for the cover filtration and show how the construction is equivalent to the Cech filtration under certain settings. We then develop a language and theory for stable paths within this filtration, inspired by ideas of persistent homology. We demonstrate how the filtration and paths can be applied to a variety of applications in which defining a metric is not obvious but a cover is readily available. We demonstrate the usefulness of this construction by employing it in the context of recommendation systems and explainable machine learning. We demonstrate a new perspective for modeling recommendation system data sets that does not require manufacturing a bespoke metric. This extends work on graph-based recommendation systems, allowing a topological perspective. For an explicit example, we look at a movies data set and we find the stable paths identified in our framework represent a sequence of movies constituting a gentle transition and ordering from one genre to another. For explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations or observations. Our framework provides an alternative way of building a filtration from a single mapper that is then used to explore stable paths. As a direct illustration, we build a mapper from a supervised machine learning model trained on the FashionMNIST data set. We show that the stable paths in the cover filtration provide improved explanations of relationships between subpopulations of images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 17

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Motivation

The need to rigorously seed a solution with a notion of stability in topological data analysis (TDA) has been addressed primarily using topological persistence [5, 15]. Persistence arises when we work with a sequence of objects built on a data set, a filtration

, rather than with a single object. One line of focus of this work has been on estimating the homology of the data set. This typically manifests itself as examining the persistent homology represented as a diagram or barcode, with interpretations of zeroth and first homology as capturing significant clusters and holes, respectively

[1, 13, 12, 36]. In practice it is not always clear how to interpret higher dimensional homology (even holes might not make obvious sense in certain cases). A growing focus is to use persistence diagrams as a form of feature engineering to help compare different data sets rather than interpret individual homology groups [2, 9, 33].

The implicit assumption in most such TDA applications is that the data is endowed with a natural metric, e.g., points exist in a high-dimensional space or pairwise distances are available. In certain applications, it is also not clear how one could assign a meaningful metric. For example, memberships of people in groups of interest is captured simply as sets specifying who belongs in each group. An instance of such data is that of recommendation systems, e.g., as used in Netflix to recommend movies to the customer. Graph based recommendation systems have been an area of recent research. Usually these systems are modeled as a bipartite graph with one set of nodes representing recommendees and the other representing recommendations. In practice, these systems are augmented in bespoke ways to accommodate whichever type of data is available. It is highly desirable to analyze the structure without prescribing some ill-fit or incomplete metric to the data.

Another TDA approach for structure discovery and visualization of high-dimensional data is based on a construction called

Mapper [31]. Defined as the nerve of a refined pullback cover of the data, Mapper has found increasing use in diverse applications in the past several years [23]. Attention has recently focused on interpreting parts of the -skeleton of the Mapper, which is a simplicial complex, as significant features of the data. Paths, flares, and cycles have been investigated in this context [20, 25, 32]. The framework of persistence has been applied to this construction to define a multi-scale Mapper, which permits one to derive results on stability of such features [11]. At the same time, the associated computational framework remains unwieldy and still most applications base their interpretations on a single Mapper object.

Note that the Mapper construction works with covers. The default approach is to start with overlapping hypercubes that cover a parameter space, which is usually a subset of for some dimension , and consider the pullback of this cover to the space of data. In recommendation systems, the cover is just a collection of abstract sets providing membership info. Could we define a topological construction on such abstract covers that still reveals the topology of the dataset?

We could study paths in this construction, but as the topological constructions are noisy, we would want to define a notion of stability for such paths. With this goal in mind, could we define a filtration from the abstract cover? But unlike in the setting of, e.g., multiscale mapper [11], we do not have a sequence of covers (called a tower of covers)—we want to work with a single cover. How do we define a filtration on a single abstract cover? Could we prove stability results for such a filtration? Finally, could we demonstrate the usefulness of our construction on real data?

1.1 Our Contributions

We introduce a new type of filtration defined on a single abstract cover. Termed cover filtration, our construction uses Jaccard distances between elements of the cover. We generalize the Jaccard distance between two elements to those of multiple elements in the cover, and define a filtration on a single cover using the generalized Jaccard distance as the filtration index. Working with a bottleneck distance on covers, we show a stability result on the cover filtration—the cover filtrations of two covers are interleaved, where is a bound on the bottleneck distance between the covers and is the cardinality of the smallest element in either cover (see Theorem 3.5). We conjecture that in Euclidean space, the cover filtration is isomorphic to the standard Čech filtration built on the data set. We prove the conjecture holds in dimension and independently that the Vietoris-Rips filtration completely determines the cover filtration in arbitrary dimensions.

This filtration is quite general, and enables the computation of persistent homology for data sets without requiring strong assumptions or defining ill-fit metrics. With real life applications in mind, we study paths in our construction. Paths provide intuitive explanations of the relationships between the objects that the terminal vertices represent. Our perspective of path analysis is that shortest may not be more descriptive—see Figure 1 for an illustration. Instead, we define a notion of stability of paths in the cover filtration. Under this notion, a stable path is analogous to a highly persistent feature as identified by persistent homology.

Figure 1: A cover with elements, and the corresponding nerve (left column). The cyan and green vertices are connected by a single edge. But this edge is generated by a single point in the intersection of the cyan and green cover elements. Removing this point from the data set gives the cover and nerve shown in the right column. The path from cyan to green node now has six edges.

We demonstrate the utility of stable paths in cover filtrations on two real life applications: a problem in movie recommendation system and Mapper. We first show how recommendation systems can be modeled using the cover filtration, and then show how stable paths within this filtration suggest a sequence of movies that represent a “smooth” transition from one genre to another (Section 6.1). We then define an extension of the traditional Mapper [31] termed the Jaccard Mapper Filtration, and show how stable paths within this filtration can provide valuable explanations of populations in the Mapper, focusing on the case of explainable machine learning (Section 6.2).

1.2 Related Work

Cavanna and Sheehy [8] developed theory for a cover filtration, built from a cover of a filtered simplicial complex. But we work from more general covers of arbitrary spaces.

Our work is inspired by similar goals as those of Dey et al. [11] and Carrière et al. [6], who addressed the question of stability in the Mapper construction. Our goal is to provide some consistency, and thus interpretability, to the Mapper. We incorporate ideas of persistence in a different manner into our construction using a single cover, which considerably reduces the effort of generating results.

The multi-scale Mapper defined by Dey et al. [11] builds a filtration on the Mapper by varying the parameters of a cover. This construction yields nice stability properties, but is unwieldy in practice and difficult to interpret. Carrière et al. develop ways based on extended persistence to automatically select a hypercube cover that best captures the topology of the data [6]. This approach constructs one final Mapper that is easy to interpret, but is restricted to the use of hypercube covers, which is just one option of myriad potential covering schemes.

Krishnamoorthy and coworkers have developed methods for tracking populations within the Mapper by identifying interesting paths [18] and interesting flares [19]. Interesting paths maximize an interestingness score, and are manifested in the Mapper as long paths that track particular populations that show trending behavior. Flares capture subpopulations that diverge, i.e., show branching behavior. In our context, we are interested in shorter paths, under the assumption that they provide the most succinct explanations for relationships between subpopulations.

Our work is similar to that of Parthasarathy et al. [27]

in that they use the Jaccard Index of an observed graph to estimate the geodesic distance of the underlying graph. We take an approach more akin to persistence and make far fewer assumptions about properties of the underlying data. As a result, we are unable to make rigorous estimates of distances and instead provide many possible representative paths.

S-paths defined by Purvine et al. [29] are similar to stable paths when we realize that covers can be modeled as hypergraphs, and vice versa. Stable paths incorporate the size of each cover elements (or hyperedges), normalizing the weights by relative size. This perspective allows us to compare different parts of the resulting structure which may have wildly difference sizes of covers. In this context, a large overlap of small elements is considered more meaningful than a proportionally small intersection of large elements.

In Section 6.1, we show how the cover filtration and stable paths can be applied in the context of recommendation systems. Our viewpoint on recommendation systems is similar to work of graph-based recommendation systems. This is an active area of research and we believe our new perspective of interpreting such systems as covers and filtrations will yield useful tools for advancing the field. The general approach of graph-based recommendation systems is to model the data as a bipartite graph, with one set of nodes representing the recommendation items and the other set representing the recommendees. We can interpret a bipartite graph as a cover, either with elements being the recommendees covering the items, or elements being the items covering the recommendees.

Organization

In Section 2, we define the cover filtration, and provide the stability and equivalence results. Section 5 develops the theory of stable paths. Section 6 demonstrates the applicability of both the cover filtration and the stable paths to recommendation systems and the Mapper.

2 Cover Filtrations

We introduce the notions of distance on covers required to construct our filtrations and then provide the general definition of the cover filtration. We start with the standard Jaccard distance between two sets, and generalize it to an arbitrary collection of subsets of a cover.

Definition 2.1 (Jaccard Distance [17]).

The Jaccard distance between two sets is

This distance is bounded on , i.e., two sets have Jaccard distance when they are equal and distance when they do not intersect. The Jaccard distance is a metric on the collection of all finite sets [22].

We extend the Jaccard distance from an operator on a pair of elements to an operator on a set of elements.

Definition 2.2 (Generalized Jaccard Distance).

Define the generalized Jaccard distanceof a collection of sets as

The definition could be made for infinite collections and for infinite sets, but we will work with finite collections of finite sets in this paper. We make use of this generalized distance to associate birth times to simplices in a nerve. Given a cover, we define the cover filtration as the filtration induced from sublevel sets of the generalized Jaccard distance function. In other words, consider a cover of the space and the nerve of this cover. For each simplex in the nerve, we assign as birth time the value of its Jaccard distance. This filtration captures information about similarity of cover elements, and the overall structure of the cover.

Definition 2.3 (Nerve).

A nerve of a cover is an abstract simplicial complex defined such that each subset , i.e., with , defines a simplex if . In this construction, each cover element defines a vertex.

Definition 2.4 (Jaccard Nerve).

The Jaccard nerve of a cover , denoted , is defined as the nerve of with each simplex assigned their generalized Jaccard distance as weight:

Note that by definition for every simplex . We will use when the cover is evident from context. This can be thought of as a weighted nerve, but the weighting scheme satisfies the conditions of a filtration.

Theorem 2.5.

The Jaccard nerve of a cover is a filtered simplicial complex.

Proof.

This proof makes use of standard set theory results. Let be an arbitrary cover of some set and let be its Jaccard nerve. We consider as a filtration by assigning as the birth time of simplex its weight . To show this is indeed a filtration, we focus on a single simplex and a face to show that the face always appears in the filtration before the simplex.

Suppose is generated from cover elements over some index set . Let a face be generated by cover elements indexed by a subset . The birth time of is

and the birth time of is

Clearly, with , we have that and . It follows then that . With denoting the subcomplex that includes all simplices in with birth time at most , for any with , we have . Hence is a monotonic filtration. ∎

Following this result, we refer to the construction as the Jaccard filtration.

We could study an adaptation of cover filtration to an analog of the Vietoris-Rips (VR) complex by building a weighted clique rank filtration from the -skeleton of the cover filtration [28]. This adaptation drastically reduces the number of intersection and union checks required for the construction. The weight rank clique filtration is a way of generating a flag filtration from a weighted graph [28]. We can apply this technique to build a VR analog of the cover filtration.

Note on Complexity

The complexity of constructing the cover filtration is by and large inherited directly from the computational complexity of the nerve. Given a cover , the nerve could have at most simplices and dimension at most [26]. These bounds are equivalent to the corresponding worst case bounds for VR and Čech complexes.

The work involved for each simplex in constructing includes computing the volume of intersection and volume of union of the elements in the simplex. The complexity of union and intersection operations is largely dependent on the type of data being used. Let and be the costs of computing the union and intersection, respectively, of a set of cover elements . In the worst case, we have to do operations per simplex, leading to an overall worst case computational complexity of . For instance, if we assume that a hashing-based dictionary could be produced for each set in , both and will be at most linear in [4].

3 Stability

We consider notions of stability in the cover filtration with respect to changes in the cover. We first modify the standard edit distance to define a bottleneck distance on the space of covers of a finite set that have the same cardinality. Under this setting, we show that the cover filtration is interleaved with respect to this distance.

Definition 3.1 (Bottleneck metric on covers).

Let and be two finite covers of finite set with same cardinality, and let be the set of all possible matchings between them. Let denote the symmetric difference. Then the bottleneck distance between two covers is defined as

We first verify that is indeed a metric.

Proposition 3.2.

Let , and be finite covers of a finite set with equal cardinalities, and let denote the bottleneck distance between any pair of these covers as specified in Definition 3.1. Then is a metric.

Proof.

We make the following observations.

  1. , as matching each set in the cover to itself gives a distance of and the smallest cardinality of a symmetric distance is , hence this matching gives the minimum possible symmetric difference. Likewise, if , there is a matching where the symmetric difference between each matched pair has cardinality . Hence the sets are equal for each pair in the matching, and hence .

  2. , since the smallest cardinality of a symmetric difference is .

  3. .

  4. Let and . Then the number of points with different inclusion states between and its matched set is at most , and the number of points with different inclusion states between and its matched set is at most . Then the number of points with different inclusion states between and is at most . Since this result holds for all , there is a matching between and with a maximum symmetric difference of . Hence .

Hence is a metric with respect to covers of equal cardinality. ∎

We now present two somewhat technical lemmas, which we subsequently employ in the proof of the main stability result.

Lemma 3.3.

Let be real numbers with , , , and . Then we have that when . In words, if we have a total weight that we can distribute between decreasing the numerator of a proper fraction and increasing its denominator, the greatest decrease will come from decreasing the numerator by the entire weight.

Proof.

We will show that, with the given conditions, .

Since , , so and . Also, since is proper, , so . Thus . Since , we get that , which in turn shows that , as desired. ∎

Corollary 3.4.

Similar to Lemma 3.3, the greatest increase possible in such a scenario comes from assigning the negative of the total weight to the numerator.

Proof.

Let be numbers as specified. Then , and , and , so fulfill the hypotheses of Lemma 3.3. Then we get that . ∎

We now present a theorem that provides basic stability guarantees for the constructed filtration, assuming that each element is not too small.

Theorem 3.5.

Suppose that and are two covers of with such that , a positive integer. Given , and are interleaved filtrations.

Proof.

Given two covers and , we use the notation that and , for generic indices and , are paired in a matching that minimizes the bottleneck distance between the two covers. We assume that the bottleneck distance is , a positive integer.

We consider the following question: what is the largest change in generalized Jaccard distance possible between a collection and , where index sets and are paired elementwise in a matching? To answer this question, we keep fixed and consider how large a difference in generalized Jaccard distance we can achieve by altering the inclusion status of up to points in each . That is, we want to maximize change in . To get the maximum increase in , we must increase the numerator and/or decrease the denominator. Likewise, we must decrease the numerator and/or increase the denominator to get maximum decrease.

First, we note that changing the inclusion status of a point in cover elements cannot both increase the size of the intersection and decrease the size of the union. Similarly, it cannot decrease the size of the intersection and increase the size of the union. We also note that altering the inclusion of a point in various cover elements can at most change the size of the intersection by 1 and the size of the union by 1.

The greatest possible change would occur if it were possible to select points for each element of the cover such that changing the status of each selected point with respect to the cover element increased or decreased the size of the intersection by 1, or did the opposite to the size of the union.

Lemma 3.3 and Corollary 3.4 imply that the maximum possible change in those situations will be achieved when all weight is directed toward increasing or decreasing the size of the intersection, since must be between 0 and 1. As we want to bound the possible change in the Jaccard distance, it will suffice to use the observation that

to obtain bounds on the change of Jaccard distance between covers with maximum bottleneck distance of and with as the cover cardinality.

Then we have

Similarly, since

Hence , giving that and are interleaved. ∎

Remark 3.6.

Consider the case when . Assume without loss of generality that . Then there is a vertex that is not present in . Hence and cannot be interleaved in the current setting. We need to first generalize matchings and the bottleneck distance to allow covers with unequal cardinalities.

4 Equivalence

To situate the cover filtration, we wish to show that it is isomorphic to the Čech and VR filtrations under certain conditions. We conjecture that the Čech filtration on a finite set of points, i.e., the nerve of balls with radius around each point and over a sequence of , and the Jaccard Nerve constructed from the terminal cover of the Čech filtration are isomorphic. More precisely, the insertion order of simplices is equivalent between the two cases, and there exists a continuous bijection between insertion times of the Jaccard Nerve and insertion times of the Čech filtration. We prove this result for , i.e., when is drawn from the real line. We also provide experimental evidence for the 1-skeleton equivalence, and prove one direction of this equivalence (that the VR filtration completely determines the cover filtration).

Let represent the cover of by balls of radius centered on points in . The Čech complex is defined as the nerve of this cover. The Čech filtration is the sequence of simplicial complexes for all values of .

Conjecture 4.1 (Čech equivalence).

Given a finite data set and some radius the Čech filtration constructed from is isomorphic to the the cover filtration on constructed from .

A comprehensive proof for arbitrary order of intersections and arbitrary dimension is incomplete. We provide the proof for the case of -skeleton in -dimension and provide a mapping from the Čech filtration to cover filtration for the -skeleton in arbitrary dimension.

Proof.

Let be a set of points in . For some subset , let be the birth radius of the simplex defined by the subset of points. In , this can be computed as

The Jaccard distance given some large radius for the same set is defined as

We find the equivalence as

and

We now address the case of -skeleton of the Čech and cover filtrations in arbitrary dimension. It is clear that if the -skeletons are isomorphic, then the cover filtration is isomorphic to the Vietoris-Rips filtration. We prove one direction of this isomorphism.

Lemma 4.2.

The Vietoris-Rips filtration completely determines the cover filtration in arbitrary dimensions.

Proof.

The intersection of two hyperspheres was derived by Li [21]. The volume of intersection of two hyperspheres of equal radius in with centers distance apart is defined as

where is the gamma function and is the regularized incomplete beta function:

We can reduce this equation to

The volume of an -sphere of radius is

and so the volume of union of two -spheres is

We then compute the Jaccard distance of two spheres in and radius with Euclidean distance apart as

This equation provides a mapping from the birth time of the edge in the Čech filtration to the birth time of the edge in the cover filtration. Once an and are chosen, the equation readily reduces, producing the birth times of a simplex in the cover filtration. ∎

This result suggests that one can derive the cover filtration from the Vietoris-Rips filtration.

We finish by detailing experimental results suggesting that the 1-skeleton of the Jaccard filtration and the 1-skeleton the Čech filtration are isomorphic (i.e. the Vietoris-Rips filtration). To estimate the area of intersection of 1-spheres, we use Monte Carlo integration with uniform sampling. The first plot in Figure 2 shows the 50 landmark points along with 20,000 points uniformly sampled around the landmarks. The middle plot shows the persistence diagrams of dimension 0 and 1 for the Vietoris-Rips filtration on the landmarks. Finally, we show an approximated Jaccard filtration on the landmarks, using the balls with radii 0.5 as the covers. We approximate the Jaccard filtration similar to how the Vietoris-Rips approximates Čech filtration, i.e., by only computing the 1-skeleton of the nerve, and including any higher order simplices for which all faces are already contained in the filtration, taking the maximum birth time of all faces. We note that the two persistence diagrams have only minor differences (only in dimension 1).

Figure 2: Persistence diagrams for the Vietrois-Rips filtration and the approximate Jaccard filtration of a set of uniformly sampled points in the plane.

5 Stable Paths

We develop a theory of stable paths within a cover filtration. We provide an algorithm for computing a most stable path from one vertex to another. Note that a most stable path might not be a shortest path in terms of number of edges. Conversely, a shortest path might not be highly stable. Since the two objectives are at odds with each other, we provide an algorithm to identify a family of shortest paths as we vary the stability level, akin to computing persistent homology.

We were studying shortest paths in a Mapper constructed on a machine learning model as ways to illustrate the relations between the data as identified by the model. In this context, shortest paths found could have low Jaccard distance, and thus could be considered noise. This motivated our desire to find stable paths, as they would intuitively be most representative of the data set and stable with respect to changing parameters in Mapper or changing data.

Definition 5.1 (-Stable Path).

Given a Jaccard distance , a path is defined to be -stable if

It follows that a -stable path is also -stable for any . Also, a -stable path is more stable than a -stable path when . In this case, we have a higher confidence that the edges in do exist, and are not due to noise, than the edges in . We define most stable paths between a pair of vertices as follows.

Definition 5.2 (Most Stable Path).

Given a pair of vertices and , a most stable - path is a -stable path between and for the smallest value of . If there are multiple - paths at the same minimum value, a shortest path among them is defined as a most stable path.

The problem of finding the most stable - path can be solved as a minimax path problem on undirected graph, which can solved efficiently using, e.g., range minimum queries [10].

We are then left with two paths between vertices and , the shortest and the most stable. It should be clear that the shortest path is not necessarily stable and the stable path is not necessarily short. As these two notions, stable and short, are at odds with each other, we are interested in computing the entire Pareto frontier between the short and stable path. We present an algorithm to identify the Pareto frontier in Figure 3, and a visualization of the output from this algorithm in Figure 4.

Input: -skeleton  of cover filtration and vertices 
set LIST    // stores  pairs
while  are connected in 
    compute shortest path  between  and 
    find 
    if LIST has no pair  with 
       add  to LIST
    else if  for LIST with 
       replace  with  in LIST
    remove all edges  from  with 
Return: LIST
Figure 3: Algorithm to identify the Pareto frontier between shortest and most stable paths.

In this algorithm, we repeatedly compute the shortest path, while essentially sweeping over the Jaccard Distance. This process results in a Pareto frontier balancing the shortest paths with the stability of those paths.

The blue points in Figure 4 are on the Pareto frontier, while the orange points are the pairs that get replaced from the LIST in the course of the algorithm. We then visualize the paths on the Pareto frontier in Figure 5. Continuing our analogy to persistence, the path corresponding to a point on the Pareto frontier which sees a steep rise to the left is considered highly persistent, e.g., the path with length on the frontier.

Figure 4:

Pareto frontier between length of path and stability of path. The graph is a triangulation of the plane and weights are randomly inverted exponential distribution, i.e. if

is taken from an exponential distribution, the weights are
Figure 5: Visualization of each path on the Pareto frontier shown in Figure 4.

6 Applications

We apply the cover filtration and stable paths to two applications, recommendation systems and Mapper. We first show how recommendation systems can be modeled using the cover filtration and then show how stable paths within this filtration can answer the question What movies should I show my friend first, to wean them into my favorite (but potentially weird) movie? We then define an extension of the traditional Mapper [31, 7] called the Jaccard Mapper Filtration, and show how stable paths within this filtration can provide valuable explanations of populations in the Mapper. As a direct illustration, we focus on the case of explainable machine learning, where the Mapper is constructed with a supervised machine learning model as the filter function, and address the question What can we learn about the model?

The applications of cover filtration and stable paths are not limited to these two situations. Another possibility not explored is for sensor networks. Sensor coverage areas are often not uniform balls, and the cover filtration is aptly suited for developing a filtration. In the context of communication networks, stable paths could be interpreted as reliable routes. Another direct application could be in finding driving directions that take not only short, but also “easy” routes.

6.1 Recommendation Systems

In this application, we apply the cover filtration to a recommendation system data set and employ the stable paths analysis to compute sequences of movies that ease viewers from one title to another title. For an example that we will see more of shortly, suppose you have only ever seen the movie Mulan and your partner wants to show you Moulin Rouge. It would be jarring to just watch the movie, so your partner might gently build up to Moulin Rouge by showing you movies similar to both Mulan and Moulin Rouge. We compute stable paths that identify such a feasible gentle sequence.

Figure 6: Pareto frontier of stable paths between Mulan and Moulin Rouge. Nonoptimal paths are not shown.

We use the MovieLens-20m data set [16]. This data set is comprised of 20 million ratings by 138,493 users of 27,278 movies. Often, these types of data sets are interpreted as bipartite graphs. ’ Once we realize that a bipartite graph can be equivalently represented as a covering of one node set with the other, we can apply the cover filtration to build a filtration. In our case, we interpret each movie as a cover element of the users who have rated the movie. To reduce computational expenses and noise, we remove all movies with less than 10 ratings and then sample 4000 movies at random from the remaining movies.

Figure 6 shows the computed Pareto frontier of stable paths for the case of Mulan and Moulin Rouge. In Table 1, we show two stable paths that might be chosen. The stable path with length 4 is found after a large drop in instability. As the length and stability must be traded off, we think this would be a decent path to choose if you want to optimize both. The second path shown is the most stable. For readers who have seen the movies in this path, the relationship between each edge is clear, but the entire path seems to wander farther than necessary.

Shortest Path Most Stable Path
  1. Mulan (1998)

  2. Dumbo (1941)

  3. Sound of Music, The (1965)

  4. Moulin Rouge (2001)

  1. Mulan (1998)

  2. Robin Hood (1973)

  3. Dumbo (1941)

  4. Sound of Music, The (1965)

  5. Gone with the Wind (1939)

  6. Psycho (1960)

  7. High Fidelity (2000)

  8. Moulin Rouge (2001)

Table 1: Two sequences of movie transitions.

6.2 Jaccard Mapper Filtration

As supervised learning has become more powerful, the need for explanations is also grown. We develop a method of model induction for inspecting a machine learning model. The goal is to develop an understanding of the model structure by characterizing the relationship between the feature space and the prediction space. The gleaned understanding can help non-experts make sense of algorithmic decisions and is essential when models are too complex to fully understand in a white-box fashion. The Mapper

[31] is aptly suited for visualizing this functional structure.

Figure 7:

Constructed mapper from logistic regression model of Fashion-MNIST data set. The window marks the frame of Figure

8

This application extends our work of using paths in the Mapper to provide explanations for supervised machine learning models [30]. Following this work [30]

, we build Mapper from the predicted probability space of a logistic regression model. We then extend the constructed Mapper to be a

Jaccard Mapper Filtration and proceed to analyze the stable paths in that object.

Given topological spaces , a function , and a cover of , we define Mapper to be the nerve of the refined pullback cover of . A refined cover is one such that each cover element is split into its path-connected components.

Definition 6.1 (Jaccard Mapper filtration).

Given data , a function , and a cover of , we define the Jaccard Mapper as the Jaccard nerve (Definition 2.4) of the refined pullback cover of :

By incorporating information about the amount of overlap between the cover produced by the Mapper, our analysis is robust to noise and largely insensitive to the chosen parameters of the Mapper construction.

Figure 8: Depiction of stable paths found along Pareto frontier in Figure 9.

Figure 7 shows the Jaccard Mapper filtration constructed from a logistic regression model built from the Fashion-MNIST data set [35]. This data set consists of 70,000 images of clothing items from 10 classes. Each image is pixels. It is widely regarded as a more difficult drop-in replacement for the ubiquitous MNIST handwritten digits data set.

The dimensionality of the data set is first reduced to 100 dimensions using Principle Components Analysis, and then a logistic regression classifier with

regularization is trained on the reduced data using 5-fold cross validation on a training set of 60,000 images. The model is evaluated at 93% accuracy on the remaining 10,000 images. We then extract the 10-dimensional predicted probability space and use UMAP [24] to reduce the space to 2 dimensions. This 2-dimensional space is taken as the filter function of the Mapper, using a cover consisting of 40 bins along each dimension with 50% overlap between each bin. DBSCAN is used as the clustering algorithm in the refinement step [14]. KeplerMapper is used for constructing the Mapper [34]. Finally, the cover is extracted and the Jaccard Mapper filtration is constructed.

Figure 9: Pareto frontier of stable paths between predominately sneaker vertex and predominately ankle boot vertex.

To illustrate the power of the path explanations, we start with two vertices selected from the sneaker and ankle boot regions of the resulting graph. The three regions of shoes (sneaker, ankle boot, and sandals) are understandably confusing to the machine learning model, and so we are interested in where these confusions arise. Figure 8 shows the paths associated with the Pareto frontier (Figure 9).

Figure 10: Path visualizations for shortest path (left) and stable path with length 12 (right).

In Figure 9, we show the Pareto frontier between the two chosen vertices. This frontier shows a large decrease in instability value (thus increase in stability) when moving to a path length of 12. As noted in Section 5, paths found after a large increase in stability correspond to highly stable paths, i.e., the path remains the shortest path while sweeping the instability value over a comparatively large range.

Figure 10 shows representatives from each vertex in the path for the shortest path and the stable path with length 12. Each row corresponds to one vertex and the columns show a representative from each class represented in the vertex. Each image shows the multiplicity of that type of shoe in the vertex.

In both paths, the vertices start predominately containing sneakers and sneaker-like sandals. They then transition to containing a larger proportion of ankle boots, with all three classes showing higher cut tops or high heels.

Along each path we can see the relationships between nodes change. In the most stable path on the right, we a slow transition from sneaker space to ankle boot space, with some amount of sandals spread throughout. Through the path, shoes from each of the three classes become taller. Near the middle of the path, the images from sneakers and ankle boots are nearly indistinguishable. And earlier in the path, we see how some white strips in the sneakers and boots might easily be confused with negative space in the sandals.

These two paths provide a holistic representation of how the trained logistic model interprets the data. By exploring these paths, we gain valuable insight into why a model is making a decision. This can help either reinforce our trust in the model or reject the prediction. In either outcome, these explanations can strengthen the results of the predictions by including humans in the loop. Though the case of prediction clothing type is a very low stakes application, this framework is readily applicable to much more important data sets.

7 Conclusion

In this work, we established the cover filtration, a new kind of filtration that enables application of TDA to previously inaccessible types of data. We then developed a theory of stable paths in the cover filtration, and provide algorithms for computing the Pareto frontier between short and stable paths. As proof of their utility to real world applications, we show how these two ideas can be applied to the analysis of recommendation systems and of Mapper in the context of explainable machine learning.

The results in this paper suggest many new questions. Most obvious is the development of a proof for Conjecture 4.1. The application of recommendation systems leaves us curious if the cover filtration along with new results such as the one on predicting links in graphs using persistent homology [3] could provide methods for answering the main question in recommendation system research: what item to recommend to the user next?

While paths and connected components are most amenable to practical interpretations, could other structures in the cover filtration also suggest insights? What would holes and loops in the cover filtration for recommendation systems mean?

We showed that the cover filtration is stable to small changes within the cover. Does this result imply that the persistence diagram of the Jaccard Mapper filtration is stable with respect to changes in the data, cover parameters, or filter functions?

8 Acknowledgment

Broussard, Krishnamoorthy, and Saul acknowledge funding from the US National Science Foundation through grants 1661348 and 1819229. Part of the research described in this paper was conducted under the Laboratory Directed Research and Development Program at PNNL, a multi-program national laboratory operated by Battelle for the U.S. Department of Energy.

References