HELOC Applicant Risk Performance Evaluation by Topological Hierarchical Decomposition

11/26/2018 ∙ by Kyle Brown, et al. ∙ Wright State University 0

Strong regulations in the financial industry mean that any decisions based on machine learning need to be explained. This precludes the use of powerful supervised techniques such as neural networks. In this study we propose a new unsupervised and semi-supervised technique known as the topological hierarchical decomposition (THD). This process breaks a dataset down into ever smaller groups, where groups are associated with a simplicial complex that approximate the underlying topology of a dataset. We apply THD to the FICO machine learning challenge dataset, consisting of anonymized home equity loan applications using the MAPPER algorithm to build simplicial complexes. We identify different groups of individuals unable to pay back loans, and illustrate how the distribution of feature values in a simplicial complex can be used to explain the decision to grant or deny a loan by extracting illustrative explanations from two THDs on the dataset.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Because of heavy regulations in the financial services industry, there are stringent requirements for financial decisions made by algorithm to be explainable. In particular, it is important for credit institutions to be able to explain their decisions to creditors. This presents a challenge to the adoption of artificial intelligence techniques in the industry, as many AI algorithms act as “black boxes" which are difficult for a human to interpret or to explain why the algorithm made a certain decision 

Kumar et al. (2017); Došilović et al. (2018). Unfortunately, the most powerful machine learning techniques such as deep neural networks are not inherently explainable. The goal of explainable AI (XAI) is to remedy this issue by one of two broad approaches: the first takes existing (unexplainable) algorithms to them explainable; the second develops new, powerful explainable techniques from scratch. This study takes the latter approach.

We propose a new unsupervised method for decomposing a dataset into groups sharing similar features, which we call a topological hierarchical decomposition (THD). This is done by repeated applications of the MAPPER algorithm  Singh et al. (2007)

to smaller and smaller subsets of a dataset. MAPPER estimates the nerve of an open cover constructed on a point-cloud dataset. It was proposed in the context of topological data analysis (TDA), a field which develops tractable algorithms to compute algebraic structures from topology. Topology is a branch of mathematics which studies the qualitative structure of spaces, such as dimension, shape, openness, connectedness, and other complex properties. A good overview of TDA is given by

Carlsson (2009).

We apply THD to an anonymized dataset of home equity line of credit (HELOC) loan applications made available by FICO FICO (2018)

and illustrate how it provides insight, based on groups with distinct distributions of data features, into why applicants may be unable to pay back a loan within 90 days, making them risky to loan to. The goal is not to provide an exhaustive explanation for every individual from the dataset. Instead, we will provide illustrative explanations extracted from two THDs and describe how they could be used in an explainable AI approach. Examples include use by a lending institution to explain to a potential creditor why they are denied or granted a home equity line of credit, or to explain a decision made by a black-box supervised machine learning algorithm, simply based on records of past loan performances rather than by training a transparent supervised learning algorithm that requires a historical account of whether customers’ past loans were approved or denied.

2 Related work

Much work on explainable AI (XAI) has been done in recent years, some of it concerning financial services. Kumar et al. (2017)

introduce CLEAR-Trade, which is a system similar to techniques for visualizing image-classification neural networks, where instead of finding regions of images that have high activation values it visualizes time intervals of stock market data causing intense network activations. Relevant to our work here are studies on the prediction of credit risk, with varying levels of explainability. These include the use of fuzzy support vector machines 

Wang et al. (2005)

, non-parametric models 

Khandani et al. (2010), neural networks Khashman (2010)

, and one study which compares multiple algorithms including neural networks, k-nearest neighbors and decision trees 

Galindo and Tamayo (2000). These studies focus on the performance of the algorithms for classification and regression, rather than the explanation of the predictions they make.

Much of the practical applications of TDA have been in the biological sciences and related fields such as chemistry Offroy and Duponchel (2016). We were unable to find applications of TDA to financial services in the literature, so the rest of this section will discuss applications of TDA to other fields. Nicolau et al. (2011)

applied TDA to a bioinformatic dataset of gene expression data to identify a new subtype of Estrogen Receptor-positive breast cancers. This subtype was invisible to traditional cluster analysis, but was made visible with the application of MAPPER.

Offroy and Duponchel (2016) apply MAPPER to the Raman spectra of various bacteria and contrast the results with traditional data analysis techniques. They found that TDA is able to extract useful information from samples with high signal-to-noise ratio. Nielson et al. (2015) use TDA for data-driven discovery in preclinical traumatic brain injury and spinal cord injury datasets. They mention the possible utility of TDA in clinical decision making, treatment planning, and rapid diagnosis.

3 Methodology

In this section we first review the MAPPER construction for approximating the nerve of a dataset, and then describe the topological hierarchical decomposition (THD) in terms of MAPPER. For background on general topology and metric spaces, consult Munkres (2000), and for algebraic topology and simplicial complexes see Hatcher (2002). Let and be two metric spaces, and a continuous map between them. We will call the data space, the filter space, and a filter function or lens. Given an open covering on , let denote the pullback cover in , which is obtained by splitting the connected components of and collecting them to obtain an open cover of . Then the MAPPER construction is the simplicial complex obtained by taking the nerve of the pullback cover Singh et al. (2007):

(1)

An illustration of taking the nerve of a simple open cover is given in Figure 1. The left panel represents open covers of , each defined by . Open sets become vertices in the resulting simplicial complex in the right panel and whenever two open sets intersect, there is an edge between their corresponding vertices in the complex. When three sets intersect, there is a 2-simplex between their vertices in the complex, represented as a triangle in the figure.

Figure 1: Illustration of the nerve of a simple collection of open sets

For our purposes, we only need the 1-skeleton of this simplicial complex, which can be thought of as an undirected graph which we call a topological network. For an example of such a topological network, see Figure 4. In practice, the pullback cover is constructed by binning points in and running a clustering algorithm on each inverse image of a bin in . A common binning technique to generate a cover is to use generalized intervals, which are Cartesian products of the form where is the dimension of the filter space . This is most easily understood in 1-dimension, where typically bins of the form are used, where can be viewed as a parameter controlling the size of the bins, and therefore their overlap, and are evenly spaced centers of the bins. Alternatively, we can consider choosing a number of bins and an overlap parameter and determine from these. We will denote the resulting open cover on by and call the resolution of the covering and the overlap parameter, where . For more discussion of the mathematics of MAPPER and its properties, see Dey et al. (2016).

A topological hierarchical decomposition (THD) is an algorithm for decomposing a dataset into smaller groups based on iterative applications of MAPPER. We start with the entire dataset and an initial resolution , which is usually very small (). Keeping the overlap parameter fixed throughout the entire process, we run MAPPER on to obtain the topological network . This network becomes the root node of a tree structure called the THD tree. We then look at the connected components of this network, say . For each , if the number of data points in this connected component is above some threshold , we say that we have observed a split. For each split, there will be a branch in the THD tree starting from the current network . We then proceed to recursively split each subset of the data with a size above the threshold using the same THD process. If for any network we do not observe a split, then we increase the initial resolution by some increment and run MAPPER again on the corresponding dataset. This creates a linear (i.e. non-branching) path in the THD tree until the next split is observed. An example of a such a path in a THD tree between splits can be seen in the bottom-center of Figure 2.

The idea of a THD should not be confused with the similar idea of persistent homology Zomorodian and Carlsson (2005), of which the purpose is to estimate the topological structure of a space from a sample of points in it Edelsbrunner and Harer (2008). Homology is a global structure of a space, determined by the totality of its points. In contrast, THD aims to decompose a space and understand its local features in terms of the resulting groups or clusters. Furthermore, since at each split the dataset fed to MAPPER changes, there is no persistent structure (such as a persistence module) defined for THD. For a method of extracting a persistence module from a dataset using MAPPER, consult the multi-scale MAPPER of Dey et al. (2016).

In fact, a THD has more in common with the cluster tree Stuetzle (2003); Chaudhuri and Dasgupta (2010), a hierarchy constructed from a density function that can be estimated with clustering algorithms such as single-linkage Gower and Ross (1969). However, unlike these algorithms which fully partition the dataset, the not all data-points will be present in the leaf nodes of a THD tree. This is similar to density-based algorithms such as DBSCAN Ester et al. (1996)

, which is able to classify points as

outliers that are not included in the high-density clusters Kriegel et al. (2011)

. However, in a THD the outliers are not based on an (estimated) density but small connected components of a topological graph obtained from the MAPPER construction.

We implemented the THD process described above to decompose a dataset and examine the distribution of feature values at each split to determine the significant differences between groups that cause splits. We also look at the distribution of a label feature in each group which wasn’t used in building the THD, and use the significant features to explain the deviation of this label feature in a group from the global average. From this process we extract a structure superficially similar to a decision tree, where branches are not determined by the values of a single feature, but instead by significant differences in the values of a group of features at each split in the THD. Moreover, these splits are constructed in a completely unsupervised way as only the non-label features were used in building the THD, whereas a decision tree is constructed using labels in the scoring function to decide which feature to split on. This unsupervised decision-structure may be combined with the decisions made by a supervised learning algorithm to explain those decisions by tracing the path of a single individual through the THD and looking at significant characteristics of the groups the individual inhabits.

4 Results

This study uses of the FICO Explainable Machine Learning dataset (hereafter the HELOC dataset) made available by FICO FICO (2018). The dataset consists of anonymized home equity line-of-credit (HELOC) loan applications made by homeowners requesting a loan in the range of $5,000 to $150,000. The target (label) feature is called RiskPerformance, and is a categorical value of either "Bad" if the consumer was more than 90 days past the due date on a payment in the 24 months after their credit account was opened, and "Good" otherwise. The dataset contains 5,000 "Good" individuals and 5,459 "Bad" individuals for a total of 10,459 samples, giving a distribution of 48% "Good" individuals and 52% "Bad" individuals.

We did not seek to predict RiskPerformance, but merely explain its value given the other features in an unsupervised way using a THD. To establish the difference between groups, we did a statistical comparison between them using KS-score for continuous variables and a hypergeometric distribution for categorical ones. By doing this for several splits in the THD, we can then track the path of an individual throughout the THD, using the "choice" of which branch to follow at each split to "tell a story" about why this person was or was not able to pay back a loan on time. From these branches we were able to extract illustrative explanations for whether individuals in a group were 90 or more days delinquent 24 months after taking out a loan.

Figure 2: THD tree for VNE metric, NHL as filter with networks for selected groups shown

Two THDs were computed from the entire dataset, using all features but RiskPerformance and ExternalRiskEstimate. We excluded the ExternalRiskEstimate because it was obtained from an outside source and may not be as useful in an explanation. For our initial resolution we always used 1, which gives a topological network with one node containing the entire dataset. This resolution is increased until the first split occurs, and then THD is ran recursively on each branch until no connected components with a number of points above the threshold remain. The gain (overlap parameter) was 2.7 for all THDs, and remains fixed throughout the whole process. The split threshold was set to 20 points, i.e. a connected component with 20 data points (not just nodes) would be considered a split.

We used the variance-normalized Euclidean distance as the metric for both THDs, and for filters we built one THD with a

neighborhood lens (NHL), which is analogous to the first two components of t -SNE Maaten and Hinton (2008), and the other with the first two multi-dimensional scaling (MDS) Kruskal (1964a, b) coordinates. The Ayasdi Platform Ayasdi (2018) was used to build the topological networks in the THD with MAPPER. The portion of the THD tree for the NHL filter showing splits is given in Figure 2. Topological networks for selected nodes are shown as well, colored by RiskPerformance where red means more "Good" individuals and blue more "Bad" individuals. The tree for the MDS filter, not shown, exhibits different behavior in splitting, where there are a lot of small splits until the last few large splits are reached. In the NHL filter THD, there is a significant split early on which is not observed with the MDS filter. In both THDs, there seems to be a large split at the end, with two large groups that are able to be further decomposed.

Figure 3: Summary of significant features at high-level splits in the THD tree with VNE metric, NHL as filter

A simplified view of the NHL THD, showing the early splits, is given in Figure 3. The root node gives the number of points and RiskPerformance distribution for the entire dataset. At each split, a summary of the most important features distinguishing this group from other groups in the split is given. Finally, the number of points and the distribution of RiskPerformance values in the group is given. This diagram could be extended to include all splits in a THD, and then used to explain an individual’s performance based on their path through the THD. For example, we observe that individuals falling in Split 1.1.2 could be turned down for a loan as the group has 88.5% of its members unable to pay back on time. Further investigation reveals that most individuals in the group exhibit high credit card utilization, suggesting an explanation for these individuals. We summarize other explanations are extracted from Figure 3:

  • (a) Individuals in Split 1.1.2 were unable to pay a loan due to high credit card utilization leading them unable to pay back on time.

  • (b) Individuals in Split 1.1.3 were unable to pay a loan due to a past history of delinquency, despite low credit card utilization.

  • (c) Individuals in Split 1.1.1.2 were unable to pay a loan due to having few trades, meaning they have less of a credit history, but also history of delinquency on past trades. This can make such users riskier to lend to.

  • (d) Individuals in Split 1.2.1.2 paid their loan, even though they have a short history and few trades, but have a very low rate of delinquency.

We extracted explanations in a similar way from the MDS THD as well. Here the group names such as "Split 1.2" refer to groups in the MDS THD (not shown), and not in the NHL THD:

  • (e) Individuals in Split 2 would be denied a loan due to having a large number of loans with balance, and a high number of inquiries.

  • (f) Individuals in Split 1.2 would be denied a loan due to a history of delinquency over 120 days and a large number of trades with balance.

  • (g) Individuals in Split 1.1.1.1.2 would be denied a loan due to a history of delinquency and a high number of revolving trades with balance. This is in spite of the fact that these individuals have a better external risk estimate than individuals in their sister group of the split.

Note in explanation (d) how the THD is able to identify fine grained groups of customers who could be seen as good loan customers, even though they have a shallow credit history. The THD is further able to find a set of customers likely to be poor loan customers by explanation (b), even though they have relatively low credit utilization. These customers may be counterintuitive in nature, in the sense that their features intuitively suggest that the customer should (not) be granted credit.

Predicting the group of a split where a new applicant would fall within the THD would thus provide both a decision and reason for granting or denying the applicant even when applicant features take on surprising or contradictory values, and the explanation can be presented at a finer or coarser grain depending on the split depth an analyst chooses to select an explanation from. Specifically, the denial of a loan can be explained by an applicant that falls into a group with a high percentage (greater than the global average of 52%) of "Bad" RiskPerformance values, where membership in a group is defined by a distribution of feature values that distinguish the group from others at a split in the THD. These explanations are only based on the features of the applicant, and have nothing to do with the past loaning behavior of the organization. Note that using different settings for the THD will result in different explanations, although there are some similarities such as a history of delinquency and a large number of loans with balance correlating with "Bad" RiskPerformance, and hence these individuals would be denied a loan.

(a) RiskPerformance
(blue=bad, red=good)
(b) credit cards with high utilization
(blue=less, red=more)
(c) revolving trades with balance
(blue=less, red=more)
(d) percentage installment trades
(blue=0%, red=100%)
Figure 4: Topological Network for the first split in the NHL THD, colored by different features

It is also instructive to look at individual topological networks where a split occurs. An example for the first split in the NHL THD is given in Figure 4. The connected component at the top of the network corresponds to the smaller group labeled "Split 1.2" in Figure 3. The further split in this group can already be seen, as there is a vertex in the upper component that can be removed to split it into two more components. "Split 1.2.1," which has few credit cards with high utilization, can be seen as the right component of this upper network, based on the coloring in Figure 3(b). The topological networks can be used for an even more granular explanation, as we can consider the nodes an individual belongs to in the network as containing similar individuals. We can also look at the immediate neighbors of these nodes to get a very local group of individuals similar to the one under consideration. This ability to go from a high-level, group-based representation, to individual topological networks, and then to just points from a group of related nodes in a network is a novel and useful feature of THD.

Comparison to transparent supervised models

It should be noted that Figure 3

does not appear all that different from a decision tree. Each split in the THD is based on a set of feature properties that differentiate one group from another, which is not unlike a decision tree that makes classification decisions by learning a hierarchy of heuristics to bin data. Moreover, decision trees are inherently transparent in the sence that each path down a tree from root to leaf describes a series of conditions explaining why data is classified.

The key difference between using splits of a THD to provide explanations rather than a decision tree is that THD is an entirely unsupervised technique; in constructing a THD the target feature RiskPerformance is never used. A decision tree, in contrast, is a supervised approach where the target feature is used directly during learning. When this training data is collected based on credit award decisions made by an organization from the past, the decision tree essentially learns a model describing how and why a firm awards credit to applicants. The learned model thus incorporates any potentially historical biases or priorities of the organization the training data is from. In taking an unsupervised approach, the THD becomes decoupled from the organization or institution who issues credit: splits in a THD are based on distinguishing features between sets of past applications conditioned on whether they successfully paid their loan. Thus, the THD can lead to automatic loan decisions based solely on the merits of the applicant, instead of a combination of applicant merit and historical firm behavior. Moreover, a THD is theoretically grounded by exploiting the shape and structure of the underlying manifold of data about applicants, which is more likely to have a shape and structure characteristics across applicants for all forms of credit besides HELOC. Insights from a THD are thus more likely to be transferable across domains (e.g., to support decisions for other lines of credit besides HELOC), compared to decision tree heuristics that are (over)fitted to a single, specific dataset.

We further note that the THD requires no a priori

information about the meaning or importance of each feature. Since these explanations are independent of any machine learning model used in classification, they could thus be used to supplement and explain decisions made by the algorithm. For example, a linear regression may give larger magnitude to weights that were found to correlate with RiskPerformance in THD groups, such as percentage of trades never delinquent and number of trades with balance. Finally, these explanations could also be used to understand a

misclassification made by a classifier. The classifier may be weighting the wrong features, i.e. features that correlate with RiskPerformance in a different THD group than the one the point being classified belongs to. Another possibility is that the data point being classified is an outlier - it is in a THD group but has unusual features for that group. THD provides a framework for identifying such points automatically.

5 Conclusion

In this paper we introduced the topological hierarchical decomposition

, which constructs a tree-like structure of topological networks by iteratively applying the MAPPER construction to smaller subsets of a dataset, based on connected components of previously computed networks. We described THD based on the parametrization of a family of open covers on the filter space in terms of a resolution parameter, and contrasted it with the ideas of persistent homology and hierarchical clustering. We then constructed two THDs on the FICO Machine Learning Challenge dataset consisting of anonymized HELOC applications. We showed how these THDs can be used to explain related groups of features in the dataset and their influence on the target RiskPerformance feature. We showed examples of THD trees and topological networks that can be used to understand a dataset, and how colorings of a topological network can be used to understand splits in a THD.

An important topic that future studies on THD should cover is an understanding of its mathematical and algorithmic properties. In order to tie THD more closely to persistent homology, it may become necessary to apply some notion of zigzag persistence Carlsson et al. (2009), which allows one to "study the persistence of topological features across a family of spaces or point-cloud sets" Carlsson and de Silva (2010). The "sequence of topological spaces" would be the groups obtained from a THD and the functions between them just inclusion from the smaller group to its parent, since the smaller group is always a subset of the parent group. This describes a hierarchical structure of inclusions, proceeding from leaf nodes in the THD back up to the group containing the entire dataset. Whether one actually obtains a persistent structure from this remains to be seen.

Other future work could look at applications of THD. One could construct a classifier from a THD by constructing one from both training and testing data and then looking at the leaf-level network a testing point falls in and doing a majority vote on the label of its nearest training points based on nodes in the network. This classifier could then be compared with the state-of-the-art, and if competitive, could provide a more explainable alternative to them. The amount of hyper-parameters that need to be selected by a human to construct a THD is large. It would be useful to have some way to automatically select these parameters to obtain the "optimal" THD. One approach in a classification context would be to construct several THDs on the training set and then to select the one with the largest information gain based on the label.

Topological hierarchical decomposition is a versatile tool that can be used for an unsupervised or semi-supervised approach to data analysis. It also provides a new method for understanding predictions made by existing supervised algorithms. Networks in a THD tree can be queried at multiple levels of granularity, providing corresponding levels of explainability. Since it makes use of topological data analysis, it comes with built-in robustness to noisy data. These features make it useful in almost any field where analysis of big data is required.

Acknowledgments

This work was supported by the Air Force Research Laboratory, 711th Human Performance Wing, Airman Systems Directorate with funding provided through Oak Ridge Institute for Science and Education (ORISE). Our work has also been supported by the Ohio Federal Research Network project Human-Centered Big Data. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the author(s) and do not necessarily reflect the views of the Ohio Federal Research Network.

The authors would also like to thank Matthew Piekenbrock for discussions on multiscale MAPPER and hierarchical clustering that were useful in preparing the discussion of THD and comparisons with other techniques in Section 3.

References

  • Ayasdi [2018] Ayasdi. Ayasdi platform, 2018. URL https://www.ayasdi.com/platform/.
  • Carlsson [2009] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009.
  • Carlsson and de Silva [2010] Gunnar Carlsson and Vin de Silva. Zigzag persistence. Foundations of Computational Mathematics, 10(4):367–405, Aug 2010. ISSN 1615-3383. doi: 10.1007/s10208-010-9066-0. URL https://doi.org/10.1007/s10208-010-9066-0.
  • Carlsson et al. [2009] Gunnar Carlsson, Vin De Silva, and Dmitriy Morozov. Zigzag persistent homology and real-valued functions. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pages 247–256. ACM, 2009.
  • Chaudhuri and Dasgupta [2010] Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems, pages 343–351, 2010.
  • Dey et al. [2016] Tamal K Dey, Facundo Mémoli, and Yusu Wang. Multiscale mapper: Topological summarization via codomain covers. In Proceedings of the Twenty-seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 997–1013. SIAM, 2016.
  • Došilović et al. [2018] F. K. Došilović, M. Brčić, and N. Hlupić. Explainable artificial intelligence: A survey. In 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 0210–0215, May 2018. doi: 10.23919/MIPRO.2018.8400040.
  • Edelsbrunner and Harer [2008] Herbert Edelsbrunner and John Harer. Persistent homology-a survey. Contemporary mathematics, 453:257–282, 2008.
  • Ester et al. [1996] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.
  • FICO [2018] FICO. Explainable machine learning challenge, 2018. URL https://community.fico.com/s/explainable-machine-learning-challenge.
  • Galindo and Tamayo [2000] Jorge Galindo and Pablo Tamayo. Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1-2):107–143, 2000.
  • Gower and Ross [1969] John C Gower and Gavin JS Ross. Minimum spanning trees and single linkage cluster analysis. Applied statistics, pages 54–64, 1969.
  • Hatcher [2002] Allen Hatcher. Algebraic Topology. Cambridge University Press, 2002.
  • Khandani et al. [2010] Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787, 2010.
  • Khashman [2010] Adnan Khashman. Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications, 37(9):6233–6239, 2010.
  • Kriegel et al. [2011] Hans-Peter Kriegel, Peer Kröger, Jörg Sander, and Arthur Zimek. Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3):231–240, 2011.
  • Kruskal [1964a] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964a.
  • Kruskal [1964b] Joseph B Kruskal. Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2):115–129, 1964b.
  • Kumar et al. [2017] Devinder Kumar, Graham W. Taylor, and Alexander Wong. Opening the black box of financial AI with clear-trade: A class-enhanced attentive response approach for explaining and visualizing deep learning-driven stock market prediction. CoRR, abs/1709.01574, 2017. URL http://arxiv.org/abs/1709.01574.
  • Maaten and Hinton [2008] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
  • Munkres [2000] James R Munkres. Topology. Prentice Hall, 2000.
  • Nicolau et al. [2011] Monica Nicolau, Arnold J Levine, and Gunnar Carlsson. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, page 201102826, 2011.
  • Nielson et al. [2015] Jessica L Nielson, Jesse Paquette, Aiwen W Liu, Cristian F Guandique, C Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C Gensel, Jennifer Kloke, Tanya C Petrossian, et al. Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury. Nature Communications, 6:8581, 2015.
  • Offroy and Duponchel [2016] Marc Offroy and Ludovic Duponchel. Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry. Analytica Chimica Acta, pages 1–11, 2016.
  • Singh et al. [2007] Gurjeet Singh, Facundo Mémoli, and Gunnar E Carlsson.

    Topological methods for the analysis of high dimensional data sets and 3d object recognition.

    In SPBG, pages 91–100, 2007.
  • Stuetzle [2003] Werner Stuetzle. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of classification, 20(1):025–047, 2003.
  • Wang et al. [2005] Yongqiao Wang, Shouyang Wang, and Kin Keung Lai. A new fuzzy support vector machine to evaluate credit risk. IEEE Transactions on Fuzzy Systems, 13(6):820–831, 2005.
  • Zomorodian and Carlsson [2005] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, 2005.