Sensitivity Analysis with Manifolds

09/03/2018
by   Alberto Hernández, et al.
Universidad De Costa Rica
0

The course of dimensionality is a common problem in statistics and data analysis. Variable sensitivity analysis methods are a well studied and established set of tools designed to overcome these sorts of problems. However, as this work shows, these methods fail to capture relevant features and patterns hidden within the geometry of the enveloping manifold projected into a variable. We propose a sensitivity index that captures and reflects the relevance of distinct variables within a model by focusing at the geometry of their projections.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

12/09/2020

Estimation of first-order sensitivity indices based on symmetric reflected Vietoris-Rips complexes areas

In this paper we estimate the first-order sensitivity index of random va...
02/19/2020

ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable

We present a comprehensive R software ivmodel for analyzing instrumental...
02/17/2020

Multiple Flat Projections for Cross-manifold Clustering

Cross-manifold clustering is a hard topic and many traditional clusterin...
07/20/2021

Adaptively Sampling via Regional Variance-Based Sensitivities

Inspired by the well-established variance-based methods for global sensi...
01/10/2013

Analysing Sensitivity Data from Probabilistic Networks

With the advance of efficient analytical methods for sensitivity analysi...
04/30/2019

Active Manifolds: A non-linear analogue to Active Subspaces

We present an approach to analyze C^1(R^m) functions that addresses limi...
09/28/2017

What is the Machine Learning?

Applications of machine learning tools to problems of physical interest ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let for and

two random variables. Define the non-linear regression model as

(1.1)

Here is a random noise independent of . The unknown function describes the conditional expectation of given . Suppose as well that for is a size

sample for the random vector

.

If , the model (1.1) suffers from the

curse of dimensionality

, a term introduced by Bellman (1957) and Bellman (1961), where he showed that the sample size required to fit a model increases with the number of variables

. In a statistical context, the model selection techniques solve the problem using indicators as the AIC or BIC or more advanced techniques as Ridge or Lasso regression. The interested reader can find a comprehensive survey of these methodologies in

Hastie et al. (2009).

Professionals on computational modelling deal with these problems all the time; the choosing of relevant variables is a recurring task for them. Another way to approach this problem is through the variable importance measures or indices. These indices rely on indicators to find the relevance or sensitivity of the variables in a model. The works of Saltelli et al. (2004), Saltelli et al. (2008), Saltelli et al. (2009) and Wei et al. (2015)

compile different approaches to estimate those indicators. They present techniques based on differences methods; parametric and non-parametric settings, variance-based measures, moment independent measures, and graphical techniques.

These techniques overlook the geometric arrangement of the data to build the indexes. The algorithms simplify the model simulating or reconstructing the complete relation between and and then present that information in the form of an indicator. Depending on this simplification they do not consider the geometric properties of the data. For example, most indices will fail to recognize structure when the input variable is of zero-sum, treating it as random noise.

The analysis of topological data is a recent field of research that aims to overcome these shortcomings. Given a set of points generated in space, it tries to reconstruct the model through an embedded manifold that covers the data set. With this manifold, we can study the characteristics of the model using topological and geometrical tools instead of using classic statistical tools.

Two classic tools used to discover the intrinsic geometry of the data are the Principal Components Analysis (PCA) and the Multidimensional Scaling (MDS). The PCA transforms the data in a smaller linear space preserving the statistical variance. The other approach, the MDS, performs the same task but preserving the distances between points. Recent methods like the isomap algorithm developed by

Tenenbaum (2000) and expanded by Bernstein et al. (2000); Balasubramanian (2002) unify these two concepts to allow the reconstruction of a low-dimensional variety for non-linear functions. Using the geodesic distance it identifies the corresponding manifolds and search lower dimension spaces to project it.

In recent years, new theoretical developments use tools such as persistent homology, simplicial complexes, and Betti numbers to reconstruct manifolds, the reconstruction works for clouds of random data and functional data, see Ghrist (2008), Carlsson (2009, 2014). In Gallón et al. (2013) and in Dimeglio et al. (2014) some examples are presented. This approach allows handling “ Big Data” quickly and efficiently, for example, see Snášel et al. (2017).

In this work, we aim to connect the concepts of sensitivity analysis with the analysis of topological data. By doing this, it will be possible to create relevance indicators for statistical models using the geometric information extracted from the data.

The outline of this paper is: Section 2 deals with basic notions, both in sensitivity analysis and in topology. In Subsection 2.1 some of the most used and well-known statistics methods are reviewed and commented. We finish this subsection with an example that motivates this work. Subsection 2.2 deals with preliminaries in homology. Section 3 explains the method used to create our sensitivity index; Subsection 3.1 describes the construction of the neighborhood graph, and deals with different topics such as the importance of scale, the Ishigami Model and presents a strip of code to determine the radius of proximity. Subsection 3.2 describes the algorithm used to construct the homology complex through the Vietoris-Rips complex and Subsection 3.3 explains our proposed sensitivity index. Section 4 contains a description of our results, it describes the software and packages used to run our theoretical examples. Subsection 4.1 is a full description of each theoretical example together with a visual aid, such as graphics and tables describing the results. Finally, Section 5 contains our conclusions and explores scenarios for future research.

2 Preliminary Aspects

2.1 Sensitivity Analysis

According to Saltelli et al. (2009), the sensitivity analysis (SA) is “the study of how uncertainty in the output of a model can be apportioned to different sources of uncertainty in the model input”. The process of validation could be seen as an optional step. But, a complete analysis requires it to find hidden patterns and uncertainties in the data and to identify relevant factors and validating simplifications of the problem.

We could classify sensitivity analysis methods into one-at-the-time and global sensitivity analysis. In the one-at-the-time methods, one variable is sampled at a time while keeping the rest constant. The methods reveal the pattern of the model, however; they work if the problem is linear. Otherwise, features of the data can be missed and misleading conclusions can arise. The correlations and regression sensitivity analysis share this problematic behavior. We refer the reader to

Saltelli et al. (2006) for a further review on the flaws and shortcomings of linear sensitivity analysis models.

To discover the nonlinear patterns of the data one could use other techniques. Some popular and well-documented ones are screening, Morris method, global variance or the moment independent measures. For instance, the Morris method creates an approximation of the variation over the variable we want to study. This method can detect no linearity, but due to its nature assumes monotonicity. This characteristic is not always present in complex models Hamby (1995). The higher-order effects are difficult to estimate due to their demanding computational requirements Saltelli et al. (2006).

One popular method to asses relevant variables is the variance-based global sensitivity analysis. The method was proposed by Sobol’ (1993) based on the ANOVA decomposition. He proved that if is an squared integrable function, then he decomposed it in the unitary cube as:

(2.1)

where each term is also square integrable over the domain. Also, each function is defined by , and so on. This decomposition has terms and the first one, , is constant. The remaining are non constant functions. Sobol also proved that this representation is unique if each term has zero mean and the functions are pairwise orthogonal. Equation (2.1) could have been seen as the decomposition of the output variable into its effects due to the interaction of none, one or multiple variables. Taking expectation in Equation (2.1) and simplifying the expression we get:

and so on for all combinations of variables.

Once with this orthogonal decomposition, we measure the variance of each element. The global-variance method estimates the regression curves (surfaces) for each dimension (or multiple dimensions) removing the effects due to variables in lower dimensions. Then, it gauges the variance of each curve (surface) normalized by the total variance in the model. For the first and second-order effects the formulas remain as:

(2.2)
(2.3)

To quantify how much we could apportion of each variable to the whole model, the total effect is estimated by

The moment independent indices estimates relevant inputs in other way. These estimate the average of the distance between the random variable and for any . The initial ideas come from Borgonovo et al. (2014)

which used monotonic invariant transformations to gauge the distance between two probability measures. Therefore, if the variable

is irrelevant or independent to the model, the random variables and will be almost identical and the impact will be small. Otherwise the impact will be big. The following are popular measures using this technique:

  • Kolmogorov-Smirnov: ,

  • Radon-Nykodym: ,

  • Kullback-Leiber:

where and represent the distribution function for and respectively. The density functions are presented as and .

All those methods try to compact all the features of the space in a single measure and then average over the set of values into the sample.

However, the sample could have particularities not detected by the method. For example, Figure 1 bellow presents a toy example where the data points are arranged into a circle with a hole in the middle (further details in Section 4). The first variable presents all the information about the model while the second one is a noisy arrangement.

We estimated the latter example with the algorithms developed by Ratto and Pagano (2010) and Solís (2018). Those algorithms work without specifying the explicit formulation of the model. Here, both results agree that the variables and are irrelevant to the model. In this paper we propose an algorithm that handles this examples and notices such geometric and topological details within the data.

(Solís) 0.007 0.000
(Ratto) 0.002 0.000
Figure 1: Circle with hole

2.2 Homology

In this subsection we recall some of the topological definitions needed to understand the frame of our work.

According to Barden and Thomas (2003), an n–topological manifold is a Hausdorff space with a countable basis such that each point has an open neighborhood that is homeomorphic with an open subset of .

We recall here that Hausdorff means that any two distinct points lie in disjoint open subsets. A countable basis means countable family of open subsets such that each open subset is the union of a subfamily. The reader may see Barden and Thomas (2003) for details.

In terms of differential manifolds, a chart on a topological manifold is a pair where is an open subset and is a homeomorphism that sends to an open subset . The set is so-called a coordinate neighborhood and the set is so-called its coordinate space. An atlas for a topological manifold is a collection of charts such that the coordinate neighborhoods of cover the whole manifold :

To avoid ambiguity, we introduce the coordinate transformations: if two distinct coordinate neighborhoods are not disjoint then, we can define the following functions from the images of the overlaping open subset:

and

An n–differential manifold is a topological manifold of dimension together with a maximal smooth atlas on it. The reader may consult Lang (1963) for differential manifolds details.

To define a simplex, let consider the unit interval and two topological spaces and . Consider the triple product and the quotient where the relation is given by the identifications

In other words, we are collapsing the subspaces and to and respectively. To visualize it, take for instance two real closed intervals. Hence, we are collapsing two opposite faces of a cube onto line segments so that, the cube becomes a tetrahedron:


According to Hatcher (2002) (may see also Munkres (1984), a -complex is a generalization of a simplex. Recall that -simplices are points, called vertices. As well as -simplices, we have line segments as -simplices, called edges, -simplices called faces, -simplices called tetrahedron, and a generalization of dimension will be a convex set in containing a subset of distinct points that do not lie in the same hyper plane of dimension or, equivalently, that the vectors are linearly independent. In such a case, we are denoting the points as vertices, and the usual notation would be for edges, for faces, for tetrahedron, and for -simplices.

A -complex is a qoutient topological space of a collection of disjoint simplices identifying some of their faces via certain homeomorphisms that preserve the order of the vertices. Those homeomorphisms are linear, and give us a better notation for -simplices: .

Now, we may define the simplicial homology groups of a -complex as follows. Let consider the free abelian group with open -simplices as basis elements. It means that the elements of this group , known as chains, look like linear combinations of the form

(2.4)

with integer coeficients . We also could wirte chains as linear combinations of characteristic maps

(2.5)

where every is the corresponding characteristic map of each , with image the closure of . So, is a finite collection of -simplices in with integer multiplicities . The boundary of the -simplex consists of the various -simplices . For chains, the boundary of is an oriented -chain of the form

(2.6)

which is a linear combination of faces. This allows us to define the boundary homomorphisms for a general -complex , as follows:

(2.7)

Thence, we get a sequence of homomorphisms of abelian groups

where for all . This is usually known as a chain complex. Since , the , and so, we define the simplicial homology group of as the quotient

(2.8)

The elements of the kernel are known as cycles and the elements of the image are known as boundaries.

Easy computations of sequences give us the simplicial homology of some examples: the circle

and the torus

In a very natural way, one can extend this process to define singular homology groups . This process, nevertheless, is not trivial but natural. If is a -complex with finitely many -simplices, then (and of course ) is finitely generated. The -Betti number of is the number of summands isomorphic to the aditive group . The reader may see Hatcher (2002) or Munkres (1984) for details.

Definition 2.1 (VR neigboorhood graph).

Given and scale , the VR neighborhood graph is a graph where and

Definition 2.2 (VR expansion).

Given a neigborhood graph , their Vietoris-Rips complex is defined as all the edges of a simplex that are in . In this case belongs to . For , we have

where is a simplex of .

3 Methodology

Recall the model (1.1). Here the random variables are distorted by the function and its topology. Our aim is to measure how much each of the influenced this distortion, i.e, we want to determine which variables influence the model the most.

In Section 2.1 we reviewed the most common indices to estimate the sensitivity of for with respect to . They were based in statistical measures to account for their sensitivity. In our case, we want to consider the geometry of the point-cloud, its enveloping manifold and create an index that will reveal information about the model.

The first step is to create a neighborhood graph for the point-cloud formed by where an edge is set if a pair of nodes are within an of euclidean distance. In this way, we connect only the nodes nearby by a fixed distance. With this neighborhood graph, we construct the persistent homology using the method of Zomorodian (2010) for the Vietoris-Rips (VR) complex.

The algorithm of Zomorodian (2010) works in two-phases: First it creates a VR neighborhood graph (Definition 2.1) and then builds, step by step, the VR complex (Definition 2.2).

These definitions state the procedure to construct the two-phase scheme for the Vietoris-Rips complex with resolution :

  1. Using Definition 2.1 compute the neighborhood graph with parameter .

  2. Using Definition 2.2 compute . From now on set .

This scheme provides us with a geometrical skeleton for the data cloud points with resolution . If is large there will be more edges connecting points. Here, we could have a large interconnected graph with little information. Otherwise, if is small, there would be fewer edges connecting points, resulting in a sparse graph and missing relevant topological features within the data cloud.

In the second step, we unveil the topological structure of the neighborhood graph through the Vietoris-Rips complex. The expansion builds the cofaces related to our simplicial complex. In Section 3.2 we will discuss more about the algorithm used for this purpose.

3.1 Neigborhood graph

The neighborhood graph collects the vertices , and for each vertex it adds all the edges within the set , satisfying . This brute-force operation works in time.After considering a variety of theoretical examples it becomes clear that the scale factor in the data set is relevant. The scale in one variable may differ with the scale of the output by many orders of magnitude. Thus, proximity is relative to the scale of the axis in which the information is presented and the proximity neighborhood could be misjudged.

The Ishigami model, presented in Figure 2 bellow shows how the proximity neigborhood becomes distorted when scale is taken into consideration.

Figure 2: The Second variable of the Ishigami model escalated to with a circle centered in and radius 1 (left). The same circle draw in the true scale of the data (right).

We conclude that the aspect ratio between both variables define how the algorithm construct the neighborhood graph. Therefore, to use circles to build the neighborhood graph, we need to set both variables into the same scale. The following algorithm constructs the VR-neigborhood graph for a cloud of points with arbitrary scales.

Data: A set of points and
A value .
Result: The Neigborhood Graph.
1 Function Create-VR-Neigborhood, ):
2      
3      
4      
5       DistanceMatrix Matrix() for i = 1:n do
6             for j = 1:n do
7                   DistanceMatrix[i,j]
8             end for
9            
10       end for
11       Quantile(DistanceMatrix, )
12       AdjacencyMatrix DistanceMatrix
13       NeigborhoodGraph CreateGraph(AdjacencyMatrix, xCoordinates = , y Coordinates = ) return NeigborhoodGraph
14 end
  1. Rescale the points onto the square .

  2. Estimate the distance matrix between points.

  3. With the distance chart, estimate the quantile of the distances. Declare the radius as this quantile.

  4. Using Definition 2.1, build the VR-neigborhood graph with changed by for each projection.

  5. Rescale the data points to their original scale.

3.2 VR expansion

In his work Zomorodian (2010), Zomorodian describes three methods to build the Vietoris-Rips complex. The first approach builds the complex adding the vertices, the edges and then increasing the dimension to create triangles, tetrahedrons, etc. The second method starts from an empty complex, and adds, step by step, all the simplices stopping in the desired dimension. In the third one, one takes advantage from the fact that the VR-complex is the combinations of cliques in the graph in the desired dimension.

Due to its simplicity, we adopt the third approach and detect the cliques in the graph. We use the algorithm in Eppstein et al. (2010) which is a variant of the classic algorithm from Bron and Kerbosch (1973). This algorithm orders the graph and then computes the cliques using the Bron-Kerbosch method without pivoting. This procedure reduced the worst-case scenario from time to where is the number of vertices and is the smallest value such that every nonempty subgraph of contains a vertex of degree at most .

Constructing the manifold via the VR-complex is not efficient in the sense that the co-faces may overlap, increasing the computational time required., One can overcome this issue by creating an ordered neighborhood graph.

3.3 Sensitivity Index construction

One of the main objectives in a sensitivity analysis is to discover patterns between the inputs against the output and determine which of those patterns are more influential for the model in consideration. For our case, the patterns are described through empty spaces in the projection space generated by each individual variable. If the point-cloud fills all the domain then the unknown function applied to produces erratic values of Y. Otherwise, the function sheds an structural pattern which could be recognized geometrically.

The VR-complex estimates the geometric structure of the data by filling the voids between close points. Then, we estimate the area of the created object. This number will not give much information about the influence that the variable has within the model. We estimate the area of the minimum rectangle containing all the object. If some variable presents a weak influence in the model, its behavior will be almost random with uniform distributed points into the rectangular box. However, if the variable has some relevant influence, it will create a pattern causing empty spaces to appear across the box. Therefore, we can create the measure,

Notice that if the areas of the object and the box are similar, then the index is close to zero. Otherwise, if there is a lot of empty spaces and both areas differ and the index would approach 1.

4 Results

To asses the quality of the estimated index described before, we performed numerical examples. The software used was R (R Core Team (2016)), along with the packages igraph (Csárdi and Nepusz (2006)) for all the graph manipulations, and the packages rgeos and sp (Bivand et al. (2013); Pebesma and Bivand (2005); Bivand and Rundel (2013)) for all the geometric estimations. A package containing all these algorithms will be available soon in CRAN.

For all the settings, we sample points with the distribution specified in each case. Due to the number of points, we choose the quantile for each case to determine the radius of the neigborhood graph. Further insights about this choosing will be presented in the conclusions section

We will consider five settings, each one with different topological features. The cases are not exhaustive and there are other settings with interesting features as well. However, through this sample we show how the method captures the sensitivity of the variables where other classic methods have failed, as well as a case for which our method fails to retrieve the desired information.

4.1 Theoretical examples

The examples considered are the following:

Linear

This is a simple setting with

and , and independent random variables. We set for .

Quartic:

This is another simple scheme with

and , an independent random variable. For , .

Circle with hole:

The model in this case is

with and . This form creates circle with with a hole in the middle.

Connected circles with holes:

The model is set in two sections, where in both parts we set :

  1. Circle centered in with radius between 1 and 2:

    where .

  2. Circle centered in with radius between 0.5 and 1:

    where .

Ishigami:

The final model is

where for , and .

4.2 Numerical results

The figures presented in this section represent the estimated manifold for each input variable with respect to the output variable . The lower table presents the radius used to build the neighborhood graph, the estimated areas for the manifold object, the reference square and the proposed index.

The linear model in Figure 3 is simple enough to allow us to declare the variable has the double of relevance compared to variable . The other variables will have less relevant indices. As we expected, the index for almost doubles the counterpart for . The examples show us how the empty spaces are present according to the relevance level of the variable.

For the Quartic model in Figure 4, we could estimate the theoretical Sobol indices according to Equation (2.2). The indices are , and . We observe in Figure 4 how the Sobol indices match with our algorithm, ranking the variables in order of relevance as , and .

The model of Circle with a hole in Figure 5 was discussed in the preliminaries. Recall that in this case, both variables were irrelevant to the model, even if the geometric shape showed the contrary. Figure 5 present the results. Observe how the first variable has index equal to and the second one with . These indices allow us to say the has an impact into the model much more relevant than .

To test our algorithm further, we present the model of Connected circles with holes in Figure 6. Here we created two circles with different scales and positions. Even if the choice is an arguable point for the method, we could capture the most relevant features for each projection. Again, it is possible to rank the variables as first with index equal to 0.58 and second with index 0.08.

The final model is produced by the Ishigami function, Figure 7. This is a popular model in sensitivity analysis because presents a strong nonlinearity and non-monotonicity with interactions in . With other sensitivity estimators, the variables and have great relevance to the model, while the third one has almost zero. For a further explanation of this function, we refer the reader to Sobol’ and Levitan (1999). In our case, all the variables result into a index near to 0.5. This is relevant because our index has captured the presence of structure, as well as with the model Circle with hole whereas other tools treat them as noise.

Variable Radius Obj Area Square Area Index
0.08 3.78 11.61 0.67
0.11 7.67 11.61 0.34
0.12 10.05 11.61 0.13
0.12 10.19 11.59 0.12
0.12 10.24 11.62 0.12
Figure 3: Results for the Linear case.
Variable Radius Obj Area Square Area Index
0.06 1.48 5.52 0.73
0.11 3.77 5.80 0.35
0.12 4.80 5.80 0.17
Figure 4: Results for the Quartic case.
Variable Radius Obj Area Square Area Index
0.10 2.05 3.95 0.48
0.13 3.73 3.97 0.06
Figure 5: Results for the Circle with 1 hole case.
Variable Radius Obj Area Square Area Index
0.08 8.33 19.98 0.58
0.08 8.27 8.97 0.08
Figure 6: Results for the Circle with 2 holes case.
Variable Radius Obj Area Square Area Index
0.09 64.06 141.44 0.55
0.07 54.34 133.08 0.59
0.09 69.31 141.42 0.51
Figure 7: Results for the Ishigami case.

5 Conclusions and further research

As it was mentioned above, the aim of this paper was to build a sensitivity index that relied solely on the topological and geometrical features of a given data-cloud, since it is clear that purely analytic or statistic methods fail to recognize the structure within the projection of certain variables, primarily when the input is of zero sum, that is, what might be considered artificial noise. In such cases, those projections, or the variables in question, have positive conditional variance that might contribute to the model in ways that hadn't been explored so far.

Our index proved to be reliable in detecting this conditional variance when the variable is of zero sum, differentiating between pure random noise and well structured inputs -even of zero sum- . In the cases where the model presents pure noise our index coincides fully with other methods'indexes in detecting relevant structured inputs, in the other cases our index reflects the presence of structure in all the variables, which was the case we wanted to explore.

As of sensitivity, we can not fully declare that our index measures it accurately, at least for the moment, and in a conventional way. To achieve the confidence for such a claim, we have identified a series of research problems to be dealt with in near future, namely: Improving the algorithm for the construction of the base graph, or change it completely to make the process more efficient and hence allow ourselves to run more sophisticated examples, both theoretical as well as real data examples from well known models. One of the central points to be discussed and studied further is the determination of the radius of proximity, which we believe must be given by the data set itself, probably by a more detailed application of persistent homology. Finally we will be looking forward to extend our method to more than one variable at the time, to be able to check for crossed relevance.

We do not claim this to be an exhaustive list of problems related to the improvement of our method, but are confident that they would help us run more examples and more sophisticated ones, as well as will help us get more data to compare our results with other methods.

Acknowledgements

We acknowledge Santiago Gallón for enlightining discussions about the subject. His help has been very valuable.

First and second authors acknowledge the financial support from CIMPA, Centro de Investigaciones en Matemática Pura y Aplicada through the projects 821-B7-254 and 821-B8-221 respectively.

Third author acknowledges the financial support from CIMM, Centro de Investigaciones Matemáticas y Metamatemáticas through the project 820-B8-224.

The three authors also acknowledge Escuela de Matemática, Universidad de Costa Rica for their support.

References

  • Balasubramanian (2002) Balasubramanian, M. (2002). The Isomap Algorithm and Topological Stability. Science, 295(5552):7a–7.
  • Barden and Thomas (2003) Barden, D. and Thomas, C. (2003). An Introduction to Differential Manifolds. Imperial College Press.
  • Bellman (1957) Bellman, R. (1957). Dynamic Programming. Dover Books on Computer Science Series. Princeton University Press.
  • Bellman (1961) Bellman, R. (1961). Adaptive control processes: A guided tour, volume 4 of ’Rand Corporation. Research studies. Princeton University Press.
  • Bernstein et al. (2000) Bernstein, M., de Silva, V., Langford, J. C., and Tenenbaum, J. B. (2000). Graph approximations to geodesics on embedded manifolds. Igarss 2014, 01(1):1–5.
  • Bivand et al. (2013) Bivand, R., Pebesma, E. J., Gómez-Rubio, V., and Ebooks, C. (2013). Applied spatial data analysis with R. 10(Book, Whole).
  • Bivand and Rundel (2013) Bivand, R. and Rundel, C. (2013). rgeos: Interface to Geometry Engine - Open Source (GEOS). R package version 0.3-2. R package version 0.1-8, page 61.
  • Borgonovo et al. (2014) Borgonovo, E., Tarantola, S., Plischke, E., and Morris, M. D. (2014). Transformations and invariance in the sensitivity analysis of computer experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(5):925–947.
  • Bron and Kerbosch (1973) Bron, C. and Kerbosch, J. (1973). Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575–577.
  • Carlsson (2009) Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308.
  • Carlsson (2014) Carlsson, G. (2014).

    Topological pattern recognition for point cloud data.

    Acta Numerica, 23(23):289–368.
  • Csárdi and Nepusz (2006) Csárdi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695(1695):1695.
  • Dimeglio et al. (2014) Dimeglio, C., Gallón, S., Loubes, J. M., and Maza, E. (2014). A robust algorithm for template curve estimation based on manifold embedding. Computational Statistics and Data Analysis, 70:373–386.
  • Eppstein et al. (2010) Eppstein, D., Löffler, M., and Strash, D. (2010). Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. In

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    , volume 6506 LNCS, pages 403–414.
  • Gallón et al. (2013) Gallón, S., Loubes, J.-M., and Maza, E. (2013). Statistical properties of the quantile normalization method for density curve alignment. Mathematical Biosciences, 242(2):129–142.
  • Ghrist (2008) Ghrist, R. (2008). Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society, 45(1):61–75.
  • Hamby (1995) Hamby, D. M. (1995). A comparison of sensitivity analysis techniques. Health physics, 68(2):195–204.
  • Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, volume 1 of Springer Series in Statistics. Springer-Verlag New York, New York, NY.
  • Hatcher (2002) Hatcher, A. (2002). Algebraic Topology. Cambridge University Press.
  • Lang (1963) Lang, S. (1963). Introduction to differentiable manifolds, volume 275.
  • Munkres (1984) Munkres, J. R. (1984). Elements of Algebraic Topology. Addison-Wesley.
  • Pebesma and Bivand (2005) Pebesma, E. and Bivand, R. (2005). Classes and methods for spatial data in R. The Newsletter of the R Project, 5(2):9–13.
  • R Core Team (2016) R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
  • Ratto and Pagano (2010) Ratto, M. and Pagano, A. (2010). Using recursive algorithms for the efficient identification of smoothing spline ANOVA models. AStA Advances in Statistical Analysis, 94(4):367–388.
  • Saltelli et al. (2009) Saltelli, A., Chan, K., and Scott, E. M. (2009). Sensitivity Analysis. Wiley, New York, 1 edition edition.
  • Saltelli et al. (2008) Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S. (2008). Global Sensitivity Analysis. The Primer. Wiley. com.
  • Saltelli et al. (2006) Saltelli, A., Ratto, M., Tarantola, S., and Campolongo, F. (2006). Sensitivity analysis practices: Strategies for model-based inference. Reliability Engineering & System Safety, 91(10-11):1109–1125.
  • Saltelli et al. (2004) Saltelli, A., Tarantola, S., Campolongo, F., and Ratto, M. (2004). Sensitivity Analysis in Practice. John Wiley & Sons, Ltd, Chichester, UK.
  • Snášel et al. (2017) Snášel, V., Nowaková, J., Xhafa, F., and Barolli, L. (2017). Geometrical and topological approaches to Big Data. Future Generation Computer Systems, 67:286–296.
  • Sobol’ (1993) Sobol’, I. (1993). Sensitivity Estimates for Nonlinear Mathematical Models. Mathematical Modeling and Computational experiment, 1(4):407–414.
  • Sobol’ and Levitan (1999) Sobol’, I. M. and Levitan, Y. (1999). On the use of variance reducing multipliers in Monte Carlo computations of a global sensitivity index. Computer Physics Communications, 117(1-2):52–61.
  • Solís (2018) Solís, M. (2018). Nonparametric estimation of the first order Sobol indices with bootstrap bandwidth. Manuscript submitted for publication.
  • Tenenbaum (2000) Tenenbaum, J. B. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500):2319–2323.
  • Wei et al. (2015) Wei, P., Lu, Z., and Song, J. (2015). Variable importance analysis: A comprehensive review. Reliability Engineering and System Safety, 142:399–432.
  • Zomorodian (2010) Zomorodian, A. (2010). Fast construction of the Vietoris-Rips complex. Computers & Graphics, 34(3):263–271.