1 Persistence pipeline
1.1 Modelling data spaces
Topological data analysis (TDA) and particularly its subfield persistent homology, or persistence, aim at quantifying the global connectivity structure of data sets [1, 2, 3]. Given a set of data points it is often possible to endow it with some reasonable notion of relation between points, e.g. distance measure or correlation. Study of the connectivity is facilitated by first combining points into larger entities called simplices. A -simplex is a declared subset of related points from the data set. Collection of simplices makes up a simplicial complex , namely it is a collection of certain subsets of the data. Requirements are that if is a simplex in then any subset of is also a simplex in and that the intersection of two simplices is a simplex or the empty set. Above we have described an abstract simplicial complex. Simplices can always be realized geometrically in some as convex hulls of their vertices: 0-simplices as points, 1-simplices as line segments, 2-simplices as filled triangles, 3-simplices as filled tetrahedra etc.
Simplicial complex is hence a model of the relational structure in the data. Relational structure can be modelled by a graph but graphs only consider pairwise relations between points. In many cases it makes sense to use higher-dimensional connectivity instead modelled with simplices. As a justification consider the example explained in Fig. 1. More fundamental reason is that the simpicial approach views data as spaces spanned by their points and enables the use of powerful mathematical machinery of algebraic topology for the analysis of these spaces, as will be outlined in the following section.
For persistence analysis we define the relation to be a function on the data with values in , i.e. for data points and . Concretely we say that data points create a -simplex at scale if the points satisfy pairwise This construction is called the Vietoris-Rips simplicial complex at scale . At fixed scale we can then study the connectivity structure. As a standard example, when data is endowed with a distance measure, clustering at some fixed scale corresponds to the 0-dimensional connectivity by only looking at the connected components of the simplicial complex. Simplicial complexes can also contain 1-dimensional connectivity information in the form of loops and holes (see Fig. 1), 2-dimensional information in the form of voids or cavities, etc. These are collectively called topological features.
Persistence aims to quantify the topological features in a data set and use this information for data analysis. Loop structure might signal about a recurrent dynamics of the phenomenon behind the data. Various dimensional voids can mark lack of information and connectivity or insufficient data collection. Finding such voids in data sets has aroused interest in different areas of data analysis community, see for example  and references therein. As noted in 
, voids can also indicate non-allowed combinations of feature values of data vectors.
One immediate difficulty arises in the simplicial modelling above: what is the appropriate scale of to capture the connectivity in various dimensions of an arbitrary set of points? Persistence circumvents this by forming simplicial complexes at all scales and capturing the evolution of topological features. If a simplex is generated at scale it is then present at any subsequent scale and the simplicial complexes are connected by inclusions: for The end result of the modelling step is then a mapping called filtration, , where denotes a data set with real-valued relation and denotes an -parameterized sequence of simplicial complexes and inclusions.
1.2 Algebraic fingerprinting
Filtration contains all the information about the relations in the data set on various scales. It is therefore very complicated object for infering the global structure of data and simplification is thus necessary. TDA employs tools from mathematical field of algebraic topology, essentially it uses homology of simplicial complexes which transforms the geometric information into algebraic information. We will outline the algorithm for computing homology to illustrate its very implementable nature and to gain intuition on why we are interested in homology in data analysis. For details into homology and its computation see [5, 6, 7]. For simplicity we fix the field of coefficients to be , the field with two elements 0 and 1. Let be a simplicial complex and denote by its set of -simplices. Concretely, consists of the points of the original data set.
1) Choose an ordering (starting from zero) on and use it to order elements in any simplex. If is a 2-simplex in Fig. 1, fix the order in which the points are listed and denote this ordered simplex by
2) For natural numbers and and a simplex in , define a function such that is a simplex in formed by removing from its -th element. The ordering on was needed to specify the -th element in a simplex. For example,
3) For any natural number , let be the vector space over with a base given by all simplices in . An element in is then given by a linear combination The base for of the simplicial complex in Fig. 1 would be [Arya, Bran, Jon] whereas [Arya, Bran]+[Bran, Jon]+[Arya, Jon] would be linear combination of three basis elements in .
4) Define to be the linear function assigning to a base element given by a simplex in the linear combination of -1-simplices. The map is called the boundary operator. Then The boundary operator thus formalizes the intuition that forms the boundary of Define and for , where 0 denotes the zero vector space.
5) The boundary operators connect the various simplices of a simplicial complex together. Computationally the matrices of boundary operators store the global connectivity information in their elements, with coefficient field these are just binary matrices. Homology on degree of a simplicial complex (over coefficients ) is then defined as a quotient vector space:
As noted in step 4) above, some 1-simplices might form the boundary of a 2-simplex. Some 1-simplices on the other hand might form the boundary of an actual hole in the simplicial complex as in Fig. 1. Similarly some -1-simplices might form the boundary of a -simplex and some might form the boundary of a -dimensional hole. By its definition homology quotients out linear combinations of simplices that are boundaries and we are left with those that actually represent linearly independent -dimensional holes in the complex. For , measures the number of linearly independent points that make up boundaries of 1-simplices, effectively the number of connected components.
Homology thus gives us exactly the global connectivity information of the relational structure of data that we seek. The full complexity of a filtration is now simplified by applying homology on degree . Each simplicial complex is turned into a homology vector space and the inclusion functions are turned into linear maps. The result is an -parameterized sequence of vector spaces and linear maps: We will abbreviate as . In this parameterized sequence the dimensions of homology vector spaces encode topological information: effectively measuring the number of connected components, measuring the number of one-dimensional holes and those of -dimensional voids at scale .
This algebraic step gives a mapping The obtained result is not an arbitrary -parameterized vector space. The vector spaces are finite dimensional and there are finitely many numbers in such that the map may not be an isomorphism only if , for in . These considerations follow from the fact that data sets always contain only finite number of points so topological changes in the relational structure can only occur in discrete steps. Such parameterized vector spaces are called tame . An essential result in persistence theory is that any tame -parameterized vector space decomposes into interval indecomposables called bars and the collection of bars in such a decomposition is unique . Bars are enumerated by pairs of numbers in . The bar at scale is either a one dimensional vector space, if , and the zero vector space otherwise. The maps between any non-zero vector spaces in a bar are isomorphisms. For a bar , some topological feature is understood to have appeared in the simplicial complex at filtration value . It is then present in the subsequent simplicial complexes until filtration value . For example, points in the data might connect to create a 1-dimensional loop. This loop persists until at some larger filtration value the points connect further to higher dimensional simplices and the loop vanishes. The bar decomposition can be visualized in a stem plot on a -coordinate system as shown later in Fig. 3.
2 Topological learning
The actual data analysis step in persistence pipeline is to infer information from the -parameterized sequence of homology vector spaces and linear maps obtained from the map constructed above. To simplify notation we let -Vec denote the space of tame -parameterized sequences of vector spaces Our framework of extracting information from objects in this space is through stabilizing a rank invariant attached to them. Aim of the paper is on the practical data analysis aspects and we only outline the theoretical backgound. For more details we refer to [8, 14, 15].
2.1 Rank invariant
The rank, or the dimension, is the fundamental invariant characterizing vector spaces. Similarly we want to assign rank for sequences of vector spaces in -Vec. Let be in -Vec. Due to tameness there is a sequence in such that is not an isomorphism only if . Recall that for a linear map its cokernel is the quotient vector space of by the image of : We then define
where is the homology vector space in at filtration value 0. Let us consider what information carries. Since the maps are not isomorphisms the cokernels may not be zero. The quotient by the image removes from the homology vector space the generators, or basis elements, which come from previous non-isomorphic homology vector space. is thus a vector space of the new homology generators that appear in the sequence of homology vector spaces. In the context of filtrations of input data sets, this is a way of keeping track of how topological features created by the relational structure evolve in the simplicial complexes of the filtration.
For in -Vec, its rank is now defined to be a discrete invariant given by the number
2.2 Hierarchical stabilization and contour metrics
The rank defined above is not a stable invariant. Effectively the number measures the smallest number of homology generators of . A small perturbation of input data can result in a number of non-essential homology generators. We therefore seek to stabilize the rank invariant to deal with inherent noise in data. Our approach is a general framework for stabilizing discrete invariants.
Let be a set of interesting objects and the attached invariant. For us is of course a collection of -parameterized vector spaces associated to data sets with -valued relation and is the rank. The key in converting a discrete invariant into a stable one is to choose a (pseudo)metric on . Once a metric is chosen, we can define an -radius ball around , , and look at the function taking the minimum value of on balls around with increasing radii :
Since we are minimizing the invariant in larger and larger balls around , the function is decreasing and piecewise constant, namely a simple function. Due to being a decreasing function with non-negative values, there is some such that for all in , . The function is thus eventually constant with a limit, .
The needed metrics in the stabilization can be shown  to arise from so called contours. Contour is function satisfying the following inequalities for all in :
, for and
For example, , and with a positive number are all examples of contours. The contour is called the standard contour. There is a generic way of producing contours. Let be a function with strictly positive values which we refer to as density. Then it can be shown that the function given by
where for in , we have taken the unique in such that . For more background on contours we refer to .
It is also shown in  how the choice of a contour leads to a pseudometric in -Vec. The stabilization of the rank invariant with respect to the chosen contour is then defined as
As noted above, the stable rank function is decreasing and piecewise constant and from to .
Our approach does not conceptually rely on the bar decomposition of in -Vec. Computation of the decomposition is however standard procedure in persistence analysis with various dedicated implementations  and when the decomposition is given, the stable rank can be computed algorithmically in a very efficient way:
The stable rank of at is thus the number of those bars in the decomposition that satisfy the relation between the start and end points given by the contour. In practical computations the limit of is always zero, or can be set to zero.
By fixing some values of the contour reduces to a single variable function and we can plot it. In Fig. 3 this is illustrated with few values of in the stem plot of a bar decomposition. This visualization is helpful in understanding how the contour affects the stable rank in Eq. 2: the value of stable rank at is the number of bars that reach over the function . If the function has lower values it therefore makes bars relatively longer and vice versa with larger values. The contour can thus be seen as controlling pointwise with respect to the length scale that we use to measure bars.
2.3 Topological learning with stable ranks
The stable rank attached to an input data set is a topological fingerprint of the data. In the actual data analysis task these fingerprints are used in, for example, classifying various data sets. Recall from the construction above that the stable rank is derived by choosing a contour functionwhich induces a metric needed for the stabilization in Eq. 1. Each choice of a contour gives a different stable rank capturing different aspects of the data. The learning step in our pipeline is then to choose an appropriate contour for the analysis at hand and we explore this in Section 3.
As stable ranks are -valued functions we have various choices of metrics for comparing them. In particular we have standard -metrics for :
We can also define interleaving distance between functions and . We first define the set of horizontal shifts of the functions satisfying the indicated inequalities:
The interleaving distance is then defined as the minimum of those shifts:
In Section 3 we use these constructions in demonstrating our approach with concrete data analyses. We emphasize that our approach does not rely on any algebraic decomposition of persistence and is thus applicable to multiparameter persistence . The initial theory behind our pipeline was indeed formulated for multiparameter persistence in  and later specialized for 1-parameter persistence in . In the case of one parameter we obtain the convenient algorithm, Eq. 2, for computing stable rank.
Traditional view in persistence analysis has been that long bars in the bar decomposition are of importance and smaller bars are noise. This view, however, is challenged by many recent studies showing that smaller features carry important information: study of brain artery trees in , functional networks of , analysis of protein structure in  and the relation of observed diffraction peaks to small loops in atomic configurations of amorphous silica in . With our pipeline we can flexibly choose different contours to learn what are in fact the essential features in the data. To produce the bar decompositions we used Ripser software .
3.1 Classifying physical activities
We studied PAMAP2 data obtained from  to classify different physical activities. The data consisted of seven persons performing different activities such as walking, cycling or sitting. Test subjects were fitted with three Inertial Measurements Units (IMUs) and a heart rate monitor. Measurements were registered every 0.1 seconds. Each IMU measured 3D acceleration, 3D gyroscopic and 3D magnetometer data. One data set thus consisted 28-dimensional data points indexed by 0.1 second timesteps.
We looked at two activities which from the outset are very similar and expected to be difficult to distinguish: ascending and descending stairs. For the analysis we randomly sampled without replacement 100 points from each data set, repeated 100 times. For each subject we thus obtained 100 resamplings from the activity data and computed their stable ranks with respect to a chosen contour. Out of these we computed the point-wise means of 40 stable ranks in and . These means were used as classifiers, denoted by and . Altogether we had 14 classifier pairs corresponding to all (subject, activity) combinations. Remaining 60 stable ranks in and were used as test data and denoted by and . For a test pair we found
by computing distances between the test pair and all classifier pairs. The classification is successful if the minimum is obtained with and belonging to the same (subject, activity) class in both and .
For cross validation we randomly sampled which of the stable ranks constitute classifier and which are test data for the class. Result for 20-fold cross validation is shown in the confusion matrix on the left in Fig.2 for the standard contour. Each cell of the confusion matrix is the number of classifications in the corresponding classifier (columns) and test data (rows) pair relative to the total number of test stable ranks which was 60. Correct classifications are on the diagonal. Overall accuracy (mean over diagonal of the confusion matrix) with standard contour was 60%.
We then repeated the above cross validation process but using a different contour in computing stable rank. Contour was obtained from the density function on the left side of Fig. 3. Contour lines and the bars from persistence computation are visualized on the right side of Fig. 3. This contour puts more weight on topological features appearing with larger filtration scales. Cross-validation results are shown on the right in Fig. 2. Overall accuracy increased to 65%. Note particularly increase in the accuracy of subject 4. Also noteworthy is that ascendings mainly get confused with ascendings of different subjects and the same for descendings. These (subject,activity) data thus exhibit different character and changing the contour we could make this difference more pronounced.
3.2 Cloud pattern characterization
We analysed the spatial distribution of shallow cumulus clouds. These clouds form in fair-weather conditions due to the convective transport of heat and moisture in the atmosphere. Convection is a classic example of a pattern-forming system [12, 13]. Cloud formation is known to be influenced by diverse physical processes across spatial scales ranging from molecular sizes to kilometers. Such spatial scales and all their physical variables cannot be explicitly resolved in numerical climate models, which calls for the development of cloud parametrization schemes. Moreover, the spatial distribution of clouds influences their formation processes. It is therefore important to include this distribution in parametrization schemes. This problem has been studied from different perspectives, notably the influence of land surface conditions on cloud formation . Here we describe an approach based on persistence and the use of stable ranks as descriptors of the spatial distribution of clouds. See  for further results and references.
The data was produced by the Dutch Atmospheric Large-Eddy Simulation model and covered the time period between 09:00h and 18:00h during one day, saved for analysis at 15 minute intervals, with model setup similar to that in . We simulated 10 days with different initial conditions. The data consists of large amount of physical information from which cloud fields can be extracted. The spatial simulation domain in coordinates is in kilometers with horizontal resolution of 50 meters and vertical resolution of 40 meters. The computation domain thus consists of cells. A homogeneous land surface is prescribed and the lateral boundaries are periodic. The 3D cloud field from the simulation domain was then flattened in the -direction onto a 2D plane by taking the maximum liquid water content, , values in the vertical direction. The resulting cloud fields are then as visualized in Fig. 4(b).
An important issue in the study of cloud formation is the quantification of spatial organization, or lack thereof, in a given cloud field. While methods to study spatial distributions exist in the statistical literature for objects which can be idealized as points, it is harder to work with objects that possess a spatial extent (i.e. area or volume), as clouds do. This leads to the necessity of computing a point representation for a cloud before being able to assess the spatial distribution of the cloud field. Here we consider three different representations: assigning to each cloud its geometric centroid, its point with maximum value, and a set of its points chosen at random.
A common metric in the assessment of spatial organization is the index , defined as follows. For a two-dimensional cloud field, such as the one shown in Fig. 4(b), index the connected components (the individual clouds) as , and compute their geometric centroids, . We are interested in how the spatial distribution of the compares to what we would expect under complete spatial randomness (CSR), that is, if the centroids represent a realization of a homogeneous Poisson point process. To that end, we consider the nearest-neighbor distances , which are defined as , where
represents the set of all centroids. The cumulative distribution function (CDF) of theis
which in the case of a Poisson point process has the analytic expression
where is the Poisson intensity parameter. The value of is then defined to be the area under the graph , where
is the empirical estimator of. If matches well with , the value of will be close to . A value larger than this suggests spatial clustering, while a smaller one suggests dispersion or regularity.
Let denote the stable rank of with respect to the standard contour (Eq. 2), normalized by its value at . If we define the function , we note that it increases monotonically towards . In fact, since the normalized stable rank at is an indication of the relative amount of homological features that persist beyond , the function can be understood as the empirical CDF of homological persistence.
For realizations of a Poisson point process with intensity parameter , we find that their normalized stable ranks , and therefore also , oscillate within a narrow band (see Fig. 5). At this point we do not have an analytic expression for the stable rank functions obtained from a Poisson point process, but we can define persistent homology analogues to the index via a Monte Carlo procedure by taking the area under the curves defined by . In the case of a point process in the plane we would then get two values and . We define the index as their arithmetic mean,
We tested the performance of the index defined above, and compared it to the corresponding values of in the dataset consisting of 360 distinct cloud fields (36 per simulation day). The values of both indices are shown in Fig. 6. Each panel shows the values of each index for all cloud fields, computed using 4 different point representations. Panel A shows the values obtained from assigning to each connected component its point with maximum value (local maxima); panel B shows the indices obtained when using the local maxima but only of those components with size at least 3 grid cells (all smaller components are ignored). Panel C shows the results of using the geometric centroid of each connected component. Finally, for panel D the geometric centroids were used after discarding the smaller components. These small components can be attributed to numerical imprecision in the underlying model, and hence are not physically meaningful.
As discussed above, if these indices have a value close to , it would indicate that the point process that they are evaluated on is close to complete spatial randomness, or a Poisson point process. In the simulations used here, we have cause to expect spatially random behavior: the domain size is too small to allow for deep convection and spatial organization to happen. Moreover, the lack of land surface features or patterns means there are no forcings at different spatial scales. Thus the spatial distribution of physical variables is dominated by the characteristic patterns present in atmospheric turbulence, itself an essentially random process. The values of the persistent homology index strongly support this hypothesis, while exhibits values in general larger than . This can be attributed to the fact that it is based on nearest-neighbor distances only, whereas the stable rank functions reflect the spatial relationships of the points throughout all spatial scales. This is confirmed by the fact that removing the smaller structures in the fields (those less than 3 grid cells in size) brings the values of closer to on average, whereas the average for is barely affected. This highlights the fact that, by virtue of using all the spatial information available, the persistent homology based method is inherently more robust than any nearest-neighbor method.
This result has been arrived at by using the standard contour only, which implies that spatial randomness in these cloud fields is obtained when all spatial scales present in the data are given the same weight. It is possible to obtain different morphological classifications of the same fields by using alternative contours, which emphasize spatial features differently at varying scales, as presented with the classification in Section 3.1. We used standard contour and contours visualized in Fig. 7. These contours are referred to as contour 1, denoted , and contour 2, denoted .
To reduce the effect of sampling, 10 random samples were drawn from each of the 360 cloud fields, with sample rate 5% of cloud size. To each cloud field we assign the mean stable rank of these 10 samples. Stable ranks were computed in with respect to standard contour, contour 1 and contour 2 and normalized to give function as explained above. After removing those cloud fields without features, we have 254 normalized stable ranks for each class of contours. Distance matrices using interleaving, - and -metrics (see Section 2.3) were then computed for the three different classes of stable ranks. Dendrograms from the distance matrices were visually analyzed to decide on a number of clusters of stable ranks. From these computations the interleaving distance gave the clearest clustering results. With respect to contours, and gave better clustering than standard contour.
An example of diverging morphological characteristics educed from the clustering schemes is shown in Fig. 8: (a) and (b) are representatives of two different clusters obtained by using contour , while (c) and (d) stem from clusters in the classification. As expected from the definition of the contours, the classifications they induce are influenced by different spatial scales. Namely, despite the fact that cloud fields a) and b) have identical cloud cover, and their values are very similar, the large-scale distribution of the individual clouds is significantly different for both. In similar fashion, both c) and d) are indistinguishable in terms of cloud cover and , yet are distinguished by the spatial pattern of smaller structures, even if the large-scale distribution is similar in both.
This study of cloud fields shows that the use of stable rank functions as descriptors for spatial distributions can reveal morphological properties which other methods cannot. Crucially, the possibility of changing the contour enriches the scope for determining such properties. Future investigation in this direction will address questions such as: what the optimal contour is for a given problem, what these methods can reveal about the temporal evolution of cloud formation, and how the homological properties thus discovered can be related to different physical variables in the system. From general data analysis point of view, particularly the optimization of contours is crucial for making our pipeline a full scale machine learning approach.
We gratefully acknowledge Roel Neggers for providing the DALES simulation data. JLS acknowledges support by the DFG-funded transregional research collaborative TR32 on Patterns in Soil–Vegetation–Atmosphere Systems.
Carlsson, G.: Topological pattern recognition for point cloud data. Acta Numerica23, pp. 289–368 (2014)
-  Oudot, S.: Persistence Theory: From Quiver Representations to Data Analysis. American Mathematical Society, Providence, RI (2015)
Otter, N., Porter, M., Tillmann, U., Grindrod, P., Harrington, H.: A Roadmap for the Computation of Persistent Homology. EPJ Data Science6(17), (2017)
Lemley, J., Jagodzinski, F., Andonie, R.: Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-rectangles in High Dimensional Data. In: IEEE 40th Annual Computer Software and Applications Conference, pp. 563–571. IEEE Computer Society, Los Alamitos, CA (2016)
-  Rotman, J.: An Introduction to Algebraic Topology. Springer, New York, NY (1998)
-  Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence, RI (2010)
-  Kaczynski, T., Mischaikow, K., Mrozek, M.: Computational Homology. Springer, New York, NY (2004)
-  Chachólski, W., Lundman, A., Ramanujam, R:, Scolamiero, M., Öberg, S.: Multidimensional Persistence and Noise. Foundations of Computational Mathematics 17(6), pp. 1367–1406 (2017)
-  Zomorodian, A., Carlsson, G.: Computing Persistent Homology. Discrete & Computational Geometry 33(2), pp. 249–274 (2005)
-  PAMAP, Physical Activity Monitoring for Aging People homepage, http://www.pamap.org
-  Licón-Saláiz, J., Riihimäki, H., van Laar, T.: Topological characterization of shallow cumulus cloud fields using persistent homology. In: Proceedings of the 8th International Workshop on Climate Informatics, pp. 107–110. National Center for Atmospheric Research, Boulder, CO (2018)
-  Mizushima, J.: Mechanism of the Pattern Formation in Rayleigh–Bénard Convection. Journal of the Physical Society of Japan 63(1), pp. 101–110 (1994)
-  Cerisier, P., Rahal, S., Rivier, N.: Topological correlations in Bénard–Marangoni convective structure. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 54(5), pp. 5086–5094 (1996)
-  Chachólski, W., Riihimäki, H.: Metrics and stabilization in one parameter persistence. arXiv:1904.02905, (2019)
-  Gäfvert, O., Chachólski, W.: Stable Invariants for Multidimensional Persistence. arXiv:1703.03632, (2017)
-  Carlsson, G., Zomorodian, A.: The theory of multidimensional persistence. Discrete & Computational Geometry 42(1), pp. 71–93 (2009)
-  Bendich, P., Marron, J., Miller, E., Pieloch, A., Skwerer, S.: Persistent homology analysis of brain artery trees. The Annals of Applied Statistics 10(1), pp. 198–218 (2016)
-  Stolz, B., Harrington, H., Porter, M.: Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos 27(4), pp. 047410-1–047410-17 (2017)
-  Xia, K., Wei, G.-W.: Persistent homology analysis of protein structure, flexibility, and folding. International Journal for Numerical Methods in Biomedical Engineering 30(8), pp. 814–844 (2014)
-  Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E., Matsue, K., Nishiura, Y.: Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of National Academy of Sciences 113(26), pp. 7035–7040 (2016)
-  Tompkins, A., Semie, A.: Organization of tropical convection in low vertical wind shears: Role of updraft entrainment. Journal of Advances in Modeling Earth Systems 9(2), pp. 1046–1068 (2017)
-  Bauer, U., Ripser software. github.com/Ripser/ripser
-  Rieck, M., Hohenegger, C., van Heerwaarden, C. C.: The Influence of Land Surface Heterogeneities on Cloud Size Development. Monthly Weather Review 142(10), pp. 3830–3846 (2014)
-  Neggers, R. A., Siebesma, A., and Heus, T.: Continuous single-column model evaluation at a permanent meteorological supersite. Bulletin of the American Meteorological Society 93(9), pp. 1389–1400 (2012)