Elastic depths for detecting shape anomalies in functional data

07/15/2019 ∙ by Trevor Harris, et al. ∙ University of Illinois at Urbana-Champaign 0

We propose a new depth metric called elastic depth that can be used to greatly improve shape anomaly detection in functional data. Shape anomalies are functions that have considerably different geometric forms or features than the rest of the data. Identifying them is far more difficult than identifying magnitude anomalies because shape anomalies are often not separable from the bulk of the data with visualization methods. The proposed elastic depths use the recently developed elastic distances to directly measure the centrality of functions in the amplitude and phase spaces. Measuring shape outlyingness in these spaces provides a rigorous quantification of shape which in turn gives the elastic depths a strong practical advantage over other methods in detecting anomalies. A simple boxplot and thresholding method are introduced to identify shape anomalies using the elastic depths. We assess the elastic depth's detection skill on simulated shape outlier scenarios and compare it against popular shape anomaly detectors. Finally, bond yields, image outlines, and hurricane tracks are used to demonstrate our methods applicability to functional data on three different manifolds.



There are no comments yet.


page 17

page 25

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As data collection methods are rapidly advancing, functional data is becoming more prevalent. Functional data refers to data collected continuously across time and/or space where an observation is an entire curve or surface, rather than a single value. Common examples of functional data include Berkeley growth rate curves, electrocardiogram (ECG) data, imaging data containing geometric shapes, hurricane tracks, etc. Due to the upsurge of available data, functional data analysis (FDA) has become a popular and quickly growing field of study. As with traditional data analysis methods, it is critical to perform exploratory data analysis with functional data to identify significant trends or anomalies which could bias any post-processing analysis. The functional anomalies are also interesting in their own right and can be the primary focus of study, for example, anomalously shaped bond yield curves (Section 5) can be indicators of an impending recession.

In the functional data setting, the identification of functional anomalies, i.e. identifying an entire function as an outlier, is not as straightforward as visualizing scalar outliers with univariate boxplots. By definition, a functional anomaly is a function that is significantly more extreme (either negatively or positively) than the majority of the collection of functions. Since functional data lacks the natural order inherent to univariate data, a data ranking procedure which can induce a functional ordering is necessary to identify which functions are most central, e.g. identify a median or mean function, and to identify anomalies. Generally, functional outliers are categorized into two types: magnitude or shape. Magnitude outliers are functions that clearly lie outside all other functions and are usually detected through visual data visualization methods

(e.g., Hyndman and Shang, 2010; Sun and Genton, 2011; Myllymäki et al., 2017). Shape outliers, on the other hand, are difficult to identify using data visualization methods. Shape outliers take on a different shape or pattern than all other functions and are easily hiding among them.

Popular methods for identifying shape outliers rely on the notion of data depth, a method used to define the notion of centrality and induce ordering among a set of functions. Many outlier detection methods decomposes total depth/outlyingness into magnitude and shape outlyingness. For example,

Arribas-Gil and Romo (2014) proposed the outliergram visualization tool based on half-space depth (Tukey, 1977)) and band depth (Lopez-Pintado and Romo, 2009). Huang and Sun (2019) decomposed their total variation depth, to accounting for the correlation in functional data. These methods rely on integrated functional depths which as Dai and Genton (2019) pointed out, cannot efficiently capture the centrality of functions. To remedy this, Rousseeuw et al. (2018) and Dai and Genton (2019) proposed the concept of directional outlyingess, which has since been used to modify the functional outlier map (Rousseeuw et al., 2018) and to construct the magnitude-shape plot tool (Dai and Genton, 2018).

Other outlier detection methods proposed in recent literature account for the geometry of the functions. Kuhnt and Rehage (2016) developed functional tangential angle (FUNTA) pseudo-depth based on the tangential angles of the intersections of the centered data while Nagy et al. (2017) proposed versatile modifications of previous depth notions to better identify shape outliers. Xie et al. (2017) separated the variability of functional data into amplitude and phase components using the method of Srivastava et al. (2011) and displayed this variability using independent boxplots for each component.

In this paper, we introduce a new depth measure called elastic depth, which is based on elastic distances, and show how it can be used to vastly improve shape outlier detection. As discussed in Xie et al. (2017), elastic data analysis takes advantage of phase-amplitude separation to enable independent inference for two key components of a function’s shape: phase and amplitude. We show that separating the phase and amplitude of the functions of interest allows for an improved detection of shape outliers over current shape detection approaches. Unlike Xie et al. (2017) we compute amplitude and phase distances between functions instead of pre-aligning them to their Karcher median. This allows us to directly compare their proximity in amplitude and phase space. Another key feature of our method is that it accounts for the underlying geometry of the function space and can be applied to a set of functions on any Riemannian manifold.

This paper is organized in the following way. Section 2 introduces phase and amplitude variability on a manifold and outlines the general framework of elastic shape analysis. Section 3 introduces our new elastic depth and its theoretical properties for the specific manifolds , , and . Section 4 outlines how to implement elastic depth using depth boxplots and depth thresholding as detection tools for shape outliers. These detection methods are compared against nine competing shape outlier detectors on through simulation in Section 5. In Section 6 we then demonstrate elastic depths on real data arising from the three manifolds , , and . Finally we conclude with a brief discussion of the current results and future work in Section 7.

2 Background

We will introduce some concepts that will lay the foundation for the remainder of the paper. First, we will give an overview of elastic shape analysis and general elastic distances for curves and second, we will specifically define amplitude and phase distances for the three specific manifolds , , and .

2.1 Elastic shape analysis

Elastic shape analysis (ESA) is a collection of techniques that rely on phase-amplitude separation to align functions prior to analysis. Phase and amplitude represent two orthogonal components of a function’s shape that can be uncoupled through phase-amplitude separation (Srivastava et al. (2011); Kurtek et al. (2011); Tucker et al. (2013)). Phase-amplitude separation is attractive as a function alignment tool because it is shown to be invariant to the sampling of a trajectory as well as to shape preserving transformations such as rotation, translation, and scale. The alignment process is termed elastic since trajectories may be stretched and compressed but cannot be broken, rearranged, or lengthened.

Consider a simple pairwise example where without loss of generality we wish to align to where and is the class of smooth functions defined on manifold

The goal of phase-amplitude separation is to find a function which minimizes the distance on between and where

We refer to as the phase space and a specific as the phase (or warping) function which aligns to through composition. For more detail on phase-amplitude separation, see Srivastava and Klassen (2016).

ESA is distinguished from other alignment methods by its use of a proper distance metric to align the trajectories. A proper distance metric is constructed by introducing the Transported Square Root Vector Fields (TRSVF) transformation, defined in

2.1 and aligning the TRSVF of each trajectory instead of the trajectories directly. A TSRVF representation, say , of a trajectory is a standardized gradient of . The TSRVF transformation bijectively maps, up to an additive constant, to a subspace of . The exact subspace depends on the manifold . The distance between trajectories and can then be defined by the distance between their respective TSRVF representations,

Definition 2.1.

Let be a smooth trajectory in for some Riemannian manifold with norm . The Transported Square Root Vector Field (TRSVF) of at the point is


Where represents the parallel transport of the tangent vector from to .

TRSVFs rely on parallel transport (Lee, 2001) to “transport” the tangent spaces of each to a common point, allowing for direct comparison of the gradients of trajectories on Because the tangent space of every point in and are identical, TSRVF transformations are unnecessary if or . On there is a simple closed form expression for transporting tangent vectors (gradients) (Su et al., 2014).

Following the simple example above, given two trajectories the optimal warping function is found by aligning to


where is the norm on and also known as the amplitude distance. The warping function found in 2 isolates the misalignment between the phases of to As a result, the aligned functions and capture the difference in amplitude between and .

2.2 Amplitude distances for specific manifolds

In general, the amplitude distance on is defined as


We now present the specific forms of the amplitude distance on the manifolds , , and . Amplitude captures the most distinguishing features of the shape of a trajectory, so having the proper amplitude metric is critical for identifying shape outliers. Phase may also carry pertinent shape information so we discuss its metric in the next section.

Amplitude distance in

Let be the class of smooth functions on mapping to . On the explicit transport of vectors is not necessary so the transport step in the TSRVF can be dropped. The resulting simplified form of the TSRVF is called the Square Root Slope Function (SRSF).

Definition 2.2.

Let be a smooth trajectory in , the Square Root Slope Function (SRSF) of is

The SRSF mapping has the effect of mapping to so the natural metric between two SRSFs is an norm. Following Equation (3), the distance in amplitude space is


Amplitude distance in

Let . For curves in there are two additional sources of variability over those in : scale and rotation. Scale variability is removed by standardizing each curve to have length where is the Euclidean norm. Rotation variability is removed by finding an optimal rotation term , the Special Orthogonal group of matrices. The amplitude space is then the quotient space . Analogous to the SRSF in the previous section we define the Square Root Velocity Function (SRVF) as its multidimensional counterpart.

Definition 2.3.

Let be a smooth trajectory in , then Square Root Velocity Function (SRVF) of is

where is the Euclidean norm.

Because of the length restriction, , this transformation bijectively maps to a unit Hilbert sphere. On this space the intrinsic metric is the arclength and so the amplitude distance is defined as


Amplitude distance in

Let . For smooth functions constrained to live on a sphere, rotation and scaling are not possible. However, explicit parallel transport is necessary because is a non linear manifold. On a sphere, the parallel transport of vectors has a convenient analytical form.

Definition 2.4.

Let be a smooth trajectory in and let then the parallel transport of to the tangent space of the point along the shortest geodesic path is

The TSRVF maps to so we again use the norm to define distances between TSRVFs


where is as defined in Definition 2.1.

2.3 Phase distance

The phase distance is the same for each of the three manifolds and is, in fact, the same for all univariate functions mapping to any Riemannian manifold. This is because the phase space is not defined with respect to the range of the functions, , but only to the domain . To define phase distance we use the optimal that defines the amplitude distance. The phase space is a non-linear manifold with no known geometry so we use the SRSF to map to a known geometry. Phase functions are positive for all and , so the SRSF maps onto the positive orthant of a unit Hilbert Sphere. Thus the phase distance is defined as


where is the identity function. The metric is essentially measuring the amount of elastic deformation needed to compare the amplitudes of and . See Tucker et al. (2013) for more information.

3 Elastic Depth

In the FDA literature, data depth is the dominant method used to define the notion of centrality and induce ordering on a function space. Data depth is the general notion measuring the centrality of observations with respect to a distribution. On a function space , a depth function would map onto such that functions with larger depth values are more central. This mapping induces an ordering on functional data since depths decrease monotonically with centrality. The notion of data depth was initially only feasible for low-dimensional data (p ), e.g. Tukey (1975) defined location depth in the case of bivariate data and Liu (1990) took advantage of the underlying geometry and proposed simplicial depth. Data depth has since been extended to the high-dimensional ( 10 dimensions) and even infinite-dimensional settings. Lopez-Pintado and Romo (2009) proposed band depth (BD) and modified band width (MBD) which define data depth through a graph-based approach and was later made more computationally feasible by Sun et al. (2012). López-Pintado and Romo (2011) introduced the Modified Half-region Depth (MHD) that extended half space depth of Tukey (1977) to the functional case. Depth measures such as random Tukey depth (RTD) (Cuesta-Albertos and Nieto-Reyes, 2008), spatial depth (Chakraborty and Chaudhuri, 2014; Sguera et al., 2014) and (Long and Huang, 2015) were adapted from multivariate data to fit the functional data setting. Most recently, extremal depth measures have been proposed by Myllymäki et al. (2017) and Narisetty and Nair (2016).

In this section, we will introduce the elastic depths and their properties. The concept of an elastic depth builds upon the elastic shape analysis framework and the elastic distances for trajectories outlined in Section 2.1 and 2.2.

3.1 Definition of Elastic Depth

Let be as in Section 2.1, let be a continuous distribution on , and suppose we observe . We define ’s outlyingness in amplitude and outlyingness in phase with respect to as the median distance between and all other functions using their respective metrics. We denote and as the amplitude and phase outlyingness respectively:

where . To convert these outlyingness functions into depth measures we invert them with the type B depth construction of Zuo and Serfling (2000):


and are respectively called the amplitude depth and phase depth of with respect to . Together we denote them the elastic depths.

In general the elastic depths are not properly scaled between 0 and 1 because the elastic distances are in general finitely bounded. The phase distance is bounded below by due to using an arccosine based distance and phase functions being strictly positive. The amplitude distances on and are similarly bounded below by . Only the amplitude depth on is properly scaled because the amplitude distance on is unbounded. Rescaling the depths to be properly supported on is straightforward but unnecessary for outlier detection since ordering remains the same.

3.2 Properties

Within the depth literature there have been many desirable properties discussed for both multivariate and functional data depths, see Zuo and Serfling (2000) and Mosler and Polyakova (2012) for comprehensive reviews. These properties ensure that a depth function properly measures the notion of depth or centrality. For instance, a depth function needs to be location and scale invariant (or equivariant) and it should decrease monotonically from a natural point of symmetry. Since our depth is purely for functional data we concentrate on the central properties of Mosler and Polyakova (2012). These properties are established for the amplitude depths because amplitude is the primary concern of shape analysis.

The elastic depths are based on proper distance metrics so they inherit certain properties such as translation invariance and scale equivariance automatically. On some manifolds, such as or , scale equivariance can be promoted to scale invariance since the trajectories and warping functions are constrained to live on an ball. Invariance to simultaneous reparameterization (rearrangement invariance) was shown in Srivastava et al. (2011) for amplitude distances in and and then later extended to in Su et al. (2014). Consequently the Amplitude depths are also invariant to simultaneous reparameterization. We outline some of the key properties useful for shape outlier detection below.

Phase invariance: Let be a random phase function from , then

This property is unique to the elastic depths and ensures that no matter which warping function the data are observed under, the amplitude depths will remain the same. This follows by the definition of amplitude distance as a minimizer over .

Maximality of the center: Let be a distribution on with a single point of symmetry, i.e. Fréchet median, . Then .

Maximality of the center guarantees that the maximizer of the amplitude depths, denoted the amplitude depth median, is the true Fréchet median of the distribution. The Fréchet median is the trajectory which minimizes the expected distance between itself and all other points in the space. Since is symmetric this trajectory also minimizes the median distance between itself and all over points. The minimizer of the median amplitude distance is the maximizer of amplitude depth so Maximality of the center holds.

The depth median also has a finite sample breakdown of 1/2, which is the highest that can be achieved. An estimator’s breakdown point describes the fraction of the data that can be made arbitrarily large before the estimator becomes arbitrarily poor. Usually this indicates a level of robustness in the estimator with high breakdown points corresponding to greater robustness. This holds because the depth functions are bijective transformations of a median which has a finite sample breakdown of 1/2.

Level set convexity: Let be the upper level sets for the amplitude elastic depth, then for all , is a convex set. Similarly the upper level sets for the phase elastic depth are convex for all . See Appendix for proof.

Convexity of the level sets implies that depths decrease monotonically from the center of the distribution. In conjunction with Maximality of the Center, level set convexity guarantees that the elastic depth is measuring centrality in amplitude space or phase space respectively. This property further distinguishes elastic depth from previous depth notions because they do not directly characterize centrality in the appropriate shape spaces. It is also desirable for constructing valid level central regions which leads to our notion of depth thresholding or outlier thresholding in Section 4.2.

3.3 Estimating Elastic Depths

Let and let . The elastic depths of can be estimated empirically by using the sample median in the outlyingness function. For a sample size we denote the empirical outlyingness functions as


The empirical elastic depths of are in turn noted as and for amplitude and phase respectively. The following proposition asserts the uniform consistency of this depth estimator. Proof is deferred until the appendix.

Proposition 3.1 (Uniform Consistency).

Let P be a distribution on , let , and let represent the empirical distribution of the sample. Then

for both amplitude and phase elastic depths.

4 Identifying outliers

Data depth is a natural framework for outlier detection because it provides a center-outwards ordering of the data. Functions with very low depth values are strong candidates for outliers. As mentioned in Section 1 there have been many methods, all based on functional depth in some way, for detecting shape outliers proposed in the literature. These methods typically involve some procedure for constructing an outlier cutoff point on the depths. In the next two sections we introduce two simple ways of defining a cutoff point and how they may be combined to improve outlier detection.

4.1 Depth Boxplots

To detect outliers we introduce a type of half-boxplot, called the Depth Boxplot, which is computed directly on the depth values instead of on the observed trajectories. The idea is that while shape anomalies may not be separable in , they are separable in either phase space or amplitude space . We have shown in Section 3.2 that amplitude and phase depths monotonically decrease as trajectories become more outlying in amplitude and phase. Therefore, shape anomalies can be identified as trajectories with outlying depth values, i.e. depth values that are much smaller than the rest.

To construct the depth boxplot we set the median point to be the largest observed depth value. The inner quartile range (IQR) (or box) is the set of depths values corresponding to the 50% central region and the whisker is 1.5 times the length of the IQR. An example is presented in Figure

1, where boxplots are constructed over the amplitude and the phase depths separately. Trajectories with depths lower than the whiskers are taken to be potential anomalies in either amplitude or phase.

Figure 1: Example of depth boxplots on both phase and amplitude depths.

Depth boxplots provide a principled and consistent way of identifying shape anomalies but they suffer from a few limitations. When the functional distribution has very low variance or a heavy tail then the boxplot tends to overestimate the number of outliers. More generally there is no guarantee that 1.5 times the depth IQR will isolate any and all anomalies. More sophisticated schemes based on smoothed bootstrapping have been proposed such as

Kuhnt and Rehage (2016) that define a data dependent cutoff. We found through simulations that similar performance can be achieved when the boxplot is used in conjunction with the depth thresholding defined next.

4.2 Depth Thresholding

Because the Elastic depths induce a proper centrality ranking in amplitude and phase space, trajectories with the smallest depths must be the most outlying. We could therefore identify potential anomalies as those trajectories with depths below the quantile of the observed depth values. Thresholding the depth values in this way is useful for examining the most extreme shapes present in the data. However, though these trajectories are guaranteed to be the least central they are not necessarily outlying. Even if no outliers are present, thresholding above the quantile will still return at least one trajectory. For automatic detection, thresholding is more effective in conjunction with the depth boxplots. We consider a trajectory a potential shape outlier if its depth value is below both the quantile of the depths and below the cutoff value of the depth boxplot. More formally, let and let be their amplitude (or phase) depths. Let be the cutoff value from the depth boxplot over and let be the quantile of . A trajectory is considered outlying if it’s amplitude (or phase) depth . This definition effectively extends the boxplot cutoff value depending on the data.

5 Simulation Study

A simulation study was conducted to comprehensively asses the performance of the elastic depths and the depth boxplots. We compared our method against nine other shape anomaly detectors: the Outliergram (OG), Sequential Transformations (ST-T1, ST-T2, ST-D1), the Functional Outlier Map (FOM), Total Variation Depth (TVD), the Magnitude-Shape (MS) plot, the Functional Tangential Angle Pseudo-depth (rFUNTA), Order Extended Integrated Depth (FDJ and IDJ), Geometric boxplots (GEOM), and Directional Outlyingness (DIR).

Under Sequential transformations we used T1, T2, and D1, which correspond to mean removal, standardized mean removal and the first order derivative, respectively. The directional quantiles were used as the underlying depth measure. The MS plot used the projection depth, FOM used directional outlyingness, and the Order Extended Integrated Depth used the Integrated Halfspace depth. For FUNTA we used the robustified version rFUNTA. The Elastic depths were subdivided into Amplitude Depth (ED-A), Phase Depth (ED-P), and a combined depth (ED-B). The ED-B method identified outliers as those trajectories that were outliers under either ED-A or ED-P.

We define six different shape outlyingness scenarios to test the effectiveness of the above shape outlier detectors. Each of these scenarios is represented by one of the six models detailed below.

Model 1 (Amplitude Increase): Main model: and Contamination model: , where , is a centered Gaussian process with covariance function , and is a random additive translation term. The purpose of is to shift each curve by a random amount so as to mask shape outliers which could accidentally be identified as magnitude outliers.

Model 2 (Amplitude Decrease): Main model: and Contamination model: , where , and is the Gaussian process from Model 1.

Model 3 (Phase Contamination): Main model: and Contamination model: , where and is a random phase function from . The functions are generated from the first two Fourier basis functions with random amplitudes distributed as on the tangent space to the unit Hilbert sphere. We use to impose a large amount of phase variability on the contamination model.

Model 4 (Covariance change): Main model: and Contamination model: , where and and are centered Gaussian processes with covariance functions and , respectively.

Model 5 (Frequency Increase): Main model: and Contamination model: where and is the Gaussian process from Model 1.

Model 6 (Jump contamination): Main model: and Contamination model: , where and T is distributed uniformly on .

Figure 2: Main model (solid lines) v.s. Contamination model (dashed lines) in each of the outlier models.

Each of these models, except model 3, were then further contaminated with two additional sources of noise: compositional (phase) noise and magnitude outliers. Model 3 was only contaminated with magnitude outliers because adding phase noise would destroy the difference in phases that we are trying to detect. These two noise sources introduce a level of realism to our simulations because nuisance phase and magnitude outlyingness are often present when analyzing shapes. The base models without additional noise are pictured in Figure 2.

5.1 Contamination by a Single Anomaly

We first considered the case when a single outlier is present in the data. We compared each method on their ability to rank the outlier as the most outlying function. For each of the six outlier models we sampled 99 functions from the main model and 1 function from the contamination model, with compositional noise and magnitude outliers added to both. Each trajectory was sampled on the same equidistant 30 point grid over . 1000 simulations were used for each outlier model to estimate the average ranks in Table 1.

Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
1 ED-B 1.000 1.001 2.146 1.008 1.000 1.001
2 ED-A 1.000 1.002 8.808 1.016 1.000 1.001
3 rFUNTA 1.430 23.192 1.802 7.564 1.000 2.044
4 TVD 1.000 26.341 2.355 6.228 1.000 1.226
5 ST-T1 1.000 20.125 1.979 15.361 1.058 1.000
6 ST-T2 1.285 13.582 1.591 15.700 1.712 10.909
7 FOM 1.976 20.389 3.014 16.464 2.035 7.014
8 ED-P 62.832 1.027 2.689 2.012 2.831 11.151
9 DIR 1.024 40.764 10.082 33.504 4.380 1.011
10 MS 1.000 44.306 9.412 36.242 2.370 1.000
11 OG 9.887 35.803 17.344 29.691 12.083 10.004
12 FDJ 9.576 40.953 12.002 38.746 14.959 11.190
13 GEOM 1.000 99.995 92.303 30.776 1.002 7.590
14 IDJ 33.140 40.763 42.730 38.871 40.575 38.972
15 ST-D1 51.031 50.828 53.573 51.779 52.746 52.656
Table 1: Average rank of the single outlier by detection method and outlier model. Lower ranks (closer to 1) indicate the detection method ranked the outlier as more outlying. Bold font indicates the top results in each outlier model.

The results are sorted so that the methods with the lowest average ranking (ED-B and ED-A) across all models are listed first and those with the highest are last. On each of the six outlier models, except model 3, both ED-B and ED-A had average rankings very near 1.000. This means that in most scenarios both of these methods can correctly rank the outlier as the most outlying function. Other methods such as TVD and ST-T1 succeed on models 1,5, and 6 where they almost always correctly rank the outlier but fail on models 2 and 4 where their average ranks are far from 1.000. The only meaningful difference between ED-B and ED-A is on model 3. Because ED-B can leverage phase information from the phase depth, ED-B tends to identify the phase outlier as more outlying than amplitude information alone could.

The results on models 2 and 5, amplitude decrease and frequency increase respectively, demonstrate an important feature of the elastic depths, which is not shared by other methods. Because the elastic depths are based on proper distance metrics in the amplitude and phase spaces, they can detect trajectories that lack amplitude and phase characteristics present in the rest of the data. Having either too low of an amplitude (phase) or too high of an amplitude (phase) are both indicative of shape outlyingness. Prior methods have primarily focused on trajectories with too high of an amplitude and thus struggle to adequately handle models 2 and 4.

5.2 Contamination by Multiple Anomalies

Next we considered the case when 10% of the data is outlying in shape. We compared the performance of the detection methods on the six outlier models through their False Positive Rates (FPR) and False Negative Rates (FNR) of outlier classification. The FPR and FPR are defined as:

Where are the number of false positives, true positives, false negatives, and true negatives respectively. Compositional noise and magnitude outliers were again added to each of the models save for model 3, where only magnitude outliers were added. This time 90 inlying trajectories and 10 outlying trajectories were sampled from the main model and contamination model respectively. Trajectories were sampled on an equidistant 30 point grid over and 1000 simulations were performed for each of the six models. The results are summarized in Tables 2 (False Positives) and 3 (False Negatives). The elastic depths and sequential transformations were each subdivided as in Section 5.1.

Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
1 ST-D1 0.000 0.000 0.000 0.000 0.000 0.000
2 GEOM 0.000 0.000 0.000 0.000 0.000 0.000
3 IDJ 0.000 0.000 0.002 0.000 0.000 0.000
4 rFUNTA 0.000 0.016 0.000 0.005 0.000 0.000
5 FDJ 0.002 0.007 0.004 0.004 0.003 0.003
6 ST-T1 0.001 0.001 0.000 0.024 0.000 0.001
7 ST-T2 0.004 0.003 0.000 0.030 0.002 0.003
8 TVD 0.012 0.032 0.001 0.013 0.001 0.012
9 ED-A 0.038 0.036 0.023 0.009 0.003 0.039
10 OG 0.019 0.029 0.015 0.048 0.023 0.015
11 ED-P 0.045 0.026 0.021 0.129 0.010 0.027
12 ED-B 0.076 0.059 0.037 0.136 0.012 0.062
13 FOM 0.100 0.101 0.100 0.100 0.100 0.100
14 DIR 0.108 0.107 0.111 0.121 0.109 0.105
15 MS 0.147 0.138 0.145 0.179 0.139 0.138
Table 2: False Positive Rate (FPR) for each of the fifteen detection methods on the six outlier models. Results are generally near 0 across the board indicating that most methods incorrectly flag inliers as outliers under any of the models.

The entries in Table 2 were sorted by average FPR across all models, revealing that ST-T1, GEOM, IDJ, and rFUNTA had the lowest overall rates while MS and FOM had the highest. The zero or nearly zero false positives of ST-D1, IDJ, and rFUNTA are largely counteracted by their uniformly high FNRs, see Table 3, indicating that they are overly conservative in their detection. GEOM had a generally low FNR but still struggled on models 3 and 6.

Each of the elastic depth based methods tended to have a slightly higher false positive rates relative to other methods; though they still maintained low rates overall. This is particularly true when the amplitude depth was used to identify amplitude outliers (models 1, 2, 4, 5, 6) and the phase depth was used to identify phase outliers (model 3). In practice the elastic depth’s higher positive rates can be mitigated by employing thresholding (which was not used here) so that only the top m% most outlying functions are considered.

Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
1 ED-B 0.000 0.000 0.053 0.000 0.000 0.000
2 ED-A 0.000 0.000 0.294 0.005 0.000 0.000
3 MS 0.000 0.873 0.024 0.670 0.002 0.000
4 DIR 0.000 0.897 0.044 0.814 0.036 0.000
5 TVD 0.000 0.995 0.043 0.758 0.000 0.041
6 ED-P 0.987 0.002 0.057 0.015 0.705 0.639
7 GEOM 0.000 0.191 0.824 0.654 0.000 0.821
8 ST-T1 0.000 1.000 0.414 0.891 0.520 0.011
9 OG 0.115 0.973 0.559 0.828 0.294 0.127
10 FDJ 0.411 0.999 0.482 0.981 0.533 0.921
11 ST-T2 0.886 0.996 0.146 0.799 0.957 0.993
12 rFUNTA 0.811 1.000 0.777 0.865 0.822 0.744
13 ST-D1 0.838 1.000 0.987 0.959 0.893 0.384
14 FOM 0.894 0.898 0.748 0.897 0.897 0.894
15 IDJ 1.000 1.000 1.000 1.000 1.000 1.000
Table 3: False Negative Rate (FNR) for each of the detection methods on the six outlier models. Unlike the FPR there are strong differences between the various methods. High FNR indicates that the method failed to flag outliers as outlying, indicating overly conservative behavior in light of the consistently low FPRs.

The entries in Table 3 were sorted by their average false negative rate across all models. The false negatives tended to be more concentrated about 0 and 1, indicating that most often either no outliers or all outliers were correctly identified in the simulations. The elastic depth methods, in particular ED-A and ED-B, had the lowest false positive rates across the six outlier models. In fact, they were both able to correctly identify almost every outlier in every simulation for all outlier models except Model 3 (phase contamination). In both of these cases ED-B was able to outperform ED-A since the information from the phase depth was being accounted for.

While ED-B maintains the lowest overall false negative rate, it consistently had a higher false positive rate than ED-A. The ED-A, or amplitude depth, therefore strikes the best balance between false positives and false negatives. In general ED-A should be preferred over a combined metric for detecting shape outliers and ED-P should only be used for detecting phase outliers.

6 Examples

To demonstrate the real-world performance of elastic Depth we applied it to three different data sets, one for each manifold. The first data set is a collection of U.S. Treasury yield curves (), the second is a sample from the MPEG-7 image data set (), and the last are hurricane tracks across the Atlantic Ocean (). We show that the elastic depths can be applied consistently across each of the three manifolds and analyzed using the same boxplot methodology.

6.1 U.S Daily Treasury Yield Curve Rates ()

We first consider the daily U.S. Treasury yield curves from January 2017 to April 2019 (United States Department of the Treasury, 2019). The daily yield curve is a plot of bond terms, or time to maturity, against the associated interest rate on a given day. The shape of the yield curve has long been taken as an indicator of economic activity. Ordinarily the yield curve is monotonically increasing as a function of time to maturity. However, in the months preceding recessions, the yield curve often becomes “flattened” and then “inverted” meaning that bonds with shorter maturity dates start to command higher interest rates than bonds with longer maturity dates.

We collected yield curve data for each day between January 1st 2017 to April 2019 with terms spanning from 1 month to 30 years. Though the yield curve is only supported on a finite set of points, we treat it as though it were a continuous trajectory supported on a compact subset of . This assumption is not altogether unreasonable given the relatively smooth relationship between time to maturity and interest rate. We then applied the elastic depth based boxplots with no thresholding to perform outlier detection, see Figure 5.

Figure 5: Left: Shape (amplitude) outlying daily U.S. Treasury yield curves (dashed lines) against all daily yield curves from Jan. 2017 to Apr. 2019. Right: Amplitude depths for each curve plotted over time. The bulk of the outlying curves ( triangles), i.e. “flat” curves occur during the end of the observation period.

There are two sets of outliers identified in Figure 5. The first is the yield curve on 09/05/17 where the 1 Month interest rate spiked over the 3 Month interest rate. Though the spike was enough to force that day’s yield curve to become outlying, the overall shape of the curve is still monotonic, so this wouldn’t signal a true inversion. The second set of outliers is the group of 26 yield curves, also highlighted in red, that correspond to the end of the observation period. Though these curves are not complete inversions they are considered “flat” since the 3 Month and 30 Year interest rates are similar. Because these flat curves follow a period of regular yield curve behavior they may be taken as a sign of an oncoming yield curve inversion and potentially an economic recession.

6.2 MPEG-7 Shape Data ()

Our next data example comes from the MPEG-7 shape data set (Manjunath et al., 2002). This data set consists of 1300 trajectories in , corresponding to 65 shape classes with 20 observations each. Each trajectory is an outline of some object such as an apple, turtle, or a butterfly sampled from frames in a video. Objects may be rotated, distorted, or magnified with respect to other objects of the same class. For this task we would typically not be interested in studying the phase or rotation variability; but instead try to remove them to consider amplitude alone.

To illustrate our method, we selected 16 shape classes from the available 65. We used a depth thresholding of 0.05 so that at most one outlier would be detected within each shape class. The identified amplitude outliers are displayed in red against the inlying shapes in blue on the left hand side of Figure 8 We can see that, depending on the class, elastic depth is able to identify anomalous trajectories with both too high and too low amplitudes with respect to the rest of the set. For instance, the outlying jellyfish in the second column of row four has its tentacles spreading out from all sides of its body, whereas the inlying jellyfish all have their tentacles on the right. On the other hand, the star in the fourth column of row two lacks the sharp features of the inlying set which is why it too was identified as outlying.

Figure 8: Left: 16 shape classes from MPEG-7 along with their amplitude outliers (dashed lines). Right: Most central function in each shape class (solid line) overlaid with the most outlying function (dashed line) for each of the 16 classes.

In some of the classes, such as the bone or the pentagon, it is not visually obvious why the identified outliers are outlying in shape. To understand why these trajectories may be shaped differently than the rest, we plotted the most outlying trajectory (red) against the most central trajectory (blue) for each class in the right hand side of Figure 8. This contrast makes clear that the bone outlier lacks a protrusion on the top and the pentagon outlier lacks spokes, features which would make them far in amplitude from other trajectories in their class. In other classes, such as the butterfly, there is no single feature which distinguishes the outlies so its outlyingness results from the culmination of numerous small shape differences.

6.3 HURDAT2 Hurricane Tracks ()

Our last data example comes from the National Hurricane Center’s (NHC) Atlantic Hurricane Database (HURDAT2) (Landsea and Franklin, 2013). The NHC assimilates all observations, real time and post-storm, for each tropical cyclone to estimate and record its characteristics and path across the Atlantic Ocean. The HURDAT2 database contains records for 979 tropical cyclone paths of various lengths, shapes, sizes, orientations, and placements. We only consider storms with at least 25 observations because only those paths had sufficient time to develop. The storms were then further subset to include only those originating in the ocean and north of South America.

The typical path of a hurricane is “U” shaped, starting in Africa then cutting across the eastern United States and finally heading back east towards Europe. Due to the great distances hurricanes travel it would be inappropriate to treat them as lying on a Euclidean plane. Instead, we consider them as trajectories on the surface of a unit sphere . We used the elastic depth boxplots with a depth thresholding of 0.05 to limit the number of amplitude outliers to fourteen; the top four of which are pictured in Figure 9.

Figure 9: Four most shape outlying hurricane tracks from the HURDAT2 data overlaid on the entire data set. The starting point for each track is marked by a point and tracks become progressively darker as they develop.

Each of the top four outliers are markedly different from the standard “U” shape. They exhibit an atypical spiraling behavior as they meander across the Atlantic. The identification of shape outliers helps climate scientists further investigate what causes the trajectories to be anomalous. It can be very important for improving the accuracy of hurricane prediction algorithms if the dynamics which produce anomalies are well understood.

7 Discussion

In this paper we proposed a new class of functional depths based on the elastic distance metrics and showed how they may be used to detect shape outliers. The theoretical properties of our new elastic depth were investigated, and it was shown that they satisfy most key properties required for depth metric. These include translation, scale, rearrangement, and phase invariance (equivariance), maximality of the center, convex level sets and monotonicity from the center. Rearrangement invariance and phase invariance were particularly important for detecting shape anomalies because they allowed the amplitude depth to measure centrality independent of phase.

We demonstrated the empirical performance of our method together we nine competing methods using extensive simulation studies. It was shown that our method had the lowest, nearly 0, false negative rates among all methods across all models. The elastic depths, in particular the amplitude depths, were very powerful in their ability to identify of anomalous trajectories under a wide variety of scenarios. However, the exhibited a slightly higher false positive rate compared to other methods. The relatively higher false positive rate, though still low overall, may be further mitigated by the depth thresholding outlined in Section 4.2.

Finally, we showed how the elastic depths may be used to identify outliers on three common data manifolds: , and . The data on were U.S. Treasury yield curves in which we identified a “flattening” pattern developing over April of 2019. On we identified shape outliers in sixteen different shape classes taken from the outlines of objects in a video. Finally, on we looked anomalous hurricane tracks and identified four tacks with wildly different shapes than the typical hurricane. These data examples illustrated the simplicity and consistency with which the Elastic depths may be applied, regardless of the underlying geometry.

Appendix A Appendix

Proof of Proposition 2.1 (Convexity of Level Sets)


Let and be in with amplitudes and in . Let such that the amplitude of is for some . The amplitude distance between and a random in with amplitude can be upper bounded as follows

by the convexity of the amplitude distance. Since this is a convex combination of positive real numbers, , we have that

Since and , by virtue of and in , we see that the outlyingness functions are equivalently bounded, i.e.

Consequently the outlyingness function is also bounded

hence and so is in . Thus the level sets induced on the amplitudes are convex. A similar proof shows that phase level sets are convex as well because the phase distance is convex. ∎

Proof of Proposition 2.2 (Uniform Consistency)


Define the -bracket as the set of all functions such that and . Because is a set of smooth functions, only a finite number of -brackets are needed to cover . Therefore for any there exists bracket such that


By the strong law for sample quantiles for any fixed so the right hand side goes to almost surely. Therefore if we take an sequence converging to 0 we get . Since forms an upper bound on we get the result

Similarly for the phase depth ,


  • Arribas-Gil and Romo (2014) Arribas-Gil, A. and Romo, J. (2014). Shape outlier detection and visualization for functional data: the outliergram. Biostatistics, 15:603–619.
  • Chakraborty and Chaudhuri (2014) Chakraborty, A. and Chaudhuri, P. (2014). On data depth in infinite dimensional spaces. Annals of the Institute of Statistical Mathematics, 66(2):303–324.
  • Cuesta-Albertos and Nieto-Reyes (2008) Cuesta-Albertos, J. A. and Nieto-Reyes, A. (2008). The random tukey depth. Computational Statistics Data Analysis, 52(11):4979–4988.
  • Dai and Genton (2018) Dai, W. and Genton, M. G. (2018). Multivariate functional data visualization and outlier detection. Journal of Computational and Graphical Statistics, 27(4):923–934.
  • Dai and Genton (2019) Dai, W. and Genton, M. G. (2019). Directional outlyingness for multivariate functional data. Computational Statistics Data Analysis, 131:50–65.
  • Huang and Sun (2019) Huang, H. and Sun, Y. (2019). A decomposition of total variation depth for understanding functional outliers. Technometrics, (just-accepted):1–21.
  • Hyndman and Shang (2010) Hyndman, R. J. and Shang, H. L. (2010). “rainbow plots, bagplots, and boxplots for functional data. Journal of Computational and Graphical Statistics, 19:29–45.
  • Kuhnt and Rehage (2016) Kuhnt, S. and Rehage, A. (2016). An angle-based multivariate functional pseudo-depth for shape outlier detection.

    Journal of Multivariate Analysis

    , 146:325–340.
  • Kurtek et al. (2011) Kurtek, S., Srivastava, A., and Wu, W. (2011). Signal estimation under random time-warpings and nonlinear signal alignment. In Proceedings of Neural Information Processing Systems (NIPS).
  • Landsea and Franklin (2013) Landsea, C. W. and Franklin, J. L. (2013). Atlantic hurricane database uncertainty and presentation of a new database format. Monthly Weather Review, 141(10):3576–3592.
  • Lee (2001) Lee, J. M. (2001). Introduction to smooth manifolds. Springer.
  • Liu (1990) Liu, R. (1990). On a notion of data depth based on random simplices. Annals of Statistics, 18(1):405–414.
  • Long and Huang (2015) Long, J. P. and Huang, J. Z. (2015). A study of functional depths. arXiv preprint:1506.01332.
  • Lopez-Pintado and Romo (2009) Lopez-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. Journal of the American Statistical Association, 104:718–734.
  • López-Pintado and Romo (2011) López-Pintado, S. and Romo, J. (2011). A half-region depth for functional data. Computational Statistics & Data Analysis, 55:1679–1695.
  • Manjunath et al. (2002) Manjunath, B. S., Salembier, P., and Sikora, T. (2002). Introduction to MPEG-7: multimedia content description interface. John Wiley & Sons.
  • Mosler and Polyakova (2012) Mosler, K. and Polyakova, Y. (2012). General notions of depth for functional data. arXiv preprint arXiv:1208.1981.
  • Myllymäki et al. (2017) Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H., and Hahn, U. (2017). Global envelope tests for spatial processess. Journal of the Royal Statistical Society: Series B, 79:381–404.
  • Nagy et al. (2017) Nagy, S., Gijbels, L., and Hlubinka, D. (2017). Depth-based recognition of shape outlying functions. Journal of Computational and Graphical Statistics, 26:883–893.
  • Narisetty and Nair (2016) Narisetty, N. and Nair, V. (2016). Extremal depth for functional data and applications. Journal of the American Statistical Association, 111:1705–1714.
  • Rousseeuw et al. (2018) Rousseeuw, P. J., Raymaekers, J., and Hubert, M. (2018). A measure of directional outlyingness with applications to image data and video. Journal of Computational and Graphical Statistics, 27:345–359.
  • Sguera et al. (2014) Sguera, C., Galeano, P., and Lillo, R. (2014). Spatial depth-based classification for functional data. Test, 23(4):725–750.
  • Srivastava et al. (2011) Srivastava, A., Klassen, E., Joshi, S. H., and Jermyn, I. H. (2011). Shape analysis of elastic curves in euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428.
  • Srivastava and Klassen (2016) Srivastava, A. and Klassen, E. P. (2016). Functional and shape data analysis. Springer.
  • Su et al. (2014) Su, J., Kurtek, S., Klassen, E., Srivastava, A., et al. (2014). Statistical analysis of trajectories on riemannian manifolds: bird migration, hurricane tracking and video surveillance. The Annals of Applied Statistics, 8(1):530–552.
  • Sun and Genton (2011) Sun, Y. and Genton, M. (2011). Functional boxplots. Journal of Computational and Graphical Statistics, 20:316–334.
  • Sun et al. (2012) Sun, Y., Genton, M. G., and Nychka, D. W. (2012). Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked? Stat, 1(1):68–74.
  • Tucker et al. (2013) Tucker, J. D., Wu, W., and Srivastava, A. (2013). Generative models for functional data using phase and amplitude separation. Computational Statistics and Data Analysis, 61:50–66.
  • Tukey (1977) Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley.
  • Tukey (1975) Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531.
  • United States Department of the Treasury (2019) United States Department of the Treasury (2019). Daily treasury yield curve rates. https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yield.
  • Xie et al. (2017) Xie, W., Kurtek, S., Bharath, K., and Sun, Y. (2017). A geometric approach to visualization of variability in functional data. Journal of the American Statistical Association, 112(519):979–993.
  • Zuo and Serfling (2000) Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Annals of Statistics, 28(2):461–482.