DeepAI
Log In Sign Up

Computing the Planar β-skeleton Depth

03/15/2018
by   Rasoul Shahsavarifar, et al.
University of New Brunswick
0

For β≥ 1, the β-skeleton depth (_β) of a query point q∈R^d with respect to a distribution function F on R^d is defined as the probability that q is contained within the β-skeleton influence region of a random pair of points from F. The β-skeleton depth of q∈R^d can also be defined with respect to a given data set S⊆R^d. In this case, computing the β-skeleton depth is based on counting all of the β-skeleton influence regions, obtained from pairs of points in S, that contain q. The β-skeleton depth introduces a family of depth functions that contains spherical depth and lens depth for β=1 and β=2, respectively. The straightforward algorithm for computing the β-skeleton depth in dimension d takes O(dn^2). This complexity of computation is a significant advantage of using the β-skeleton depth in multivariate data analysis because unlike most other data depths, the time complexity of the β-skeleton depth grows linearly rather than exponentially in the dimension d. The main results of this paper include two algorithms. The first one is an optimal algorithm that takes Θ(n n) for computing the planar spherical depth, and the second algorithm with the time complexity of O(n^3/2+ϵ) is for computing the planar β-skeleton depth, β >1. By reducing the problem of Element Uniqueness, we prove that computing the β-skeleton depth requires Ω(n n) time. Some geometric properties of β-skeleton depth are also investigated in this paper. These properties indicate that simplicial depth () is linearly bounded by β-skeleton depth. Some experimental bounds for different depth functions are also obtained in this paper.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

05/18/2018

Approximate Data Depth Revisited

Halfspace depth and β-skeleton depth are two types of depth functions in...
05/26/2019

Algorithmic and geometric aspects of data depth with focus on β-skeleton depth

The statistical rank tests play important roles in univariate non-parame...
08/22/2020

BSF-skeleton: A Template for Parallelization of Iterative Numerical Algorithms on Cluster Computing Systems

This article describes a method for creating applications for cluster co...
09/06/2016

Delaunay Triangulation on Skeleton of Flowers for Classification

In this work, we propose a Triangle based approach to classify flower im...
05/27/2019

Hierarchy of Transportation Network Parameters and Hardness Results

The graph parameters highway dimension and skeleton dimension were intro...
08/09/2022

On exact computation of Tukey depth central regions

The Tukey (or halfspace) depth extends nonparametric methods toward mult...
10/12/2021

Real-time Skeletonization for Sketch-based Modeling

Skeleton creation is an important phase in the character animation pipel...

1 Introduction

The rank statistic tests play an important role in univariate non-parametric statistics. If one attempts to generalize the rank tests to the multivariate case, the problem of defining a multivariate order statistic will occur. It is not clear how to define a multivariate order or rank statistic in a meaningful way. One approach to overcome this problem is to use the notion of data depth. Data depth measures the centrality of a point in a given data set in non-parametric multivariate data analysis. In other words, it indicates how deep a point is located with respect to the data set.

Over the last decades, various notions of data depth such as halfspace depth [11, 19, 21], simplicial depth [14], Oja depth [17], regression depth [18], and others have emerged as powerful tools for non-parametric multivariate data analysis. Most of them have been defined to solve specific problems in data analysis. They are different in application, definition, and geometry of their central regions (regions with the maximum depth). Regarding the planar data depth functions, some research on the algorithmic aspects of them can be found in [2, 3, 5, 6, 7, 8, 13, 16, 18].

In 2006, Elmore, Hettmansperger, and Xuan [9] defined another notion of data depth named spherical depth. It is defined as the probability that point is contained in a closed random hyperball with the diameter , where and are two random points from a common distribution function . These closed hyperballs are known as influence regions of the spherical depth function. In 2011, Liu and Modarres [15], modified the definition of influence region, and defined lens depth. Each lens depth influence region is defined as the intersection of two hyperballs and . These influence regions of spherical depth and lens depth are the multidimensional generalization of Gabriel circles and lunes in the definition of the Gabriel Graph [10] and Relative Neighbourhood Graph [20], respectively. In 2017, Yang [23], generalized the definition of influence region, and introduced a familly of depth functions called -skeleton depth, indexed by a single parameter . The influence region of -skeleton depth is defined to be the intersection of two hyperballs given by and , where and are some combinations of and . Spherical depth and lens depth can be obtained from -skeleton depth by considering and , respectively. The -skeleton depth has some nice properties including symmetry about the center, maximality at the centre, vanishing at infinity, and monotonicity. Depending on whether Euclidean distance or Mahalanobis distance is used to construct the influence regions, the -skeleton depth can be orthogonally invariant or affinely invariant. All of these properties are explored in [9, 15, 22, 23].

Although we focus on the planar case here, a notable characteristic of the -skeleton depth is that its time complexity grows linearly in the dimension while for most other data depths the time complexity grows exponentially. To the best of our knowledge, the current best algorithm for computing the -skeleton depth is the straightforward algorithm which takes .

In this paper, we present an optimal algorithm for computing the spherical depth () in that takes time. We also introduce an algorithm for computing the planar -skeleton depth, . Furthermore, we reduce the problem of Element Uniqueness to prove that computing the -skeleton depth () of a query point requires time. We also investigate some geometric properties of -skeleton depth. These properties lead us to bound the simplicial depth, spherical depth, and lens depth of a point in terms of one another. Finally, some experiments are provided to illustrate the relationships between -skeleton depth and simplicial depth.

2 -skeleton Depth

Definition 1.

The -skeleton influence region of and () for is defined as follows:

  • for , is equivalent to the line segment .

  • for , is the intersection of two balls with the radius such that the boundaries contain both points and .

  • for , the lune based version of is defined as:

    (1)

    where , , and .

Figure 1 illustrates the -skeleton influence regions for different values of .

Note 1.

It seems that for , the -skeleton influence region in dimension is not well defined because the two balls in Definition 1 are not unique. Since the -skeleton depth is defined for  [23], we ignore the in our study of the -skeleton depth and its influence region.

Note 2.

In literature, the ball based version of , is also defined. In this case, the is given by the union of the balls, instead of the intersection of them in Equation (1). For example, the hatched area in Figure 1 denotes the ball based version of the . Since the definition of the -skeleton depth is given based on the lune based alone, by we only mean its lune based version hereafter in this paper.

Figure 1: The -skeleton influence regions defined by and for =0, 0.5, 1, 2, 3, and , where , , and
Definition 2.

For integers and (), we use the shorter notation to represent the set .

Definition 3.

For a given data set of points in general positions in and the parameter , the -skeleton is defined as a graph , such that if and only if no point in belongs to , .

Definition 4.

The -skeleton depth is defined as the probability that a point is contained within the

-skeleton influence region of two random vectors from a common distribution. For a distribution function

on , and a vector in , the -skeleton depth of with respect to is defined by equation (2), where and are two random observations from .

(2)
Definition 5.

Let be a set of points in . The -skeleton depth of a point with respect to , is defined as a proportion of the -skeleton influence regions of that contain . Using the indicator function , this definition can be represented by Equation (3).

(3)

Referring to Equation (1), it can be verified that is equivalent to the inequality of , where for . To compute in a straightforward way, it is sufficient to check this inequality for all . As such, the computational complexity of the -skeleton depth in is .

In [15, 22, 23], it is proved that the -skeleton depth functions satisfy the data depth framework provided by Zuo and Serfling [24] because these depth functions are monotonic, maximized at the center, and vanishing at infinity. The -skeleton depth functions are also orthogonally (affinely) invariant if the Euclidean (Mahalanobis) distance is used to construct the influence regions of -skeleton depth influence regions.

2.1 Spherical Depth and Lens Depth

As we discussed in Section 2, the -skeleton depth is a family of statistical depth functions that includes the spherical depth when , and the lens depth when . From the equations (1) and (3), the definitions of spherical depth () and lens depth () of a query point with respect to a given data set in are as follows:

(4)
(5)

where the influence regions and are equal to and , respectively.
Figure 2 shows the spherical depth and lens depth of points in the plane with respect to a set of three points , and in .

Figure 2: Spherical depth (left figure) and lens depth (right figure) of points in the plane

3 Algorithms

The current best algorithm for computing the -skeleton depth of a point with respect to a data set is the brute force algorithm. This naive algorithm needs to check all of the -skeleton influence regions obtained from the data points to figure out how many of them contain . Checking all of such influence regions causes the naive algorithm to take . In this section, we present an optimal algorithm for computing the planar spherical depth () and an algorithm to compute the planar -skeleton depth when . In these algorithms, we need to solve some halfspace and some circle range counting problems, where all of the halfspaces have one common point. The circles also have the same characteristic. In the spherical depth algorithm, we have the halfspace range counting problems alone whereas in computing the -skeleton depth, we need to solve both circle and halfspace range counting problems.

3.1 Optimal Algorithm for Computing the Planar Spherical Depth of a Query Point

Instead of checking all of the spherical influence regions, we focus on the geometric aspects of such regions in . The geometric properties of these regions lead us to develop an algorithm for the computation of planar spherical depth of .

Lemma 1.

For arbitrary points , , and in , if and only if .

Proof.

If is on the boundary of Thales’ Theorem111Thales’ Theorem also known as the Inscribed Angle Theorem: If , , and are points on a circle where is a diameter of the circle, then is a right angle. suffices as the proof in both directions. For the rest of the proof, by we mean .
) For , suppose that (proof by contradiction). We continue the line segment to cross the boundary of the . Let be the crossing point (see the left figure in Figure 3). Since , then, is greater than . Let . From the Thales’ Theorem, we know that is a right angle. The angle because . Summing up the angles in , as computed in (6), leads to a contradiction. So, this direction of proof is complete.

(6)

) If , we prove that . Suppose that (proof by contradiction). Since , at least one of the line segments and crosses the boundary of . Without loss of generality, assume that is the one that crosses the boundary of at the point (see the right figure in Figure 3). Considering Thales’ Theorem, we know that and consequently, . The angle because . If we sum up the angles in the triangle , the same contradiction as in (6) will be implied. ∎

Figure 3: Point and spherical influence region

Algorithm 1:

Using Lemma 1, we present an algorithm to compute the spherical depth of a query point with respect to . This algorithm is summarized in the following steps. The pseudocode of this algorithm is provided in the Appendix.

  • Translating the points: Suppose that is a translation by . We apply to translate and all data points into their new coordinates. Obviously, .

  • Sorting the translated data points: In this step we sort the translated data points based on their angles in their polar coordinates. After doing this step, we have which is a sorted array of the translated data points.

  • Calculating the spherical depth: For the element in , we define and as follows:

    (7)

    Thus the spherical depth of with respect to , can be computed by:

    (8)

    To present a formula for computing , we define and as follows:

    Figure 4 illustrates , , , and in two different cases. Considering the definitions of and ,

This allows us to compute using a pair of binary searches.

Figure 4: (left figure), and (right figure)

Time complexity of Algorithm 1:

The first procedure in the algorithm takes to translate and all data points into the new coordinate system. The second procedure takes time. In this procedure, the loop iterates times, and the sorting algorithm takes . Due to using binary search for every , the running time of the last procedure is also . The rest of the algorithm contributes some constant time. In total, the running time of the algorithm is .

Coordinate system:

In practice it may be preferable to work in the Cartesian coordinate system. Sorting by angle can be done using some appropriate right-angle tests (determinants). Regarding the other angle comparisons, they can be done by checking the sign of dot products.

3.2 Algorithms for Computing the Planar -skeleton Depth ( ) of a Query Point

As illustrated in Figure 1, forms some lenses, and forms some slabs for different and in . Using some geometric properties of such lenses and slabs, we prove Lemma 2. This lemma along with some results in range counting problems studied by Agrawal in [1] help us to compute in time, where and are in .

Definition 6.

For an arbitrary non-zero point and parameter , is a line that is perpendicular to at the point . This line forms two halfspaces and . The one that includes the origin is and the other one that includes is .

Definition 7.

For a disk with the center and radius , is the intersection of and , and is the intersection of and , where and is an arbitrary non-zero point in .

Figure 5 is an illustration of these definitions for different values of parameter .

Figure 5: The and defined by for , where
Lemma 2.

For arbitrary non-zero points , in and parameter , if and only if the origin is contained in , where , , and .

Proof.

First, we show that is a well-defined set meaning that intersects . We compute , the distance of from , and prove that this value is not greater than . It can be verified that . Let ; the following calculations complete this part of the proof.



We recall Definition 1 for , , where and . Using this definition, following equivalencies can be derived from .


By solving these inequalities for which is equal to , we have:

(9)

For a fixed point , the inequalities in Equation (9) determine one halfspace and one disk given by (10) and (11), respectively.

(10)
(11)

The proof is complete because for a point , the set of all points containing in the feasible region defined by Equations (10) and (11) is equal to . ∎

Algorithm 2:

Using Lemma 2, we present an algorithm to compute the -skeleton depth of with respect to . This algorithm is summarized in two steps. Pseudocode for this algorithm can be find in the Appendix.

  • Translating the points: This step is exactly the same step as in Algorithm 1.

  • Calculating the -skeleton depth: Suppose that is an element in (translated ). We consider a disk and a line as follows:

    where , , , and are defined in Lemma 2. From Theorem 1.2 proved in [1], we can compute with storage, expected preprocessing time, and query time, where is the number of all elements of that are contained in . For the elements of , which is defined as the number of elements containing in the interior of can also be computed with the same storage, expected preprocessing time, and query time. We recall that is the intersection of halfspace and disk , where , , and are some functions of . Finally, which is equal to can be computed by Equation (12).

    (12)

    Note that and , referring to Definitions 6 and 7, can be obtained from and , respectively in constant time.

Lemma 2 and Algorithm 2 are valid for . However, the case (Algorithm 1 for spherical depth) can also be included if we replace with an empty set for . In this case, we need to solve the halfspace range counting problems alone and therefore,

Time complexity of Algorithm 2:

The Translating procedure as is discussed in Algorithm 1, takes time. With the expected preprocessing time, the second procedure takes time. In this procedure, the loop iterates times, and the range counting algorithms take time. The expected preprocessing time is required to obtain a data structure for the aforementioned range counting algorithms. The rest of the algorithm take some constant time per loop iteration, and therefore the total expected running time of the algorithm is .

4 Lower Bounds for Computing the -skeleton Depth of a Point in the Plane

We reduce the problem of Element Uniqueness222 Element Uniqueness problem: Given a set , is there a pair of indices with such that ? to the problem of computing the -skeleton depth. It is known that the question of Element Uniqueness has a lower bound of

in the algebraic decision tree model of computation proposed in 

[4].

Theorem 1.

Computing the spherical depth of a query point in the plane takes time.

Proof.

We show that finding the spherical depth allows us to answer the question of Element Uniqueness. Suppose that , for is a given set of real numbers. We suppose all of the numbers to be positive (negative), otherwise we shift the points onto the positive X-axis. For every we construct four points , , , and in the polar coordinate system as follows:

where and . Thus we have a set of points , for . See Figure 6. The Cartesian coordinates of the points can be computed by:

We select the query point , and present an equivalent form of Equation (7) for as follows:

(13)

We compute in order to answer the Element Uniqueness problem. Suppose that every is a unique element. In this case, because, from (13), it can be figured out that the expanded is as follows:



Referring to Lemma 1 and Equation (8),

Now suppose that there exist some such that in . In this case, from Equation  (13), it can be seen that:

where (see Figure 6). As an example, for , because the expanded form of these two sets is as follows: (without loss of generality, assume )

Lemma 1 and Equation (8) imply that:

Therefore the elements of are unique if and only if the spherical depth of with respect to is . This implies that the computation of spherical depth requires time. It is necessary to mention that the only computation in the reduction is the construction of which takes time. At the end of the proof, we mention that the reduction does not depend on the sorted order of the elements. ∎

Figure 6: A representation of , , and duplications in these sets
Note 3.

Instead of four copies of the elements of , we could consider two copies of such elements to construct . However, the depth calculation becomes more complicated in this case.

Lemma 3.

Suppose that and are two sets defined as follows:

For a unique element in , .

Proof.

Suppose that for some (). We prove that such does not exist. If , it is obvious that and cannot be an element of . For the case , we assume that which means that

(14)

From the cosine formula333Cosine formula: For a triangle ,

in triangle , we have

(15)

Equations (14) and (15) imply that

and

This means that which contradicts the assumption of

Theorem 2.

Computing the lens depth of a query point in the plane takes time.

Proof.

Suppose that , for is a given set of real numbers. Without loss of generality, we let these numbers to be positive (see the proof of Theorem 1). For , we construct set of points in the polar coordinate system such that and . See Figure 7. We select the query point , and define as follows:

(16)

Using Equation (16), the unnormalized form of Equation (5) can be presented by:

(17)

We solve the problem of Element Uniqueness by computing . Suppose that every is a unique element. In this case, it can be verified that (see Lemma 3). Equation (17) implies that:

Now assume that there exists some such that in . In this case,

As such,

For the case of having more duplications among the elements of ,

(18)

where is the number of duplications. Therefore the elements of are unique if and only if in Equation (18). This implies that the computation of lens depth requires . Note that all of the other computations in this reduction take . ∎

Note 4.

This reduction technique can be generalized to prove Theorem  3. It is enough to choose the rotation angle in the construction of . For the case of , where , we construct as . Note that we use the real RAM model of computation, where we can compute the square root of a real number in constant time.

Theorem 3.

For , computing the -skeleton depth of a query point in the plane requires time.

Figure 7: A representation of , , and duplications in these sets

5 Relationships Among Spherical Depth, Lens Depth, and Simplicial Depth

Theorem 4.

For and , .

Proof.

From Definition 1 for the spherical and lens influence regions of any arbitrary pair of points and in , it can be seen that is contained in . Hence Equation (19) is sufficient to complete the proof.

(19)

Definition 8.

The simplicial depth of with respect to a data set is defined by: