 # Sketched MinDist

We consider sketch vectors of geometric objects J through the function v_i(J) = _p ∈ Jp-q_i for q_i ∈ Q from a point set Q. Collecting the vector of these sketch values induces a simple, effective, and powerful distance: the Euclidean distance between these sketched vectors. This paper shows how large this set Q needs to be under a variety of shapes and scenarios. For hyperplanes we provide direct connection to the sensitivity sample framework, so relative error can be preserved in d dimensions using Q = O(d/ε^2). However, for other shapes, we show we need to enforce a minimum distance parameter ρ, and a domain size L. For d=2 the sample size Q then can be Õ((L/ρ) · 1/ε^2). For objects (e.g., trajectories) with at most k pieces this can provide stronger for all approximations with Õ((L/ρ)· k^3 / ε^2) points. Moreover, with similar size bounds and restrictions, such trajectories can be reconstructed exactly using only these sketch vectors.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper (and a more empirically-focused companion paper ) we introduce a new distance between geometric objects, . For an object , where , this depends on a set of landmarks ; for now let . These landmarks induce a sketched representation where the th coordinate is defined via a MinDist operation

 vi(J)=infp∈J∥p−qi∥,

using the th landmark . When the object is implicit, we simply use . Then our new distance between two objects is simply the (normalized) Euclidean distance between the sketched representations

where .

Our companion paper introduces other variants of this distance (using other norms or using the points on each ). We focus on this version as it is the simplest, cleanest, easiest to use, and was the best or competitive with the best on all empirical tasks. Indeed, for the pressing case of measuring a distance between trajectories, this new distance measure dominates a dozen other distance measures (including dynamic time warping, discrete Frechet distance, edit distance for real sequences) in terms of classification performance, and is considerably more efficient in clustering and nearest neighbor tasks.

The goal of this paper is to formally understand how many landmarks in are needed for various error guarantees, and how to chose the locations of these points .

Our aims in the choice of are two-fold: first, we would like to approximate with , and second we would like to recover exactly only using . The specific results vary depending on the initial set and the object class . More precisely, the approximation goal aims to preserve for all objects in some class with a subset of landmarks. Or possibly a weighted set of landmarks with , so each is associated with a weight and the weighted distance is defined

 d~Q,W(J1,J2)= ⎷N∑i=1wi⋅(vi(J1)−vi(J2))2=∥∥~v~Q(J1)−~v~Q(J2)∥∥.

where with . Specifically, our aim is an -approximation of over so when

is selected by a random process that succeeds with probability at least

, then for a pair with

 (1−\eps)dQ(J1,J2)≤d~Q,W(J1,J2)≤(1+\eps)dQ(J1,J2).

When this holds for all pairs in , we say it is a strong -approximation of over . In some cases we can set to either (the process is deterministic) or (this preserves even arbitrarily small distances), and may be able to use uniform weights for all selected points.

### 1.1 Our Results

We begin with a special signed variant of the distance associated with the class of

-dimensional hyperplanes (which for instance could model linear separators or linear regression models). The signed variant provides

a negative value on one side of the separator. In this variant, we show that if is full rank, then we can recover from , and a variant of sensitivity sampling can be used to select points to provide a -approximation . Or by selecting results in a strong -approximation (Theorem 2.2).

Next we consider the more general case where the objects are bounded geometric objects . For such objects it is useful to consider a bounded domain (for a fixed constant), and consider the case where each and landmarks satisfy . In this case, the number of samples required for a -approximation is where

 SQ=O⎛⎜⎝(Lρ)2d2+dmin(logLη,logn,(Lρ)2)22+d⎞⎟⎠, (1.1)

where . A few special cases are worth expanding upon. When is continuous and uniform over then , and this is tight in at . That is, we can show that may be needed in general. When but not necessarily uniform on , then . And when is on a grid over in of resolution , then , just a more than the lower bound.

We conclude with some specific results for trajectories. When considering the class with at most segments, then samples is sufficient for a strong -approximation. Then when considering trajectories where the critical points are at distance at least apart from any non-adjacent part of the curve, we can exactly reconstruct the trajectory from as long as is a grid of side length . It is much cleaner to describe the results for trajectories and precisely on a grid, but these results should extend for any object with piecewise-linear boundaries, and critical points sufficiently separated, or as having any point in each sufficiently dense grid cell, as opposed exactly on the grid lattice.

### 1.2 Connections to other Domains, and Core Challenges

Before deriving these results, it is useful to lay out the connection to related techniques, including ones that our results will build on, and the challenges in applying them.

#### Sensitivity sampling.

Sensitivity sampling [21, 15, 17, 29] is an important technique for our results. This typically considers a dataset (a subset of a metric space), endowed with a measure , and a family of cost functions . These cost functions are usually related to the fitting of a data model or a shape to , and for instance on a single point , for , where

 f(x)=infp∈S∥x−p∥2

is the squared distance from to the closest point on the shape . And then . The sensitivity  of w.r.t. is defined as:

 σF,X,μ(x):=supf∈Ff(x)¯f,

and the total sensitivity of is defined as: . This concept is quite general, and has been widely used in applications ranging from various forms of clustering [15, 17] to dimensionality reduction  to shape-fitting . In particular, this will allow us to draw samples iid from proportional to , and weighted ; we call this -sensitive sampling. Then is a -coreset; that is, with probability for each

 (1−\eps)¯f≤∫~Xf(~x)d~w(~x)≤(1+\eps)¯f,

using  . The same error bound holds for all (then it is called a -strong coreset) with where is the shattering dimension of the range space  . Specifically, each range is defined as those points in a sublevel set of a specific cost function for some and .

It seems natural that a form of our results would follow directly from these approaches. However, two significant and intertwined challenges remain. First, our goal is to approximate the distance between a pair of sketches , where these results effectively only preserve the norm of a single sketch ; this prohibits many of the geometric arguments in the prior work on this subject. Second, the total sensitivity associated with unrestricted and pairs is in general unbounded (as we prove in Lemma 3.1). Indeed, if the total sensitivity was bounded, it would imply a mapping to bounded vector space , wherein the subtraction of the two sketches would still be an element of this space, and the norm bound would be sufficient.

We circumvent these challenges in two ways. First, we identify a special case in Section 2 (with negative distances, for hyperplanes) under which there is a mapping of the sketch to metric space independent of the size and structure of . This induces a bound for total sensitivity related to a single object, and allows the subtraction of two sketches to be handled within the same framework.

Second, we enforce a lower bound on the distance and an upper bound on the domain . This induces a restricted class of pairs where is a scaleless parameter, and it shows up in bounds we are then able to produce for the total sensitivity with respect to and .

#### Leverage scores, and large scales.

Let denotes the Moore-Penrose pseudoinverse of a matrix, so when is full rank. The leverage score  of the th column of matrix is defined as: This definition is more specific and linear-algebraic than sensitivity, but has received more attention for scalable algorithm development and approximation [13, 3, 12, 9, 25, 10].

However, Theorem B.1 (in the Appendix B.1) shows that if is the collection of some functions defined on a set of points ( for all ), where each is the square of some function in a finite dimensional space spanned by a basis , then we can build a matrix where the th column is , and have is precisely the leverage score of the th column of the matrix . A similar observation has been made by Varadarajan and Xiao .

A concrete implication of this connection is that we can invoke an online row sampling algorithm of Cohen  . In our context, this algorithm would stream over

, maintaining (ridge) estimates of the sensitivity of each

from a sample , and retaining each in that sample based on this estimate. Even in this streaming setting, this provides an approximation bound not much weaker than the sampling or gridding bounds we present; see Appendix B.1.

#### Connection from MinDist to shape reconstruction.

The fields of computational topology and surface modeling have extensively explored [5, 28, 6] the distance function to a compact set

 dJ(x)=infp∈J∥x−p∥,

their approximations, and the offsets . For instance the Hausdorff distance between two compact sets is . The gradient of implies stability properties about the medial axis . And most notably, this stability of with respect to a sample or is closely tied to the development of shape reconstruction (aka geometric and topological inference) through -shapes , power crust , and the like. The intuitive formulation of this problem through (as opposed to Voronoi diagrams of ) has led to more statistically robust variants [6, 28] which also provide guarantees in shape recovery up to small feature size , essentially depending on the maximum curvature of .

Our formulation flips this around. Instead of considering samples from (or ) we consider samples from some domain . This leads to new but similar sampling theory, still depending on some feature size (represented by various scale parameters , , and ), and still allowing recovery properties of the underlying objects. While the samples from can be used to estimate Hausdorff distance via an all-pairs -time comparison, our formulation requires only a -time comparison to compute . We leave as open questions the recovering of topological information about an object from .

#### Function space sketching.

While most geometric inference sampling bounds focus on low-level geometric parameters (e.g., weak local feature size, etc), a variant based on the kernel distance   can be approximated (including useful level sets) using a uniform sample . The kernel distance in this setting is defined

where the kernel density estimate is defined

with and . This sampling mechanism can be used to analyze (and thus also )   by considering a reproducing kernel Hilbert space (RKHS) associated with ; this is a function space so each element is a function. And averages are kernel density estimates. Ultimately, samples yields  with probability that which implies , and hence also . Notably, the natural -norm is an -norm when restricted to any finite dimensional subspace (e.g., the basis defined by ).

Similarly, our approximations of using a sample result in a similar function space approximation. Again the main difference is that is bivariate (so it takes in a pair , which is hard to interpret geometrically), and we seek a relative error (not an additive error). This connection leads us to realize that there are JL-type approximations  of this feature space. That is, given a set of objects , and their representations , there is a mapping to with , so with probability at least so for any pair . However, for such a result to hold for all pairs in , there likely requires a lower bound on the distance and/or upper bound on the underlying space , as with the kernels [8, 26]. Moreover, such an approach would not provide an explicit coreset that is interpretably in the original space .

## 2 The Distance Between Two Hyperplanes

In this section, we define a distance between two hyperplanes. Let represent the space of all hyperplanes.

Suppose , where has the coordinate . Without specification, in this paper is a multiset, which means two points in can be at the same location, and represents norm.

Any hyperplane can be uniquely expressed in the form

 h={x=(x1,⋯,xd)∈Rd | ∑dj=1ujxj+ud+1=0},

where is a vector in , i.e. is the unit normal vector of , and is the offset. A sketched halfspace has -dimensional vector where each coordinate is defined as the signed distance from to the closest points on , which can be calculated ; the dot-product with the unit normal of , plus offset . As before, the distance is defined as . When is full rank – that is, there are points in which are not on a common hyperplane – then our companion paper  shows is a metric on .

### 2.1 Estimation of dQ by Sensitivity Sampling on Q

We use sensitivity sampling to estimate with respect to a tuple . First suppose is full rank and . Then we can let and ; what remains is to define the appropriate . Roughly, is defined with respect to a -dimensional vector space , where for each , for some ; and is the set of all linear functions on .

We now define in more detail. Recall each can be represented as a vector . This defines a function , and these functions are elements of . The vector space is however larger and defined

so that there can be for which ; rather it can more generally be in . Then the desired family of real-valued functions is defined

 F={f:Q↦[0,∞)∣∃ v∈V s.t. f(q)=v(q)2, ∀q∈Q}.

To see how this can be applied to estimate , consider two hyperplanes in and the two unique vectors which represent them. Now introduce the vector ; note that , but not necessarily in . Now for define a function as

 fh1,h2(q)=fh1,h2(x1,⋯,xd)=(∑di=1uixi+ud+1)2,

so . And thus an estimation of provides an estimation of . From Lemma B.1, we know the total sensitivity of is . In particular, given the sensitivities score for each , we can invoke [Lemma 2.1] to reach the following theorem.

###### Theorem 2.1.

Consider full rank and halfspaces with . A -sensitive sampling of of size results in a -coreset. And thus an -approximation so with probability at least , for each pair

 (1−\eps)dQ(h1,h2)≤d~Q,W(h1,h2)≤(1+\eps)dQ(h1,h2).

Now, we use the framework in Braverman   to construct a strong -approximation for over . In the remaining part of this subsection, we assume is a set (not a multiset), each has a weight , and . Recall that for a range space the shattering dimension is the smallest integer so that for all . We introduce ranges where each range is defined by two halfspaces and a threshold . This is defined with respect to and a weighting , specifically

 Xh1,h2,η={q∈Q∣w(q)fh1,h2(q)≤η}.

Next we use the sensitivity to define an adjusted range space with adjusted weights and adjusted ranges defined using as

 X′h1,h2,η={q∈Q∣w′(q)gh1,h2(q)≤η}.

Recall that . To apply [Theorem 5.5] we only need to bound the shattering dimension of the adjusted range space .

###### Lemma 2.1.

The shattering dimension of adjusted range space is bounded by .

###### Proof.

We start by rewriting any element of the adjusted range space as

 X′h1,h2,η ={q∈Q∣w′(q)gh1,h2(x)≤η} ={q∈Q∣w(q)fh1,h2(q)≤η(d+1)¯fh1,h2} ={q∈Q∣√w(q)(∑di=1uixi+ud+1))≤(η(d+1)¯fh1,h2)12} ==∩{q∈Q∣−√w(q)(∑di=1uixi+ud+1))≤(η(d+1)¯fh1,h2)12},

where is the coordinates of . This means each set can be decomposed as the intersection of sets in two ranges over from:

By Lemma A.1, we only need to bound the dimension of each associated range space and . We introduce new variables :

 zi=√w(q)xi  for i∈[d],  zd+1=√w(q),ci=ui  for i∈[d+1],  c0=−(r(d+1)¯fh1,h2)12.

Since is a fixed set, we know only depends on , and , only depend on and . By introducing new variables we construct an injective map , s.t. . So, there is also an injective map from to . Since the shattering dimension of the range space , where , is , we have , and similarly . Thus, we obtain an bound for the shattering dimension of . ∎

From Lemma 2.1 and [Theorem 5.5] we can directly obtain a strong -approximation for over .

###### Theorem 2.2.

Consider full rank and halfspaces with . A -sensitive sampling of of size results in a strong -coreset. And thus a strong -approximation so with probability at least , for all

 (1−\eps)dQ(h1,h2)≤d~Q,W(h1,h2)≤(1+\eps)dQ(h1,h2).

## 3 Sketched MinDist for Two Geometric Objects

In this section, we mildly restrict to the distance between any two geometric objects, in particularly bounded closed sets. Let be the space of objects we consider.

As before define , and then for define . The associated function space is . Setting for all , then . Using sensitivity sampling to estimate requires a bound on the total sensitivity of .

In this section we show that while unfortunately the total sensitivity is unbounded in general, it can be tied closely to the ratio between the diameter of the domain , and the minimum allowed distance between objects . In particular, it can be at least proportional to this, and in in most cases (e.g., for near-uniform ) is at most proportional to or not much larger for any .

### 3.1 Lower Bound on Total Sensitivity Figure 1: Q is the set of blue points, γ1 is the red curve, γ2 is the green curve, and they coincide with each other on the boundary of the square.

Suppose is a set of points in and no two points are at the same location, then for any we can draw two curves as shown in Figure 1, where is composed by five line segments and is composed by four line segments. The four line segments of the forms a square, on its boundary and coincide with each other, and inside this square, is the endpoint of . We can make this square small enough, such that all points are outside this square. So, we have and , and for all . Thus, we have and for all , which implies

 σF(\EuS),Q,μ(q0)≥fγ1,γ2(q0)¯fγ1,γ2=fγ1,γ2(q0)1n∑q∈Qfγ1,γ2(q)=nfγ1,γ2(q0)fγ1,γ2(q0)=n.

Since this construction of two curves can be repeated around any point ,

 S(F(\EuS))=∑q∈Qμ(q)σF(\EuS),Q,μ(q)≥∑q∈Q1nn=n.

We can refine this bound by introducing two parameters for . Given and a set of points, we define and . The following lemma gives a lower bound for the total sensitivity of in the case , which directly holds for larger .

###### Lemma 3.1.

Suppose , then can construct a set such that .

###### Proof.

We uniformly partition into grid cells, such that for constants . The side length of each grid is . We take as the grid points, and for each point we can choose two curves and (similar to curves in Figure 1) such that , , and for all . Thus, we have . So, and we have for all and , which implies . ∎

### 3.2 Upper Bound on the Total Sensitivity

A simple upper bound of is follows from the constraint. The sensitivity of each point is defined as , where for all and , and the denominator by assumption for all . Hence, the sensitivity of each point in is