Learning random points from geometric graphs or orderings

09/26/2018 ∙ by Josep Diaz, et al. ∙ University of Oxford Universitat Politècnica de Catalunya Jean Monnet University 0

Suppose that there is a family of n random points X_v for v ∈ V, independently and uniformly distributed in the square [-√(n)/2,√(n)/2]^2. We do not see these points, but learn about them in one of the following two ways. Suppose first that we are given the corresponding random geometric graph G, where distinct vertices u and v are adjacent when the Euclidean distance d_E(X_u,X_v) is at most r. Assume that the threshold distance r satisfies n^3/14≪ r ≪ n^1/2. We shall see that the following holds with high probability. Given the graph G (without any geometric information), in polynomial time we can approximately reconstruct the hidden embedding, in the sense that, `up to symmetries', for each vertex v we find a point within distance about r of X_v; that is, we find an embedding with `displacement' at most about r. Now suppose that, instead of being given the graph G, we are given, for each vertex v, the ordering of the other vertices by increasing Euclidean distance from v. Then, with high probability, in polynomial time we can find an embedding with the much smaller displacement error O(√( n)).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In this section, we first introduce geometric graphs and random geometric graphs, the approximate realization problem for such graphs, and families of vertex orderings; and we then present our main theorems, give an outline sketch of their proofs, and finally give an outline of the rest of the paper.

1.1. Random geometric graphs

Suppose that we are given a non-empty finite set , and an embedding , or equivalently a family of points in , where . Given also a real threshold distance , we may form the geometric graph or with vertex set by, for each pair of distinct elements of , letting and be adjacent if and only if . Here denotes Euclidean distance, . Note that the (abstract) graph consists of its vertex set and its edge set (with no additional geometric information). A graph is called geometric if it may be written as as above, and then is called a realization of the graph. Since we may rescale so that , a geometric graph may also be called a unit disk graph (UDG) [11].

Given a positive integer , and a real , the random geometric graph with vertex set is defined as follows. Start with random points independently and uniformly distributed in the square of area ; let for each ; and form the geometric graph or .

Random geometric graphs were first introduced by Gilbert [10] to model communications between radio stations. Since then, several related variants of these graphs have been widely used as models for wireless communication, and have also been extensively studied from a mathematical point of view. The basic reference on random geometric graphs is the monograph by Penrose [17]; see also the survey of Walters [24]. The properties of are usually investigated from an asymptotic perspective, as grows to infinity and .

A sequence of events holds with high probability (whp) if as . For example, it is well known that is a sharp threshold function for the connectivity of the random geometric graph . This means that, for every , if , then is whp disconnected, whilst if , then is whp connected (see [17] for a more precise result). We shall work with much larger , so our random graphs will whp be (highly) connected.

Given a graph , we define the graph distance between two vertices and to be the least number of edges on a path if and are in the same component, and if not then we let the distance be . Observe that in a geometric graph with a given realization , each pair of vertices and must satisfy , since each edge of the embedded geometric graph has length at most . For a finite simple graph with vertices, let denote its adjacency matrix, the symmetric matrix with if is an edge, and otherwise. (We write for an edge rather than the longer form .)

1.2. Approximate realization for geometric graphs

For a geometric graph with vertex set , the realization problem for has input the adjacency matrix , and consists in finding some realization . It is known that for UD graphs, the realization problem (also called the unit disk graph reconstruction problem) is NP-hard [3], and it remains NP-hard even if we are given all the distances between pairs of vertices in some realization [2], or if we are given all the angles between incident edges in some realization [4]. Given that these results indicate the difficulty in finding exact polynomial time algorithms, researchers naturally turned their attention to finding good approximate realizations (for deterministic problems).

Previous work on approximate realization

There are different possible measures of ‘goodness’ of an embedding. Motivated by the localization problem for sensor networks, see for example [6], (essentially) the following scale-invariant measure of quality of embedding was introduced in [15]: given a geometric graph , and an embedding and threshold distance , if is not a clique we let

(where we insist that ); and let if is a clique. Observe that if is a realization of then . The aim is to find an embedding with say which minimizes , or at least makes it small. The random projection method [22] was used in [15] to give an algorithm that, for an -vertex UD graph , outputs an embedding with ; this is, it approximates feasibility in terms of the measure up to a factor of . On the other hand, regarding inapproximability, it was shown in [13] that it is NP-hard to compute an embedding with .

In this paper we do not aim to control a goodness measure like  (though see the discussion following Theorem 1.3). Instead, we find whp a ‘good’ embedding , which is ‘close’ to the hidden original random embedding . We investigate the approximate realization problem for a random geometric graph, and for a family of vertex orderings (see later).

What we achieve for random geometric graphs is roughly as follows. We describe a polynomial time algorithm which, for a suitable range of values for , whp finds an embedding which ‘up to symmetries’ (see below for a detailed definition) maps each vertex to within about distance of the original random point . Observe that the mapping must then satisfy the following properties whp: for each pair of vertices with we have , and for each pair of vertices with we have . Thus, adjacent pairs of vertices remain quite close to being adjacent in , and non-adjacent pairs of vertices that are sufficiently far apart remain non-adjacent in .

For maps , the familiar max or sup distance is defined by

Since there is no way for us to distinguish embeddings which are equivalent up to symmetries, we cannot hope to find an embedding such that whp is small. There are 8 symmetries (rotations or reflections) of the square. We define the symmetry-adjusted sup distance by

where the minima are over the 8 symmetries of the square . This is the natural way of measuring distance ‘up to symmetries’. If we let when for some symmetry of , then it is easy to check that is an equivalence relation on the set of embeddings , and is the natural sup metric on the set of equivalence classes.

Given , we say that an embedding has displacement at most (from the hidden embedding ) if . Consider the graph with three vertices and exactly two edges and : if this is the geometric graph , then could be any value in . Examples like this suggest that we should be happy to find an embedding with displacement at most about ; and since our methods rely on graph distances, it is natural that we do not achieve displacement below .

1.3. Vertex orderings

We also consider a related approximate realization problem, with different information. As for a random geometric graph, we start with a family of unseen points independently and uniformly distributed in the square , forming the hidden embedding . (There is no radius here, and there is no graph.) We are given, for each vertex , the ordering of the other vertices by increasing Euclidean distance from . This is the family of vertex orderings corresponding to . Notice that with probability no two distances will be equal. Notice also that, if we had access to the complete ordering of the Euclidean distances between all pairs of distinct vertices in the hidden embedding , then we could read off the family of vertex orderings.

We shall see that, by using the family of vertex orderings, we can with high probability find an embedding with displacement error dramatically better than the bound we obtain for random geometric graphs.

1.4. Main results

Suppose first that we are given a random geometric graph , with hidden original embedding , for example by being given the adjacency matrix , with no geometric information. Our goal is to find an embedding such that whp it has displacement at most about , for as wide as possible a range of values for

. However, first we need to consider how to estimate

. We shall see that adding up the first few vertex degrees gives us a good enough estimator for our current purposes.

Proposition 1.1.

Let as , with . Let (so as ). Fix a small rational constant , say . Then in polynomial time we may compute an estimator such that

(1)

Our first theorem presents an algorithm to find an embedding for a random geometric graph (given without any further information), which whp achieves displacement at most about , for the range . Note that .

Theorem 1.2.

Let satisfy , and consider the random geometric graph (given say by the adjacency matrix ), corresponding to the hidden embedding . Let be an arbitrarily small rational constant. There is an algorithm which in polynomial time outputs an embedding which whp has displacement at most , that is, whp .

For a related recent result concerning estimating Euclidean distances between points (rather than estimating the points themselves), and for other recent related work, see Subsection 1.5 below.

In practice, after running the algorithm in this theorem, we would run a local improvement heuristic, even though this would not lead to a provable decrease in

. For example, we might simulate a dynamical system where each point (which is not close to the boundary of ) tends to move towards the centre of gravity of the points corresponding to the neighbours of .

Our second theorem concerns the case when we are given not the random geometric graph but the family of vertex orderings; that is, for each vertex , we are given the ordering of the other vertices by increasing Euclidean distance from .

Theorem 1.3.

Suppose that we are given the family of vertex orderings corresponding to the hidden embedding . There is a polynomial-time algorithm that outputs an embedding which whp has displacement ; that is, whp .

Now suppose that, as well as being given the family of vertex orderings, for some unknown value we are given the corresponding random geometric graph . Assume that . Then the constructed embedding does well in terms of the measure introduced earlier: we have

(2)

Also, from the constructed embedding we may form a second geometric graph . Then is close to in the sense that ‘we get only a small proportion of edges wrong’. We make this more precise in the inequality (3) below. It is easy to see that whp has edges (and many more non-edges). We know from Theorem 1.3 that whp has displacement : assume that this event holds. If then so is an edge in ; and similarly, if then , so is not an edge in . Thus there could be a mistake with only if

But whp the number of unordered pairs of distinct vertices such that these inequalities hold is . Hence, whp the symmetric difference of the edge sets of and satisfies

(3)

Outline sketch of the proofs of Theorems 1.2 and 1.3

In order to prove these theorems, we first identify ‘corner vertices’ such that the corresponding points are close to the corners of . To do this, for Theorem 1.2 we are guided by vertex degrees; and for Theorem 1.3 we look at the set of ‘extreme’ pairs such that is farthest from in the order , and is farthest from in the order .

To prove Theorem 1.2, we continue as follows. For a vertex , we approximate the Euclidean distance between and a corner by using the graph distance from to the corresponding corner vertex, together with the estimate of ; and then we place our estimate for at the intersection of circles centred on a chosen pair of the corners. For each of the circles, whp lies within a narrow annulus around it, so is close to .

In the proof of Theorem 1.3, we obtain a much better approximation to the Euclidean distance between and a corner, by using the rank of in the ordering from the corresponding corner vertex, and the fact that the number of points at most a given distance from a given corner is concentrated around its mean. In this way, we obtain much narrower annuli, and a correspondingly much better estimate for .

1.5. Further related work

In this section we mention further related work.

Theorem 1 of [1] estimates Euclidean distances between points by times the graph distance in the corresponding geometric graph. It is assumed that is known, and the error is at most plus a term involving the maximum radius of an empty ball. In the case of points distributed uniformly and independently in , the authors of [1] need in order to keep the error bound down to whp (so they need a little larger than we do in Theorem 1.2).

In [16] the authors assume that they are given a slightly perturbed adjacency matrix (some edges were inserted, some were deleted) of

points in some metric space. Using fairly general conditions on insertion and deletion, the authors use the Jaccard index (the size of the intersection of the neighborhood sets of the endpoints of an edge divided by the size of their union) to compute a

-approximation to the graph distances.

The use of graph distances for predicting links in a dynamic social network such as a co-authorship network was experimentally analyzed in [14]: it was shown that graph distances (and other approaches) can provide useful information to predict the evolution of such a network. In [19] the authors consider a deterministic and also a non-deterministic model, and show that using graph distances, and also using common neighbors, they are able to predict links in a social network. The use of shortest paths in graphs for embedding points was also experimentally analyzed in [20].

In [23] the authors consider a -nearest neighbour graph on points that have been sampled iid from some unknown density in Euclidean space. They show how shortest paths in the graph can be used to estimate the unknown density. In [21] the authors consider the following problem: given a set of indices , together with constraints (without knowing the distances), construct a point configuration that preserves these constraints as well as possible. The authors propose a ‘soft embedding’ algorithm which not only counts the number of violated constraints, but takes into account also the amount of violation of each constraint. Furthermore, the authors also provide an algorithm for reconstructing points when only knowing the nearest neighbours of each data point, and they show that the obtained embedding converges for to the real embedding (w.r.t. to a metric defined by the authors), as long as . This setup is similar to our Theorem 1.3 in the sense that we are given the ordinal ranking of all distances from a point (for each point), though note that we estimate points up to an error rather than (recall that our points are sampled from the square ).

In a slightly different context, the algorithmic problem of computing the embedding of points in Euclidean space given some or all pairwise distances was considered. If all pairwise distances are known, then one can easily find exact positions in arithmetic operations: pick three points forming a triangle , and then for each other point separately find its location with respect to , using arithmetic operations. In this way we use only the distances involving at least one of the points in . In [7, 8] the authors consider the problem of knowing only a subset of the distances (they know only small distances, as typical in sensor networks) and show that by patching together local embeddings of small subgraphs a fast approximate embedding of the points can be found.

The related problem trying to detect latent information on communities in a geometric framework was studied by [18]. In this case, points of a Poisson process in the unit square are equipped with an additional label indicating to which of two hidden communities they belong. The probability that two vertices are joined by an edge naturally depends on the distance between them, but also edges between vertices of the same label have a higher probability to be present than edges between vertices of different labels. The paper gives exact recovery results for a dense case, and also shows the impossibility of recovery in a sparse case.

1.6. Organisation of the paper

In Section 2 we recall or establish preliminaries; in Section 3 we see how to estimate the threshold distance using vertex degrees, and estimate Euclidean distances using graph distances; in Section 4 we complete the proof of Theorem 1.2; in Section 5 we prove Theorem 1.3; and in Section 6 we conclude with some open questions.

2. Preliminaries

In this section we gather simple facts and lemmas that are used in the proofs of the main results. We start with a standard version of the Chernoff bounds for binomial random variables, see for example Theorem 2.1 and inequality (2.9) in 

[12].

Lemma 2.1.

(Chernoff bounds) Let

have the binomial distribution

with mean . For every we have

and

and it follows that, for each ,

For and , let denote the closed ball of radius around . We shall repeatedly use the following fact.

Fact 2.2.

Let be a random geometric graph. For each let be the area of , and let . Then for each vertex and each point , conditional on has distribution . More precisely, this gives a density function: for any Borel set ,

In particular, if for each , then, conditional on , is stochastically at least and stochastically at most .

The next lemma gives elementary bounds on the area for , in terms of the distance from to a corner of or to the boundary of .

Lemma 2.3.

Let , and let .

  1. If is at distance at most from some corner, then .

  2. If is at distance at least from each corner, then .

  3. If is at distance at most from the boundary, then .

  4. If is at distance at least from the boundary and at distance at most from at most one side of the boundary, then .

Proof.

Parts (i) and (iii) are easy. To prove parts (ii) and (iv), we observe first that, in the disk with centre and radius , the set of points in the disk with and has area at least . For if is the point on the bounding circle with , then , so the quadrilateral with corners and has area , and .

To prove part (ii) of the lemma, it suffices to consider points at distance equal to from a corner, wlog from the bottom left corner . Suppose that . Then, by the observation in the first paragraph,

since . Part (iv) follows similarly from the initial observation. ∎

We shall depend heavily on the following result on the relation between graph distance and Euclidean distance for random geometric graphs (with slightly worse constants than the ones given in the original paper to make the expression cleaner).

Lemma 2.4.

[9][Theorem 1.1] Let be a random geometric graph with . Then, whp, for every pair of vertices we have:

where

We observed earlier that always ; we next give a corollary of the last lemma which shows that whp this bound is quite tight.

Corollary 2.5.

There is a constant () such that, if for sufficiently large, then whp, for every pair of vertices we have:

Proof.

By Lemma 2.4

(4)

But, for , the second term in the maximum in the definition of is at most ; and letting denote the first term we have

Thus

and the lemma follows from (4). ∎

In fact, all we shall need from the last two results is the following immediate consequence of the last one.

Corollary 2.6.

If , then there exists such that whp, for every pair of vertices, we have

We consider the four corner points of in clockwise order from the bottom left: (already defined), , and . See Figure 1 for the points and to illustrate the following lemma.

Lemma 2.7.

Let satisfy and consider the random geometric graph . Let tend to infinity with arbitrarily slowly, and in particular assume that and . Then whp the following holds: (a) for each , there exists such that and ; and (b) for each such that we have .

Proof.

(a) Fix . Note first that

so whp there exists such that . Let be the number of vertices such that . Then . For each , by Lemma 2.3 (i),

for sufficiently large; and then, by Lemma 2.1 and Fact 2.2,

Hence

Thus whp there exists such that and . This gives part (a) of the lemma.

(b) Let and . For all integers and , let . Consider first the central part of the square , omitting parts near the corners: let . By Lemma 2.3 (ii), for each we have

for sufficiently large. Hence, by Lemma 2.1 and Fact 2.2,

Since , we have . Thus , and so whp there is no vertex such that and .

We need a little more care near the corners. Let and let be an integer with . The area of is . For each point , is at distance at least from each corner of , so by Lemma 2.3 (ii) we have

Also,

Thus, by Lemma 2.1 and Fact 2.2,

Therefore, for each ,

Hence whp for each vertex such that is not in one of the four corner regions ; and so we have completed the proof of part (b). ∎

Figure 1. Choosing points in the 4 corners of the square

The above lemma shows us how to find vertices such that whp the corresponding points are close to the four corner points of .

Lemma 2.8.

Let satisfy , and consider the random geometric graph . Let be any function tending to infinity as . There is a polynomial-time (in ) algorithm which, on input , finds four vertices such that whp the following holds: for some (unknown) symmetry of ,

Proof.

Consider the following algorithm: pick a vertex of minimal degree, call it , and mark and all its neighbors. Continue iteratively on the set of unmarked vertices, until we have found four vertices . (Whp each vertex has degree at most ; so after at most 3 steps, at most vertices are marked, and so whp we will find .) Let be a vertex amongst maximising the graph distance from , and list the four vertices as where and (and and are the other two of the vertices listed in some order). We shall see that whp are as required.

By Lemma 2.7, whp the vertices are each within distance of a corner of , and the marking procedure ensures that the four corners involved are distinct. If and are such that and are within distance of opposite corners of , then and so . If and are within distance of adjacent corners, then ; and so, since we may assume wlog that , whp by Corollary 2.5. Hence, whp and are within distance of opposite corners, as are the other two of the chosen vertices. For each , denote the corner closest to by . Then whp is a permutation of , and and are opposite corners; and so lists the corners of in either clockwise or anticlockwise order. Thus extends to a (unique) symmetry of , and we are done. ∎

Having found four vertices such that the points are close to the four corner vertices of , for each other vertex we will be able to use the graph distances from to each of to obtain an approximation to .

3. Estimating and Euclidean distances

In this section, we use the preliminary results from the last section to see how to estimate the threshold distance , and Euclidean distances between points, sufficiently accurately to be able to prove Theorem 1.2 in the next section. Given a vertex and a set of vertices with , let denote the number of edges between and .

Lemma 3.1.

Let , with as and . Let , so as . Fix a small rational constant , say , and let for .

Let for . Let (so ), let , and let . Finally, let

Then

(5)