Dense Peelable Random Uniform Hypergraphs

07/10/2019 ∙ by Martin Dietzfelbinger, et al. ∙ TU Ilmenau 0

We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ≈ 0.918, f_4 ≈ 0.977, f_5 ≈ 0.992, ...) are well beyond the corresponding thresholds (c_3 ≈ 0.818, c_4 ≈ 0.772, c_5 ≈ 0.702, ...) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on functions and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing. These data structures frequently rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The core of a hypergraph is the largest sub-hypergraph of with minimum degree at least . The core can be obtained by peeling, which means repeatedly choosing a vertex of degree or and removing it (and the incident edge if present) from the hypergraph, until no such vertex exists. If the core of is empty, then is called peelable.

The significance of peelability.

Hypergraphs underlie many hashing based data structures and peelability is often necessary for proper operation or allows for simple linear time algorithms. We list a few examples.

  • Invertible Bloom Lookup Tables.  IBLTs [23] are based on Bloomier filters [10] which are based on Bloom filters [4]. Each element is inserted in several random positions in a hash table. Any cell stores the xor of all elements that have been inserted into it. A List-Entries query on an IBLT can recover all elements of the table precisely if the underlying hypergraph is peelable. Among other things, IBLTs have been used to construct error correcting codes [35] and to solve the set reconciliation and straggler identification problems [17].

  • Erasure Correcting Codes.  To construct capacity achieving erasure codes, the authors of [29] consider a hypergraph where corresponds to parity check bits and to message bits that were lost during transmission. A message bit is incident to precisely those check bits to which it contributed. Correct decoding hinges on peelability of the hypergraph.

  • Cuckoo Hashing and XORSAT.  In the context of cuckoo hash tables [15, 32, 37] and solving random xorsat formulas [16, 20, 38], (partial) peelability of the underlying hypergraph makes placing all (some) keys or solving the linear system (eliminating some variables) particularly simple.

  • Retrieval and Perfect Hashing.  The retrieval problem (considered later in Section 7) occurs in the context of constructing perfect hash functions [3, 6, 7, 8, 31]. The known approaches involve finding a solution for a system of equations where is a hypergraph, a function and a small set. If is a field, then the incidence matrix of needs to have full rank over to guarantee the existence of a solution. If is peelable however, then the existence of a solution is guaranteed even if only has a group structure. Moreover, it can be computed in linear time.

In these contexts, the hypergraph typically has vertex set and for each element of an input set , an edge is created with incidences chosen via hash functions. For theoretical considerations, the edges

are often assumed to be independent random variables. This has proven to be a good model for practical settings, even though perfect independence is not achieved by most practical hash functions. An important choice left to the algorithm designer is the distribution of

.

Previous work.

If the distribution is such that edges have degree or less (in particular if is a graph with edges), then – due to the well-known “birthday paradox” – there is a constant probability that an edge is repeated. In that case, is clearly not peelable. The simplest workable candidate for the distribution of is therefore to pick a constant and let contain vertices chosen independently and uniformly at random. We refer to these standard hypergraphs as -uniform Erdős-Renyi hypergraphs where is the edge density, i.e. the number of edges over the number of vertices. Corresponding peelability thresholds have been determined in [36] meaning if then is peelable with high probability (whp), i.e. with probability approaching as and if then is not peelable whp. The largest threshold is . Since the edge density is often tightly linked to a performance metric (e.g. memory efficiency of a dictionary, rate of a code) a density closer to would be desirable, but we know of only two alternative constructions.

To obtain erasure codes with high rates the authors of [29] construct for any hypergraphs with edge sizes in , average edge size and edge density that are peelable whp. In particular, this yields peelable hypergraphs with edge densities arbitrarily close to . A downside is that the high maximum edge size can lead to worst case query times of in certain contexts. Motivated by this, the author of [40] looked into non-uniform hypergraphs with constant maximum edge size. Focusing on hypergraphs with two admissible edge sizes, he found for example that mixing edges of size and size yields a family of hypergraphs with peelability threshold .

Our construction.

In this paper we introduce and analyse a new distribution on edges that yields -uniform hypergraphs with high peelability thresholds that perform well in practical algorithms.

We call our hypergraphs fuse graphs (as in the cord attached to a firecracker). There is an underlying linear geometry and similar to how fire proceeds linearly through a lit fuse, the peeling process proceeds linearly through our hypergraphs, in the sense that vertices on the inside of the line tend to only become peelable after vertices closer to the end of the line have already been removed.

Formally, for , and we define the family of -uniform fuse graphs as follows. The vertex set is where for the vertices form the -th segment111Denoting the segment size by instead of the number of vertices is more convenient. Note that still holds.. The edge set has size . Each edge is independently determined by one uniformly random variable denoting the type of and independent random variables uniformly distributed in , yielding . In other words, contains one uniformly random vertex from each segment . There may be repeating edges but the probability that his happens is . The edge density approaches for .

Results.

Let the peelability threshold for -ary fuse graphs be defined as

Our Main Theorem relates to the orientability threshold of -ary Erdős-Renyi hypergraphs and the erosion threshold defined in the technical part of our paper.

For any we have . The orientability thresholds are known exactly [11, 20, 21] and we determine lower bounds on the erosion thresholds . As shown in Table 1, this makes it possible to narrow down to an interval of size for all .

3 4 5 6 7
0.9179352469 0.9767692112 0.9924345766 0.9973757381 0.9990561294
0.9179352767 0.9767701649 0.9924383913 0.9973795528 0.9990637588
0.9179353065 0.9767711186 0.9924422067 0.9973833675 0.9990713882
0.917935 0.97677 0.99243 0.99738 0.99906
Table 1: The erosion thresholds and peelability thresholds for -ary fuse graphs satisfy . The values play a role in Section 5.

Outline.

The paper is organised as follows. In Section 2 we idealise the peeling process by switching to the random weak limit of our hypergraphs, and capture the essential behaviour of the process in terms of an operator acting on functions . For this operator, we identify the properties of being eroding and consolidating as well as corresponding thresholds and in Section 3. We then prove the “” part of our theorem in Section 4 and give numerical approximations of and in Section 5. The comparatively simple “” part of our theorem is independent of these considerations and is proved in Section 6. Finally, in Section 7 we demonstrate how using our hypergraphs can improve the performance of practical retrieval data structures.

2 The Peeling Process and Idealised Peeling Operators

In this section we consider how the probabilities for vertices to “survive” rounds of peeling changes from one round to the next. In the classical setting this could be described by a function, mapping the old probability to the new one [36]. In our case, however, there are distinct probabilities for each segment of the graph. Thus we need a corresponding operator that acts on sequences of probabilities. Conveniently, it will be independent of and .

We almost always suppress in notation outside of definitions, assuming to be large. Big- notation refers to while are constant.

Consider the parallel peeling process on . In each round of , all vertices of degree or are determined and then deleted simultaneously. Deleting a vertex implicitly deletes incident edges. We also define the rooted peeling process for any vertex , which behaves exactly like except that the special vertex may only be deleted if it has degree , not if it has degree . For any and we let be the probability that a vertex of segment survives rounds of , i.e. is not deleted. Note that the probability is well-defined as vertices of the same segment are symmetric.

By definition, for all . Whether a vertex of segment survives rounds is a function of its -neighbourhood , i.e. the set of vertices and edges of that can be reached from by traversing at most hyperedges.

It is standard to consider the random weak limit of to get a grip on the distribution of and thus on . Intuitively, we identify a (possibly infinite) random tree that captures the local characteristics of for . See [1]

for a good survey with examples and details on how to formally define the underlying topology and metric space. In the limit, the binomially distributed vertex degrees (e.g.

for vertices of segment

) become Poisson distributed (

for segment ). Short cycles are not only rare but non-existent and certain weakly correlated random variables become perfectly independent.

[Limiting Tree] Let , and . The random (possibly infinite) hypertree is distributed as follows.

has a root vertex of segment222In the current context, the segment of a vertex is an abstract label. There can be an unbounded number of vertices of each segment. which, for each , has child edges of type . Each child edge of type is incident to (fresh) child vertices of its own, one for each segment . The sub-hypertree at such a child vertex of segment is distributed recursively (and independently of its sibling-subtrees) according to . Since all arguments are standard in contexts where local weak convergence plays a role, we state the following lemma without proof. For instance, a full argument to show a similar convergence is given in [26]. See also [25] for the related technique of Poissonisation. Let be constant. Let further be the -neighbourhood of a vertex of segment in and the -neighbourhood of , both viewed as undirected and unlabelled hypergraphs. Then converges in distribution to as . We now direct our attention to survival probabilities in the idealised peeling processes , which are easier to analyse than those of .

Let be constant and be the probability that survives rounds of for . Then

Proof.

Let and . Assume is the type of some edge incident to . Edge survives rounds of if and only if all of its incident vertices survive these rounds. Since itself may not be deleted by as long as exists, the relevant vertices are the child vertices, one for each segment . Call these and denote the subtrees rooted at those vertices by . Now consider the peeling processes . Assume one of them, say , deletes in round , meaning has degree before round . It follows that has degree at most before round in , meaning deletes in round (or earlier). Conversely, if none of delete their root vertex within rounds, then have degree at least after round of and survives round of . This makes the probability for to survive rounds of equal to . Since the number of edges of type incident to has distribution , the number of edges of type incident to surviving rounds of is a correspondingly thinned out variable, namely , which means .

The claim now follows by observing that survives rounds of if and only if at least one of its child edges survives rounds of :

Replacing with its definition completes the proof. ∎

For convenience we define, for and , the operator , which maps any to with

Together Lemmas 2 and 2 imply that can be used to approximate survival probabilities. Let be constant. Then for all

To obtain upper bounds on survival probabilities, we may remove the awkward restriction “” in the definition of . We define as mapping to with

Note that does not depend on or . To simplify notation, we assume that the old operator also acts on functions , ignoring for , and producing with for . We also extend to be

, i.e. the characteristic function on

, essentially introducing vertices of segments which are, however, already deleted with probability before the first round begins. Note that while and are by definition non-increasing in , this is not the case for . For instance, has support , which grows with .333It is still possible to interpret as survival probabilities in more symmetric extended versions of the tree , but we will not pursue this. The following lemma lists a few easily verified properties of . All inequalities between functions should be interpreted point-wise.

  1. . • commutes with the shift operators and defined via and . In other words, we have .

    is monotonic, i.e. . • respects monotonicity, i.e. if is (strictly) increasing in , then so is .

3 Two Fixed Points Battling for Territory

In this section we define the erosion and consolidation thresholds at which the behaviour of changes in crucial ways.

First, we require a few facts about the function mapping . It appears in the analysis of cores in -ary Erdős-Renyi hypergraphs , essentially mapping the probability for a vertex to survive rounds of peeling to the probability to survive rounds of peeling, see [36, page 5]444Our setting corresponds to the choices .

The threshold for the appearance of a core in turns out to be the threshold for the appearance of a non-zero fixed point of . The following is implicit in the analysis.

Fact ([36, Proofs of Lemmas 3 and 4]).
  1. • For ,   has only the fixed point , with . • For , there are exactly three fixed points , and where while .

This implies the following behaviour of applying repeatedly to a starting value . This should be immediately clear from the sketches we give on the right.