Compact I/O-Efficient Representation of Separable Graphs and Optimal Tree Layouts

by   Tomáš Gavenčiak, et al.
Charles University in Prague

Compact and I/O-efficient data representations play an important role in efficient algorithm design, as memory bandwidth and latency can present a significant performance bottleneck, slowing the computation by orders of magnitude. While this problem is very well explored in e.g. uniform numerical data processing, structural data applications (e.g. on huge graphs) require different algorithm-dependent approaches. Separable graph classes (i.e. graph classes with balanced separators of size O(n^c) with c < 1) include planar graphs, bounded genus graphs, and minor-free graphs. In this article we present two generalizations of the separator theorem, to partitions with small regions only on average and to weighted graphs. Then we propose I/O-efficient succinct representation and memory layout for random walks in(weighted) separable graphs in the pointer machine model, including an efficient algorithm to compute them. Finally, we present a worst-case I/O-optimal tree layout algorithm for root-leaf path traversal, show an additive (+1)-approximation of optimal compact layout and contrast this with NP-completeness proof of finding an optimal compact layout.



page 1

page 2

page 3

page 4


Space-Efficient Graph Coarsening with Applications to Succinct Planar Encodings

We present a novel space-efficient graph coarsening technique for n-vert...

Dynamic Effective Resistances and Approximate Schur Complement on Separable Graphs

We consider the problem of dynamically maintaining (approximate) all-pai...

Improved approximation of layout problems on random graphs

Inspired by previous work of Diaz, Petit, Serna, and Trevisan (Approxima...

Planar graphs have bounded nonrepetitive chromatic number

A colouring of a graph is "nonrepetitive" if for every path of even orde...

Warehouse Layout Method Based on Ant Colony and Backtracking Algorithm

Warehouse is one of the important aspects of a company. Therefore, it is...

A Simple Approximation for a Hard Routing Problem

We consider a routing problem which plays an important role in several a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern computer memory consists of several memory layers that together constitute a memory hierarchy with every level further from the CPU being larger and slower [2], usually by more than an order of magnitude, e.g. CPU registers, L1 – L3 caches, main memory, disk drives etc. In order to simplify the model, commonly only two levels are considered at once, called main memory and cache of size . There, the main memory access is block-oriented, assuming unit time for reading and writing of a block of size , making random byte access very inefficient. While some I/O-efficient algorithms need to know the values of and (generally called cache-aware)[3], cache-oblivious algorithms[13] operate efficiently without this knowledge.

Computations that process medium to large volumes of data therefore call for space-efficient data representations (to utilize the memory capacity and bandwidth) and strongly benefit from optimized memory access patterns and layouts (to utilize the data in fast caches and read-ahead mechanisms). While this area is very well explored in e.g. numerical data processing and analysis (e.g. [24]), structural data applications (e.g. huge graphs) require different and application-dependent approaches. We describe a representations to address these issues in separable graphs and trees.

Separable graphs satisfy the -separator theorem for some , shown for planar graphs in 1979 by Lipton and Tarjan [29] (with ), where every such graph on vertices has a vertex subset of size that is a -balanced separator (i.e. it separates the graph into two subgraphs each having at most -fraction of vertices). These graphs not only include planar graphs [29] but also bounded genus graphs [17] and minor-free graph classes in general [22]. Small separators are also found in random graph models of small-world networks (e.g. geometric inhomogeneous random graphs by Bringmann et al. [7] have sublinear separators w.h.p. for all subgraphs of size ). Some graphs which come from real-world applications are also separable, such as the road network graphs [35, 33]. Separable graph classes have linear information entropy (i.e. a separable class can contain only graphs of size ) and have efficient representations using only bits per vertex on average[4] and therefore utilize the memory capacity and bandwidth very efficiently.

This paper is organized as follows: Sections 1.1 and 1.2 give an overview of the prior work and our contribution. Section 2 recalls used concepts and notation. Section 3 contains our results on random walks in separable graphs. Section 4 generalizes the separator theorem. Section 5 discusses the layout of trees.

1.1 Related work

Turán [34] introduced a succinct representation111A succinct (resp. compact) data representation uses (resp. ) bits where is the class information entropy. of planar graphs, Blandford et al. [4] introduced compact representations for separable graphs and Blelloch and Farzan [5] presented a succinct representation of separable graphs. However, none of those representations is cache-efficient (or can be easily made so). Analogous representations for general graphs suffer similar drawbacks [32, 12].

Agarwal et al. [1] developed a representation of planar graphs allowing I/O-efficient path traversal, requiring block accesses222Note that blocks may be required even for trees. Standard graph representation would access blocks. for arbitrary path of length . This has been extended to a succinct planar graph representation by Dillabaugh et al. [11] with the same result for arbitrary path traversal. It appears unlikely that the representation of [11] could be easily modified to match the I/O complexity of our random-walk algorithm due to their use of a global indexing structure.

Dillabaugh et al. [10] describes a succinct data structure for trees that uses I/O operations for leaf-to-root path traversal. For root-to-leaf traversal, they offer a similar but only compact structure.

Among other notable I/O-efficient algorithms, Maheshwari and Zeh [30] develop I/O-efficient algorithms for computing vertex separators, shortest paths and several other problems in planar and separable graphs. Jampala and Zeh [20] extends this to a cache-oblivious algorithm for planar shortest paths. While there are representations even more efficient than succinct (e.g. implicit representations, which use only bits more than the class information entropy, see Kannan et al. [21] for an implicit graph representation), these do not seem to admit I/O-efficient access.

Random walks on graphs are commonly used in Monte Carlo sampling methods, among others in Markov Chain Monte Carlo methods for inference on graphical models 


, Markov decision process (MDP) inference and even in partial-information game theory algorithms 


1.2 Our contribution

Random walks on separable graphs. We present a compact cache-oblivious representation of graphs satisfying the edge separator theorem. We also present a cache-oblivious representation of weighted graphs satisfying weighted

edge separator theorem, where the transition probabilities depend on the weights. The representations are I/O-efficient when performing random walks of any length on the graph, starting from a vertex selected according to the stationary distribution and with transition probabilities at each step proportional to the weights on the incident edges, respectively choosing a neighbor uniformly at random for the unweighted compact representation.

Namely, if every vertex contains bits of extra (user) information, the representation uses bits and a random path of length (sampled w.r.t. edge weights) uses I/O operations with high probability.

The graph representation is compact (as the structure entropy including the extra bits is . The amount of memory used for the representation of the graph is asymptotically strictly smaller than the memory used by the user data already for the common case of , in which case only I/O operations are used. For , the representation uses bits.

In contrast with previous I/O-efficient results for planar graphs, our representation is only compact (and not succinct) but works for all separable graph classes, is cache-oblivious (in contrast to only cache-aware in prior work), and, most importantly, comes with a much better bound on the number of I/O operations for randomly sampled paths (order of rather than ).

Fast tree path traversal is a ubiquitous requirement for tree-based structures used in external storage systems, database indexes and many other applications. With Theorem 5.1, we present a linear time algorithm to compute a layout of the vertices in memory minimizing the worst-case number of I/O operations for leaf-to-root paths in general trees and root-to-leaf paths in trees with unit vertex size. We show an additive (+1)-approximation of an optimal compact layout (i.e. one that fully uses a consecutive block of memory) and show that finding an optimal compact layout is -hard.

The above layout optimality is well defined assuming unit vertex size, an assumption often assumed and satisfied in practice. Using techniques from Section 3 we can turn the layout into a compact representation using bits of memory, requiring at most I/O operations for leaf-to-root paths in general trees and root-to-leaf paths in trees of fixed degree where is the I/O complexity of the optimal layout, i.e. I/O-optimal layout with the vertices using any conventional vertex representation with bits for inter-vertex pointers. See Theorem 5.2.

Compared to previous results [10], our representation is compact and we present the exact optimum over all layouts while they provide the asymptotic optimum . However, this does not guarantee that our representation has lower I/O complexity, since our notion of optimality only considers different layouts with each vertex stored by a structure of unit size.

Separable graph theorems. We prove two natural generalizations of the separator theorem (Theorem 4.2) and show that their natural joint generalization does not hold by providing a counterexample (Theorem 4.3). The Recursive Separator Theorem involves graph partitions coming from recursive applications of the Separator Theorem. Let and denote the maximum and average size of a region in the partition, respectively. We prove stronger bound on number of edges going between regions – instead of . The second generalization is for weighted graphs, showing that in the bound can be replaced by the total weight to get . We show that the bound does not hold in general by providing a counterexample.

2 Preliminaries

Throughout this paper, we use standard graph theory notation and terminology as in Bollobas [6]. We denote the subtree of rooted in vertex by , the root of tree by and the set of children of a vertex as . All the logarithms are binary unless noted otherwise.

We use standard notation and results for Markov chains as introduced in the book by Grinstead and Snell [19] (chapter 11) and mixing in Markov chains, as introduced in the chapter on mixing times in a book by Levin and Peres [27].

2.1 Separators

Let be a class of graphs closed under the subgraph relation. We say that satisfies the vertex (edge) -separator theorem iff there exist constants and such that any graph in has a vertex (edge) cut of size at most that separates the graph into components of size at most . We define a weighted version of vertex (edge) separator theorem, which requires that there is a balanced vertex (edge) separator of total weight at most , where is the sum of weights of all the edges. Note that these definitions make sense even for directed graphs. -separator theorem without explicit statement whether it is edge or vertex separator, means vertex separator theorem.

Many graphs that arise in real-world applications satisfy vertex or edge separator theorem.

It has been extensively studied how to find balanced separators in graphs. In planar graphs, a separator of size can be found in linear time [29]. Separators of the same size can be found in minor-closed families in time for any [22]. A balanced separator of size can be found in finite-element mesh in expected linear time [31]

. Good heuristics are known for some graphs which arise in real-world applications, such as the road network

[33]. A poly-logarithmic approximation which works on any graph class is known [26]. A poly-logarithmic approximation of the separators will be sufficient to achieve almost the same bounds in our representation (differing by a factor at most poly-logarithmic in ).

We define a recursive separator partition to be a partition of vertex set of a graph, obtained by the following recursive process. Given a graph , we either set the whole to be one set of the partition or do the following:

  1. Apply separator theorem. This gives us partition of into two sets from the separator theorem.

  2. Recursively obtain recursive separator partitions of and .

  3. Return the union of the partitions of and as the partition of .

We call the sets in a recursive separator partition regions.
If there is an algorithm that computes balanced separator in time , there is an algorithm that computes recursive separator partition with region size in time for any . A stronger version called -division can be computed in linear time on planar graphs [18].

2.2 I/O complexity

For definitions related to I/O complexity, refer to Demaine [8]. We use the standard notation with being the block size and the cache size. Both and is counted in words. Each word has bits and it is assumed that .

3 Representation for Random Walks

In this section, we present our cache-oblivious representation of separable graphs optimized for random walks and related results.

Theorem 3.1

Let be a graph from a graph class satisfying the edge separator theorem where every vertex contains extra bits of information. Then there is a cache-oblivious representation of using bits in which a random walk of length starting in a vertex sampled from the stationary distribution uses in expectation I/O operations. Moreover, such representation can be computed in time for any .

For other random walks and weighted graphs where the transition probabilities are proportional to the random walk stationary distribution, we can show a weaker result. Namely, we can no longer guarantee a compact representation.

Theorem 3.2

Let be any Markov chain of random walks on a graph and assume has a unique stationary distribution . Assume satisfies the edge separator theorem with respect to the edges-traversal probabilities in . Let be a Markov chain of random walks on with transition probabilities proportional to , e.g. . Then there is a layout of vertices of into blocks with vertices each such that a random walk in of length crosses memory block boundary in expectation times.

Note that this gives an efficient memory representation when and the probabilities on incident edges can be represented by (or computed from) words, which is the case for bounded degree graphs with some chains . We also note that such partially-implicit graph representations are present in the state graphs of some MCMC probabilistic graphical model inference algorithms.

Additionally, we present a result on the concentration of the number of I/O operations which applies to both Theorems 3.1 and 3.2.

Theorem 3.3

Let be a fixed graph, the mixing time of and the number of edges going between blocks crossed during the random walk. Then the probability that does not hold is for some value and , where the variable indicates if the walk crossed an edge between two different blocks in step .

The following lemma is implicit in [4], as the authors use the same layout to get compact representation of separable graphs and they use the following property.

Lemma 1 (Blandford et al. [4])

If in Theorem 3.2 gives the same traversal probability to all edges, the representation induces a vertex order such that .

3.1 Proofs of Theorems 3.1 – 3.3

Proof (Proof of Theorem 3.1)

Since the stationary distribution on an undirected graph assigns equal probability to every edge, we can apply Lemma 1 on to obtain vertex ordering such that . We could therefore compactly store the edges as variable-width vertex order differences (offsets). However, it is not straightforward to find the memory location of a given vertex when a variable-width encoding is used. To avoid an external (and I/O inefficient) index used in some other approaches, we replace the edge offset information with relative bit-offsets, directly pointing to the start of the target vertex, using Theorem 3.4 on the edge offsets. We expand the representation by inserting the bits of extra information to every vertex, adjusting the pointers and thus widening each by bits.

To prove the bound on I/O complexity, we use the same argument as in the proof of Theorem 3.2. Average of bits is used for representation of single vertex and, therefore, average of vertices fit into one cache line. By Theorem 4.2, part i, the total probability on edges going between memory blocks is . Again, by linearity of expected value, this proves the claimed I/O complexity.

Compact representation as in Theorem 3.4 can be computed in the claimed bound, as is shown in Theorem 3.5. ∎

Proof (Proof of Theorem 3.2)

We use the following recursive layout. Let be an edge separator with respect to edge-traversal probabilities in . Then partitions into two subgraphs and . We recursively lay out and and concatenate the layouts. Note that and are stored in memory contiguously. At some level of recursion, we get partition into subgraphs represented by between and words for constant. We call these subgraphs block regions. Since the average degree in graphs satisfying edge separator theorem is [28], the average vertex representation size is also and the average number of vertices in a block region is, therefore, . It follows from Theorem 4.2, part ii, that the total probability on edges going between block regions is . From linearity of expectation, -fraction of steps in the random walk cross between block regions in expectation. Moreover, each of the block regions in the partition is stored in memory blocks, which proves the claimed bound on I/O complexity. ∎

Proof (Proof of Theorem 3.3)

Let be the number of edges crossed during the random walk that go between blocks. We are assuming that there is at least one edge going between two blocks in the graph.

We choose (arbitrary constant would work). Note that is a number of steps, after which the probabilities on edges differ from those in stationary distribution by at most , regardless from what distribution we started the random walk since [27]. This means that the probability that an edge going between two blocks is crossed after steps differs by at most -fraction from the probability in stationary distribution.


be indicator random variable that is

iff the random walk crosses edge going between blocks in step . We consider the following sets of random variables for (not conditioning on variables with nonpositive indices). Note that the random variables in each of sets are independent and , as mentioned above. Let be and . Note that for each . By applying the Chernoff inequality, we get that the following bounds hold for all for some for each :

The probability that there exists such that either or is by the union bound for some value of at most the following:

Note that converges to , which is the value that we are showing concentration of around. The asymptotic bound on the probability follows. ∎

3.2 Expanding relative offsets to relative bit-offsets

Having the edges of a graph encoded as relative offsets to the target vertex and having these numbers encoded by a variable-length encoding, we need a way to find the exact location of the encoded vertex. Others have used a global index for this purpose but this is generally not I/O-efficient.

Our approach encodes the relative offsets as slightly wider numbers that directly give the relative bit-index of the target. However, this is not straightforward as expanding just one relative offset to a relative bit-offset can make other bit-offsets (spanning over this value) larger and even requiring more space, potentially cascading the effect.

Note that one simple solution would be to widen every offset representation by bits where is the total number of bits required to encode all the offsets, yielding encoding. bits are sufficient to store each offset. Therefore, by expanding the offsets, they increase at most times. By adding bits, we can encode increase of offsets by factor of up to .

However, we propose more efficient encoding with the following theorem. We interpret the numbers as relative pointers, -th number pointing to the location of the -th value. In the proof, we use a dynamic width gamma number encoding in the form , where -th bit encodes whether is the last bit encoded.

Theorem 3.4

Let be a sequence of numbers such that and . Then there are -element sequences (the encoded bit-widths) and (the bit-offsets) of numbers such that for all , (i.e.  can be gamma-encoded in bits), where (so is a relative bit-offset of encoded position ) and .


There are certainly some non-optimal valid choices for ’s and ’s, and we can improve upon them iteratively by shrinking ’s to fit gamma-encoded with sign (i.e. ), which may, in turn, decrease some ’s. Being monotonic, this process certainly has a fixpoint and and we assume arbitrary such fixpoint.

Let and be constants to be fixed below. Denote and (resp. when ). Intuitively, when expanding offsets to bit offsets , it may happen that contains with , forcing . We amortize such cases by distributing ”extra bits” to such ”smaller” offsets.

Let and let (or undefined if there is no such ) and let . Observe that since all have . We also note that implies since would imply and leading to and , which gives the desired contradiction with large enough (depending only on ).

We will distribute the extra bits starting from the largest ’s. Every uses bits for its encoding and distributes another bits to . Let be the number of extra bits received from in this way.

For every offset we use bits and the received bits . Since the received bits are accounted for in other offsets, this uses bits in total. Therefore we only need to show that the number of bits thus available at is sufficient, i.e. that (one to represent , one to distribute to ).

Now either there is and we have so and noting that for large enough only depending on : , so we obtain as desired.

On the other hand, undefined implies that . Therefore and . Now we may fix , obtaining as required for . This finishes the proof for any fixpoint and . ∎

The algorithm from the beginning of the proof can be shown to run in polynomial time. We start with e.g.  and . Then we iteratively update and recompute as above. Since every iteration takes time and in every iteration at least one decreases, the total time is at most . In the following section, we show an algorithm that computes a representation with the same asymptotic bounds, running in time for any .

3.2.1 Constructing the compact representation

In this section, we use notation defined in section 3.2, specifically and . Recall that is the set of edges of spanned by the edge in the representation and is the relative offset of edge in the (expanded) representation). Let be the graph we want to represent. We assume that satisfies the edge separator theorem.

We find a representation using bits, as mentioned above by expanding all pointers and then modify it to make it compact.

We define a directed graph on the set with arc going from to iff . Let us fix a recursive separator hierarchy of . We call the level of recursion on which the edge is part of the separator. We define a graph to be the subgraph of induced by vertices corresponding to edges of which appear in the recursive separator hierarchy in a separator of subgraph of size at most .

The following lemma will be used to bound the running time of the algorithm:

Lemma 2

The maximum out-degree of is . For any fixed , where is some constant depending only on and .


We first prove that maximum out-degree of is .

There are edges with spanning any single vertex. The number of edges spanning some vertex with decreases exponentially with , resulting in a geometric sequence summing to .

The maximum out-degree of is the same as that of graph corresponding to a subgraph of of size at most . Maximum out-degree of is, therefore, .

The number of vertices in is equal to the number of edges in going between blocks of size . This number is, by Theorem 4.2, equal to , which is for some . ∎

Theorem 3.5

Given a separator hierarchy, the representation from Theorem 3.1 can be computed in time for any .


We first describe an algorithm running in time , where is the constant from the separator theorem, and then improve it.

Just as in the proof of Theorem 3.4, denotes the relative offset of edge in the representation. We store a counter for each vertex equal to the decrease of required to shrink its representation by at least one bit. That is, , where is rounded down to closest power of two. When we shrink the representation of edge corresponding to vertex , we have to update counters for all , such that . Since the out-degree of is , the updates take time. We start with representation with bits and at each step, we shorten the representation by at least one bit. This gives the running time of .

To get the running time of , we consider the graph for some sufficiently small epsilon. Note that the maximum out-degree of is . We can fix small enough to decrease the maximum out-degree to . Therefore, by using the same algorithm as above on graph for sufficiently small, we can get a running time of for any fixed . The representations of edges corresponding to vertices not in the graph are not shrunk.

Note that the presumptions of Theorem 3.4 are fulfilled by the edges corresponding to vertices in and the obtained representation of graph , is therefore compact. The edges not in are then added, increasing some offsets. The representation of an offset of length at least for is never increased asymptotically by inserting edges since it already has bits. There are at most edges of shorter than that span any single inserted edge. Lengthening of offsets shorter than , therefore, contributes at most for some sufficiently small. The inserted edges themselves have representations of total length . Additional bits are used after the insertion of edges and the representation, therefore, remains compact. ∎

4 Separator hierarchy

In this section, we prove two generalizations of the separator hierarchy theorem. Our proof is based on the proof from [23]. Most importantly, we show that the recursive separator theorem also holds if we want the regions to have small size on average and not in the worst case. We also prove the theorem for weighted separator theorem with weights on edges. We show that the natural generalization of our two generalizations does not hold by presenting a counterexample.

Since the two theorems are very similar and their proofs only differ in one step, we present them as one theorem with two variants and show only one proof proving both variants. The difference lies in the reason why the Inequality 1 holds. The following lemma and observation prove the inequality under some assumptions and they will be used in the proof of the theorem.

Observation 4.1

The Inequality 1 holds for .

Lemma 3

The Inequality 1 holds for and , and satisfying the following.


Let . We simplify the inequality

for and satisfying the equality (2). By substituting for and rearranging the inequality, we get

We substitute . Note that this holds for and that we may assume by symmetry. Since the inequality holds for , it is sufficient to show the inequality for with both sides differentiated with respect to . By differentiating both sides and simplifying the inequality, we get

which obviously holds, since and .

Now we proceed to prove the two generalizations of the recursive separator theorem. Note that in the following, is the average or maximum region size, depending on whether the graph is weighted or not.

Theorem 4.2

Let be a (possibly weighted) graph satisfying the separator theorem with respect to its weights and let be its recursive balanced separator partition. Then if either

  1. the graph in not weighted and is the average size of a region in the partition , or

  2. the graph is weighted and is the maximum size of a region in the partition .

Then the total weight of edges not contained inside a region of is , where is the total weight (resp. number if unweighted) of all edges of .

In this proof, let be the total weight of the edges in with denoting the weight of the single edge .


We use induction on the number of vertices to prove the following claim.


Let us have a recursive separator partition of -vertex graph of average region size . Then for some and .

Before the actual proof of this claim, let us define some notation. Let , and be the constants from the separator theorem (recall that separator theorem ensures existence of a partition of into two sets of size at least with edges of total weight at most going across). Let be the maximum value of over all -vertex graphs of total weight and all their recursive separator partitions with average region size . We use to denote a fraction of the number of vertices and to denote a fraction of the total weight.

Proof (Proof of the claim)

We defer the proof of the base case until we fix the constant .

By the separator theorem, satisfies the following recurrence.

where are the respective average region sizes in the two subgraphs. It, therefore, holds that .

From the inductive hypothesis, we get the first inequality of the following. The second inequality follows from the Observation 4.1 for the case i and from the Lemma 3 for the case ii.


It holds that , where is a constant depending only on , since for . We can therefore set such that

This completes the induction step.

For large enough, the claimed bound in the base case is negative and it, therefore, holds. ∎

We conclude this section by showing that the following natural generalization of Theorem 4.2 does not hold:

Theorem 4.3

The following generalization does not hold: Let be a weighted graph satisfying the separator theorem with respect to its weights and let be its recursive separator partition. Let be the average size of a region in the partition . Then the total weight of edges not contained in a region of is , where is the total weight of all edges of .


We show that there is a weighted graph satisfying the -separator theorem with respect to its weight and a recursive partition of with edges going between partition regions of that have total weight , where is the total weight of all edges, and with average region size of .

Let be an unweighted graph of bounded degree satisfying the -separator theorem. We set weights of all its edges to be , except for one arbitrary edge with weight , where is the number of edges of . Note that . We denote this weighted graph by .

Let be a separator in from the separator theorem. We modify in order to obtain a balanced separator in of weight . If , we set . Otherwise, we remove from and add all other edges incident to its endpoints. This gives us which is a separator and its weight differs from the weight of only by an additive constant, since the graph has bounded degree. It follows that satisfies the -separator theorem with respect to its weights.

We consider a partition constructed by the following process. Let be a separator from the separator theorem on , partitioning into vertex sets and . If , we stop and set and as the regions of . Otherwise, without loss of generality, . We set as a region of and recursively partition .

At the end of this process, we get with edges of total weight at least between regions (as is not contained within any region). The partition has regions, so the average region size is . ∎

5 Representation for Paths in Trees

In this section, we show a linear algorithm that computes a cache-optimal layout of a given tree. We are assuming that the vertices have unit size and is the number of vertices that fit into a memory block. The same assumption has been used previously by Gil and Itai [16]. This is a reasonable assumption for trees of fixed degree and for trees in which each vertex only has a pointer to its parent. It does not matter in which direction the paths are traversed and we may, therefore, assume that the paths are root-to-leaf.

We also show that it is NP-hard to find an optimal compact layout of a tree and show an algorithm which gives a compact layout with I/O complexity at most .

Definition 1

Laid out tree: A laid out tree is an ordered triplet , where is a rooted tree and assigns to each vertex the memory block that it is in. We require that at most vertices are assigned to any block. We treat the block 0 specially as the block already in the cache.

We define to be the cost of path in a given layout . We define , the worst-case I/O complexity given free slots, as

where ranges over all root-to-leaf paths and over all layouts that assign at most vertices to block . Since block is assumed to be already in cache, accessing these vertices does not count towards the I/O complexity. We define , the worst-case I/O complexity of laid out tree , to be . This means is the maximum number of blocks on a root-to-leaf path. We define a worst-case optimal layout of a tree given free memory slots as a layout attaining .

We can observe that . From the lemmas below follows that only depends on the subtrees rooted in children of with the maximum value of .

Lemma 4

For any , and is non-increasing in .


The function is monotonous in since a layout given free slots is a valid layout given slots for . Moreover , since we can map vertices in the root’s block to block instead. From this and the monotonicity, the lemma follows. ∎

We define deficit of a tree . Note that . It follows from Lemma 4 that for all and for .

Lemma 5

For , there is a worst-case optimal layout attaining such that root is in block .


Let be a layout that does not assign block to the root. If no vertex is mapped to block , we can move root to block . Since block does not count towards I/O complexity, doing this can only improve the layout. Otherwise, let be vertex, which is mapped to block . We construct layout such that , and for all other vertices . For any path , , since any path which contains in layout already contained it in and block does not count towards the I/O complexity. ∎

It is natural to consider layouts in which blocks form connected subgraphs. This motivates the following definition

Definition 2

A partition of a rooted tree is convex if the intersection of any root-to-leaf path with any set of the partition is a (possibly empty) path.

Let be the set of successors of vertex with maximum value of .

Lemma 6

The function satisfies the following recursive formula for .

where the is over all sequences such that .


By lemma 5, we may assume that an optimal layout attaining for puts the root to block and allocates the remaining slots of block to root’s subtrees, slots to the subtree . On the other hand, from values of , we can construct a layout with cost . ∎

Problem 1

Input: Rooted tree
Output: Worst-case optimal memory layout of .

Theorem 5.1

There is an algorithm which computes a worst-case optimal layout in time . Moreover, this algorithm always outputs a convex layout.


We solve the problem using a recursive algorithm. For each vertex, we compute and . First, we define and .

If , we let and . Otherwise and . As a base case, we use that when . For , we use that .

Using the values and calculated using the above recurrence, we reconstruct the worst-case optimal layout in a recursive manner. When laying out a subtree given free slots, we check whether . If it is, we distribute the empty slots (one is used for the root) in a way that subtrees for get at least empty slots. Otherwise, distribute them arbitrarily. We put the root of a subtree into a newly created block if the subtree gets free slots. Otherwise, we put the root into the same block as its parent. It follows from the way we construct the solution that it is convex.

It follows from lemmas 4 and 6 that if and only if free slots can be allocated among the subtrees such that subtree gets at least of them. It can be easily proven by induction that the algorithm finds for each vertex the smallest number of free slots required to make the allocation possible and calculates the correct value of . ∎

If the subtree sizes are computed beforehand, we spend time in vertex . By charging this time to the children, we show that the algorithm runs in linear time.

This algorithm can be easily modified to give a compact layout which ensures I/O complexity of walking on a root-to-leaf path to be at most . This is especially relevant since finding the worst-case optimal layout is NP-hard, as we show in section 5.1. The algorithm can be modified to give a compact layout by changing the reconstruction phase such that we never give more than free slots to the subtree of rooted in unless . Note that only the last block on a path can have unused slots. We can put blocks which are not full consecutively in memory, ignoring the block boundaries. Any path goes through at most blocks out of which at most one is not aligned, which gives total I/O complexity of .

The following has been proven before in [9] and follows directly from Theorem 5.1.

Corollary 1

For any tree , there is a convex partition of which is worst-case optimal.


The corollary follows from Theorem 5.1, since the algorithm given in the proof is correct and always gives a convex solution. ∎

Since the layout computed by the algorithm is always convex, we never re-enter a block after leaving it. This means that really is the worst-case I/O complexity.

Finally, we show how to construct a compact representation with similar properties. Note that we do not claim optimality among all compact representations but only relative to the tree layout optimality as in Theorem 5.1.

Theorem 5.2

For a given tree with bits of extra data per vertex, there is a compact memory representation of using bits of memory requiring at most I/O operations for leaf-to-root paths in general trees and root-to-leaf paths in bounded degree trees. Here is the I/O complexity of the optimal layout from Theorem 5.1 when we set the vertex size to be for leaf-to-root paths, or to for root-to-leaf paths.


The theorem is an indirect corollary of Theorems 5.1 and 3.4. We set the vertex size as indicated in the theorem statement (depending on the desired direction of paths) and obtain an assignment of vertices to blocks by Theorem 5.1. We call the set of the blocks . Note that for , this is already a compact representation.

For smaller , we construct an auxiliary tree on the blocks representing their adjacency in . We can assume that is a tree due to the convexity of the blocks of . We apply the separator decomposition to obtain an ordering of with short representation of offset edge representation (Lemma 1). Similarly, we can get an ordering for each block in . We order the vertices of according to , ordering the vertices within blocks according to orderings of the individual blocks. We obtain an ordering having offset edge representation of total length , as there is edges going between blocks with offset edge representations of total length and edges within blocks with offset edge representations of total length .

We now apply Theorem 3.4 on the edge offsets still split in memory blocks according to , obtaining a bit-offset edge representation where the vertex representation of every block of still fits within one memory block, as we have previously reserved memory for every pointer and . We merge consecutive blocks whose vertices fit together into one block. This ensures that every block has at least vertices. ∎

5.1 Hardness of worst-case optimal compact layouts

In this section, we prove that it is NP-hard to find a worst-case optimal compact layout (that is, the packing with minimum I/O complexity out of all compact layouts). We show this by reduction from the 3-partition problem, which is strongly NP-hard [15] (i.e. it is NP-hard even if all input numbers are written in unary).

Problem 2 (3-partition)

Input: Natural numbers .
Output: Partition of into sets such that for each .

Theorem 5.3

It is NP-hard to find a worst-case optimal compact layout of a given tree .


We let . We construct the following tree. It consists of a path of length rooted in . For each number from the 3-partition instance, we create a path of length . We connect one of the end vertices of each of these paths to .

Next, we prove the following claim. There is a layout of I/O complexity 2 iff the instance of 3-partition is a yes instance. We can get such layout from a valid partition easily by putting in a memory block exactly the paths corresponding to ’s that are in the same partition set. For the other implication, we first prove that is stored in one memory block. If it were not, we would visit at least two different memory block while traversing and there would be a root-to-leaf path that would visit three memory blocks. If is stored in one memory block, the I/O complexity of the tree is 2 iff the paths can be partitioned such that ever no part is stored in multiple memory blocks. There is such partition iff the instance of 3-partition is a yes instance. ∎

6 Further research

Finally, we propose several open problems and future research directions.

Experimental comparison of traditional graph layouts with the layouts presented in our work and layouts proposed in prior work could both direct and motivate further research in this area.

While we optimize the separable graph layout for random walks it is conceivable that a minor modification would also match the worst-case performance of the previous results.

The worst-case performance of the algorithm for finding the bit-offsets in Section 3.2 is most likely not optimal, and we suspect that the practical performance would be much better.

For the sake of simplicity, both our and prior representations of trees assume fixed vertex size (e.g. implicitly in the results on layouts) or allow extra bits per vertex in the compact separable graph representation. This could be generalized for vertices of different sizes and unbounded degrees.


  • Agarwal et al. [1998] Agarwal, P.K., Arge, L., Murali, T., Varadarajan, K.R., Vitter, J.S.: I/O-efficient algorithms for contour-line extraction and planar graph blocking. In: SODA. pp. 117–126 (1998)
  • Aggarwal et al. [1987] Aggarwal, A., Chandra, A.K., Snir, M.: Hierarchical memory with block transfer. In: Foundations of Computer Science. pp. 204–216. IEEE (1987)
  • Aggarwal and Vitter [1988] Aggarwal, A., Vitter, Jeffrey, S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (Sep 1988),
  • Blandford et al. [2003] Blandford, D.K., Blelloch, G.E., Kash, I.A.: Compact representations of separable graphs. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. pp. 679–688. SIAM (2003)
  • Blelloch and Farzan [2010] Blelloch, G.E., Farzan, A.: Succinct representations of separable graphs.

    In: Annual Symposium on Combinatorial Pattern Matching. pp. 138–