Storing Set Families More Compactly with Top ZDDs

04/09/2020 ∙ by Kotaro Matsuda, et al. ∙ The University of Tokyo 0

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmicin ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller size than ZDDs for real data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Zero-suppressed Binary Decision Diagrams (ZDDs) [Minato] are data structures which are derived from Binary Decision Diagrams (BDDs) [Bryant]

and which represent a family of sets (combinatorial sets) in a compressed form by Directed Acyclic Graphs (DAGs). ZDDs are data structures specialized for processing set families and it is known sparse set families can be compressed well. ZDDs support binary operations between two set families in time polynomial to the ZDD size. Because of these advantages, ZDDs are used for combinatorial optimization problems and enumeration problems.

Though ZDDs can store set families compactly, their size may grow for some set families, and we need further compression. DenseZDDs [denseZDD] are data structures for storing ZDDs in a compressed form and supporting operations on the compressed representation. DenseZDDs represent a ZDD by a spanning tree of the DAG representing it, and an array of pointers between nodes on the spanning tree. Therefore its size is always linear to the original size and to compress more, we need another representation.

Our basic idea for compression is as follows. In a ZDD, the identical sub-structures are shared and replaced by pointers. However identical sub-structures cannot be shared if they appear at different heights in ZDD. As a result, even if the DAG of a ZDD contains repetitive structures in height direction, they cannot be shared.

For not DAGs but trees, there exists a data structure called top DAG compression [Bille], which can capture repetitive structures in height direction. We extend it for DAGs and apply to compress ZDDs which support the operations on compressed ZDDs.

1.1 Our contribution

We propose top ZDDs, which partition the edges of a ZDD into a spanning tree and other edges called complement edges, and store each of them in a compressed form. For the spanning tree, we use the top DAG compression, which represents a tree by a DAG with fewer number of nodes. For the complement edges, we store them in some nodes of the top DAG by sharing identical edges. We show that basic operations on ZDDs can be supported in time where is the number of nodes of the ZDD. For further compression we use succinct data structures for trees [Navarro14] and for bitvectors [Raman07, Grossi05].

We show experimental results on the size of our top ZDDs and existing data structures, and query time on them. The results show that the top ZDDs use less space for most of input data.

2 Preliminaries

Here we explain notations and basic data structures.

Let be the universal set. Any set in this paper is a subset of . The empty set is denoted by . For a set , its size is denoted by . The size of the empty set is . A subset of the power set of is called a set family. If a set family satisfies either or , is said to be monotone. If the former is satisfied, is monotone decreasing and the latter monotone increasing.

2.1 Zero-suppressed Binary Decision Diagrams

Zero-suppressed Binary Decision Diagrams (ZDDs) [Minato] are data structures for manipulating finite set families. A ZDD is a directed acyclic graph (DAG) with a root node satisfying the following properties. A ZDD has two types of nodes; branching nodes and terminal nodes. There are two types of terminal nodes and . These terminal nodes have no outgoing edges. Each branching node has an integer label , and also has two outgoing edges -edge and -edge. The node pointed to by the -edge (-edge) of is denoted by (). If for any branching node it holds and , the ZDD is said to be ordered. In this paper, we consider only ordered ZDDs. For convenience, we assume for terminal nodes . We divide the nodes of the ZDD into layers according to the labels of the nodes. Note that if there are no edges from layer to layer . The number of nodes in ZDD is denoted by and called the size of the ZDD. On the other hand, the data size of a ZDD stands for the number of bits used in the data structure representing the ZDD.

The set family represented by a ZDD is defined as follows. [The set family represented by a ZDD] Let be a node of a ZDD and , . Then the set family represented by is defined as follows.

  1. If is a terminal node: if , , if , .

  2. If is a branching node: .

For the root node of ZDD , corresponds to the set family represented by the ZDD . This set family is also denoted by .

All the paths from the root to the terminal on ZDD have one-to-one correspondence to all the sets in the set family represented by . Consider a traversal of nodes from the root towards terminals so that for each branching node on the path, if we go to from , and if we go to from . By repeating this process, if we arrive at , and if we arrive at or the branching node corresponding to does not exist.

2.2 Succinct data structures

Succinct data structures are data structures whose size match the information theoretic lower bound. Formally, a data structure is succinct if any element of a finite set with cardinality is encoded in bits. In this paper we use succinct data structures for bitvectors and trees.

2.2.1 Bitvectors

Bitvectors are the most basic succinct data structures. A length- sequence of ’s and ’s is called a bitvector. On this bitvector we consider the following operations:

  • (): returns , the -th entry of .

  • (): returns the number of in the first bits of .

  • (): returns the position of the -th occurrence of in .

The following result is known. ([Raman07]) For a bitvector of length , using a -bit data structure constructed in time, are computed in constant time on the word-RAM with word length .

Consider a bitvector of length with ones. For a sparse bitvector, namely, the one with , we can obtain a more space-efficient data structure. ([Grossi05]) For a bitvector of length with ones, is computed in constant time on the word-RAM with word length using a -bit data structure. Note that on this data structure, takes time.

2.2.2 Trees

Consider a rooted ordered tree with nodes. An information-theoretic lower bound of such trees is bits. We want to support the following operations: (1) : returns the parent of node , (2) : returns the first/last child of node , (3) : returns the next/previous sibling of node (4) : returns if node is a leaf or not, (5) : returns the preorder of node , (6) : returns the node with preorder , (7) : returns the number of leaves whose preorders are smaller than that of node , (8) : returns the -th leaf in preorder, (9) : returns the depth of node , that is, the distance from the root to , (10) : returns the number of nodes in the subtree rooted at node , (11) : returns the lowest common ancestor (LCA) between nodes and .

([Navarro14]) On the word-RAM with word length , the above operations are done in constant time using a -bits data structure. We call this the BP representation in this paper.

2.3 DenseZDD

A DenseZDD [denseZDD] is a static representation of a ZDD with attricuted edges [Minato90] by using some succinct data structures. In comparison to the ordinary ZDD, a DenseZDD provides a much faster membership operation and less memory usage for most of cases. When we construct a DenseZDD from a given ZDD, dummy nodes are inserted so that holds for each internal node for fast traversal. The spanning tree consisting of all reversed 0-edges is represented by straight forward BP. The DenseZDD is a combination of this BP and other succinct data structures that represent remaining information of the given ZDD.

3 Top Tree and Top DAG

We explain top DAG compression [Bille] to compress labeled rooted trees.

Top DAG compression is a compression scheme for labeled rooted trees by converting the input tree into top tree [toptree] and then compress it by DAG compression [Buneman03, Downey80, Frick03]. DAG compression is a scheme to represent a labeled rooted tree by a smaller DAG obtained by merging identical subtrees of the tree. Top DAG compression can compress repeated sub-structures (not only subtrees). For example, a path of length with identical labels can be represented by a top DAG with nodes. Also, for a tree with nodes, accessing a node label, computing the subtree size, and tree navigational operations such as first child and parent are done in time. Here we explain the top tree and its greedy construction algorithm. We also explain operations on top DAGs.

The top tree [toptree] for a labeled rooted tree is a binary tree representing the merging process of clusters of defined as follows. We assume that all edges in the tree are directed from the root towards leaves, and an edge denotes the edge from node to node . Clusters are subsets of with the following properties.

  • A cluster is a subset of the nodes of the original tree such that nodes in are connected in .

  • forms a tree and we regard the node in closest to the root of as the root of the tree. We call the root of as the top boundary node,

  • contains at most one node having directed edges to outside of . If there is such a node, it is called the bottom boundary node.

A boundary node is either a top boundary node or a bottom boundary node.

By merging two adjacent clusters, we obtain a new cluster, where merge means to take the union of node sets of two clusters and make it as the node set of the new cluster. There are five types of merges, as shown in Figure 1. In the figure, ellipses are clusters before merge, black circles are boundary nodes of new clusters, and white circles are not boundary nodes in new clusters.

Figure 1: Merging clusters.

These five merges are divided into two.

  1. (a)(b) Vertical merge: two clusters can be merged vertically if the top boundary node of one cluster coincides with the bottom boundary node of the other cluster, and there are no edges from the common boundary node to nodes outside the two clusters.

  2. (c)(d)(e) Horizontal merge: two clusters can be merged horizontally if the top boundary nodes of the two clusters are the same and at least one cluster does not have the bottom boundary node.

The top tree of the tree is a binary tree satisfying the following conditions.

  • Each leaf of the top tree corresponds to a cluster with the endpoints of an edge of .

  • Each internal vertex of the top tree corresponds to the cluster made by merging the clusters of its two children. This merge is one of the five types in Figure 1.

  • The cluster of the root of the top tree is itself.

We call the DAG obtained by DAG compression of the top tree as top DAG , and the operation to compute the top DAG from tree is called top DAG compression [Bille].

We define labels of vertices in the top tree to apply DAG compression as follows. For a leaf of the top tree, we define its label as the pair of labels of both endpoints of the corresponding edge in . For an internal vertex of the top tree, its label must have the information about cluster merge. It is enough to consider three types of merges, not five as in Figure 1. For vertical merges, it is not necessary to store the information that the merged cluster has the bottom boundary node or not. For horizontal merges, it is enough to store if the left cluster has a bottom boundary node or not. From this observation, we define labels of internal vertices as follows.

  • For vertices corresponding to vertical merge: we set their labels as V.

  • For vertices corresponding to horizontal merge: we set their labels as if the left child cluster has the bottom boundary node, or if the right child cluster has the bottom boundary node. If both children do not have bottom boundary nodes, the label can be arbitrary.

Top trees created by a greedy algorithm satisfy the following. ([Bille]) Let be the number of nodes of a tree . Then the height of created by a greedy algorithm is .

Consider to support operations on a tree which is represented by top DAG . From now on, a node in stands for the node with preorder in . By storing additional information to each vertex of the top DAG, many tree operations can be supported [Bille]. For example, returns the label of and returns the subtree of rooted at . For a tree with nodes, all operations except are done in time, and is done in time. Algorithm 1 shows a pseudo code.

4 top ZDD

We explain our top ZDD, which is a representation of ZDD by top DAG compression. Though it is easy to apply our compression scheme for general rooted DAGs, we consider only compression of ZDDs.

A ZDD is a directed acyclic graph in which nodes have labels (terminal nodes have and ) and edges have labels or . We can regard it as a graph with only edges being labeled. For each edge of ZDD , we define its label as a pair (edge label /, ) if is a branching node, or a pair (edge label /, /) if is a terminal node. In practice, we can use instead of and instead of for the second element, where . Below we assume ZDDs have labels for only edges, and -edge comes before -edge for each node.

Next we consider top trees for edge-labeled trees. The difference from node-labeled trees is only how to store the information for single edge clusters. In top trees, we stored labels for both endpoints of edges. We change this for storing only edge labels.

The top ZDD is constructed from a ZDD as follows.

  1. We perform a depth-first traversal from the root of and obtain a spanning tree of all branching nodes. During the process, we do not distinguish -edges and -edges, and terminal nodes are not included in the tree. Nodes of the tree are identified with their preorders in . If we say node , it means the node in with preorder . We call edges of not included in as complement edges.

  2. We convert the spanning tree to a top tree by the greedy algorithm.

  3. For each complement edge , we store its information in a node of as follows. If is a terminal, let be the vertex of the top tree corresponding to the cluster of single edge between and its parent in . Note that is uniquely determined. Then we store a triple (, edge label /, /) in . If is a branching node, we store the information of the complement edge to a vertex of corresponding to a cluster containing both and . The information to store is a triple (, edge label /, ). We decide a vertex to store it as follows. Let be the vertices of the top tree corresponding to the clusters of single edges towards in , respectively. Then we store the triple in the lowest common ancestor in . Here the information represents local preorders inside the cluster corresponding to . Note that may not be the minimal cluster including both and .

  4. We create a top DAG by DAG compression by sharing identical clusters. To determine identicalness of two clusters, we compare them together with the information of complement edges in the clusters stored in step 3. Complement edges which do not appear in multiple clusters are moved to the root of .

Figure 2 shows an example of a top ZDD. The left is the original ZDD and the right is the corresponding top ZDD. Red and green edges show edges in the spanning tree and complement edges, respectively. In this figure we show for each vertex of the top DAG, the corresponding cluster, but they are not stored explicitly.

Figure 2: An example of a top ZDD. Terminal nodes and branching nodes are depicted by squares and circles, respectively, and -edges and -edges are depicted by dotted and solid lines, respectivelty. Red edges are spanning tree edges and green edges are complement edges. For each vertex of the top DAG, the corresponding cluster and the information stored in the vertex are shown.

To achieve small space, it is important to use what data structure for representing each information. For example, we explained that each vertex of the top DAG stores the cluster size etc., this is redundant and the space can be reduced. Next we explain our space-efficient data structure which is enough to support efficient queries in detail.

4.1 Details of the data structure

We need the following information to recover the original ZDD from a top ZDD.

  • Information about the structure of top DAG .

  • Information about each vertex of . There are three types of vertices: vertices corresponding to a leaf of the top tree, vertices representing vertical merge, and vertices representing horizontal merge. For each type we store different information.

  • Information about complement edges. They are stored in the root or other vertices of .

We show space-efficient data structures for storing these information. In theory, we use the succinct bitvector with constant time rank/select support [Raman07]. In practice, we use the SparseArray [OkanoharaS07] to compress a bitvector if the ratio of ones in the bitvector is less than , and use the SparseArray for the bitvector whose 0/1 are flipped if the ratio of zeros is less than . To store an array of non-negative integers, we use bits for each entry where is the maximum value in the array. Let denote the number of internal nodes of a ZDD. We use to represent terminals , respectively.

4.1.1 The data structure for the structure of top DAG

We store top DAG after converting it to a tree. We make tree by adding dummy vertices to . For each vertex of whose in-degree is two or more, we do the following.

  1. Let be the vertices of from which there are edges towards . Note that there may exist identical vertices among them corresponding to different edges. We create dummy vertices .

  2. For each , remove edge and add edge .

  3. For each dummy vertex , we store information about a pointer to . In our implementation, we store the preorder of in from which the dummy vertices are removed.

Then we can represent the structure of the top DAG by the tree and the pointers from the dummy vertices.

Next we explain how to store and the information about the dummy vertices. The structure of is represented by the BP sequence [Navarro14]. There are two types of leaves in : those which exist in the original top DAG, and those for the dummy vertices. To distinguish them, we use a bitvector. Let be the number of leaves in . We create a bitvector of length whose -th bit corresponds to the -th leaf of in preorder. We set if the -th leaf is a dummy vertex, and we set otherwise.

We add additional information to dummy vertices to support efficient queries. We define an array of length where is the number of dummy vertices. For the -th dummy vertex in preorder, let be the vertex pointed to by the dummy vertex. We define . That is, stores the cumulative sum of cluster sizes up to . This array is used to compute the cluster size for each vertex efficiently.

4.1.2 Information on vertices

We explain how to store information on vertices of except for dummy vertices.

Each vertex corresponding to a leaf in the original top tree is a cluster for a single edge in the spanning tree, and it is a non-dummy leaf in . We sort these vertices in preorder in , and store information on edges towards them in the following two arrays. One is array to store differences of levels between endpoints of edges. Let and be the starting and the ending points of the edge corresponding to the -th leaf, respectively. Then we set . The other is array to store if an edge is -edge or -edge. We set if the edge corresponding to the -th vertex is a -edge, and otherwise.

Each vertex of corresponding to vertical merge or horizontal merge is an internal vertex. We sort internal vertices of in their preorder. Then we make a bitvector so that if the -th vertex stands for vertical merge, and if it stands for horizontal merge. For vertices corresponding to horizontal merge, we do not store additional information. For vertices corresponding to vertical merge, we use arrays and to store the differences of preorders and levels between the top and the bottom boundary nodes of the merged cluster. Let be the -th vertex in preorder corresponding to vertical merge, be the cluster corresponding to , be the top boundary node of , and be the bottom boundary node of . Note that and are nodes of the ZDD. Then we set and .

4.1.3 Information on complement edges

Complement edges are divided into two: those stored in the root of the top DAG and those stored in other vertices. We represent them in a different way.

First we explain the data structure for storing complement edges in the root of the top DAG. Let be the set of all complement edges stored in the root. We sort edges of in the preorder of their starting point. Orders between edges with the same starting point are arbitrary.

For complement edges stored in the root, we store the preorders of their starting point using a bitvector , the preorders of their ending point using an array , and edge labels / using an array . The cluster corresponding to the root of top DAG is the spanning tree of the ZDD. For each node of the spanning tree, we represent the number of complement edges in whose starting point is , using a unary code. We concatenate and store them in preorder in the bitvector . For edges in sorted in preorder of the starting points, we store the preorder of the ending point of the -th edge in , and set if the -th edge is a -edge, and set otherwise.

Next we explain the data structure for storing complement edges in vertices other than the root. Let be the set of those edges. We sort the edges as follows.

  1. We divide the edges of into groups based on the clusters having the edges. These groups are sorted in preorder of vertices for the clusters.

  2. Inside each cluster , we sort the edges of in preorder of starting points of the edges. For edges with the same starting point, their order is arbitrary.

We store the sorted edges of using a bitvector and three arrays , , and . The bitvector stores the numbers of complement edges in vertices of by unary codes. The arrays , , and are defined as: , the local preorder of the ending point of the -th edge inside the cluster, if the -th edge is a -edge, and otherwise.

Table 9 summarizes the components of the top ZDD.

4.2 The size of top ZDDs

The size of top ZDDs heavily depends on not only the number of vertices in the spanning tree after top DAG compression, but also the number of complement edges for which we store some information. Therefore the size of top ZDDs becomes small if the number of nodes is reduced by top DAG compression and many common complement edges are shared.

In the best case, top ZDDs are exponentially smaller than ZDDs. There exists a ZDD with nodes to which the corresponding top ZDD has vertices.

Proof.

A ZDD storing a power set with elements satisfies the claim. Figure 3 shows this ZDD and top ZDD. A spanning tree of the ZDD is a path of many -edges. Its top tree has a leaf corresponding to a -edge of length , and internal vertices form a complete binary tree with height . If we apply DAG compression to this top tree, we obtain the DAG of length as shown in Figure 3. Sharing complement edges also works very well. The -th vertex from below representing a vertical merge stores a -edge connecting a node with local preorder inside a cluster and a node with local preorder . The root of the top DAG stores -edge and -edge to the terminal . Because the height of the top DAG is , the claim holds. ∎

Figure 3: A top ZDD with vertices, where .

4.3 Operations on top ZDDs

We give algorithms for supporting operations on the original ZDD using the top ZDD. We consider the following three basic operations. We identify a node of the ZDD with the vertex in the spanning tree used to create the top ZDD whose preorder is .

  • : returns the label of a branching node .

  • : returns the preorder of the node pointed to by the -edge of , or returns or if the node is a terminal.

  • : returns the preorder of the node pointed to by the -edge of , or returns or if the node is a terminal.

We show is done in time and other operations are done in time where is the number of nodes of the ZDD. Below we denote the vertex of stored in the top ZDD with preorder by “vertex of ”.

First we explain how to compute in time. We can compute recursively using a similar algorithm to those on the top DAG. A difference is that we assumed that each vertex of the top DAG stores the cluster size, while in the top ZDD it is not stored to reduce the space requirement. Therefore we have to compute it using the information in Table 9.

To work the recursive computation, we need to compute the cluster size represented by vertex of efficiently. We can compute by the number of non-dummy leaves in the subtree of rooted at , and the sizes of the clusters corresponding to dummy leaves in the subtree rooted at . If we merge two clusters of size and , the resulting cluster has size . Therefore if we merge clusters whose total size is , the resulting cluster after merges has size . These values can be computed from the BP sequence of , the array , and the bitvector . By using , we can compute the interval of leaf ranks in the subtree rooted at . Then, using , we can find the number of non-dummy leaves and the interval of non-dummy leaf ranks, in the subtree of . Because is the array for storing cumulative sums of cluster sizes for dummy leaves, the summation of sizes of clusters corresponding to -th to -th dummy leaves is obtained from . Because the size of a cluster for a non-dummy leaf is always , the summation of cluster sizes for non-dymmy leaves is also obtained. Algorithm 2 gives a pseudo code for computing . This can be done in constant time.

Using the function , we can compute a recursive function similar to Algorithm 1. Instead of in Algorithm 1, we use . When we arrive at a dummy leaf, we use a value in to move to the corresponding internal vertex of and restart the recursive computation. Then for the vertex of the original ZDD whose preorder in is , we can obtain the leaf of corresponding to the cluster of a single edge containing .

To compute , we traverse the path from the root of to the leaf corresponding to the cluster containing . First we set . During the traversal, if the current vertex is for vertical merge and the next vertex is its right child, that is, the next cluster is in the bottom, we add the value of the top cluster to . The index of is computed from and . When we reach the leaf of , if is its top boundary node, it holds , otherwise, let , then we obtain . Because each operation is done in constant time and the height of the top DAG is , is computed in time.

Next we show how to compute . We can compute in a similar way. We do a recursive computation as operations on top DAG, A difference is how to process complement edges. There are two cases: if the -edge from is in the spanning tree or not. If the -edge from is in the spanning tree, the edge is stored in a cluster with a single edge . The top boundary node of such a cluster is . Therefore we search clusters whose top boundary node is . If the -edge from is not in the spanning tree, it is a complement edge and it is stored in some vertex on the path from a cluster with a single edge whose bottom boundary node is to the root. Therefore we search for .

First we recursively find a non-dummy leaf of whose top boundary node is . During this process, if there is a vertex whose top boundary is and its cluster contains more than one edge and corresponds to horizontal merge, we move to the left child, because the -edge from must exist in the left cluster. If we find a non-dummy leaf of which corresponds to a cluster with a single edge and its top boundary node is , its bottom boundary node is . We climb up the tree until the root to compute the global preorder of . If there does not exist such a leaf, the -edge from is not in the spanning tree. We find a cluster with a single edge whose bottom boundary node is . From the definition of the top ZDD, the -edge from is stored in some vertices visited during the traversal. Because complement edges stored in a cluster are sorted in local preorders inside the cluster of starting points, we can check if there exists a -edge whose starting point is in time. If it exists, we obtain the local preorder of inside the cluster. By going back to the root, we obtain the global preorder of . Note that complement edges for all clusters are stored in one array, and therefore we need to obtain the interval of indices of the array corresponding to a cluster. This can be done using . In the worst case, we perform a binary search in each cluster on the search path. Therefore the time complexity of is .

5 Experimental Comparison

We compare our top ZDD with existing data structures. We implemented top ZDD with C++ and measured the required space for storing the data structure. For comparison, we used the following three data structures.

  • top ZDD (proposed): we measured the space for storing the data structures in Table 9.

  • DenseZDD [denseZDD]: data structures for representing ZDDs using succinct data structures. Two data structures are proposed; one support constant time queries and the other has time complexity. We used the latter that uses less space.

  • a standard ZDD: a data structure which naively represents ZDDs. We store for each node its label and two pointers corresponding to a -edge and a -edge. The space is bits where is the number of nodes of a ZDD and is the size of the universe of a set family.

We constructed ZDDs of the following set families.

  • The power set of a set with elements.

  • For the set with elements, the family of all the set satisfying
    .

  • For the set with elements, the family of all the sets with cardinality at most .

  • Knapsack set families with random weights. That is, for -th element in a set (), we define its weight as a uniformly random integer in , then sort the elements in decreasing order of weights, and construct a set family consisting of all sets with weight at most .

  • The family of edge sets which are matching of a given graph. As for graphs, we used the grid graph, the complete graph with vertices , and a real communication network “Interoute”.

  • Set families of frequent item sets.

  • Families of edge sets which are paths from the bottom left vertex to the top right vertex in grid graph, for .

  • Families of solutions of the -queen problem, for .

We used several values for the parameters . The results are shown in Tables 1 to 8. The unit of size is bytes.

top ZDD DenseZDD
2,297 4,185 3,750
2,507 178,764 300,000
Table 1: The power set of .
top ZDD DenseZDD
2,471 227,798 321,594
2,551 321,594 1,440,375
Table 2: For the set with elements, the family of all the set satisfying .
top ZDD DenseZDD
3,863 9,544 9,882
13,654 146,550 206,025
43,191 966,519 1,440,375
Table 3: For the set with elements, the family of all the sets with cardinality at most .
top ZDD DenseZDD
1,659,722 1,730,401 2,444,405
1,032,636 1,516,840 2,181,688
2,080,965 2,929,191 4,491,025
1,135,653 1,740,841 2,884,279
1,383,119 2,618,970 3,990,350
565,740 656,728 1,056,907
Table 4: Knapsack set families with random weights. is the number of elements, is the maximum weight of an element, is the capacity of the knapsack.
top ZDD DenseZDD
grid 12,246 16,150 18,014
complete graph 23,078 16,304 25,340
Interoute 30,844 39,831 50,144
Table 5: The family of edge sets which are matching of a given graph.
top ZDD DenseZDD
mushroom 104,774 91,757 123,576
retail 59,894 65,219 62,766
T40I10D100K 177,517 188,400 248,656
Table 6: Set families of frequent item sets.
top ZDD DenseZDD
17,194 28,593 37,441
49,770 107,529 143,037
157,103 401,251 569,908
503,265 1,465,984 2,141,955
Table 7: Families of paths in grid graph.
top ZDD DenseZDD
40,792 35,101 45,950
183,443 167,259 229,165
866,749 799,524 1,126,295
Table 8: Families of solutions of the -queen problem.

We found that for all data sets, the top ZDD uses less space than the naive representation of the standard ZDD. We also confirmed that the data sets in Tables 1, 2, and 3 can be compressed very well by top ZDDs. Table 4 shows the results on the sets of solutions of knapsack problems. For any case, the top ZDD uses less space than the DenseZDD, and for some cases the memory usage of the top ZDD is almost the half of that of the DenseZDD. Tables 5 and 6 show the results for families of matching in a graph and frequent item sets, respectively. There are a few case that the DenseZDD uses less space than the top ZDD.

The results above are for monotone set families, that is, any subset of a set a the family also exists in the family. Tables 7 and 8 show results on non-monotone set families. For the set of edges on the path from the bottom left corner to the top right corner of an grid graph, the top ZDD uses less space than the DenseZDD, and for , the top ZDD uses about the memory of DenseZDD. On the other hand, for the sets of all the solutions of the -queen problem, the top ZDD uses about 10 % more space than the DenseZDD. From these experiments we confirmed that the top ZDD uses less space than the DenseZDD for many set families.

Next we show construction time and edge traverse time of the top ZDD and the DenseZDD in Tables 10 to 17. For edge traverse time, we traversed from the root of a ZDD towards terminals by randomly choosing - or -edge 65,536 times, and took the average. When we arrived at a terminal, we restarted from the root.

The results show the DenseZDD is faster than the top ZDD for construction and traverse, except for the construction time for the data set retail. The traverse algorithm of the top ZDD is recursive and in the worst case it recurses times, whereas that for the DenseZDD is not recursive.

6 Concluding Remarks

We have proposed top ZDD to compress a ZDD by regarding it as a DAG. We compress a spanning tree of a ZDD by the top DAG compression, and compress other edges by sharing them as much as possible. We showed that the size of a top ZDD can be logarithmic of that of the standard ZDD. We also showed that navigational operations on a top ZDD are done in time polylogarithmic to the size of the original ZDD. Experimental results show that the top ZDD always uses less space than the standard ZDD, and uses less space than the DenseZDD for most of the data.

Future work will be as follows. First, in the current construction algorithm, we create a spanning tree of ZDD by a depth-first search, but this may not produce the smallest top ZDD. For example, if we choose all -edges, we obtain a spanning tree whose root is the terminal , and this might be better. Next, in this paper we considered only traversal operations and did not give advanced operations such as choosing the best solution among all feasible solutions based on an objective function. Lastly, we considered only compressing ZDDs, but our compression algorithm can be used for compressing any DAG. We will find applications of our compression scheme.

References

Appendix A Pseudo Codes

1:Preorder
2:The label of node
3: the root of top DAG
4:return sub
5:procedure sub()
6:     if vertex corresponds to a cluster with a single edge  then
7:         if  then
8:              return (the label of the starting point of )
9:         else
10:              return (the label of the ending point of )          
11:     else
12:         
13:         
14:         
15:         
16:         if vertex is horizontal merge then
17:              if  then
18:                  return sub
19:              else
20:                  return sub               
21:         else
22:              
23:              if  then
24:                  return sub
25:              else if  then
26:                  return sub
27:              else
28:                  return sub                             
Algorithm 1 : computes the label of a node whose preorder in the tree representing the top DAG is .
1:Preorder
2:The size of the cluster for
3:
4:
5:
6:
7:
8:
9:if  then
10:     return
11:else
12:     return
Algorithm 2 : the size of the cluster corresponding to vertex of

Appendix B Components of Top ZDD

BP sequence representing the structure of
bitvector showing -th leaf is a dummy vertex or not
array storing cumulative sum of cluster sizes of the first to the -th dummy leaves
array storing differences of labels of ending points of -th non-dummy leaf
array showing the edge corresponding to the -th non-dummy leaf is -edge or not
bitvector showing -th internal vertex is a vertical merge or not
array storing differences of preorders between the top and the bottom boundary nodes of the vertex corresponding to -th vertical merge
array storing differences of labels between the top and the bottom boundary nodes of the vertex corresponding to -th vertical merge
bitvector storing in unary codes the number of complement edges from each vertex
array storing preorders of ending points of the -th complement edge stored in root
array showing the -th complement edge stored in the root is a -edge or not
bitvector storing in unary codes the number of complement edges from each vertex stored in the root
array storing local preorders of starting points of -th complement edge stored in non-root
array storing local preorders of ending points of -th complement edge stored in non-root
array showing the -th complement edge stored in non-root is -edge or not
Table 9: Components of the top ZDD

Appendix C Construction and Traverse Time

Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
0.006 0.004 13.546 0.458
0.217 0.116 11.768 0.198
Table 10: The power set of .
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
0.264 0.078 9.082 0.244
0.776 0.412 10.419 0.229
Table 11: For the set with elements, the family of all the set satisfying .
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
0.011 0.006 10.892 0.320
0.269 0.123 16.019 0.534
2.013 0.878 20.101 0.412
Table 12: For the set with elements, the family of all the sets with cardinality at most .
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
2.974 1.210 16.716 0.259
2.033 1.019 23.215 0.290
7.010 1.481 21.698 0.534
2.084 0.954 7.365 0.519
2.597 1.712 14.127 0.244
7.010 1.481 21.698 0.534
Table 13: Knapsack set families with random weights. is the number of elements, is the maximum weight of an element, is the capacity of the knapsack.
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
grid 0.030 0.020 11.678 1.053
complete graph 0.019 0.009 14.864 0.290
Interoute 0.028 0.016 15.588 0.397
Table 14: The family of edge sets which are matching of a given graph.
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
mushroom 0.093 0.037 14.100 0.198
retail 0.099 0.134 12.857 0.702
T40I10D100K 0.198 0.117 13.788 0.183
Table 15: Set families of frequent item sets.
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
0.022 0.011 15.491 0.793
0.082 0.036 12.039 1.022
0.536 0.153 12.229 1.144
1.821 0.944 14.233 1.404
Table 16: Families of paths in grid graph.
Construction Time(s) Traverse Time ( s)
  top ZDD DenseZDD   top ZDD DenseZDD
0.038 0.015 17.184 0.778
0.335 0.065 21.581 0.900
1.722 0.419 20.173 1.099
Table 17: Families of solutions of the -queen problem.