1 Introduction
Boolean functions are functions that take Boolean variables as arguments and return Boolean values. They were widely accepted as the modeling formalism for design, verification and synthesis of digital computers[Crama and Hammer2011]. More realworld problems, including cryptography, social choice theory, etc., can be formulated using Boolean functions.
Ordered binary decision diagrams (OBDDs) [Bryant1992] are a standard data structure for representing and manipulating Boolean formulas. They are compact to store and efficient to operate. More importantly, they provide a canonical representation of Boolean functions. Given any two logically equivalent Boolean functions, their OBDDs are isomorphic.
A central problem in the application of OBDDs is to find a proper decision order of Boolean variables. With respect to different decision order, the OBDDs’ sizes may vary from linear to exponential in the number of the variables [Bryant1986]. Obviously, we prefer the decision order that minimizes the OBDD’s size.
However, the problem of finding the optimal order of an OBDD is NPcomplete [Bollig and Wegener1996]. Many heuristics have been proposed to find a nearoptimal solution of this problem. However, all existing techniques do not achieve a good balance between efficiency and effectiveness. Methods that can significantly reduce the OBDD’s size always take a long time. Methods that take advantages in speed always can’t achieve significant results. Although the problem is NPcomplete, the Boolean formulas generated from real world (e.g. circuits, programs, etc.) do have some patterns. If we can utilize these patterns, it is possible to develop a technique that is both efficient and effective.
NNs have been applied to many areas, including computer vision, natural language processing, recommendation systems, etc. Surprisingly, there is no work for applying deep learning to the OBDD reordering problem. One possible reason is most NN frameworks are not suitable for learning on boolean formulas. For example, Recurrent Neural Networks (RNNs)
[Medsker and Jain1999] family can handle sequences learning problem. To apply RNNs, we must serialize Boolean formulas (usually in 3conjunctive normal form (3CNF)) into sequences. However, Boolean formula have rich invariances that such a sequential model would ignore, such as the permutation invariance of clauses [Selsam et al.2018]. For example, and are syntactically different but semantically equivalent. The sequential model will ignore the permutation invariance, take them as different input.CNFs can be viewed as hypergraphs [Kolany1993]. If we can directly apply deep learning on hypergraph, the semantics of Boolean formula will not be wrecked. The message passing neural network (MPNN) framework [Gilmer et al.2017] is a powerful deep learning technique for graphs. However, it cannot be utilized directly on hypergraph since the message function is defined on ordinary edges. In this paper, we lift the MPNN framework to 3hypergraphs (), define a message function on hyperedges.We implement the framework in gated graph neural network(GGNN). In ordinary graphs, GGNN models edges by square matrices, then uses matrix multiplication to generate message. We use nonsquare matrices to model hyperedges so that message can be generated and passed on hyperedges.
Compared with the existing techniques, can give a nearoptimal solution in an extremely short time, even for some hard examples that are unsolvable with the existing techniques. The main technical contributions of this paper are summarized as follows:

We view OBDD reordering as deep learning on 3hypergraph. To the best of our knowledge, this is the first neural networkbased approach for OBDD reordering.

Following the main idea of message passing, we lift the MPNN to 3hypergraph.

Experimental results show that our approach can find a nearoptimal order in an extremely short time.
1.1 Related Work
Many OBDD reordering algorithms have been proposed in the literatures. [Fujita, Matsunaga, and Kakuda1991] and [Ishiura, Sawada, and Yajima1991] propose the window permutation algorithm by exchanging a variable with its neighbor in the ordering. [Rudell1993] proposes the sifting algorithm which finds the optimum position of a variable by repeatedly move it forward or backward. [Günther and Drechsler1998] applies linear transformations to minimize OBDD’s size. [Bollig, Löbbing, and Wegener1995] applies simulated annealing to find a nearoptimal order. Instead of using swap or exchange operation, this method defines a jump operation. [Drechsler, Becker, and Göckel1996] uses the genetic algorithm
to optimize the OBDD’s size. Among all these heuristics, the genetic algorithm and the simulated annealing algorithm attain the best results but are also the most time consuming.
[Grumberg, Livne, and Markovitch2003]uses decision tree to learn variable pair permutation which is more likely to lead to a good order. In contrast, our NN approach can directly produce the total order of all variable, not pairwise.
There exist some works on applying neural networks to the OBDDrelated topics. [Beg, Prasad, and Beg2008] applies the feedforward and recurrent neural network to predict the OBDD’s size for a given Boolean function. [Bartlett and Andrews2002] studies the problem of converting fault trees to OBDDs. They propose a neural network approach for selecting one among several existing heuristics to construct the OBDD. Their approach is essentially a heuristic selection mechanism, and heavily depends on the available heuristics. In contrast, our approach can directly produce an OBDD variable order. [Selsam et al.2018] use message passing neural networks to learn to solve SAT problems. They convert CNFs into graphs, view both literal and clause as node. In our method clause is treated as hyperedges.
2 Preliminaries
2.1 Boolean Functions
Let be the Boolean domain. Let be a set of Boolean variables over . A truth assignment decides a truth value (either 0 or 1) for each variable in . A Boolean function over is a function that takes as the arguments and returns either 0 or 1. The size is called the arity of . The formula is also called a Boolean formula. A truth assignment satisfies iff taking this truth assignment as arguments, returns 1.
Let be a Boolean variable, a literal of is either its positive form (i.e. ) or its negative form (i.e. ). A clause is a disjunction of several literals. A conjunctive normal form (CNF) is a conjunction of several clauses. For example, is a CNF. For simplicity, we often ignore the operators in a formula.
A 3CNF is a CNF where all clauses have three or less literals. Any Boolean formula can be transformed into an equisatisfiable 3CNF formula [Tseitin1983]. In the remainder of this paper, we assume all Boolean formulas are in 3CNF.
2.2 Binary Decision Diagrams
A binary decision diagram (BDD) is a rooted, directed acyclic graph with a node set and an edge set . Two types of nodes are contained in , i.e., the terminal nodes and the nonterminal nodes. A terminal node has no outgoing edge, and is labelled with either 0 or 1. A nonterminal node is labelled with a variable (called the decision variable at this node), and has two successors, and , where and indicate the decided values of being and , respectively. Figure 1 shows two BDDs, where the edges to and are marked as dotted and solid lines, respectively.
Let be a BDD of . Let be a truth assignment to . We can easily decide if satisfies by traversing from its root to one of its terminal nodes. Let be the current node. If the variable labelled by is assigned 0 in , the next node on the path is ; otherwise, if the variable is assigned 1, the next node is . The value that labels the final reached terminal node gives the value of the function. Taking the left BDD in Figure 1 as an example, with a truth assignment of and , the value of can be quickly decided to be 0 by traversing the graph.
Let be a total order on . An ordered BDD (OBDD) with respect to is a BDD such that the decision order of variables on all paths of this OBDD follow . A reduce algorithm [Bryant1986] can be repeatedly applied, to eliminate the possible redundancies in an OBDD. The resulting structure is called a reduced OBDD. Note that both BDDs in Figure 1 are reduced OBDDs.
Reduced OBDD is a canonical representation for Boolean functions. Given any two logically equivalent Boolean functions, their reduced OBDDs with respect to a variable order are isomorphic [Bryant1986]. In this paper, we assume all OBDDs are reduced OBDDs.
2.3 Variable Reordering Problem
Given an OBDD , we denote the size of by , i.e., the number of nodes in . The size of an OBDD is highly sensitive to its variable order. Consider a Boolean function . If we choose a variable order of , the OBDD’s size is (the left OBDD in Figure 1). In contrast, if we choose another variable order of , the corresponding OBDD’s size is 16 (the right OBDD in Figure 1).
In general, for a Boolean function of the form , with the variable order , its OBDD’s size is ; while with the variable order (assume ), its OBDD’s size becomes . In other words, with repsect to different variable orders, the OBDD’s size of a Boolean function may vary from linear to exponential in the number of variables [Bryant1986].
The OBDD reordering problem is to find an optimal variable order for a given Boolean function, such that its OBDD’s size is minimal. This problem has been proved NPcomplete [Bollig and Wegener1996]. Many heuristics [Fujita, Matsunaga, and Kakuda1991, Ishiura, Sawada, and Yajima1991, Rudell1993, Drechsler, Becker, and Göckel1996, Bollig, Löbbing, and Wegener1995] have been proposed to find a nearoptimal solution for this problem. Among all these heuristics, the genetic algorithm and simulated annealing algorithm often attain the best results [Drechsler, Becker, and Göckel1996, Bollig, Löbbing, and Wegener1995]. However, both these two algorithms are quite time consuming. We seek for a reordering algorithm that can not only find a nearoptimal solution, but also be time efficient.
2.4 Graph and Hypergraph
A graph is defined by a set of vertices (also called nodes) and a set of edge which defines the relation between nodes. A hypergraph is a generalization of a graph in which an edge can connect more than two vertices, and thus , where means power set. The uniform hypergraph is a hypergraph such that all its hyperedges have exactly nodes. We use hypergraph to represent the set of all uniform hypergraph. In this paper, we consider only 3hyperedges i.e. .
Let be a node in a graph, the neighbors of is the set of nodes that points to (or, passes messages to) , formally:
In the next subsection, we will discuss how passes message to . But now, we need to lift the definition of to hypergraph. We lift the idea of neighbors into left/right neighbors, which means each node can get message from both side in a hyperedge. Formally:
The task of machine learning on graph domain can be either
graphlevel or nodelevel . In graphlevel, a graphis mapped to a vector of reals
. In nodelevel, depends on a node of , i.e. . For example, compute the size of graph is a graphlevel task, compute the degree of vertex is a nodelevel task.2.5 The Message Passing Framework
The Message Passing Neural Network(MPNN) [Gilmer et al.2017]
is a general framework for supervised learning on graphs. It is originally a graphlevel prediction framework for chemical compound, we slightly modify it into nodelevel prediction. The main idea of message passing is to embed each node into vector space, then iteratively refine the embeding. In an iteration,
each node receives messages from its neighbors and updates its embedding accordingly. In this paper, we also call embedding of node as state.Let be the embeding of node at time , be the embeding of edge , and be a handcrafted feature of . The
is initialized by the zeropadding of
. Formally, message passing is defined by message function and vertex update function .where is the message sent to , The can be or for different message aggregation strategies. All the incoming messages of will be aggregated by average if . For , the messages is aggregated by suming up. After the message passing, we read out each prediction of node , from its final refined embeding and handcrafted feature
We collect the prediction of all node as the output of neural network.
Notice that the are all undefined by now. While the MPNN is a framework. Each design of defines a concrete Neural Network. For example, Gated Graph Neural Networks (GGNN) [Li et al.2016]
, Deep Tensor Neural Networks (DTNN)
[Schütt et al.2017] are all instance of MPNN, which define two different . In fact, The MPNN originally came from the abstraction of at least eight notable NNs that operate on graphs. Our work of lifting message passing is on the message function . For implementation, we will use GGNN as the instance of MNPP in this paper.2.6 Gated Graph Neural Network
GGNN assumes the labels on edges are finite and discrete, and calls the labels types. Let’s take molecules as examples. We saw atoms as vertices, the type of edges can be the chemical bond (single bond, bond, etc). However, the distance of atoms cannot be the edge type, since distance is not discrete.
In GGNN, the embeding of node is in vector space , the embeding of edge is in matrix space . We use matrix to model the type of edges. The parameter of is a learned from training of neural network. Let be the type of edge , the embeding of edge of determined by its type
The message function is designed as matrix multiplication . The update function is
, where GRU is the Gated Recurrent Unit introduced in
[Cho et al.2014]. The same update function is used at each time step t. Finally , where is a fully connected neural network, and means concatenation of vectorsSeveral GGNN can by composed successively as several layers [Li et al.2016] in a way that the output (i.e. final state) of the current message passing process is used as the initial state of the next message passing process. In each layer (i.e. message passing process), the message passing is repeated for several times, with the same parameters of NN. But different layers have different parameters. We denote as the state of node on the timestep in the layer, and as the number of timesteps in the layer. The layered GGNN can be formalized as .
The idea of residual connection (i.e. skipping over layers)
[He et al.2016] can also be introduced into the connection of GGNN layers. The incoming message of each node can be concatenated to the final state of several previous layer before that is fed into . For example, the message of each node in the GGNN layer, can be concatenated to the final state of and layers.The residual connection is used to reduce the problem of vanishing gradients in backpropagation.
3 OBDD Reordering as DL on Hypergraph
Neural network (NN) has been proven a powerful machine learning technique for nonlinear datafitting problem [Hornik, Stinchcombe, and White1989]. In this section, we show how an OBDD problem can be reduced to a deep learning problem on 3uniform hypergraph. We utilize NN to learn the patterns of “good” OBDD variable orders from realworld example. After the training phase, NN can predict a good variable order for a given 3CNF formula in a short time.
3.1 Inputs
The input of neural network is a 3Hypergraph. The labels on hyperedges are finite and discrete, we call it types just like what we did in normal graph. Each 3CNF is converted to a 3Hypergraph. Let be the variable set of a given CNF. The vertex set of the converted hypergraph is , where the is a special node that represents . Each clause in 3CNF is converted to a hyperedge directly from it’s variables. The type of each hyperedge is decided by the type of each literal(i.e and ). Especially, the type of literal is . For example: the clause is converted to the hyperedge with the type , and the clause is converted to the hyperedge with the type . For simplicity, we use to represent the hyperedge .
To start the message passing, each node needs a handcrafted feature to initialize . We sort those variables primarily by the frequencies of occurrence, secondarily by the frequency of positive literal if variables appear same times. Lastly we use lexicographic order of variable name if they are still same. We use the position of in the sorting order to construct an onehot vector as . If is the variable, the element of is 1, other elements are 0s For we use zero vector to initialize. Let us take as an example, we use
as handcrafted features, use zeropadding to initialize . This encoding method ensures almost independent of the name of Boolean variable .
It should be noted that, the hypergraph is only converted from CNF, which is independent from its graph of BDD. The graph of OBDD has no relation with NN in this paper.
3.2 Outputs
Outputs of the OBDD reordering problem are variable orders. We want the neural network to find a nearoptimal order in a short time.
A variable order can be specified as a permutation of variables. For example, the variable orders of the two OBDDs in Figure 1 are and , respectively. However, the variable permutation is not a proper format of the neural network’s output. Generally, a neural network requires its output to be a differentiable structure such that the gradient descent algorithm can work on [Rumelhart, Hinton, and Williams1986].
To this end, we let the output of the OBDD reordering problem to be a vector of real numbers, called the depth vector. Formally, given a variable , denote , For example, a depth vector of the right OBDD in Figure 1 is . The less the depth value is, the more front the corresponding variable in the order. With the above depth vector, we can quickly decide the variable order: .
3.3 Loss Function
After the definition of input and output, we also need a suitable loss function for out task. Since the final order computed from the final depth vector is only related to the order of the values, but not the detailed values in the vector. We use the angle
of the predicted vector to the expected vector to measure the error, i.e.,where is the prediction of NN, is the target vector, and means 2norm. Notice that, each element is the nearoptimal OBDD depth of corresponding variable (i.e. node in hypergraph). We have already convert the OBDD reordering task into nodelevel learning task on hypergraph. We don’t care the state of whole hypergraph but do focus on the prediction of each node (i.e. depth of variable).
4 Neural Network for 3Hypergraph
In this section, we discuss how to generate messages on hyperedges.Firstly we define a message function on hyperedges, then introduce nonsquare matrices to model each hyperedge. We also find a method to convert one hypergraph to two ordinary graphs, and . The incoming message of in equals the sum of messages in and . The can thus be implemented on the top of MPNN.
4.1 Lifting Message Passing to HyperEdge
As we already discussed in Section 2.5, the message functions are used to generate message on edges. Messages are used to update the states of vertices in graph, to learn the embeding of each type of edge from massive data. Following this idea, the first thing we need to do, is to design a form of message function which can generate message on 3hyperedges. To achieve this goal, message function should be defined on hyperedge:
where can be either or . What needs to be emphasized is that, the modification of message function is the only modification of the MPNN framework. The update function and the readout function all remain the same.
Our motivation is to keep the framework of message pass unchanged, but lift the message generation on hyperedges.
4.2 Hyperedge Message Functions
We have already extended the framework of MPNN into hypergraph, now we discuss how to implement the message functions in GGNN. The key idea of GGNN is to use square matrices to model ordinary edges. Each edge type is modeled by a matrix . Finally GGNN uses a matrix multiplication to implement the message function and generate messages. The can be seen as a mapping from node state to message: . We need to lift the mapping into since one node gets two neighbors in a hyperedge. So we lift the square matrix into nonsquare matrix , and use to implement message functions:
Now we have a method to generate and pass messages on hypergraphs. We also find a way to implement the on the top of existing MPNN.
4.3 Implementation on The Top of MPNN
Since the matrix can be partitioned into blocks. The hyperedge message function can also be rewritten as block matrix multiplication. Following this idea, We surprisingly found that the can be reduced to two ordinary message passings on ordinary graph. This makes it possible to implement on the top of the existing MPNN. The key is to decompose the hyper message passing. Firstly we divide the matrix into 2 blocks, i.e. , then:
We call a graph a derived graph of when
and denote as the reverse graph of where . We get that
where is the message of on , is the message of on . Finally we get
which means that the MPNN can be used to implement the message passing of hypergraph by decomposing a hypergraph into the derived graph and it’s reverse.
5 Implementation and Evaluation
We implement our algorithm on the top of Tensorflow
[Abadi et al.2016], and used ADAM [Kingma and Ba2015] for the learning rate control. All experiments were performed on GeForce GTX 1080 Ti GPU and an Intel Xeon E5 CPU.To evaluate the efficiency of our approach, up to 7 typical OBDD reordering algorithms are compared:

: our neural network approach;

GA: the genetic algorithm [Drechsler, Becker, and Göckel1996] for OBDD reordering.

LINEAR: a combination of sifting variable up and down and linear transformations of two adjacent variables[Günther and Drechsler1998].

RAND: randomly choose pairs of variables and swap them in the order [Somenzi2015].

GSIFT: the group sifting method [Panda and Somenzi1995].

WIN2: the iterating window algorithm [Felt et al.1993], with its window size been set to 2.

WIN3: also the [Felt et al.1993], but window size is 3.
In , we embed each node to 500dimensional vector space. We have GGNN layers, the each layer correspondingly propagate times. The layer has residual connections from layer, and the layer has residual connections from both and layer. We use average function to do message aggregation.
5.1 Data Set and Benchmark
We choose the LGSynth91 benchmark [Yang1991] as our data set. We collect the circuits in Berkeley Logic Interchange Format (blif) [Berkeley1992] format, convert them to AndInverter Graph(aig) [Biere2007] format, extract the transition relation boolean formula into equisatisfiable 3CNF. Note that the genetic algorithm [Drechsler, Becker, and Göckel1996] attains the best result among all OBDD reordering algorithms. For each sample, we run the genetic algorithm to compute the nearoptimal variable order, and use this order as the label. We set a timeout of 30 minutes, with the time of building the initial BDD and the reordering being all counted. There are 28 samples that can finish GA labeling in 30 minutes.
While it’s far not enough to train a Neural Network, so we randomly mutate the circuit in the level of aig: randomly negate a variable of an andgate 13 times. For each sample we make 200 distinct mutations and then run GA labeling again on them. Finally we get 5138 labeled samples, among which 80% are used for training and the rest are used for testing. The evaluation is performed only on the test set. The number of variable and clause varies from and . The phase(i.e. clause/variable) varies from . The average numbers of variable, clause, phase are 59.3, 139.3, 2.3 correspondingly. After the training, the best loss we can get on test set is .
We also take a step forward, make a more challenging stress test on our . We collect some samples of LGSynth91 (with less than 300 variables) that can not finish the GA labeling in time limit, and call them the hard benchmark. The samples in the hard benchmark are all challenging enough for OBDD. We are very curious about the performance of on the hard benchmark.
5.2 Results on Time
To evaluate the efficiency of those algorithm, we compare their computation time of giving a result of nearoptimal order. We only consider the time of perform algorithms, the time of building initial OBDD is not included. The result of average computation time (seconds) is in Table 1.
Algorithm  GA  LINEAR  GSIFT  RAND  WIN3  WIN2  

Time(sec)  43.50  12.29  0.01  12.92  9.65  0.58  0.24 
The WIN2 and WIN3 are quite efficient among the traditional methods. The GA takes longest time to give a best result. The RAND makes balance between compression ratio and time. However, is the fastest algorithm. We go further and fit a curve of time for each algorithm in Figure 3. The horizontal axis lists the sizes of the input OBDDs, the vertical axis shows the average computation time of different reordering algorithms. Note that the vertical axis is logarithmic.
We observe that GA slows down quickly with the increasing of OBDD’s size. In contrast, is not sensitive to the size of the input OBDD. Recall that the inputs of the are CNFs, instead of OBDDs. To conclude, our approach can get a nearoptimal variable order in a short time. But will such a fast speed of affect the quality of its solution?
5.3 Results on Compression Ratio
To evaluate the accuracy of reordering algorithms, we compare their compression ratios.
Given a Boolean function and a variable order, we denote the original OBDD by . All reordering algorithms are respectively applied to to produce a new variable order. The OBDD with respect to the new order is denoted by . In the experiments, we use CUDD [Somenzi2015] to evaluate the corresponding OBDD’s size. If , we adopt the new order. Let be a reordering algorithm, we use to measure the compression ratio.
The average compression ratios of each algorithm on test set are shown in Figure 4 The horizontal axis indicates 7 algorithms and the vertical axis shows their compression ratios. From Figure 4, observe that GA always gets the best compression ratio. This conforms to the existing results in literatures [Drechsler, Becker, and Göckel1996, Jindal and Bansal2017]. The achieves best result, and the results of the top4 algorithms are close.
To see more details of those sample, we fit a curve of compression ratio for each algorithm in Figure 5. The horizontal axis lists the sizes of the input OBDDs, and the vertical axis shows the average compression ratio of different reordering algorithms. Note that the horizontal axis is logarithmic. We find that smaller OBDDs are harder to be compressed. This is understandable, since the difference between linear and exponential OBDD size is smaller when the number of variable is smaller. The curves of top4 algorithms are close and GA is always better then other algorithms. The WIN2 and WIN3 is not such effective but quick. How performs in hard benchmark? we will talk it in next subsection.
5.4 Results on Stress Test
It is challenging for BDDmethod in large circuits. We set the timeout for 12 hours, and give 110GB memory for each samples. Firstly, there are 46.2% of hard benchmark cannot even build an initial OBDD, we call them veryhard benchmark for simplicity. 50% of veryhard benchmark are out of time for 12 hours (OOT), others are out of memory for 110GB (OOM). The traditional methods are performed on the initial OBDD, so they are failed on those task. However, recall that the prediction of doesn’t rely on the initial OBDD. We directly use the order of to build OBDD. 41.7% samples in veryhard benchmark can build the OBDD using the order of , others are all OOT, not OOM, which means it still has some possibility for them to build OBDDs if we give more time.
For the rest of hard benchmark, which traditional method can be performed, we compare with Win2, Win3, Rand. There are 2 samples OOT for Rand, we lists some results in Table 2
Name  Vars  Nodes  WIN2  WIN3  RAND  

cordic  106  9M  99%  0.01  24%  17  47%  50  96%  1701 
s298  133  2M  80%  0.01  42%  1  67%  4  93%  122 
s344  144  41M  97%  0.01  24%  71  34%  271  98%  11194 
s349  148  47M  90%  0.01  25%  73  36%  279  99%  6658 
mux  153  10M  85%  0.01  30%  9  47%  37  99%  537 
sct  159  3M  83%  0.01  17%  2  53%  6  96%  61 
lal  164  219M  99.7%  0.01  14%  444  47%  1807    OOT 
s382  185  12M  52%  0.01  44%  13  64%  31  91%  1232 
s386  185  0.5M  40%  0.01  12%  0.4  20%  1  75%  17 
s400  193  13M  62%  0.01  43%  12  63%  30  92%  2580 
s444  193  11M  89%  0.01  16%  17  18%  58  97%  1123 
s420  210  43M  93%  0.01  27%  69  43%  197  96%  5732 
s510  244  7M  99%  0.01  38%  5  41%  22  99%  438 
s526  248  594M  92%  0.01  10%  1433  52%  5251    OOT 
The first column is the name of samples, the second column is the number of variables. The third column is the size of initial OBDD, where the ‘M’ means million(). Others are result of each algorithm. The first column of each algorithm result is the compression ratio, the second column is time in seconds. The result shows that achieves a very good result in the stress test, totally beats WIN2 and beats WIN3 in most case. The compression ratios of is also competitive to RAND, with 2 case can not finish measure using RAND in 12 hours. The speed of is extremely fast.
6 Conclusions
In this paper, we apply to minimize OBDDs, lift the neural message passing on 3hypergraph to recieve 3CNF as input. We perform experiments to compare our approach to classical algorithms on variable reordering of OBDDs. Experimental results show that our approach can get competitive compression ratio in an extremely short time. There are many complex relationships in real world can be modeled by hypergraphs. In the future, we plan to apply to more fields.
References
 [Abadi et al.2016] Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. 2016. Tensorflow: A system for largescale machine learning. In OSDI, volume 16, 265–283.
 [Bartlett and Andrews2002] Bartlett, L. M., and Andrews, J. D. 2002. Choosing a heuristic for the "fault tree to binary decision diagram" conversion, using neural networks. IEEE Transactions on Reliability 51(3):344–349.
 [Beg, Prasad, and Beg2008] Beg, A.; Prasad, P. C.; and Beg, A. 2008. Applicability of feedforward and recurrent neural networks to boolean function complexity modeling. Expert Systems with Applications 34(4):2436 – 2443.
 [Berkeley1992] Berkeley, U. 1992. Berkeley logic interchange format (blif). Oct Tools Distribution 2:197–247.
 [Biere2007] Biere, A. 2007. The aiger andinverter graph (aig) format. Available at fmv. jku. at/aiger.
 [Bollig and Wegener1996] Bollig, B., and Wegener, I. 1996. Improving the variable ordering of OBDDs is NPcomplete. IEEE Transactions on Computers 45(9):993–1002.
 [Bollig, Löbbing, and Wegener1995] Bollig, B.; Löbbing, M.; and Wegener, I. 1995. Simulated annealing to improve variable orderings for OBDDs. In Int’l Workshop on Logic Synth. Citeseer.
 [Bryant1986] Bryant, R. E. 1986. Graphbased algorithms for boolean function manipulation. IEEE Transactions on Computers 100(8):677–691.
 [Bryant1992] Bryant, R. E. 1992. Symbolic boolean manipulation with ordered binarydecision diagrams. ACM Comput. Surv. 24(3):293–318.
 [Cho et al.2014] Cho, K.; Van Merriënboer, B.; Bahdanau, D.; and Bengio, Y. 2014. On the properties of neural machine translation: Encoderdecoder approaches. arXiv preprint arXiv:1409.1259.
 [Crama and Hammer2011] Crama, Y., and Hammer, P. L. 2011. Boolean functions: theory, algorithms, and applications. Cambridge University Press.
 [Drechsler, Becker, and Göckel1996] Drechsler, R.; Becker, B.; and Göckel, N. 1996. Genetic algorithm for variable ordering of OBDDs. IEE ProceedingsComputers and Digital Techniques 143(6):364–368.
 [Felt et al.1993] Felt, E.; York, G.; Brayton, R.; and SangiovanniVincentelli, A. 1993. Dynamic variable reordering for BDD minimization. In Proceedings of the 1993 Design Automation Conference, 130–135. IEEE.
 [Fujita, Matsunaga, and Kakuda1991] Fujita, M.; Matsunaga, Y.; and Kakuda, T. 1991. On variable ordering of binary decision diagrams for the application of multilevel logic synthesis. In Proceedings of the Conference on European Design Automation, 50–54. IEEE Computer Society Press.
 [Gilmer et al.2017] Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural message passing for quantum chemistry. 70:1263–1272.

[Grumberg, Livne, and
Markovitch2003]
Grumberg, O.; Livne, S.; and Markovitch, S.
2003.
Learning to order bdd variables in verification.
Journal of Artificial Intelligence Research
18:83–116.  [Günther and Drechsler1998] Günther, W., and Drechsler, R. 1998. Bdd minimization by linear transformations.

[He et al.2016]
He, K.; Zhang, X.; Ren, S.; and Sun, J.
2016.
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, 770–778.  [Hornik, Stinchcombe, and White1989] Hornik, K.; Stinchcombe, M.; and White, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.
 [Ishiura, Sawada, and Yajima1991] Ishiura, N.; Sawada, H.; and Yajima, S. 1991. Minimazation of binary decision diagrams based on exchanges of variables. In ICCAD, volume 91, 472–475.

[Jindal and Bansal2017]
Jindal, S., and Bansal, M.
2017.
A novel and efficient variable ordering and minimization algorithm based on evolutionary computation.
Indian Journal of Science and Technology 9(48).  [Kingma and Ba2015] Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.
 [Kolany1993] Kolany, A. 1993. Satisfiability on hypergraphs. Studia Logica 52(3):393–404.
 [Li et al.2016] Li, Y.; Tarlow, D.; Brockschmidt, M.; and Zemel, R. 2016. Gated graph sequence neural networks. International Conference on Learning Representations.
 [Medsker and Jain1999] Medsker, L., and Jain, L. C. 1999. Recurrent neural networks: design and applications. CRC press.
 [Panda and Somenzi1995] Panda, S., and Somenzi, F. 1995. Who are the variables in your neighborhood. In Proceedings of the 1995 IEEE/ACM International Conference on ComputerAided Design, 74–77. IEEE Computer Society.
 [Rudell1993] Rudell, R. 1993. Dynamic variable ordering for ordered binary decision diagrams. In Proceedings of the 1993 IEEE/ACM International Conference on ComputerAided Design, 42–47. IEEE Computer Society Press.
 [Rumelhart, Hinton, and Williams1986] Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by backpropagating errors. Nature 323(6088):533.
 [Schütt et al.2017] Schütt, K. T.; Arbabzadah, F.; Chmiela, S.; Müller, K. R.; and Tkatchenko, A. 2017. Quantumchemical insights from deep tensor neural networks. Nature communications 8:13890.
 [Selsam et al.2018] Selsam, D.; Lamm, M.; Bunz, B.; Liang, P.; de Moura, L.; and Dill, D. L. 2018. Learning a sat solver from singlebit supervision. arXiv preprint arXiv:1802.03685.
 [Somenzi2015] Somenzi, F. 2015. CUDD: CU decision diagram package release 3.0.0. University of Colorado at Boulder.
 [Tseitin1983] Tseitin, G. S. 1983. On the complexity ofderivation in propositional calculus. In Automation of Reasoning. Springer. 466–483.
 [Yang1991] Yang, S. 1991. Logic synthesis and optimization benchmarks user guide: version 3.0. Microelectronics Center of North Carolina (MCNC).