Graph Neural Reasoning for 2-Quantified Boolean Formula Solvers

04/27/2019 ∙ by Zhanfu Yang, et al. ∙ 0

In this paper, we investigate the feasibility of learning GNN (Graph Neural Network) based solvers and GNN-based heuristics for specified QBF (Quantified Boolean Formula) problems. We design and evaluate several GNN architectures for 2QBF formulae, and conjecture that GNN has limitations in learning 2QBF solvers. Then we show how to learn a heuristic CEGAR 2QBF solver. We further explore generalizing GNN-based heuristics to larger unseen instances, and uncover some interesting challenges. In summary, this paper provides a comprehensive surveying view of applying GNN-embeddings to specified QBF solvers, and aims to offer guidance in applying ML to more complicated symbolic reasoning problems.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A propositional formula expression consists of Boolean constants (: true, : false), Boolean variables (), and propositional connectives such as , , , and etc. The SAT (Boolean Satisfiability) problem, which asks if given a formula can be satisfied (as ) by assigning proper Boolean values to the variables, is the first proven NP-complete problem (Cook, 1971). As an extension of propositional formula, QBF (Quantified Boolean Formula) allows quantifiers ( and ) over the Boolean variables. In general, a quantified Boolean formula can be expressed as such:

where denote quantifiers that differ from its neighboring quantifiers, are disjoint sets of variables, and is propositional formulae with all Boolean variables bounded. The QBF problem is PSPACE-complete (Savitch, 1970). To this researchers previously proposed incremental determinzation (Rabe & Seshia, 2016; Rabe et al., 2018) or CEGAR-based (Janota et al., 2016) solvers to solve it. They are non-deterministic, e.g., employing heuristics guidance for search a solution. Recently, MaxSAT-based (Janota & Marques-Silva, 2011) and ML-based (Janota, 2018) heuristics have been proposed into CEGAR-based solvers. Without existing decision procedure, Selsam et al. (2018) presented a GNN architecture that embeds the propositional formulae. Amizadeh et al. (2019) adapt a RL-style explore-exploit mechanism in this problem, but considering circuit-SAT problems. However, these solvers didn’t tackle unsatisfiable formulae. In terms of above discussion, there are no desirable general solver towards a QBF problem in practice. To this end, we focus on 2QBF formulae in this paper, a specified-QBF case with only 1 alternation of quantifiers.

Extended from SAT, 2QBF problems keep attracting a lot of attentions due to their practical usages (Mishchenko et al., 2015; Mneimneh & Sakallah, 2003; Remshagen & Truemper, 2005), yet remaining very challenging like QBF. Formally, , where , and are sets of variables, and is quantifier-free formula. The quantifier-free formula can be in Conjunctive Normal Form (CNF), where is a conjunction of clauses, clauses are disjunctions of literals, and each literal is either a variable or its negation. For example, the following term is a well-formed 2QBF in CNF: . If is in CNF, it is required that the quantifier is on the outside, and the quantifier is on the inside. Briefly, the 2QBF problem is to ask whether the formula can be evaluated to considering the and quantifications. It’s presumably exponentially harder to solve 2QBF than SAT because it characterizes the second level of the polynomial hierarchy.

Our work explores several different 2QBF solvers by way of graph neural-symbolic reasoning. In Section 2, we investigate famous SAT GNN-based solvers (Selsam et al., 2018)(Amizadeh et al., 2019). We found these architectures hard to extend to 2QBF problems, due to that GNN is unable to reason about unsatisfiability. To this, we further make some effective reconfiguration to GNN. In Section 3, on behalf of a traditional CEGAR-based solver, three ways to learn the GNN-based heuristics are proposed: to rank the candidates, to rank the counterexamples, and their combination. They aim to avoid multiple GNN embeddings per formula, to reduce the GNN inference overhead. Relevant experiments showcase their superiorities in 2QBF.

2 GNN-based QBF Solver Failed

Let’s first revisit the existing GNN-based SAT solvers, and analyze why they fails to suit the 2QBF problem.

2.1 GNN for QBF

Embedding of SAT

SAT formulae are translated into bipartite graphs Selsam et al. (2018), where literals () represent one kind of nodes, and clauses () represent the other kind. We denote EdgeMatrix () as edges between literal and clause nodes with dimension x . The graph of is given below as an example.

As below, and denote embedding matrices of literals and clauses respectively, denotes messages from to , denotes MLP of for generating messages, denotes LSTM of for digesting incoming messages and updating embeddings, denotes matrix multiplication of and , denotes matrix transportation of , denotes matrix concatenation, and denotes the embedding of ’s negations.

Iterations are fixed for train but can be unbounded for test.

Embedding of 2QBF

We separate -literals and -literals in different groups, embed them via different NN modules. The graph representation of shows:

We use and to denote all -literals and all -literals respectively. We use denote the EdgeMatrix between and , and denote MLPs that generate .

We designed multiple architectures (details in supplementary) and use the best one as above for the rest of the paper.

Data Preparation

For training and testing, we follow Chen & Interian (2005), which generates QBFs in conjunctive normal form. Specifically, we generate problems of specs (2,3) and sizes (8,10). Each clause has 5 literals, 2 of them are randomly chosen from a set of 8 -quantified variables, 3 are randomly chosen from a set of 10 -quantified variables. We modify the generation procedure that it generates clauses until the formula becomes unsatisfiable. We then randomly negate an -quantified literal per formula to make it satisfiable.


We vote MLPs from

-variables and use average votes as logits for SAT/UNSAT prediction:

As in table 1, Each block of entries are accuracy rate of UNSAT and SAT formulae respectively. The models are tested on 600 pairs of formulae and we allow message-passing iterations up to 1000. GNNs fit well to smaller training dataset, but has trouble for 160 pairs of formulae. Performance deteriorates when embedding iterations increase and most GNNs become very biased at high iterations.

Dataset 40 pairs 80 pairs 160 pairs
8 iters (0.98, 0.94) (1.00, 0.92) (0.84, 0.76)
testing (0.40, 0.64) (0.50, 0.48) (0.50, 0.50)
16 iters (1.00, 1.00) (0.96, 0.96) (0.88, 0.70)
testing (0.54, 0.46) (0.52, 0.52) (0.54, 0.48)
32 iters (1.00, 1.00) (0.98, 0.98) (0.84, 0.80)
testing (0.32, 0.68) (0.52, 0.50) (0.52, 0.50)
Table 1: GNN Performance to Predict SAT/UNSAT
Dataset 160 unsat 320 unsat 640 unsat
8 iters (1.00, 0.99) (0.95, 0.72) (0.82, 0.28)
testing (0.64, 0.06) (0.67, 0.05) (0.69, 0.05)
16 iters (1.00, 1.00) (0.98, 0.87) (0.95, 0.69)
testing (0.64, 0.05) (0.65, 0.05) (0.65, 0.06)
32 iters (1.00, 1.00) (0.99, 0.96) (0.91, 0.57)
testing (0.63, 0.05) (0.64, 0.05) (0.63, 0.05)
Table 2: GNN Performance to Predict Witness of UNSAT

-Witnesses of UNSAT

Proving unsatisfiability of 2QBF needs a witness of unsatisfiability, which is an assignment to

-variables that eventually leads to UNSAT. We use logistic regression in this experiment. To be specific, the final embeddings of

-variables are transformed into logits via a MLP and used to compute the cross-entropy loss with the known witness unsatisfiability of the formulae.

This training task is very similar to Amizadeh et al. (2019), except our GNN has to reason about unsatisfiability of the simplified SAT formulae, which we believe infeasible. We summarize the results in Table  2. In each block of entries, we list the accuracy per variable and accuracy per formulae on the left and right seperately. Entries in upper half of each block is for training data, and lower half for testing data. From the table we see that GNNs fit well to the training data. More iterations of message-passing give better fitting. However, the performance on testing data are only slightly better than random. More iterations in testing do not help with performance.

2.2 Why GNN-based QBF Solver Failed

We conjecture current GNN architectures and embedding processes are unlikely to prove unsatisfiability or reason about -assignments. Even in SAT problem Selsam et al. (2018), GNNs are good at finding solutions for satisfiable formulae, while not for confidently proving unsatisfiability. Similarly Amizadeh et al. (2019) had little success in proving unsatisfiability with DAG-embedding because showing SAT only needs a witness, but proving UNSAT needs more complete reasoning about the search space. A DPLL-based approach would iterate all possible assignments and construct a proof of UNSAT. However, a GNN embedding process is neither following a strict order of assignments, nor learning new knowledge that indicates some assignments should be avoided. In fact, the GNN embedding may be mostly similar to vanilla WalkSAT approaches, with randomly initialized assignments and stochastic local search, which can not prove unsatisfiability.

This conjecture may be a great obstacle for learning 2QBF solvers from GNN, because proving either satisfiability or unsatisfiability of the 2QBF problem needs not only a witness. If the formula is satisfiable, proof needs to provide assignments to -variables under all possible assignments of -variables or in a CEGAR-based solver. If the formula is unsatisfiable, then the procedure should find an assignment for the -variables.

3 Learn GNN-based Heuristics

In Section  2, we know that GNN-based 2QBF solvers are unlikely to be learned, therefore, the success of learning SAT solvers (Selsam et al., 2018; Amizadeh et al., 2019) cannot simply extend to 2QBF or more expressive logic. We consider the CEGAR-based solving algorithm, to reduce the GNN inference overhead. We first present the CEGAR-based solving procedure in Algorithm 1 (Janota & Marques-Silva, 2011).

  Output: (sat, -) or (unsat, witness)
  Initialize constraints as empty set.
  while true do
     (has-candidate, candidate) = SAT-solver()
     if not has-candidate then
         return (sat, -)
     end if
     (has-counter, counter) = SAT-solver()
     if not has-counter then
         return (unsat, candidate)
     end if
     add counter to constraints
  end while
Algorithm 1 CEGAR 2QBF solver

Note that is constraints for candidates. Initially, is , and any assignment of -variables can be proposed as candidate which may reduce the problem to a smaller propositional formula. If we can find an assignment to -variables that satisfies the propositional formula, this assignment is called a counterexample to the candidate. We denote as all clauses in that are satisfied by the counterexample. The counterexample can be transformed into a constraint, stating that next candidates cannot simultaneously satisfy clauses (), since those candidates are already rejected by the current counterexample. This constraint can be added to as a propositional term, thus finding new candidates is done by solving constraints-derived propositional term .

3.1 Ranking the Candidates

In order to decide which candidate to use from SAT-solver , we can rank solutions in MaxSAT-style by simplifying the formula with candidates and ranking them based on the number of clauses they satisfy. We use it as a benchmark comparison. Besides, the hardness can be evaluated as the number of solutions of the simplified propositional formula. Thus the training data of our ranking GNN is all possible assignments of -variables and the ranking scores that negatively relate to the number of solutions of each assignment-propagated propositional formula (Details about computing the ranking scores shown in supplementary).

We extend the GNN embedding architecture so that the final embedding of the -variables are transformed into a scoring matrix () for candidates via a MLP (). A batch of candidates () are ranked by passing through a two-layer MLP without biases, where the weights of the first layer is the scoring matrix (

), and the weights of the second layer is a weight vector (


We make use of the TensorFlow ranking library

(Pasumarthi et al., 2018) to compute the pairwise-logistic-loss with NDCG-lambda-weight for supervised training. What’s more, we evaluate our ranking heuristics by adding them to CEGAR cycle and measure the average steps needed to solve the problems. It requires us to change the subroutine to a subroutine, where once a solution is found, it is added back to the formula as constraint, and search for a different solution, until no solutions can be found or maximal number of solutions is reached. Then the heuristics ranks the solutions and proposes the best one as candidate. We use 4 datasets: (1)TrainU: 1000 unsatisfiable formulae used for training; (2) TrainS: 1000 satisfiable formulae used for training; (3) TestU: 600 unsatisfiable formulae used for testing; (4) TestS: 600 satisfiable formulae used for testing); and 4 ranking heuristics: (1) -: no ranking; (2) MaxSAT: ranking by the number of satisfied clauses via on-the-fly formula simplification; (3) GNN1: ranking by hardness via GNN model inference; (4) GNN2: ranking by the number of satisfied clauses via GNN model inference.

As shown in Table  3

, all 3 ranking heuristics improve the solving process of all 4 datasets. Unsatisfiable formulae benefit more from the heuristics, and the heuristics generalizes very well from training formulae to testing formulae. Machine learning results are repeated twice with different random seeds, and numbers shown are from models with best performance on training data.

DataSet TrainU TrainS TestU TestS
- 21.976 34.783 21.945 33.885
maxSAT 13.144 30.057 12.453 28.863
GNN1 13.843 31.704 13.988 30.573
GNN2 15.287 32.0 14.473 30.788
Table 3: Performance of CEGAR Candidate Ranking

3.2 Ranking the Counterexamples

We consider a GNN-based heuristics for ranking counterexamples. Each counterexample contributes to a constraint in , which either shrinks the search space of the witnesses of unsatisfiability, or be added to the constraints indicating that no candidates are witnesses of unsatisfiability.

As following, we compute ranking scores for our training data. For satisfiable 2QBF instances in the training data, we list all possible assignments of -variables, and collect all constraining clauses in . Then we solve with hmucSAT (Nadel et al., 2013), seeking for unsatisfiability cores. Initially we plan to give a high ranking score (10) for -assignments corresponding to clauses in unsatisfiability cores, and a low ranking score (1) for all other -assignments. Later, we choose to give other -assignments ranking scores based on the number of satisfied clauses, in range of because unsatisfiability cores are often small.

For unsatisfiable 2QBF instances, we collect all constraining clauses in . As is satisfiable and solutions are actually witnesses of unsatisfiability. To obtain unsatisfiability scores, we add solutions to as extra constraints until the becomes unsatisfiable. We then compute the ranking scores.

We use another dataset of which rankings scores is totally based on the number of clauses satisfied for comparison. To add ranking modules, we extend GNN embedding architecture. Notations include for the scoring matrix, for a MLP to get scoring matrix from the final embedding of -variables, for a batch of counterexamples, and for the weight vector.

DataSet TrainU TrainS TestU TestS
- 21.976 34.783 21.945 33.885
maxSAT 14.754 22.265 14.748 21.638
GNN1 17.492 26.962 17.198 26.598
GNN2 16.95 26.717 16.743 26.325
Table 4: Performance of CEGAR-COUNTER-RANKING

After supervised training, we evaluate the trained GNN-based ranking heuristics in a CEGAR-based solver. The results are shown in Table  4. Based on the MaxSAT heuristics, ranking counterexamples benefits solving satisfiable formulae more than unsatisfiable formulae. However, GNN1 performs worse than GNN2. The likely explanation is that predicting unsatisfiability cores is far too complicated for GNN. Moreover, knowledge of unsatisfiability cores cannot be obtained from each counterexample alone, but needs analysis of all counterexamples collectively. It may go back to the limitation of GNN in reasoning about ‘‘all possible solutions’’, and the added score information behaves like an interference rather than knowledge for GNN-based ranking heuristics. Machine learning results are repeated twice, reporting models with best training data performance.

DataSet TrainU TrainS TestU TestS
- 21.976 34.783 21.945 33.885
maxSAT 9.671 20.777 9.425 19.883
GNN1 11.686 25.021 11.605 24.518
GNN2 12.505 25.505 12.22 24.938
GNN3 11.25 24.76 12.008 24.295
Table 5: Performance of CEGAR-BOTH-RANKING

3.3 Combination of the Heuristics

To combine ranking heuristics and counterexamples in a single solver, we extend the GNN-embedding architecture with ranking data of candidates and counterexamples. We have GNN1 trained by ranking scores from hardness and unsatisfiability cores, GNN2 trained by ranking scores from the number of satisfied clauses for both candidates and counterexamples, and GNN3 trained by ranking scores from hardness for candidates, and number of satisfied clauses for counterexamples. As in Table  5

, GNN3 is arguably the best model we obtained from supervised learning via this ranking method. All machine learning results are repeated twice with different random seeds, and models with best performance in training data are reported.

4 Conclusion

In this paper, we show learning GNN-based 2QBF solvers is hard by current GNN architectures due to its inability to reason about unsatisfiability. Our work extends the previous GNN-based 2QBF solver in terms of CEGAR-based heuristic. A suite of GNN-based techniques have been made to improve the GNN embedding for reasoning 2QBF solutions. Their superiorities are witnessed in our experiments.


  • Amizadeh et al. (2019) Amizadeh, S., Matusevych, S., and Weimer, M. Learning to solve circuit-SAT: An unsupervised differentiable approach. In International Conference on Learning Representations, 2019. URL
  • Chen & Interian (2005) Chen, H. and Interian, Y. A model for generating random quantified boolean formulas. In IJCAI, pp. 66--71. Professional Book Center, 2005.
  • Cook (1971) Cook, S. A. The complexity of theorem-proving procedures. In

    Proceedings of the Third Annual ACM Symposium on Theory of Computing

    , STOC ’71, pp. 151--158, New York, NY, USA, 1971. ACM.
    doi: 10.1145/800157.805047. URL
  • Janota (2018) Janota, M. Towards generalization in QBF solving via machine learning. In AAAI, pp. 6607--6614. AAAI Press, 2018.
  • Janota & Marques-Silva (2011) Janota, M. and Marques-Silva, J. P. Abstraction-based algorithm for 2qbf. In SAT, volume 6695 of Lecture Notes in Computer Science, pp. 230--244. Springer, 2011.
  • Janota et al. (2016) Janota, M., Klieber, W., Marques-Silva, J., and Clarke, E. M. Solving QBF with counterexample guided refinement. Artif. Intell., 234:1--25, 2016.
  • Mishchenko et al. (2015) Mishchenko, A., Brayton, R. K., Feng, W., and Greene, J. W. Technology mapping into general programmable cells. In FPGA, pp. 70--73. ACM, 2015.
  • Mneimneh & Sakallah (2003) Mneimneh, M. N. and Sakallah, K. A. Computing vertex eccentricity in exponentially large graphs: QBF formulation and solution. In SAT, volume 2919 of Lecture Notes in Computer Science, pp. 411--425. Springer, 2003.
  • Nadel et al. (2013) Nadel, A., Ryvchin, V., and Strichman, O. Efficient MUS extraction with resolution. In FMCAD, pp. 197--200. IEEE, 2013.
  • Pasumarthi et al. (2018) Pasumarthi, R. K., Wang, X., Li, C., Bruch, S., Bendersky, M., Najork, M., Pfeifer, J., Golbandi, N., Anil, R., and Wolf, S. Tf-ranking: Scalable tensorflow library for learning-to-rank. CoRR, abs/1812.00073, 2018.
  • Rabe & Seshia (2016) Rabe, M. N. and Seshia, S. A. Incremental determinization. In SAT, volume 9710 of Lecture Notes in Computer Science, pp. 375--392. Springer, 2016.
  • Rabe et al. (2018) Rabe, M. N., Tentrup, L., Rasmussen, C., and Seshia, S. A. Understanding and extending incremental determinization for 2qbf. In Chockler, H. and Weissenbacher, G. (eds.), Computer Aided Verification, pp. 256--274, Cham, 2018. Springer International Publishing. ISBN 978-3-319-96142-2.
  • Remshagen & Truemper (2005) Remshagen, A. and Truemper, K. An effective algorithm for the futile questioning problem.

    J. Autom. Reasoning

    , 34(1):31--47, 2005.
  • Savitch (1970) Savitch, W. J. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177 -- 192, 1970. ISSN 0022-0000.
  • Selsam et al. (2018) Selsam, D., Lamm, M., Bünz, B., Liang, P., de Moura, L., and Dill, D. L. Learning a SAT solver from single-bit supervision. CoRR, abs/1802.03685, 2018.