1 Introduction
A propositional formula expression consists of Boolean constants (: true, : false), Boolean variables (), and propositional connectives such as , , , and etc. The SAT (Boolean Satisfiability) problem, which asks if given a formula can be satisfied (as ) by assigning proper Boolean values to the variables, is the first proven NPcomplete problem (Cook, 1971). As an extension of propositional formula, QBF (Quantified Boolean Formula) allows quantifiers ( and ) over the Boolean variables. In general, a quantified Boolean formula can be expressed as such:
where denote quantifiers that differ from its neighboring quantifiers, are disjoint sets of variables, and is propositional formulae with all Boolean variables bounded. The QBF problem is PSPACEcomplete (Savitch, 1970). To this researchers previously proposed incremental determinzation (Rabe & Seshia, 2016; Rabe et al., 2018) or CEGARbased (Janota et al., 2016) solvers to solve it. They are nondeterministic, e.g., employing heuristics guidance for search a solution. Recently, MaxSATbased (Janota & MarquesSilva, 2011) and MLbased (Janota, 2018) heuristics have been proposed into CEGARbased solvers. Without existing decision procedure, Selsam et al. (2018) presented a GNN architecture that embeds the propositional formulae. Amizadeh et al. (2019) adapt a RLstyle exploreexploit mechanism in this problem, but considering circuitSAT problems. However, these solvers didn’t tackle unsatisfiable formulae. In terms of above discussion, there are no desirable general solver towards a QBF problem in practice. To this end, we focus on 2QBF formulae in this paper, a specifiedQBF case with only 1 alternation of quantifiers.
Extended from SAT, 2QBF problems keep attracting a lot of attentions due to their practical usages (Mishchenko et al., 2015; Mneimneh & Sakallah, 2003; Remshagen & Truemper, 2005), yet remaining very challenging like QBF. Formally, , where , and are sets of variables, and is quantifierfree formula. The quantifierfree formula can be in Conjunctive Normal Form (CNF), where is a conjunction of clauses, clauses are disjunctions of literals, and each literal is either a variable or its negation. For example, the following term is a wellformed 2QBF in CNF: . If is in CNF, it is required that the quantifier is on the outside, and the quantifier is on the inside. Briefly, the 2QBF problem is to ask whether the formula can be evaluated to considering the and quantifications. It’s presumably exponentially harder to solve 2QBF than SAT because it characterizes the second level of the polynomial hierarchy.
Our work explores several different 2QBF solvers by way of graph neuralsymbolic reasoning. In Section 2, we investigate famous SAT GNNbased solvers (Selsam et al., 2018)(Amizadeh et al., 2019). We found these architectures hard to extend to 2QBF problems, due to that GNN is unable to reason about unsatisfiability. To this, we further make some effective reconfiguration to GNN. In Section 3, on behalf of a traditional CEGARbased solver, three ways to learn the GNNbased heuristics are proposed: to rank the candidates, to rank the counterexamples, and their combination. They aim to avoid multiple GNN embeddings per formula, to reduce the GNN inference overhead. Relevant experiments showcase their superiorities in 2QBF.
2 GNNbased QBF Solver Failed
Let’s first revisit the existing GNNbased SAT solvers, and analyze why they fails to suit the 2QBF problem.
2.1 GNN for QBF
Embedding of SAT
SAT formulae are translated into bipartite graphs Selsam et al. (2018), where literals () represent one kind of nodes, and clauses () represent the other kind. We denote EdgeMatrix () as edges between literal and clause nodes with dimension x . The graph of is given below as an example.
As below, and denote embedding matrices of literals and clauses respectively, denotes messages from to , denotes MLP of for generating messages, denotes LSTM of for digesting incoming messages and updating embeddings, denotes matrix multiplication of and , denotes matrix transportation of , denotes matrix concatenation, and denotes the embedding of ’s negations.
Iterations are fixed for train but can be unbounded for test.
Embedding of 2QBF
We separate literals and literals in different groups, embed them via different NN modules. The graph representation of shows:
We use and to denote all literals and all literals respectively. We use denote the EdgeMatrix between and , and denote MLPs that generate .
We designed multiple architectures (details in supplementary) and use the best one as above for the rest of the paper.
Data Preparation
For training and testing, we follow Chen & Interian (2005), which generates QBFs in conjunctive normal form. Specifically, we generate problems of specs (2,3) and sizes (8,10). Each clause has 5 literals, 2 of them are randomly chosen from a set of 8 quantified variables, 3 are randomly chosen from a set of 10 quantified variables. We modify the generation procedure that it generates clauses until the formula becomes unsatisfiable. We then randomly negate an quantified literal per formula to make it satisfiable.
Sat/unsat
As in table 1, Each block of entries are accuracy rate of UNSAT and SAT formulae respectively. The models are tested on 600 pairs of formulae and we allow messagepassing iterations up to 1000. GNNs fit well to smaller training dataset, but has trouble for 160 pairs of formulae. Performance deteriorates when embedding iterations increase and most GNNs become very biased at high iterations.
Dataset  40 pairs  80 pairs  160 pairs 

8 iters  (0.98, 0.94)  (1.00, 0.92)  (0.84, 0.76) 
testing  (0.40, 0.64)  (0.50, 0.48)  (0.50, 0.50) 
16 iters  (1.00, 1.00)  (0.96, 0.96)  (0.88, 0.70) 
testing  (0.54, 0.46)  (0.52, 0.52)  (0.54, 0.48) 
32 iters  (1.00, 1.00)  (0.98, 0.98)  (0.84, 0.80) 
testing  (0.32, 0.68)  (0.52, 0.50)  (0.52, 0.50) 
Dataset  160 unsat  320 unsat  640 unsat 

8 iters  (1.00, 0.99)  (0.95, 0.72)  (0.82, 0.28) 
testing  (0.64, 0.06)  (0.67, 0.05)  (0.69, 0.05) 
16 iters  (1.00, 1.00)  (0.98, 0.87)  (0.95, 0.69) 
testing  (0.64, 0.05)  (0.65, 0.05)  (0.65, 0.06) 
32 iters  (1.00, 1.00)  (0.99, 0.96)  (0.91, 0.57) 
testing  (0.63, 0.05)  (0.64, 0.05)  (0.63, 0.05) 
Witnesses of UNSAT
Proving unsatisfiability of 2QBF needs a witness of unsatisfiability, which is an assignment to
variables that eventually leads to UNSAT. We use logistic regression in this experiment. To be specific, the final embeddings of
variables are transformed into logits via a MLP and used to compute the crossentropy loss with the known witness unsatisfiability of the formulae.This training task is very similar to Amizadeh et al. (2019), except our GNN has to reason about unsatisfiability of the simplified SAT formulae, which we believe infeasible. We summarize the results in Table 2. In each block of entries, we list the accuracy per variable and accuracy per formulae on the left and right seperately. Entries in upper half of each block is for training data, and lower half for testing data. From the table we see that GNNs fit well to the training data. More iterations of messagepassing give better fitting. However, the performance on testing data are only slightly better than random. More iterations in testing do not help with performance.
2.2 Why GNNbased QBF Solver Failed
We conjecture current GNN architectures and embedding processes are unlikely to prove unsatisfiability or reason about assignments. Even in SAT problem Selsam et al. (2018), GNNs are good at finding solutions for satisfiable formulae, while not for confidently proving unsatisfiability. Similarly Amizadeh et al. (2019) had little success in proving unsatisfiability with DAGembedding because showing SAT only needs a witness, but proving UNSAT needs more complete reasoning about the search space. A DPLLbased approach would iterate all possible assignments and construct a proof of UNSAT. However, a GNN embedding process is neither following a strict order of assignments, nor learning new knowledge that indicates some assignments should be avoided. In fact, the GNN embedding may be mostly similar to vanilla WalkSAT approaches, with randomly initialized assignments and stochastic local search, which can not prove unsatisfiability.
This conjecture may be a great obstacle for learning 2QBF solvers from GNN, because proving either satisfiability or unsatisfiability of the 2QBF problem needs not only a witness. If the formula is satisfiable, proof needs to provide assignments to variables under all possible assignments of variables or in a CEGARbased solver. If the formula is unsatisfiable, then the procedure should find an assignment for the variables.
3 Learn GNNbased Heuristics
In Section 2, we know that GNNbased 2QBF solvers are unlikely to be learned, therefore, the success of learning SAT solvers (Selsam et al., 2018; Amizadeh et al., 2019) cannot simply extend to 2QBF or more expressive logic. We consider the CEGARbased solving algorithm, to reduce the GNN inference overhead. We first present the CEGARbased solving procedure in Algorithm 1 (Janota & MarquesSilva, 2011).
Note that is constraints for candidates. Initially, is , and any assignment of variables can be proposed as candidate which may reduce the problem to a smaller propositional formula. If we can find an assignment to variables that satisfies the propositional formula, this assignment is called a counterexample to the candidate. We denote as all clauses in that are satisfied by the counterexample. The counterexample can be transformed into a constraint, stating that next candidates cannot simultaneously satisfy clauses (), since those candidates are already rejected by the current counterexample. This constraint can be added to as a propositional term, thus finding new candidates is done by solving constraintsderived propositional term .
3.1 Ranking the Candidates
In order to decide which candidate to use from SATsolver , we can rank solutions in MaxSATstyle by simplifying the formula with candidates and ranking them based on the number of clauses they satisfy. We use it as a benchmark comparison. Besides, the hardness can be evaluated as the number of solutions of the simplified propositional formula. Thus the training data of our ranking GNN is all possible assignments of variables and the ranking scores that negatively relate to the number of solutions of each assignmentpropagated propositional formula (Details about computing the ranking scores shown in supplementary).
We extend the GNN embedding architecture so that the final embedding of the variables are transformed into a scoring matrix () for candidates via a MLP (). A batch of candidates () are ranked by passing through a twolayer MLP without biases, where the weights of the first layer is the scoring matrix (
), and the weights of the second layer is a weight vector (
).We make use of the TensorFlow ranking library
(Pasumarthi et al., 2018) to compute the pairwiselogisticloss with NDCGlambdaweight for supervised training. What’s more, we evaluate our ranking heuristics by adding them to CEGAR cycle and measure the average steps needed to solve the problems. It requires us to change the subroutine to a subroutine, where once a solution is found, it is added back to the formula as constraint, and search for a different solution, until no solutions can be found or maximal number of solutions is reached. Then the heuristics ranks the solutions and proposes the best one as candidate. We use 4 datasets: (1)TrainU: 1000 unsatisfiable formulae used for training; (2) TrainS: 1000 satisfiable formulae used for training; (3) TestU: 600 unsatisfiable formulae used for testing; (4) TestS: 600 satisfiable formulae used for testing); and 4 ranking heuristics: (1) : no ranking; (2) MaxSAT: ranking by the number of satisfied clauses via onthefly formula simplification; (3) GNN1: ranking by hardness via GNN model inference; (4) GNN2: ranking by the number of satisfied clauses via GNN model inference.As shown in Table 3
, all 3 ranking heuristics improve the solving process of all 4 datasets. Unsatisfiable formulae benefit more from the heuristics, and the heuristics generalizes very well from training formulae to testing formulae. Machine learning results are repeated twice with different random seeds, and numbers shown are from models with best performance on training data.
DataSet  TrainU  TrainS  TestU  TestS 

  21.976  34.783  21.945  33.885 
maxSAT  13.144  30.057  12.453  28.863 
GNN1  13.843  31.704  13.988  30.573 
GNN2  15.287  32.0  14.473  30.788 
3.2 Ranking the Counterexamples
We consider a GNNbased heuristics for ranking counterexamples. Each counterexample contributes to a constraint in , which either shrinks the search space of the witnesses of unsatisfiability, or be added to the constraints indicating that no candidates are witnesses of unsatisfiability.
As following, we compute ranking scores for our training data. For satisfiable 2QBF instances in the training data, we list all possible assignments of variables, and collect all constraining clauses in . Then we solve with hmucSAT (Nadel et al., 2013), seeking for unsatisfiability cores. Initially we plan to give a high ranking score (10) for assignments corresponding to clauses in unsatisfiability cores, and a low ranking score (1) for all other assignments. Later, we choose to give other assignments ranking scores based on the number of satisfied clauses, in range of because unsatisfiability cores are often small.
For unsatisfiable 2QBF instances, we collect all constraining clauses in . As is satisfiable and solutions are actually witnesses of unsatisfiability. To obtain unsatisfiability scores, we add solutions to as extra constraints until the becomes unsatisfiable. We then compute the ranking scores.
We use another dataset of which rankings scores is totally based on the number of clauses satisfied for comparison. To add ranking modules, we extend GNN embedding architecture. Notations include for the scoring matrix, for a MLP to get scoring matrix from the final embedding of variables, for a batch of counterexamples, and for the weight vector.
DataSet  TrainU  TrainS  TestU  TestS 

  21.976  34.783  21.945  33.885 
maxSAT  14.754  22.265  14.748  21.638 
GNN1  17.492  26.962  17.198  26.598 
GNN2  16.95  26.717  16.743  26.325 
After supervised training, we evaluate the trained GNNbased ranking heuristics in a CEGARbased solver. The results are shown in Table 4. Based on the MaxSAT heuristics, ranking counterexamples benefits solving satisfiable formulae more than unsatisfiable formulae. However, GNN1 performs worse than GNN2. The likely explanation is that predicting unsatisfiability cores is far too complicated for GNN. Moreover, knowledge of unsatisfiability cores cannot be obtained from each counterexample alone, but needs analysis of all counterexamples collectively. It may go back to the limitation of GNN in reasoning about ‘‘all possible solutions’’, and the added score information behaves like an interference rather than knowledge for GNNbased ranking heuristics. Machine learning results are repeated twice, reporting models with best training data performance.
DataSet  TrainU  TrainS  TestU  TestS 
  21.976  34.783  21.945  33.885 
maxSAT  9.671  20.777  9.425  19.883 
GNN1  11.686  25.021  11.605  24.518 
GNN2  12.505  25.505  12.22  24.938 
GNN3  11.25  24.76  12.008  24.295 
3.3 Combination of the Heuristics
To combine ranking heuristics and counterexamples in a single solver, we extend the GNNembedding architecture with ranking data of candidates and counterexamples. We have GNN1 trained by ranking scores from hardness and unsatisfiability cores, GNN2 trained by ranking scores from the number of satisfied clauses for both candidates and counterexamples, and GNN3 trained by ranking scores from hardness for candidates, and number of satisfied clauses for counterexamples. As in Table 5
, GNN3 is arguably the best model we obtained from supervised learning via this ranking method. All machine learning results are repeated twice with different random seeds, and models with best performance in training data are reported.
4 Conclusion
In this paper, we show learning GNNbased 2QBF solvers is hard by current GNN architectures due to its inability to reason about unsatisfiability. Our work extends the previous GNNbased 2QBF solver in terms of CEGARbased heuristic. A suite of GNNbased techniques have been made to improve the GNN embedding for reasoning 2QBF solutions. Their superiorities are witnessed in our experiments.
References
 Amizadeh et al. (2019) Amizadeh, S., Matusevych, S., and Weimer, M. Learning to solve circuitSAT: An unsupervised differentiable approach. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJxgz2R9t7.
 Chen & Interian (2005) Chen, H. and Interian, Y. A model for generating random quantified boolean formulas. In IJCAI, pp. 6671. Professional Book Center, 2005.

Cook (1971)
Cook, S. A.
The complexity of theoremproving procedures.
In
Proceedings of the Third Annual ACM Symposium on Theory of Computing
, STOC ’71, pp. 151158, New York, NY, USA, 1971. ACM. doi: 10.1145/800157.805047. URL http://doi.acm.org/10.1145/800157.805047.  Janota (2018) Janota, M. Towards generalization in QBF solving via machine learning. In AAAI, pp. 66076614. AAAI Press, 2018.
 Janota & MarquesSilva (2011) Janota, M. and MarquesSilva, J. P. Abstractionbased algorithm for 2qbf. In SAT, volume 6695 of Lecture Notes in Computer Science, pp. 230244. Springer, 2011.
 Janota et al. (2016) Janota, M., Klieber, W., MarquesSilva, J., and Clarke, E. M. Solving QBF with counterexample guided refinement. Artif. Intell., 234:125, 2016.
 Mishchenko et al. (2015) Mishchenko, A., Brayton, R. K., Feng, W., and Greene, J. W. Technology mapping into general programmable cells. In FPGA, pp. 7073. ACM, 2015.
 Mneimneh & Sakallah (2003) Mneimneh, M. N. and Sakallah, K. A. Computing vertex eccentricity in exponentially large graphs: QBF formulation and solution. In SAT, volume 2919 of Lecture Notes in Computer Science, pp. 411425. Springer, 2003.
 Nadel et al. (2013) Nadel, A., Ryvchin, V., and Strichman, O. Efficient MUS extraction with resolution. In FMCAD, pp. 197200. IEEE, 2013.
 Pasumarthi et al. (2018) Pasumarthi, R. K., Wang, X., Li, C., Bruch, S., Bendersky, M., Najork, M., Pfeifer, J., Golbandi, N., Anil, R., and Wolf, S. Tfranking: Scalable tensorflow library for learningtorank. CoRR, abs/1812.00073, 2018.
 Rabe & Seshia (2016) Rabe, M. N. and Seshia, S. A. Incremental determinization. In SAT, volume 9710 of Lecture Notes in Computer Science, pp. 375392. Springer, 2016.
 Rabe et al. (2018) Rabe, M. N., Tentrup, L., Rasmussen, C., and Seshia, S. A. Understanding and extending incremental determinization for 2qbf. In Chockler, H. and Weissenbacher, G. (eds.), Computer Aided Verification, pp. 256274, Cham, 2018. Springer International Publishing. ISBN 9783319961422.
 Remshagen & Truemper (2005) Remshagen, A. and Truemper, K. An effective algorithm for the futile questioning problem. , 34(1):3147, 2005.
 Savitch (1970) Savitch, W. J. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177  192, 1970. ISSN 00220000.
 Selsam et al. (2018) Selsam, D., Lamm, M., Bünz, B., Liang, P., de Moura, L., and Dill, D. L. Learning a SAT solver from singlebit supervision. CoRR, abs/1802.03685, 2018.
Comments
There are no comments yet.