1 Introduction
The research community has been witnessing an increasing attention devoted to the integration of learning and reasoning, especially through the combination of neural networks and symbolic AI [10, 3] into neurosymbolic systems. At the core of a neurosymbolic system there is an algorithm to represent symbolic knowledge in a neural network. One of the goals is to leverage the parallel and distributed properties of the network to perform reasoning. In many neurosymbolic approaches, the most used form of knowledge representation is ifthen rules whereby logical reasoning is built upon ModusPonens as the only rule of inference [18, 5, 19, 4, 20, 9]. Given a formula (read “B if A” following a logic programming notation) a neural network would either infer approximately (True) if by forward chaining, or search for the value of to confirm or refute the hypothesis (backward chaining). This has two shortcomings. First, ModusPonens alone may not capture entirely the power of logical reasoning as required by an application. For example, it may be the case in an application that if (False), the neural network is expected to infer approximately (ModusTollens). Second, one may wish to allow other forms of rules to be represented by the neural network such as disjunctive normal form (DNF) with any number of negative literals.
In this paper, we introduce Logical Boltzmann Machines, a neurosymbolic system that can represent any propositional logic formula in a neural network and achieve efficient reasoning using restricted Boltzmann machines. We introduce an algorithm to translate any logical formula described in DNF into a Boltzmann machine and we show equivalence between the logical formula and the energybased connectionist model. In other words, we show soundness of the translation algorithm. Specifically, the connectionist model will assign minimum energy to the assignments of truthvalues that satisfy the formula. This produces a new way of performing reasoning in neural networks by employing the neural network to search for the models of the logical formula, that is, to search for assignments of truthvalues that map the logical formula to true. We show that Gibbs sampling can be applied efficiently for this search with a large number of variables in the logical formula. If the number of variable is small, inference can be carried out analytically by sorting the freeenergy of all possible truthvalue assignments. Returning to our example with formula
, Logical Boltzmann Machines can infer approximately given , and it can infer approximately given since both truthvalue assignments  (, ) and (, )  would minimise the energy of the network.In what concerns the representational issue of rules other than ifthen rules, we propose a new way of converting any logical formula into strict DNF (SDNF) which is shown to map conveniently onto Restricted Boltzmann Machines (RBMs). In the experiments reported in this paper, this new mapping into SDNF and RBMs is shown to enable approximate reasoning with a large number of variables. The proposed approach is evaluated in a logic programming benchmark task whereby machine learning models are trained from data and background knowledge. Logical Boltzmann Machines achieved a better training performance (higher test set accuracy) in five out of seven data sets when evaluated empirically on this benchmark in comparison with a purelysymbolic learning system (Inductive Logic Programming system Aleph [17]), a neurosymbolic system for Inductive Logic Programming (CILP++ [5]) and a purelyconnectionist system (standard RBMs [15]).
The contribution of this work is twofold:

A theoretical proof to equivalently map logical formulas and probabilistic neural networks, namely Restricted Boltzmann Machines, which can facilitate neurosymbolic learning and reasoning.

A foundation for the employment of statistical inference methods to perform logical reasoning.
The remainder of the paper is organised as follows. In the next section, we review the related work. Section 3 describes and proves correctness of the mapping from any logical formula into SDNF and then RBMs. Section 4 defines reasoning by sampling and energy minimization in RBMs. Section 5 introduces the LBM system and evaluates scalability of reasoning with an increasing number of variables. Section 6 contains the experimental results on learning and the comparison with a symbolic, neurosymbolic and a purelyneural learning system. We then conclude the paper and discuss directions for future work.
2 Related Work
One of the earliest work on the integration of neural networks and symbolic knowledge is Knowledgebased Artificial Neural Network [18] which encodes ifthen
rules into a hierarchy of perceptrons. In another early approach
[6], a singlehidden layer neural network with recurrent connections is proposed to support logic programming rules. An extension of that approach to work with firstorder logic programs, called CILP++ [5], uses the concept of propositionalisation from Inductive Logic Programming (ILP) whereby firstorder variables can be treated as propositional atoms in the neural network. Also based on firstorder logic programs, [4] propose a differentiable ILP approach that can be implemented by neural networks, while [2]maps stochastic logic programs into a differentiable function also trainable by neural networks. These are all supervised learning approaches.
Among unsupervised learning approaches, Penalty Logic
[12] was the first work to integrate propositional and nonmonotonic logic formulae into symmetric networks. However, it required the use of higherorder Hopfield networks which can become complicated to construct and inefficient to train with the learning algorithm of Boltzmann machines (BM). Such higherorder networks require transforming the energy function into a quadratic form by adding hidden variables not present in the original logic formulae for the purpose of building the network. More recently, several attempts have been made to extract and encode symbolic knowledge into RBMs [11, 19]. These are based on the structural similarity between symmetric networks and biconditional logical statements and do not contain soundness results. By contrast, and similarly to Penalty Logic, the approach introduced in this paper is based on a proof of equivalence between the logic formulae and the symmetric networks, but without requiring higherorder networks.Alongside the above approaches which translate from a symbolic to a neural representation, normally from ifthen rules to a feedforward or recurrent neural network, there are also hybrid approaches, which combine neural networks with symbolic AI systems and logic operators, such as DeepProbLog
[9]and Logic Tensor Networks (LTN)
[14]. While DeepProbLog adds a neural network module to probabilistic logic programming, LTN represents the level of truth of firstorder logic statements in the neural network. LTN employs a discriminative approach to infer the level of truth rather than the generative approach adopted in this paper by LBM. Although a discriminative approach can be very useful for evaluating assignments of variables to truthvalues, we argue that it is not adequate for the implementation of a search for the satisfying assignments of logical formulae. For this purpose, the use of a generative approach is needed as proposed in this paper with LBM.3 Knowledge Representation in RBMs
An RBM [15] can be seen as a twolayer neural network with bidirectional (symmetric) connections, which is characterised by a function called the energy of the RBM:
(1) 
where and are the biases of input unit and hidden unit , respectively, and is the connection weight between and
. This RBM represents a joint distribution
where is the partition function and parameter is called the temperature of the RBM, is the set of visible units and is the set of hidden units in the RBM.In propositional logic, any wellformed formula (WFF) can be mapped into Disjunctive Normal Form (DNF), i.e. disjunctions () of conjunctions (), as follows:
where is called a conjunctive clause, e.g. . Here, we denote the propositional variables (literals) as for positive literals (e.g. ), for negative literals (e.g. ); and denote, respectively, the sets of indices of the positive literals and indices of the negative literals in the formula. This notation may seem overcomplicated but it will be useful in the proof of soundness of our translation from SDNF to RBMs.
Definition 1
Let denote the truthvalue of a WFF given an assignment of truthvalues to the literals of with truthvalue mapped to 1 and truthvalue mapped to 0. Let denote the energy function of an energybased neural network with visible units and hidden units . is said to be equivalent to if and only if for any assignment there exists a function such that .
This definition of equivalence is similar to that of Penalty Logic [12], whereby all assignments of truthvalues satisfying a WFF are mapped to global minima of the energy function of network . In our case, by construction, assignments that do not satisfy the WFF are mapped to maxima of the energy function.
Definition 2
A strict DNF (SDNF) is a DNF with at most one conjunctive clause that maps to for any assignment of truthvalues . A full DNF is a DNF where each propositional variable must appear at least once in every conjunctive clause.
Lemma 1
Any SDNF can be mapped onto an energy function:
where (resp. ) is the set of (resp. ) indices of the positive (resp. negative) literals in .
Proof. Each conjunctive clause in can be represented by which maps to if and only if (i.e. ) and (i.e. ) for all and . Since is a SDNF, it is if and only if one conjunctive clause is . Then, the sum if and only if the assignment of truthvalues to , is a model of . Hence, the neural network with energy function is such that .
Example 1
The formula , c.f. Table 1, can be converted into a SDNF as follows:
For each conjunctive clause in , a corresponding expression is added to the energy function, e.g. corresponding to clause . Hence, the energy function for equivalent to becomes:
We now show that any SDNF can be mapped onto an RBM.
Theorem 1
Any SDNF can be mapped onto an equivalent RBM with energy function
(2) 
where , and are, respectively, the sets of indices of the positive and negative literals in each conjunctive clause of the SDNF, and is the number of positive literals in conjunctive clause .
Proof. We have seen in Lemma 1 that any SDNF can be mapped onto energy function . For each expression , we define an energy expression associated with hidden unit as . is minimized with value when , written . This is because if and only if and for all and . Otherwise, and with . By repeating this process for each we obtain that any SDNF is equivalent to an RBM with the energy function: such that .
Applying Theorem 1, an RBM for the XOR formula can be built as shown in Figure 1. We choose . The energy function of this RBM is:
For comparison, one may construct an RBM for XOR using Penalty Logic, as follow. First, we compute the higherorder energy function: then we transform it to quadratic form by adding a hidden variable to obtain: which is not an energy function of an RBM, so we keep adding hidden variables until the energy function of an RBM might be obtained, in this case:
The above example illustrates in a simple case the value of using SDNF, in that it produces a direct translation into efficient RBMs by contrast with existing approaches. Next, we discuss the challenges of the conversion of WFFs into SDNF.
Representation Capacity: It is wellknown that any formula can be converted into DNF. If is not SDNF then by definition there is a group of conjunctive clauses in which map to when is satisfied. This group of conjunctive clauses can always be converted into a full DNF which is also a SDNF. Therefore, any WFF can be converted into SDNF. From Theorem 1, it follows that any WFF can be represented by the energy of an RBM. For example, becomes . We now describe a method for converting logical formulae into SDNF, which we use in the empirical evaluations that follow.
Let us consider a clause:
(3) 
which can be rearranged as , where is a disjunctive clause obtained by removing from . can be either or for any and . We have:
(4) 
because . By De Morgan’s law (), we can always convert (and therefore ) into a conjunctive clause.
By applying (4) repeatedly, each time we can eliminate a variable out of a disjunctive clause by moving it into a new conjunctive clause. The disjunctive clause holds true if and only if either the disjunctive clause holds true or the conjunctive clause () holds true.
As an example, consider the application of the transformation above to an ifthen rule (logical implication):
(5) 
The logical implication is converted to DNF:
(6) 
Applying the variable elimination method in (4) to all variables in the clause , we obtain the SDNF of the logical implication as:
(7) 
where denotes a set from which has been removed. if . Otherwise, . This SDNF only has clauses, making translation to an RBM very efficient. For example, using this method, the SDNF of is . We need an RBM with only 4 hidden units to represent this SDNF.^{1}^{1}1Of course, the number of hidden units will grow exponentially with the number of disjuncts in (typically not allowed in logic programming), e.g. if then the full DNF will have seven conjunctive clauses.
4 Reasoning in RBMs
Reasoning as Sampling
There is a direct relationship between inference in RBMs and logical satisfiability, as follows.
Proposition 1
Let be an RBM constructed from a formula . Let be a set of indices of variables that have been assigned to either True or False (we use to denote the set ). Let be a set of indices of variables that have not been assigned a truthvalue (we use to denote ). Performing Gibbs sampling on given is equivalent to searching for an assignment of truthvalues for that satisfies .
Proof. Theorem 1 has shown that the truthvalue of is inversely proportional to an RBM’s rank function, that is:
(8) 
Therefore, a value of that minimises the energy function also maximises the truth value, because:
(9)  
Now, we can consider an iterative process to search for truthvalues by minimising an RBM’s energy function. This can be done by using gradient descent to update the values of and then
one at a time (similarly to the contrastive divergence algorithm) to minimise
while keeping the other variables () fixed. The alternating updates are repeated until convergence. Notice that the gradients amount to:(10)  
In the case of Gibbs sampling, given the assigned variables , the process starts with a random initialisation of , and proceeds to infer values for the hidden units and then the unassigned variables in the visible layer of the RBM, using the conditional distributions and , respectively, where and
(11)  
It can be seen from (11) that the distributions are monotonic functions of the negative energy’s gradient over and . Therefore, performing Gibbs sampling on those functions can be seen as moving randomly towards a local point of minimum energy, or equivalently to an assignment of truthvalues that satisfies the formula. Since the energy function of the RBM and the satisfiability of the formula are inversely proportional, each step of Gibbs sampling to reduce the energy should intuitively generate a sample that is closer to satisfying the formula.
Reasoning as Lowering Free Energy
When the number of unassigned variables is not large such that the partition function can be calculated directly, one can infer the assignments of using the conditional distribution:
(12) 
where is known as the free energy. The free energy term is a negative softplus function scaled by a nonnegative value called confidence value. It returns a negative output for a positive input and a closetozero output for a negative input. One can change the value of to make the function smooth as shown in Figure 2.
Each free energy term is associated with a conjunctive clause in the SDNF through the weighted sum . Therefore, if a truthvalue assignment of does not satisfy the formula, all energy terms will be close to zero. Otherwise, one free energy term will be , for a choice of obtained from Theorem 1. Thus, the more likely a truth assignment is to satisfy the formula, the lower the free energy.
5 Logical Boltzmann Machines
Based on the previous theoretical results, we are now in position to introduce Logical Boltzmann Machines (LBM). LBM is a neurosymbolic system that uses Restricted Boltzmann Machines for distributed reasoning and learning from data and knowledge.
The LBM system converts any set of formulae into an RBM by applying Theorem 1 to each formula . In the case of Penalty Logic, formulae are weighted. Given a set of weighted formulae , one can also construct an equivalent RBM where each energy term generated from formula is multiplied by . In both cases, the assignments that minimise the energy of the RBM are the assignments that maximise the satifiability of , i.e. the (weighted) sum of the truthvalues of the formula.
Proposition 2
Given a weighted knowledgebase , there exists an equivalent RBM such that , where is the sum of the weights of the formulae in that are satisfied by assignment .
A formula can be decomposed into a set of (weighted) conjunctive clauses from its SDNF. If there exist two conjunctive clauses such that one is subsumed by the other then the subsumed clause is removed and the weight of the remaining clause is replaced by the sum of their weights. Identical conjunctive clauses are treated in the same way: one of them is removed and the weights are added. From Theorem 1, we know that a conjunctive clause is equivalent to an energy term where . A weighted conjunctive clause , therefore, is equivalent to an energy term . For each weighted conjunctive clause, we can add a hidden unit to an RBM with connection weights for all and and for all . The bias for this hidden unit will be . The weighted knowledgebase and the RBM are equivalent because , where is the sum of the weights of the clauses that are satisfied by .
Example 2
(Nixon diamond problem) Consider the following weighted knowledgebase:
Converting all formulae to SDNFs, e.g. , produces conjunctive clauses. After combining the weights of clause () which appears twice, an RBM is created (Figure 3) from the following unique conjunctive clauses and confidence values:
6 Experimental Results
Reasoning
In this experiment we apply LBM to effectively search for satisfying truth assignments of variables in large formulae. Let us define a class of formulae:
(13) 
A formula in this class consists of possible truth assignments of the variables, with of them mapping the formula to (call this the satisfying set). Converting to SDNF as done before but now for the class of formulae, we obtain:
(14) 
Applying Theorem 1 to construct an RBM from , we use Gibbs sampling to infer the truth values of all variables. A sample is accepted as a satisfying assignment if its free energy is lower than or equal to with . We evaluate the coverage and accuracy of accepted samples. Coverage is measured as the proportion of the satisfying set that is accepted. In this experiment, this is the number of satisfying assignments in the set of accepted samples divided by ). Accuracy is measured as the percentage of accepted samples that satisfy the formula.
We test different values of and . LBM achieves accuracy in all cases, meaning that all accepted samples do satisfy the formula. Figure 4
shows the coverage as Gibbs sampling progresses (after each time that a number of samples is collected). Four cases are considered: M=20 and N=5, M=20 and N=10, M=25 and N=10, M=30 and N=10. In each case, we run the sampling process 100 times and report the average results with standard deviations. The number of samples needed to achieve
coverage is much lower than the number of possible assignments (). For example, when M=20, N=10, all satisfying assignments are found after million samples are collected, whereas the number of possible assignments is billion, producing a ratio of sample size to the search space size of just . The ratio for M=30, N=10 is even lower at w.r.t. possible assignments. As far as we know, this is the first study of reasoning in neurosymbolic AI to produce these results with such low ratios.Figure 5 shows the time needed to collect all satisfying assignments for different N in with . LBM only needs around 10 seconds for , seconds for , and seconds for . The curve grows exponentially, similarly to the search space size, but at a much lower scale.
Learning from Data and Knowledge
In this experiment, we evaluate LBM at learning the same Inductive Logic Programming (ILP) benchmark tasks used by neurosymbolic system CILP++ [5] in comparison with ILP stateoftheart system Aleph [17]. An initial LBM is constructed from a subset of the available clauses^{2}^{2}2The rest of the clauses is used for training and validation in the usual way, whereby satisfying assignments are selected from each clause as training examples, for instance from clause , assignment
is converted into vector
. used as background knowledge, more hidden units with random weights are added to the RBM and the system is trained further from examples, following the methodology used in the evaluation of CILP++. Both confidence values and network weights are free parameters for learning.We carry out experiments on 7 data sets: Mutagenesis [16], KRK [1], UWCSE [13], and the Alzheimer’s benchmark: Amine, Acetyl, Memory and Toxic [7]. For Mutagenesis and KRK, we use of the clauses as background knowledge to build the initial LBM. For the larger data sets UWCSE and Alzheimer’s benchmark, we use of clauses as background knowledge. The number of hidden units added to the LBM was chosen arbitrarily to be . For a fair comparison, we also evaluate LBM against a fullyconnected RBM with hidden units so as to offer the RBM more parameters than the LBM since the RBM does not use background knowledge. Both RBM and LBM are trained in discriminative fashion [8] using the conditional distribution for inference. The code with these experiments will be made available.
The results using 10fold cross validation are shown in Table 2, except for UWCSE which use 5 folds. The results for Aleph and CILP++ were collected from [5]. It can be seen that LBM has the best performance in 5 out of 7 data sets. In the alzacetyl data set, Aleph is better than all other models in this evaluation, and the RBM is best in the alzamine data set, despite not using background knowledge (although such knowledge is provided in the form of training examples).
Aleph  CILP++  RBM  LBM  

Mutagenesis  80.85  91.70  95.55  96.28 
KRK  99.60  98.42  99.70  99.80 
UWCSE  84.91  70.01  89.14  89.43 
alzamine  78.71  78.99  79.13  78.25 
alzacetyl  69.46  65.47  62.93  66.82 
alzmemory  68.57  60.44  68.54  71.84 
alztoxic  80.50  81.73  82.71  84.95 
7 Conclusion and Future Work
We introduced an approach and neurosymbolic system to reason about symbolic knowledge at scale in an energybased neural network. We showed equivalence between minimising the energy of RBMs and satisfiability of Boolean formulae. We evaluated the system at learning and showed its effectiveness in comparison with stateoftheart approaches. As future work we shall analyse further the empirical results and seek to pinpoint the benefits of LBM at combining reasoning and learning at scale.
References
 [1] (1995) Machine intelligence 13. K. Furukawa, D. Michie, and S. Muggleton (Eds.), pp. 291–309. External Links: ISBN 0198538502, Link Cited by: §6.

[2]
(2017)
TensorLog: deep learning meets probabilistic dbs
. CoRR abs/1707.05390. External Links: Link Cited by: §2.  [3] (2020) Neurosymbolic ai: the 3rd wave. External Links: 2012.05876 Cited by: §1.
 [4] (2018) Learning explanatory rules from noisy data. JAIR 61, pp. 1–64. Cited by: §1, §2.
 [5] (2014) Fast relational learning using bottom clause propositionalization with artificial neural networks. Mach. Learning 94 (1), pp. 81–104. External Links: ISSN 15730565, Document, Link Cited by: §1, §1, §2, §6, §6.
 [6] (2001) Symbolic knowledge extraction from trained neural networks: a sound approach. Artif. Intel. 125 (1–2), pp. 155–207. Cited by: §2.
 [7] (19951201) Relating chemical activity to structure: an examination of ilp successes. New Generation Computing 13 (3). Cited by: §6.
 [8] (201203) Learning algorithms for the classification restricted boltzmann machine. J. Mach. Learn. Res. 13 (1), pp. 643–669. External Links: ISSN 15324435 Cited by: §6.
 [9] (2018) DeepProbLog: neural probabilistic logic programming. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett (Eds.), pp. 3749–3759. External Links: Link Cited by: §1, §2.
 [10] (2018) Deep learning: A critical appraisal. CoRR abs/1801.00631. External Links: Link, 1801.00631 Cited by: §1.
 [11] (2011) A neuralsymbolic cognitive agent for online learning and reasoning. In IJCAI, pp. 1653–1658. Cited by: §2.
 [12] (1995) Reasoning, nonmonotonicity and learning in connectionist networks that capture propositional knowledge. Artif. Intell. 77 (2), pp. 203–247. External Links: ISSN 00043702, Link, Document Cited by: §2, §3.
 [13] (200602) Markov logic networks. Mach. Learn. 62 (12), pp. 107–136. External Links: ISSN 08856125, Link, Document Cited by: §6.
 [14] (2016) Learning and reasoning with logic tensor networks. In AI*IA, pp. 334–348. External Links: ISBN 9783319491295, Link, Document Cited by: §2.
 [15] (1995) Constituent structure and explanation in an integrated connectionist/symbolic cognitive architecture. In Connectionism: Debates on Psychological Explanation, Cited by: §1, §3.
 [16] (1994) Mutagenesis: ilp experiments in a nondeterminate biological domain. In Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMDStudien, pp. 217–232. Cited by: §6.
 [17] (2007) The aleph system, version 5. Note: http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html Cited by: §1, §6.
 [18] (1994) Knowledgebased artificial neural networks. Artif. Intel. 70, pp. 119–165. Cited by: §1, §2.

[19]
(2018)
Deep logic networks: inserting and extracting knowledge from deep belief networks
. IEEE T. Neur. Net. Learning Syst. (29), pp. 246–258. External Links: Document, ISSN 2162237X Cited by: §1, §2.  [20] (2017) Differentiable learning of logical rules for knowledge base reasoning. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 2319–2328. External Links: Link Cited by: §1.