    # Necessary and Sufficient Condition for Satisfiability of a Boolean Formula in CNF and its Implications on P versus NP problem

In this paper, a necessary and sufficient condition for satisfiability of a boolean formula, in CNF, has been determined. It has been found that the maximum cardinality of satisfiable boolean formula increases exponentially, with increase in number of variables. Due to which, any algorithm require exponential time, in worst case scenario, depending upon the number of variables in a boolean formula, to check satisfiability of the given boolean formula. Which proves the non-existence of a polynomial time algorithm for satisfiability problem. As satisfiability is a NP-complete problem, and non-existence of a polynomial time algorithm to solve satisfiability proves exclusion of satisfiability from class P. Which implies P is not equal to NP. Further, the necessary and sufficient condition can be used to optimize existing algorithms, in some cases, the unsatisfiability of a given boolean function can be determined in polynomial time. For this purpose, a novel function has been defined, that can be used to determine cardinality of a given boolean formula, and occurances of a literal in the given formula, in polynomial time.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Boolean Satisfiability problem is a NP-complete problem.[Karp1972] It implies, all other NP-complete problems can be reduced to Boolean Satisfiability problem in polynomial time. So, if there exist an algorithm that can solve Boolean Satisfiability problem in polynomial time, then every other NP-complete problem can be solved in polynomial time. However, still there does not exist such algorithm, that has been proved to solve the problem in polynomial time. It has lead to formulation of P versus NP problem defined by Stephen Cook in [Cook2006]. History and importance of P versus NP problem has been discussed in detail in [Cook2006].

Various attempts have been made to create efficient algorithms and systems to solve Boolean Satisfiability problem.

In 1971, in [Cook1971], it was shown that any boolean formula, in CNF, can be converted into a formula with at most three literals in polynomial time, based on the assumption, that number of clauses in the given formula are of the polynomial length.

In 1992, in [Selman1992], a greedy local search procedure called GSAT, was introduced. It was shown that, GSAT can solve structural satisfiability problems quickly. However, for testing of the algorithm, input contained a considerably less number of clauses for the given number of variables, for e.g. formulas with 50 variables having only 215 clauses were used. Whereas, it has been found that, a satisfiable boolean formula, with 50 variables can have more than clauses.

In 2001, in [Moskewicz2001], development of a new complete solver, Chaff, has been described. It has been shown that, Chaff has been able to obtain one to two orders of magnitude performance improvement on difficult SAT benchmarks in comparison with other solvers. In experiments, in [Moskewicz2001], the benchmark problems were used, but again, these problems contained considerably less number of clauses for the given variables.

In 2018, in [Yin2018], used problems with 50 variables and 212 clauses for testing, which is again, contained considerably less clauses for the given variables.

In 1999, in [Friedgut1999], sharp thresholds for graph properties and -SAT problem were presented. -SAT is a special case of SATISFIABILITY problem defined in [Karp1972], and any boolean formula in -SAT is a special case of general boolean formula in CNF.

Most of the results presented in above works used much less number of clauses for a given number of variables. Further, no study has been able to establish a relationship between number of variables, number of clauses, and satisfiability of a general boolean formula in CNF.

In this paper, properties of clauses have been studied, novel relationships have been defined among clauses, and a necessary and sufficient condition has been established that determine satisfiability of any boolean formula in CNF. Further, it has been found that, any algorithm that solves Boolean Satisfiability problem, can be divided in two parts, one part generate possible solutions, which has exponential complexity and other part is similar to linear search. Thus combined complexity of any algorithm is of exponential order, which implies satisfiability cannot be solved in polynomial time, which implies . [Karp1972]

However, The necessary and suffient condition for satisfiability can be used to optimise existing algorithms, like DPLL[dpll], for improving complexity of best-case scenarios.

## 2 Boolean Satisfiability Problem

As defined in [Karp1972], For the given clauses , we need to find whether conjuction of the given clauses is satisfiable or not.

## 3 Terminology used

The terms literal, boolean variable, clause are used with same meaning as defined in [Heule2015]. A boolean formula in CNF is a conjuction of clauses. It can be represented as a finite set of clauses. [Heule2015] A set of boolean variables is called a variable set.

## 4 Notations used

The notations, representing basic relations between sets have been used as defined in [jech2013set].

### 4.1 Variable cases

Let a variable, X, can be assigned values , and , independently, then, it is written as:

 X=⎧⎨⎩x1x2x3

## 5 Tautology Clause

A clause, which evaluates to for every valuation, is called a tautology clause. If a clause contains a complemented pair of literals, it is a tautology.[Heule2015] In other words, If , then is a taulogy clause.

### 5.1 Significance of tautology clause in satisfiability problem

As a tautology clause always evaluates to , that is represented by 1 in boolean algebra. Let F is a boolean formula in CNF, which containins a tautology clause, we can write

 F=C1∧T (1)

where is a tautology clause.

 ⟹F=C1∧1

by using Identity property of Boolean algebra,

 ⟹F=C1 (2)

Hence, The tautology clause has no effect on satisfiablity of a boolean formula, So, It can be ignored while solving satisfiability problem.

### 5.2 Non-Tautology Clause

A clause which is not a tautology is called a non-tautology clause.

###### Lemma 1

If is a non-tautology clause, then .

###### Proof

Given that, is not a tautology clause. Let, for the sake of contradiction,

 ∃x∈N|¬x∈N

N is a tautology clause(from definition), which is not true. So, our assumption is wrong. Hence,

 ∀x∈N⟹¬x∉N

###### Lemma 2

If is a clause, with n literals, such that, then

###### Proof

Given that, is a clause, by definition, is a disjunction of literals, so, we can write,

 C=(x1∨x2∨⋯∨xi∨⋯∨xn)

also, given that, , so we can write,

 C=(x1∨x2∨⋯∨1∨⋯∨xn)

by using dominance law of boolean algebra,

 ⟹C=1

Hence proved.

###### Lemma 3

If is a clause, with n literals, such that, then

 xi=0∀xi∈C

###### Proof

Given that, is a clause, by definition, is a disjunction of literals, so, we can write,

 C=(x1∨x2∨⋯∨xn)

also, given that,

 C=0(false)

Suppose, for the sake of contradiction, for some . Using Lemma 2, we get,

 C=1

which is a contradiction, Therefore, the assumption, is not true, hence,

 xi=0∀xi∈C (3)

Hence proved.

###### Theorem 1

If and are clauses, such that,

 D⊆C

and

 C=0

then

 D=0

###### Proof

Given that,

 D⊆C

and

 C=0 (4)

using Lemma 3,

 xi=0∀xi∈C (5)

As ,

 ∀x∈D⟹x∈C (6)

From Eq. 5 and Eq. 6 we have,

 xj=0∀xj∈D

We can write,

 D=(0∨0∨0∨⋯∨0)
 ⟹D=0 (7)

Hence proved.

## 6 Clause over a variable set

A non-tautology clauses, , is called a clause over a variable set, , if,

 (∀x)(x∈Cor¬x∈C⟹x∈V)

For e.g. clauses, and are clauses over variable set,

### 6.1 Fully Populated Clause over a variable set

A clause, , is called a fully populated clause over a variable set, , if

 (∀x)(x∈V⇔x∈Cfullor¬x∈Cfull)

For e.g. clause is a fully populated clause over variable set,

###### Lemma 4

If is a clause over a variable set, , then, , such that, is a fully populated clause over .

###### Proof

Given that, is a clause over a variable set, , from definition,

 ⟹(∀x)(x∈Cor¬x∈C⟹x∈V) (8)

We define a variable set,

 Vsub={x|x∈Cor¬x∈C} (9)
 ⟹(∀x)(x∈Vsub⇔x∈Cor¬x∈C)

also, from Eq. 8 and Eq. 9

 ∀x∈Vsub⟹x∈V
 ⟹Vsub⊆V

Hence, , such that, is a fully populated clause over variable set, .

###### Lemma 5

If is a fully populated clause over a variable set, , then , , such that, is a fully populated clause over variable set

###### Proof

Given that, is a fully populated clause over a variable set,

 ⟹(∀x)(x∈V⇔x∈Cor¬x∈C)

Suppose,

 ⟹∀x∈D⟹x∈C
 ⟹(∀x)(x∈Dor¬x∈D⟹x∈V)

is a clause over .

From Lemma 4, , such that, is a fully populated clause over

Hence, , , such that, is a fully populated clause over

###### Lemma 6

For any given valuation to a variable set, , there exist a fully populated clause, say , over , such that,

 Ck=0(false)

###### Proof

Let, the variable set, , is given by,

 V={x1,x2,…,xn}

where

 xi={01∀xi∈V

Let, each has been assigned any of the values given above.

Now, we define a clause, , depending upon the valuation assigned above,

 Ck={y|y={¬xiifxi=1xiifxi=0∀xi∈V}
 ⟹(∀x)(x∈V⇔x∈Ckor¬x∈Ck)

is a fully populated clause. and,

By putting values assigned for variables in , in , we get,

 xi=0∀xi∈Ck
 ⟹Ck=0(false)

Hence, for any given valuation to the variable set, , there exists a fully populated clause, , such that,

###### Theorem 2

For a given set of variables, , with n variables, there exist fully populated clauses.

###### Proof

For a given set of variables, , with n variables, we can write a fully populated clause in the general form, given by,

 C={x1,x2,…,xn}

where

 xi={xi¬xi

i.e. each can be assigned a value in two ways, independently. As there are number of variables, in a clause. So, by using basic principle of counting, there are ways, in which, a clause can be selected. Hence, for a given variable set, , with variables, there exist number of fully populated clauses.

## 7 Sibling Clause

Two unequal fully populated clauses over a common variable set, , are called sibling clauses. In other words, If and are two non-tautology clauses, such that:

then is a sibling clause of and vice-versa. For e.g. and are sibling clauses over a variable set, .

###### Lemma 7

If A and B are two sibling clauses, and , then .

###### Proof

Given that and are sibling clauses.

 ⟹∃xi∈A|¬xi∈B (10)

also, given that,

 A=0

Using Lemma 3,

 ⟹xi=0∀xi∈A

from equation Eq. 10,

 ⟹∃¬xi∈B|xi=0

put

 ⟹∃y∈B|y=1

from Lemma 2,

 B=1

Hence proved.

###### Lemma 8

If and are two sibling clauses, over a variable set, , and and are power sets of and , respectively, then

 ∀D∈P(Ci)|D∉P(Cj)
 ∃E∈P(Cj)

such that, and are sibling clauses.

###### Proof

Given that, and are sibling clauses over a variable set, . It implies, by definition of sibling clauses, and are fully populated clauses over .

From Lemma 5,

 ∀D⊆Ci,∃Vsub⊆V (11)

such that, is a fully populated clause over . Or we can write,

 ∀D∈P(Ci),∃Vsub⊆V (12)

such that, is a fully populated clause over .

Now, We define a set, ,

 E={y|y={x,ifx∈Cj¬xif¬x∈Cjandx∈Vsub} (13)
 ⟹∀x∈E⟹x∈Cj
 ⟹E⊆Cj (14)

as is a fully populated clause over , and ,

 (∀x)(x∈V⇔x∈Cjor¬x∈Cj)
 ⟹∀x∈Vsub⟹x∈Cjor¬x∈Cj
 ⟹(∀x)(x∈Vsub⇔x∈Eor¬x∈E)

Thus, is a fully populated clause over .

From Eq. 12 and Eq. 14,

 ∀D∈P(Ci),∃E∈P(Cj) (15)

such that, and are fully populated clauses over a common variable set, .

 ⟹(∀x)(x∈Vsub⇔x∈Dor¬x∈D⇔x∈Eor¬x∈E) (16)

Now, there can be two cases, either or ,

Suppose,

from Eq. 14,

 ⟹D∈P(Cj)

But, given that,

 ⟹D≠E

Thus, from Eq. 16, and are two unequal fully populated clauses over a common variable set,

and are sibling clauses.

Hence,

 ∀D∈P(Ci)|D∉P(Cj)
 ∃E∈P(Cj)

such that, and are sibling clauses.

## 8 Cardinality of a Boolean formula in CNF

As we know that, a clause is a set of literals. For a variable , there are two literals, i.e. and . Let us represent each literal as . So, for each variable , there are two literals and . For a variable set, , of variables, there are literals. So, a general clause in a boolean formula, can be written in the form, given by,

 C={l1,l2…,ln,ln+1,ln+2,…,l2n}

where,

 li={linull∀li∈C

As each can be selected in two ways, independently, and there are literals in . So, using fundamental counting principle, the total number of clauses possible are given by,

 |Fgen|=2×2×2×…2ntimes
 ⟹|Fgen|=22n

Hence, maximum possible cardinality of a boolean formula in CNF, is , including a null clause, .

## 9 Boolean formula in effective CNF

A boolean formula in CNF, given by, , is called a boolean formula in effective CNF, if it does not contain a tautology clause. We can write,

 ∀C∈F⟹Cisnotatautology (17)

Or, is non-tautology clause. As discussed in Section 5.1 , a tautology clause has no effect on satisfiability of a CNF. So, for any given boolean formula, if we can identify tautology clauses, and ignore their existence, we can get an effective CNF.

## 10 A complete boolean formula

A boolean formula, , containing every possible non-tautology clause, over a set of variables, , including clause, is called a complete boolean formula. For eg. for variable set, , the complete boolean formula is given by,

 F2=(x1∨x2)∧(x1∨¬x2)∧(¬x1∨x2)∧(¬x1∨¬x2)∧(x1)∧(¬x1)∧(x2)∧(¬x2)∧ϕ

where is a null clause.

In sets notation, it can be written as:

 F2={{x1,x2},{x1,¬x2},{¬x1,x2},{¬x1,¬x2},{x1},{¬x1},{x2},{¬x2},ϕ}
###### Theorem 3

If is a complete boolean formula, over a variable set, , of variables, then contains clauses, including a clause.

###### Proof

Given, that is a complete boolean formula, over , and contains variables.

 ⟹|V|=n

From the definition of a complete boolean formula, we know that,

 ∀C∈Fn⟹Cisanon−tautologyclause
 ⟹∀x∈C⟹¬x∉C

We can write, a general clause in as,

 C=(X1,X2,X3,…Xn) (18)

where,

 Xi=⎧⎨⎩xi¬xinull

is a variable, which can be assigned the values or independently. A value for means, the clause , neither contain nor .

Now, Each can be assigned a value in 3 different ways. By using fundamental counting principle, the total number of clauses possible is given by,

 n(C)=3×3×3…ntimes⟹n(C)=3n

Also, there will be a clause in which, is assigned value . It will result in a clause, given by, . Hence, contains clauses, including a clause.

###### Corollary 1

If is a complete boolean formula, over a variable set, , with variables, then, for any given variable ,

 n(xi)=n(¬xi)=n(xi−null)=3n−1

where, is number of clauses containing ,

is number of clauses containing

is number of clauses containing neither nor

###### Proof

As explained in Theorem 3, for the given complete boolean formula , with n clauses, we can write a clause in the form, given by,

 C=(X1,X2,X3,…Xn) (19)

where,

 Xi=⎧⎨⎩xi¬xinull

Now, suppose, we put for some in Eq. 19, we get,

 C=(X1,X2,X3,…xi,…Xn)
 C=(xi,X1,X2,X3,…Xn)

We have assigned the value to one of the variables. There are variables remaining, to which, we can assign values independently. Each variable can be assigned 3 values independently. Thus, by using the fundamental counting principle, the total number of clauses with is given by,

 n(xi)=3×3×3…n−1times
 n(xi)=3n−1

Similarily, by assigning and we find

 n(¬xi)=3n−1

and

 n(xi−null)=3n−1

Hence, we get,

 n(xi)=n(¬xi)=n(xi−null)=3n−1

###### Corollary 2

A complete boolean formula, can be written as:

 Fn=P(C1)∪P(C2)∪⋯∪P(Cp)

where is set of all poosible fully populated clauses over a set of variables, .

###### Proof

As explained in Theorem 3, for the given complete boolean formula , with n clauses, we can write a clause in the form, given by,

 C=(X1,X2,X3,…Xn) (20)

where,

 Xi=⎧⎨⎩xi¬xinull

But, First, if we assign

 Xi={xi¬xi∀Xi∈C

We get a set of all fully populated clauses over , say , given by,

 Ffull={C1,C2,…,Cp}

Then, we assign, for any

 Xi={x|x∈Cinull

We get power set of clause . By assigning vaues, as above, , we get all possible clauses over . Thus, we can write:

 Fn=P(C1)∪P(C2)∪⋯∪P(Cp) (21)

Hence proved.

###### Theorem 4

If is a powerset of C, where C is a fully populated clause, over a variable set, , with variables, then,

 n(xi)=n(xi−null)=2n−1∀xi∈C

where, is number of clauses containing , in and
is number of clauses not containing , in

###### Proof

Given that, is a fully populated clause, over a variable set, , with variables, and is a power set of . Let . We can write, , in general form,

 D=(X1,X2,X3…Xn)

where,

 Xi={xixi∈Cnull

Now, if we put for some , we get,

 D=(X1,X2,X3…xi…Xn)
 D=(xi,X1,X2,X3…Xn)

We have assigned the value to one of the variables. There are variables re- maining, to which, we can assign values independently. Each variable can be assigned 3 values independently. Thus, by using the fundamental counting principle, the total number of clauses with is given by,

 n(xi)=2×2×2…(n−1)times
 n(xi)=2n−1

Similarily, by assigning we find

 n(xi−null)=2n−1

Hence, we get,

 n(xi)=n(xi−null)=2n−1

###### Theorem 5

If there exists a fully populated clause, over , such that,

 F=Fn∖P(Ck)

where, is a complete boolean formula over , then, is satisfiable.

###### Proof

Given that,

 F=Fn∖P(Ck)

Suppose is any clause in , i.e.

 ⟹D∈Fn∖P(Ck)
 ⟹D∈Fn|D∉P(Ck)

Let is a set of all fully populated clauses over , then from Corollary 2

 ⟹D∈P(C1)∪P(C2)∪⋯∪P(Cp)|D∉P(Ck)

As, are unequal fully populated clauses over , which implies, from the definition of sibling clauses, are sibling clauses, including .

Let , where is any clause in but

 ⟹D∈P(Ci)|D∉P(Ck)

from Lemma 8

 ∃E∈P(Ck)

such that, and are sibling clauses.

As is any clause in

 ⟹∀D∈F,∃E∈P(Ck)

such that, and are sibling clauses.

As, is a fully populated clause, so for a valuation, given by,

 Ck=0(false)

from Theorem 1

 ∀E⊆Ck⟹E=0(false)
 ⟹∀E∈P(Ck)⟹E=0(false)

As and are sibling clauses, from Lemma 7

 ⟹∀D∈F⟹D=1(true)

for valuation

is satisfiable.

###### Theorem 6

If is satisfiable, and , then is satisfiable.

###### Proof

Given that,

 Fsub⊆F
 ⟹∀C∈Fsub⟹C∈F

As is satisfiable. It implies, there exists a valuation, such that,

 ⟹∀C∈Fsub⟹C=1(true)

Hence, is satisfiable.

###### Theorem 7

If is satisfiable, then, there exists a fully populated clause, , such that,

 ∀E∈P(Ck)⟹E∉F

###### Proof

Given that, is satisfiable.

Now, suppose, for the sake of contradiction, that, there does not exist a fully populated clause, , such that,

 ∀E∈P(Ck)⟹E∉F
 ⟹∀Ck,∃E∈P(Ck)|E∈F

From Lemma 6, for any valuation, to the variable set, ,

for any valuation, to the variable set, , in which,

 ∃E∈P(Ck)|E∈F

, from Theorem 1, for any valuation, to the variable set, , in which,

for any valuation, to the variable set, ,

 ∃E∈F|E=0

is unsatisfiable. Which is a contradiction, so our assumption was wrong. Hence, there exist a fully populated clause, , such that,

 ∀E∈P(Ck)⟹E∉F

Hence proved.

###### Theorem 8

is satisfiable, if and only if, there exists a fully populated clause, , such that,

 ∀E∈P(Ck)⟹E∉F

###### Proof

Let is satisfiable, from Theorem 7, it implies,

 ∀E∈P(Ck)⟹E∉F

Conversally, Let there exist a fully populated clause, , such that,

 ∀E∈P(Ck)⟹E∉F

Let is a complete boolean formula, over , then we can write,

 F=Fn∖P(Ck)

From Theorem 5,we get, is satisfiable.

## 11 Time complexity of an algorithm to solve boolean satisfiability problem

###### Theorem 9

The time complexity of an algorithm to solve boolean satisfiability problem is

###### Proof

The Theorem 8 provides us the necessary and sufficient condition for satisfiability of a boolean formula. Which states that, a boolean formula is satisfiable, iff there exist a fully populated clause, , such that, the clause, and all it’s subsets are absent in the given boolean formula.

And, for a valuation, in which, , makes that formula saisfiable. So, an algorithm is not required to check for absence of each subset of , individually, as all subsets are related to each other by the results of Theorem 1. However, if an algorithm process individual clauses in a given boolean formula, it would require to process clauses in the order of to in worst case scenarios. But, the existence of each fully populated clause is independent of existence of the other clauses. So, any algorithm shall require to search for a fully populated clause, , from the list of all possible fully populated clauses for the variable set, , over which the boolean formula has been defined. Which implies that, boolean satisfiability problem is basically a searching problem.

Further, the input of the satisfiability problem is not mentioned to be in sorted form.[Karp1972] However, the algorithm may attempt to sort the input, but it will lead to additional complexity of , where is the number of elements in the list. From Theorem 2, we know, that number of fully populated clauses, for a variable set, , of variables, is . Which implies, there will be additional complexity of or . Which implies, it can be solved using linear search only. And time complexity of linear search is , where is the number of items in the list. From Theorem 2, we know, that number of fully populated clauses, for a variable set, , of variables, is . Which implies, the time complexity of an algorithm to solve boolean satisfiability problem is . Hence proved.

## 12 Implications on P vs NP Problem

In [Karp1972] it has been established that SATISFIABILITY is a NP-complete problem. In Theorem 9, it has been proved that, the time complexity of an algorithm to solve boolean satisfiability problem is , which is not polynomial. Which implies,

 SATISFIABILITY∉P

From Corollary 1. in [Karp1972]

 ⟹P≠NP

## 13 Cardinality Function

Cardinality function can be used to determine cardinality of a given boolean formula. It can be used to optimise existing algorithms. For a given boolean formula in CNF, we define a function where, and , by replacing disjunction with multiplication and conjuction with addition. For e.g. Let the boolean formula, , is given by,

 F=(x1∨x2∨¬x3)∧(¬x1∨x2∨x3) (22)

We define the function, , given by:

 f(X,Xc)=(x1\*x2\*¬x3)+(¬x1\*x2\*x3) (23)

In general, can be defined as:

 f(X,Xc)=p∑i=1Mi∀i∈[1,p] (24)

where, p is number of clauses in F and

 Mi=∏xj∀xj∈Ci (25)

where, is clause in the given boolean formula, . It is to be noted that is a function on integers, i.e.

The algorithm for for the boolean formula given in Eq. 22 is given below:

### 13.1 Total number of clauses

Following algorithm can be used to check total number of clauses in

### 13.2 Total number of clauses containing xi

Following algorithm can be used to check total number of clauses, containing in

### 13.3 Total number of clauses containing ¬xi

Following algorithm can be used to check total number of clauses, containing in

### 13.4 Total number of clauses containing ¬xi

Following algorithm can be used to check total number of clauses, containing or in

### 13.5 Checking tautology clauses in F

The following algorithm can be used to check existence of tautology clauses, in a given formula, , in polynomial time.

## 14 Optimisations

The results produced from above theorems and algorithms provided above using novel cardinality function can be used to optimise any algorithm that solve satisfiability problem.

From Section 8, we know that maximum possible cardinality of a boolean formula in CNF, is , including a null clause, .

We know that cardinality of a power set is ,

 |P(Ck)|=2n

where, , as is a fully populated clause.

From Theorem 8, we know is satisfiable. However, if contains any more clause,

 ⟹∃D∈F|D∈P(Ck)

is unsatisfiable.

So, we can use following constraint for optimisation:

• If then is unsatisfiable.

If is in effective CNF, whcih can be checked using algorithm given in Algorithm 6, then we have:

• If then is unsatisfiable.

• If then is unsatisfiable.

• If then , belong to the solution

• If then , belong to the solution

## 15 Conclusion

A necessary and sufficient condition has been established to determine satisfiability of a boolean formula in CNF. It has been found that a satisfiable boolean formula with variables, can have clauses. This property can be used to improve encryption algorithms.[Cook2006] While, same condtion can be used to optimise existing algorithms to solve Boolean Satisfiabilty problem, which has applications in automatic theorem proving procedures.[dpll]