Towards Bit-Width-Independent Proofs in SMT Solvers

05/24/2019 ∙ by Aina Niemetz, et al. ∙ 0

Many SMT solvers implement efficient SAT-based procedures for solving fixed-size bit-vector formulas. These approaches, however, cannot be used directly to reason about bit-vectors of symbolic bit-width. To address this shortcoming, we propose a translation from bit-vector formulas of non-fixed bit-width to formulas in a logic supported by SMT solvers that includes non-linear integer arithmetic, uninterpreted functions, and universal quantification. While this logic is undecidable, this approach can still solve many formulas by capitalizing on advancements in SMT solving for non-linear arithmetic and universally quantified formulas. We provide several case studies in which we have applied this approach with promising results, including the bit-width independent verification of invertibility conditions, compiler optimizations, and bit-vector rewrites.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Satisfiability Modulo Theories (SMT) solving for the theory of fixed-size bit-vectors has received a lot of interest in recent years. Many applications rely on bit-precise reasoning as provided by SMT solvers, and the number of solvers that participate in the corresponding divisions of the annual SMT competition is high and increasing. Although theoretically difficult (e.g., [14]), bit-vector solvers are in practice highly efficient and typically implement SAT-based procedures. Reasoning about fixed-size bit-vectors suffices for many applications. In hardware verification, the size of a circuit is usually known, and in software verification, machine integers are treated as fixed-size bit-vectors, where the width depends on the underlying architecture. Current solving approaches, however, do not generalize beyond this limitation, i.e., they cannot reason about parametric circuits or machine integers of arbitrary size.

In this paper we focus on bit-vector formulas with parametric bit-width. In essence, we replace the translation from fixed-size bit-vectors to propositional logic (which is at the core of state-of-the-art bit-vector solvers) with a translation to the quantified theories of integer arithmetic and uninterpreted functions. We use a completely automated verification process capitalizing on recent advances in SMT solving for these theories. The reliability of our approach depends on the correctness of the SMT solvers in use. Interactive theorem provers such as Isabelle and Coq [19, 27], on the other hand, target applications where trust is of higher importance than automation, although substantial progress towards incorporating automation has been made in recent years [5]. Our long-term goal is an efficient automated framework for proving bit-width independent properties in a trusted proof assistant, which requires both a formalization of such properties in a proof assistant and the development of efficient automated techniques to reason about these properties. This work shows that state-of-the-art SMT solving combined with our encoding techniques make the latter feasible. The next steps towards this goal are described in the final section of this paper.

Translating a formula from the theory of fixed-size bit-vectors to the theory of integer arithmetic is not straightforward. This is due to the fact that the semantics of bit-vector operators are defined modulo the bit-width, which must be expressed using exponentiation terms (with as the bit-width) when translated to integer arithmetic. Most SMT solvers, however, do not support unrestricted exponentiation. Further, bit-wise operators such as bit-wise , , and left and right shifts do not have a natural representation in integer arithmetic. These operations, as well as concatenation and extraction, seem more suitable for the theory of strings, whose combination with quantified formulas is not well supported by current SMT solvers. Although these operators are definable in the theory of integer arithmetic using -function encoding (e.g. [10]), such a translation is expensive as it requires an encoding of sequences into natural numbers. Instead, we introduce an uninterpreted function (UF) for each of these operators and axiomatize them, which shifts some of the burden from integer arithmetic to UF reasoning. We consider two approaches for axiomatization: a complete axiomatization using induction, and a partial (hand-crafted) axiomatization as an under-approximation.

To evaluate the potential of our approach, we examine three case studies that arise from real applications where reasoning about bit-width independent properties is essential. Niemetz et al. [18] defined invertibility conditions for bit-vector operators. The authors utilized these conditions to solve quantified bit-vector formulas and verified their correctness up to bit-width . As a first case study, we consider the bit-width independent verification of these invertibility conditions, which [18] left to future work. As a second case study, we examine the bit-width independent verification of compiler optimizations of LLVM. For that, we use the Alive tool [16], which generates verification conditions for such optimizations in the theory of fixed-size bit-vectors. Proving the correctness of these optimizations for arbitrary bit-widths ensures their correctness for any language and underlying architecture rather than specific ones. As a third case study, we consider the bit-width independent verification of rewrite rules for the theory of fixed-size bit-vectors. SMT solvers for this theory heavily rely on such rules to simplify the input. Verifying their correctness is essential and typically done by hand.

To summarize, this paper makes the following contributions.

  • In Section 3, we study complete and incomplete encodings of bit-vector formulas with parametric bit-width using integer arithmetic.

  • In Section 4, the effectiveness of each approach is evaluated in three case studies.

  • As part of the invertibility condition case study in Section 4, we introduce conditional inverses for bit-vector constraints, thus augmenting the invertibility conditions from [18] with more concrete parametric solutions.

Symbol SMT-LIB Syntax Sort
, =, distinct
, , , bvult, bvugt, bvslt, bvsgt
, , , bvule, bvuge, bvsle, bvsge
, bvnot, bvneg
, , bvand, bvor, bvxor
, , bvshl, bvlshr, bvashr
, , , bvadd, bvmul, bvurem, bvudiv
extract ()
concatenation
Table 1: Considered bit-vector operators with SMT-LIB 2 syntax.
Related Work

Bit-width independent bit-vector formulas were studied in [21], where a formal language for bit-vectors of parametric width is introduced, along with a semantics and a decision procedure. The formal language that we use here can be seen as a simplified variant of that language. A unification-based algorithm for fixed-sized bit-vectors was introduced in [4], and was elevated to handle symbolic lengths for some common cases. Bit-width independent formulas are related to parametric Boolean functions and circuits. An inductive approach for reasoning about such formalisms was developed in [12, 11], by considering a Boolean function for the base case of a circuit and a Boolean function for its inductive step. Reasoning about equivalence of such circuits can be embedded in the framework of parametric width bit-vectors (see [21]).

2 Preliminaries

We briefly review the usual notions and terminology of many-sorted first-order logic with equality (denoted by ). See [10, 28] for more detailed information. Let be a set of sort symbols, and for every sort , let  be an infinite set of variables of sort . We assume that sets are pairwise disjoint and define as the union of sets . A signature consists of a set of sort symbols and a set of function symbols. Arities of function symbols are defined in the usual way. Constants are treated as -ary functions. We assume that includes a Boolean sort and the Boolean constants (true) and (false). Functions returning are also called predicates.

We assume the usual definitions of well-sorted terms, literals, and formulas, and refer to them as -terms, -literals, and -formulas, respectively. We define as a tuple of variables and write with for a quantified formula . For a -term or -formula , we denote the free variables of (defined as usual) as and use to denote that the variables in occur free in . For a tuple of -terms and a tuple of -variables , we write for the term or formula obtained from by simultaneously replacing each occurrence of in by .

A -interpretation maps: each to a distinct non-empty set of values (the domain of in ); each to an element ; and each to a total function if , and to an element in if . We use the usual inductive definition of a satisfiability relation between -interpretations and -formulas.

A theory  is a pair , where is a signature and is a non-empty class of -interpretations that is closed under variable reassignment, i.e., if interpretation only differs from an in how it interprets variables, then also . A -formula is -satisfiable (resp. -unsatisfiable) if it is satisfied by some (resp. no) interpretation in ; it is -valid if it is satisfied by all interpretations in . We will sometimes omit when the theory is understood from context.

The theory of fixed-size bit-vectors as defined in the SMT-LIB 2 standard [3] consists of the class of interpretations and signature , which includes a unique sort for each positive integer  (representing the bit-vector width), denoted here as . For a given positive integer , the domain of sort in is the set of all bit-vectors of size . We assume that includes all bit-vector constants of sort for each , represented as bit-strings. However, to simplify the notation we will sometimes denote them by the corresponding natural number in . All interpretations are identical except for the value they assign to variables. They interpret sort and function symbols as specified in SMT-LIB 2. All function symbols in are overloaded for every . We denote a -term (or bit-vector term of width as when we want to specify its bit-width explicitly. We refer to the -th bit of as with . We interpret as the least significant bit (LSB), and as the most significant bit (MSB), and denote bit ranges over from index down to as . The unsigned interpretation of a bit-vector as a natural number is given by , and its signed interpretation as an integer is given by .

Without loss of generality, we consider a restricted set of bit-vector function and predicate symbols (or bit-vector operators) as listed in Table 1. The selection of operators in this set is arbitrary but complete in the sense that it suffices to express all bit-vector operators defined in SMT-LIB 2. We use () for the maximum or minimum signed value of width , e.g., and .

The theory of integer arithmetic is also defined as in the SMT-LIB 2 standard. The signature includes a single sort , function and predicate symbols , and a constant symbol for every integer value. We further extend to include exponentiation, denoted in the usual way as . All interpretations are identical except for the values they assign to variables. We write to denote the (combined) theory of uninterpreted functions with integer arithmetic. Its signature is the union of the signature of with a signature containing a set of (freely interpreted) function symbols, called uninterpreted functions.

2.1 Parametric Bit-Vector Formulas

We are interested in reasoning about (classes of) -formulas that hold independently of the sorts assigned to their variables or terms. We formalize the notion of parametric -formulas in the following.

In the remainder of the paper, we fix sets and of variable and constant symbols (respectively) of bit-vector sort. We use the symbols in (resp. ) for representing variables (resp. constants) of bit-vector sort whose bit-width is not fixed. The bit-width is instead represented using a map , which maps from symbols to -terms, where we refer to as the symbolic bit-width assigned by to . We also define the map from symbols to -terms, where we call the symbolic value assigned by to . Let , and let be the set of (integer) free variables occurring in the range of either or . We say that is legal if for every interpretation that interprets each variable in as a positive integer, and for every , also interprets as a positive integer.

Let be a formula built from the symbols of and , ignoring their sorts. We refer to as a parametric -formula. We may interpret as a class of fixed-size bit-vector formulas as follows. For each symbol and positive integer , we associate a unique variable of (fixed) bit-vector sort . Given a legal with and an interpretation that maps each variable in to a positive integer, let be the result of replacing all symbols in by the corresponding bit-vector variable and all symbols in by the bit-vector constant corresponding to of sort , where in both cases is the value of . We say a formula is well-sorted under if is legal and is a well-sorted -formula for all mapping variables in to positive integers.

Example 1

Let be the set and be the set where and . Let be the formula . We have that is well-sorted under where or . It is not well-sorted when since is not a well-sorted -formula whenever . An where is not legal, since is possible even when and .

Notice that symbolic constants such as the maximum unsigned constant of a symbolic length can be represented by introducing with and . Furthermore, recall that signature includes the bit-vector extract operator, which is parameterized by two natural numbers and . We do not lift the above definitions to handle extract operations having symbolic ranges, e.g. where and are -terms. This is for simplicity and comes at no loss of expressive power since constraints involving extract can be equivalently expressed using constraints involving concatenation. For example, showing that every instance of a constraint holds where is equivalent to showing that holds for all where have sorts , , , respectively. We may reason about a formula involving a symbolic range of by considering a parametric bit-vector formula that encodes a formula of the latter form, where the appropriate symbolic bit-widths are assigned to symbols introduced for .

We assume the above definitions for parametric -formulas are applied to parametric -terms as well. Furthermore, for any legal , we assume can be extended to terms of bit-vector sort that are well-sorted under such that has sort for all mapping variables in to positive integers. Such an extension of to terms can be easily computed in a bottom-up fashion by computing for each child and then applying the typing rules of the operators in . For example, we may assume if is of the form and is well-sorted under , and if is of the form .

Finally, we extend the notion of validity to parametric bit-vector formulas. Given a formula that is well-sorted under , we say is -valid under if is -valid for all mapping variables in to positive integers.

3 Encoding Parametric Bit-Vector Formulas in SMT

Current state-of-the-art SMT solvers do not support reasoning about parametric bit-vector formulas. In this section, we present a technique for encoding such formulas as formulas involving non-linear integer arithmetic, uninterpreted functions and universal quantifiers. In SMT parlance, these are formulas in the UFNIA logic. Given a formula  that is well-sorted under some mapping , we describe this encoding in terms of a translation , which returns a formula such that if is valid in the theory of uninterpreted functions with integer arithmetic, then is -valid under . We describe several variations on this translation and discuss their relative strengths and weaknesses.

3.0.1 Overall Approach

At a high level, our translation produces an implication whose antecedent guards the ranges of the integer variables to be in the correct ranges (e.g., for every bit-width variable ), and whose conclusion is the result of converting each (parametric) bit-vector term of bit-width to an integer term. Operations on parametric bit-vector terms are converted to operations on the integers modulo , where need not be a constant. We first introduce necessary uninterpreted functions that will be used in our translation. Note that SMT solvers do not support the full set of functions in our extended signature , since they do not support exponentiation. The translation requires a limited form of exponentiation, . We thus introduce an uninterpreted function symbol of sort , whose intended semantics is two raised to the power of its argument (when its argument is non-negative). Second, for each (non-predicate) -ary function of sort in the signature of fixed-size bit-vectors (excluding bit-vector extraction), we introduce an uninterpreted function of arity and sort , where the extra argument is used to specify the bit-width. For example, for with sort , we introduce of sort . In its intended semantics, this function adds the second and third arguments, both integers, and returns the result modulo , where is the first argument. The signature contains one function, bit-vector concatenation , whose two arguments may have different sorts. For this case, the first argument of indicates the bit-width of the third argument, i.e., is interpreted as the concatenation of and , where is an integer that encodes a bit-vector of bit-width ; the bit-width for is not specified by an argument, as it is not needed for the elimination of this operator that we perform later. We introduce uninterpreted functions for each bit-vector predicate in a similar fashion. For instance, has sort and encodes whether its second argument is greater than or equal to its third argument, when these two arguments are interpreted as unsigned bit-vector values whose bit-width is given by its first argument. Depending on the variation of the encoding, our translation will either introduce quantified formulas that fully axiomatize the behavior of these uninterpreted functions or add (quantified) lemmas that state key properties about them, or both.

(, ):

  1. Return .

Conv (, ):

  1. Match :

  2. if
    if

Elim ():

  1. Match :

  2. otherwise
Figure 1: Translation for parametric bit-vector formulas, parametrized by axiomatization mode . We use as shorthand for .

3.0.2 Translation Function

Figure 1 defines our translation function , which is parameterized by an axiomatization mode . Given an input formula that is well-sorted under , it returns the implication whose antecedant is an axiomatization formula and whose conclusion is the result of converting to its encoded version via the conversion function Conv. The former is dependent upon the axiomatization mode which we discuss later. We assume without loss of generality that contains no applications of bit-vector extract, which can be eliminated as described in the previous section, nor does it contain concrete bit-vector constants, since these can be equivalently represented by introducing a symbol in with the appropriate concrete mappings in and .

In the translation, we use an auxiliary function Conv which converts parametric bit-vector expressions into integer expressions with uninterpreted functions. Parametric bit-vector variables (that is, symbols from ) are replaced by unique integer variables of type , where we assume a mapping maintains this correspondence, such that range of does not include any variable that occurs in . Parametric bit-vector constants (that is, symbols from set ) are replaced by the term . The ranges of the maps in may contain arbitrary -terms. In practice, our translation handles only cases where these terms contain symbols supported by the SMT solver, as well as terms of the form , which we assume are replaced by during this translation. For instance, if and , then returns . Equalities are processed by recursively running the translation on both sides. The next case handles symbols from the signature , where symbols are replaced with the corresponding uninterpreted function . We take as the first argument , indicating the symbolic bit-width of the last argument of , and recursively call Conv on . In all cases, corresponds to the bit-width that the uninterpreted function expects based on its intended semantics (the bit-width of the second argument for bit-vector concatenation, or of an arbitrary argument for all other functions and predicates). Finally, if the top symbol of is a Boolean connective we apply the conversion function recursively to all subchildren.

We run Elim for all applications of uninterpreted functions introduced during the conversion, which eliminates functions that correspond to a majority of the bit-vector operators. These functions can be equivalently expressed using integer arithmetic and . The ternary addition operation , that represents addition of two bit-vectors with their width specified as the first argument, is translated to integer addition modulo . Similar considerations are applied for and . For and , our translation handles the special case where the second argument is zero, where the return value in this case is the maximum value for the given bit-width, i.e. . The integer operators corresponding to unary (arithmetic) negation and bit-wise negation can be eliminated in a straightforward way. The semantics of various bitwise shift operators can be defined arithmetically using division and multiplication with . Concatenation can be eliminated by multiplying its first argument by , where recall is the bit-width of the second arugment . In other words, it has the effect of shifting left by bits, as expected. The unsigned relation symbols can be directly converted to the corresponding integer relation. For the elimination of signed relation symbols we use an auxiliary helper (unsigned to signed), defined in Figure 1 which returns the interpretation of its argument when seen as a signed value. The definition of can be derived based on the semantics for signed and unsigned bit-vector values in the SMT LIB standard. Based on this definition, we have that integers and that encode bit-vectors of bit-width satisfy if and only if .

As an example of our translation, let , , , and from Example 1. is . After simplification, running Elim results with the formula .

Thanks to Elim, we can assume that all formulas generated by Conv contain only uninterpreted function symbols in the set . Thus, we restrict our attention to these symbols only in our axiomatization , described next.

Table 2: Full axiomatization of , , and . The axiomatization of is omitted, and is dual to that of . We use for .
axiom
base cases
weak monotonicity
strong monotonicity
modularity
never even
always positive
div 0
base case
max
min
idempotence
contradiction
symmetry
difference
range
base case
zero
one
symmetry
range
Table 3: Partial axiomatization of , , and . The axioms for are omitted, and are dual to those for . We use for .

3.0.3 Axiomatization Modes

We consider four different axiomatization modes , which we call , , and (quantifier-free). For each of these axiomatizations, we define as the conjunction:

The first conjunction states that all integer variables introduced for parametric bit-vector variables reside in the range specified by their bit-width. The second conjunction states that all free variables in (denoting bit-widths) are positive. The remaining four conjuncts denote the axiomatization for the four uninterpreted functions that may occur in the output of our conversion function. The definition of these formulas are given in Tables 2 and 3 for and respectively. For each axiom, denote bit-widths and denote integers that encode bit-vectors of size . We assume guards on all quantified formulas (omitted for brevity) that constrain to be positive and to be in the range . Each table entry lists a set of formulas (interpreted conjunctively) that state properties about the intended semantics of these operators. The formulas for assert the intended semantics of these operators,whereas assert several properties of them. asserts both, and the quantifier-free axiomatization mode takes only the formulas in that are quantifier-free. In particular, corresponds to the base cases listed in , and for the other operators is simply . The partial axiomatization of these operations mainly includes natural properties of them. For example, we include some base cases for each operation, and also the ranges of its inputs and output. For some proofs, these are sufficient. For  , and , we’ve also included their behavior for specific cases, e.g., and its variants. Other axioms (e.g., “never even”) were added after analyzing specific benchmarks, and coming up with sufficient axioms that are missing for their proofs.

Our translation satisfies the following key properties.

Theorem 3.1

Let be a parameteric bit-vector formula that is well-sorted under , and has no occurences of bit-vector extract or concrete bit-vector constants. Then:

  1. is -valid under if and only if is -valid.

  2. is -valid under if and only if is -valid.

  3. is -valid under if is -valid.

  4. is -valid under if is -valid.

The proof of Property 1 is carried out by translating every interpretation of into a corresponding interpretation of such that