1 Introduction
Satisfiability Modulo Theories (SMT) solving for the theory of fixedsize bitvectors has received a lot of interest in recent years. Many applications rely on bitprecise reasoning as provided by SMT solvers, and the number of solvers that participate in the corresponding divisions of the annual SMT competition is high and increasing. Although theoretically difficult (e.g., [14]), bitvector solvers are in practice highly efficient and typically implement SATbased procedures. Reasoning about fixedsize bitvectors suffices for many applications. In hardware verification, the size of a circuit is usually known, and in software verification, machine integers are treated as fixedsize bitvectors, where the width depends on the underlying architecture. Current solving approaches, however, do not generalize beyond this limitation, i.e., they cannot reason about parametric circuits or machine integers of arbitrary size.
In this paper we focus on bitvector formulas with parametric bitwidth. In essence, we replace the translation from fixedsize bitvectors to propositional logic (which is at the core of stateoftheart bitvector solvers) with a translation to the quantified theories of integer arithmetic and uninterpreted functions. We use a completely automated verification process capitalizing on recent advances in SMT solving for these theories. The reliability of our approach depends on the correctness of the SMT solvers in use. Interactive theorem provers such as Isabelle and Coq [19, 27], on the other hand, target applications where trust is of higher importance than automation, although substantial progress towards incorporating automation has been made in recent years [5]. Our longterm goal is an efficient automated framework for proving bitwidth independent properties in a trusted proof assistant, which requires both a formalization of such properties in a proof assistant and the development of efficient automated techniques to reason about these properties. This work shows that stateoftheart SMT solving combined with our encoding techniques make the latter feasible. The next steps towards this goal are described in the final section of this paper.
Translating a formula from the theory of fixedsize bitvectors to the theory of integer arithmetic is not straightforward. This is due to the fact that the semantics of bitvector operators are defined modulo the bitwidth, which must be expressed using exponentiation terms (with as the bitwidth) when translated to integer arithmetic. Most SMT solvers, however, do not support unrestricted exponentiation. Further, bitwise operators such as bitwise , , and left and right shifts do not have a natural representation in integer arithmetic. These operations, as well as concatenation and extraction, seem more suitable for the theory of strings, whose combination with quantified formulas is not well supported by current SMT solvers. Although these operators are definable in the theory of integer arithmetic using function encoding (e.g. [10]), such a translation is expensive as it requires an encoding of sequences into natural numbers. Instead, we introduce an uninterpreted function (UF) for each of these operators and axiomatize them, which shifts some of the burden from integer arithmetic to UF reasoning. We consider two approaches for axiomatization: a complete axiomatization using induction, and a partial (handcrafted) axiomatization as an underapproximation.
To evaluate the potential of our approach, we examine three case studies that arise from real applications where reasoning about bitwidth independent properties is essential. Niemetz et al. [18] defined invertibility conditions for bitvector operators. The authors utilized these conditions to solve quantified bitvector formulas and verified their correctness up to bitwidth . As a first case study, we consider the bitwidth independent verification of these invertibility conditions, which [18] left to future work. As a second case study, we examine the bitwidth independent verification of compiler optimizations of LLVM. For that, we use the Alive tool [16], which generates verification conditions for such optimizations in the theory of fixedsize bitvectors. Proving the correctness of these optimizations for arbitrary bitwidths ensures their correctness for any language and underlying architecture rather than specific ones. As a third case study, we consider the bitwidth independent verification of rewrite rules for the theory of fixedsize bitvectors. SMT solvers for this theory heavily rely on such rules to simplify the input. Verifying their correctness is essential and typically done by hand.
To summarize, this paper makes the following contributions.
Symbol  SMTLIB Syntax  Sort 

,  =, distinct  
, , ,  bvult, bvugt, bvslt, bvsgt  
, , ,  bvule, bvuge, bvsle, bvsge  
,  bvnot, bvneg  
, ,  bvand, bvor, bvxor  
, ,  bvshl, bvlshr, bvashr  
, , ,  bvadd, bvmul, bvurem, bvudiv  
extract ()  
concatenation 
Related Work
Bitwidth independent bitvector formulas were studied in [21], where a formal language for bitvectors of parametric width is introduced, along with a semantics and a decision procedure. The formal language that we use here can be seen as a simplified variant of that language. A unificationbased algorithm for fixedsized bitvectors was introduced in [4], and was elevated to handle symbolic lengths for some common cases. Bitwidth independent formulas are related to parametric Boolean functions and circuits. An inductive approach for reasoning about such formalisms was developed in [12, 11], by considering a Boolean function for the base case of a circuit and a Boolean function for its inductive step. Reasoning about equivalence of such circuits can be embedded in the framework of parametric width bitvectors (see [21]).
2 Preliminaries
We briefly review the usual notions and terminology of manysorted firstorder logic with equality (denoted by ). See [10, 28] for more detailed information. Let be a set of sort symbols, and for every sort , let be an infinite set of variables of sort . We assume that sets are pairwise disjoint and define as the union of sets . A signature consists of a set of sort symbols and a set of function symbols. Arities of function symbols are defined in the usual way. Constants are treated as ary functions. We assume that includes a Boolean sort and the Boolean constants (true) and (false). Functions returning are also called predicates.
We assume the usual definitions of wellsorted terms, literals, and formulas, and refer to them as terms, literals, and formulas, respectively. We define as a tuple of variables and write with for a quantified formula . For a term or formula , we denote the free variables of (defined as usual) as and use to denote that the variables in occur free in . For a tuple of terms and a tuple of variables , we write for the term or formula obtained from by simultaneously replacing each occurrence of in by .
A interpretation maps: each to a distinct nonempty set of values (the domain of in ); each to an element ; and each to a total function if , and to an element in if . We use the usual inductive definition of a satisfiability relation between interpretations and formulas.
A theory is a pair , where is a signature and is a nonempty class of interpretations that is closed under variable reassignment, i.e., if interpretation only differs from an in how it interprets variables, then also . A formula is satisfiable (resp. unsatisfiable) if it is satisfied by some (resp. no) interpretation in ; it is valid if it is satisfied by all interpretations in . We will sometimes omit when the theory is understood from context.
The theory of fixedsize bitvectors as defined in the SMTLIB 2 standard [3] consists of the class of interpretations and signature , which includes a unique sort for each positive integer (representing the bitvector width), denoted here as . For a given positive integer , the domain of sort in is the set of all bitvectors of size . We assume that includes all bitvector constants of sort for each , represented as bitstrings. However, to simplify the notation we will sometimes denote them by the corresponding natural number in . All interpretations are identical except for the value they assign to variables. They interpret sort and function symbols as specified in SMTLIB 2. All function symbols in are overloaded for every . We denote a term (or bitvector term) of width as when we want to specify its bitwidth explicitly. We refer to the th bit of as with . We interpret as the least significant bit (LSB), and as the most significant bit (MSB), and denote bit ranges over from index down to as . The unsigned interpretation of a bitvector as a natural number is given by , and its signed interpretation as an integer is given by .
Without loss of generality, we consider a restricted set of bitvector function and predicate symbols (or bitvector operators) as listed in Table 1. The selection of operators in this set is arbitrary but complete in the sense that it suffices to express all bitvector operators defined in SMTLIB 2. We use () for the maximum or minimum signed value of width , e.g., and .
The theory of integer arithmetic is also defined as in the SMTLIB 2 standard. The signature includes a single sort , function and predicate symbols , and a constant symbol for every integer value. We further extend to include exponentiation, denoted in the usual way as . All interpretations are identical except for the values they assign to variables. We write to denote the (combined) theory of uninterpreted functions with integer arithmetic. Its signature is the union of the signature of with a signature containing a set of (freely interpreted) function symbols, called uninterpreted functions.
2.1 Parametric BitVector Formulas
We are interested in reasoning about (classes of) formulas that hold independently of the sorts assigned to their variables or terms. We formalize the notion of parametric formulas in the following.
In the remainder of the paper, we fix sets and of variable and constant symbols (respectively) of bitvector sort. We use the symbols in (resp. ) for representing variables (resp. constants) of bitvector sort whose bitwidth is not fixed. The bitwidth is instead represented using a map , which maps from symbols to terms, where we refer to as the symbolic bitwidth assigned by to . We also define the map from symbols to terms, where we call the symbolic value assigned by to . Let , and let be the set of (integer) free variables occurring in the range of either or . We say that is legal if for every interpretation that interprets each variable in as a positive integer, and for every , also interprets as a positive integer.
Let be a formula built from the symbols of and , ignoring their sorts. We refer to as a parametric formula. We may interpret as a class of fixedsize bitvector formulas as follows. For each symbol and positive integer , we associate a unique variable of (fixed) bitvector sort . Given a legal with and an interpretation that maps each variable in to a positive integer, let be the result of replacing all symbols in by the corresponding bitvector variable and all symbols in by the bitvector constant corresponding to of sort , where in both cases is the value of . We say a formula is wellsorted under if is legal and is a wellsorted formula for all mapping variables in to positive integers.
Example 1
Let be the set and be the set where and . Let be the formula . We have that is wellsorted under where or . It is not wellsorted when since is not a wellsorted formula whenever . An where is not legal, since is possible even when and .
Notice that symbolic constants such as the maximum unsigned constant of a symbolic length can be represented by introducing with and . Furthermore, recall that signature includes the bitvector extract operator, which is parameterized by two natural numbers and . We do not lift the above definitions to handle extract operations having symbolic ranges, e.g. where and are terms. This is for simplicity and comes at no loss of expressive power since constraints involving extract can be equivalently expressed using constraints involving concatenation. For example, showing that every instance of a constraint holds where is equivalent to showing that holds for all where have sorts , , , respectively. We may reason about a formula involving a symbolic range of by considering a parametric bitvector formula that encodes a formula of the latter form, where the appropriate symbolic bitwidths are assigned to symbols introduced for .
We assume the above definitions for parametric formulas are applied to parametric terms as well. Furthermore, for any legal , we assume can be extended to terms of bitvector sort that are wellsorted under such that has sort for all mapping variables in to positive integers. Such an extension of to terms can be easily computed in a bottomup fashion by computing for each child and then applying the typing rules of the operators in . For example, we may assume if is of the form and is wellsorted under , and if is of the form .
Finally, we extend the notion of validity to parametric bitvector formulas. Given a formula that is wellsorted under , we say is valid under if is valid for all mapping variables in to positive integers.
3 Encoding Parametric BitVector Formulas in SMT
Current stateoftheart SMT solvers do not support reasoning about parametric bitvector formulas. In this section, we present a technique for encoding such formulas as formulas involving nonlinear integer arithmetic, uninterpreted functions and universal quantifiers. In SMT parlance, these are formulas in the UFNIA logic. Given a formula that is wellsorted under some mapping , we describe this encoding in terms of a translation , which returns a formula such that if is valid in the theory of uninterpreted functions with integer arithmetic, then is valid under . We describe several variations on this translation and discuss their relative strengths and weaknesses.
3.0.1 Overall Approach
At a high level, our translation produces an implication whose antecedent guards the ranges of the integer variables to be in the correct ranges (e.g., for every bitwidth variable ), and whose conclusion is the result of converting each (parametric) bitvector term of bitwidth to an integer term. Operations on parametric bitvector terms are converted to operations on the integers modulo , where need not be a constant. We first introduce necessary uninterpreted functions that will be used in our translation. Note that SMT solvers do not support the full set of functions in our extended signature , since they do not support exponentiation. The translation requires a limited form of exponentiation, . We thus introduce an uninterpreted function symbol of sort , whose intended semantics is two raised to the power of its argument (when its argument is nonnegative). Second, for each (nonpredicate) ary function of sort in the signature of fixedsize bitvectors (excluding bitvector extraction), we introduce an uninterpreted function of arity and sort , where the extra argument is used to specify the bitwidth. For example, for with sort , we introduce of sort . In its intended semantics, this function adds the second and third arguments, both integers, and returns the result modulo , where is the first argument. The signature contains one function, bitvector concatenation , whose two arguments may have different sorts. For this case, the first argument of indicates the bitwidth of the third argument, i.e., is interpreted as the concatenation of and , where is an integer that encodes a bitvector of bitwidth ; the bitwidth for is not specified by an argument, as it is not needed for the elimination of this operator that we perform later. We introduce uninterpreted functions for each bitvector predicate in a similar fashion. For instance, has sort and encodes whether its second argument is greater than or equal to its third argument, when these two arguments are interpreted as unsigned bitvector values whose bitwidth is given by its first argument. Depending on the variation of the encoding, our translation will either introduce quantified formulas that fully axiomatize the behavior of these uninterpreted functions or add (quantified) lemmas that state key properties about them, or both.
(, ):

Return .
Conv (, ):

Match :

if if
Elim ():

Match :

otherwise
3.0.2 Translation Function
Figure 1 defines our translation function , which is parameterized by an axiomatization mode . Given an input formula that is wellsorted under , it returns the implication whose antecedant is an axiomatization formula and whose conclusion is the result of converting to its encoded version via the conversion function Conv. The former is dependent upon the axiomatization mode which we discuss later. We assume without loss of generality that contains no applications of bitvector extract, which can be eliminated as described in the previous section, nor does it contain concrete bitvector constants, since these can be equivalently represented by introducing a symbol in with the appropriate concrete mappings in and .
In the translation, we use an auxiliary function Conv which converts parametric bitvector expressions into integer expressions with uninterpreted functions. Parametric bitvector variables (that is, symbols from ) are replaced by unique integer variables of type , where we assume a mapping maintains this correspondence, such that range of does not include any variable that occurs in . Parametric bitvector constants (that is, symbols from set ) are replaced by the term . The ranges of the maps in may contain arbitrary terms. In practice, our translation handles only cases where these terms contain symbols supported by the SMT solver, as well as terms of the form , which we assume are replaced by during this translation. For instance, if and , then returns . Equalities are processed by recursively running the translation on both sides. The next case handles symbols from the signature , where symbols are replaced with the corresponding uninterpreted function . We take as the first argument , indicating the symbolic bitwidth of the last argument of , and recursively call Conv on . In all cases, corresponds to the bitwidth that the uninterpreted function expects based on its intended semantics (the bitwidth of the second argument for bitvector concatenation, or of an arbitrary argument for all other functions and predicates). Finally, if the top symbol of is a Boolean connective we apply the conversion function recursively to all subchildren.
We run Elim for all applications of uninterpreted functions introduced during the conversion, which eliminates functions that correspond to a majority of the bitvector operators. These functions can be equivalently expressed using integer arithmetic and . The ternary addition operation , that represents addition of two bitvectors with their width specified as the first argument, is translated to integer addition modulo . Similar considerations are applied for and . For and , our translation handles the special case where the second argument is zero, where the return value in this case is the maximum value for the given bitwidth, i.e. . The integer operators corresponding to unary (arithmetic) negation and bitwise negation can be eliminated in a straightforward way. The semantics of various bitwise shift operators can be defined arithmetically using division and multiplication with . Concatenation can be eliminated by multiplying its first argument by , where recall is the bitwidth of the second arugment . In other words, it has the effect of shifting left by bits, as expected. The unsigned relation symbols can be directly converted to the corresponding integer relation. For the elimination of signed relation symbols we use an auxiliary helper (unsigned to signed), defined in Figure 1 which returns the interpretation of its argument when seen as a signed value. The definition of can be derived based on the semantics for signed and unsigned bitvector values in the SMT LIB standard. Based on this definition, we have that integers and that encode bitvectors of bitwidth satisfy if and only if .
As an example of our translation, let , , , and from Example 1. is . After simplification, running Elim results with the formula .
Thanks to Elim, we can assume that all formulas generated by Conv contain only uninterpreted function symbols in the set . Thus, we restrict our attention to these symbols only in our axiomatization , described next.
axiom  

base cases  
weak monotonicity  
strong monotonicity  
modularity  
never even  
always positive  
div 0  
base case  
max  
min  
idempotence  
contradiction  
symmetry  
difference  
range  
base case  
zero  
one  
symmetry  
range 
3.0.3 Axiomatization Modes
We consider four different axiomatization modes , which we call , , and (quantifierfree). For each of these axiomatizations, we define as the conjunction:
The first conjunction states that all integer variables introduced for parametric bitvector variables reside in the range specified by their bitwidth. The second conjunction states that all free variables in (denoting bitwidths) are positive. The remaining four conjuncts denote the axiomatization for the four uninterpreted functions that may occur in the output of our conversion function. The definition of these formulas are given in Tables 2 and 3 for and respectively. For each axiom, denote bitwidths and denote integers that encode bitvectors of size . We assume guards on all quantified formulas (omitted for brevity) that constrain to be positive and to be in the range . Each table entry lists a set of formulas (interpreted conjunctively) that state properties about the intended semantics of these operators. The formulas for assert the intended semantics of these operators,whereas assert several properties of them. asserts both, and the quantifierfree axiomatization mode takes only the formulas in that are quantifierfree. In particular, corresponds to the base cases listed in , and for the other operators is simply . The partial axiomatization of these operations mainly includes natural properties of them. For example, we include some base cases for each operation, and also the ranges of its inputs and output. For some proofs, these are sufficient. For , and , we’ve also included their behavior for specific cases, e.g., and its variants. Other axioms (e.g., “never even”) were added after analyzing specific benchmarks, and coming up with sufficient axioms that are missing for their proofs.
Our translation satisfies the following key properties.
Theorem 3.1
Let be a parameteric bitvector formula that is wellsorted under , and has no occurences of bitvector extract or concrete bitvector constants. Then:

is valid under if and only if is valid.

is valid under if and only if is valid.

is valid under if is valid.

is valid under if is valid.
The proof of Property 1 is carried out by translating every interpretation of into a corresponding interpretation of such that
Comments
There are no comments yet.