# On Transforming Functions Accessing Global Variables into Logically Constrained Term Rewriting Systems

In this paper, we show a new approach to transformations of an imperative program with function calls and global variables into a logically constrained term rewriting system. The resulting system represents transitions of the whole execution environment with a call stack. More precisely, we prepare a function symbol for the whole environment, which stores values for global variables and a call stack as its arguments. For a function call, we prepare rewrite rules to push the frame to the stack and to pop it after the execution. Any running frame is located at the top of the stack, and statements accessing global variables are represented by rewrite rules for the environment symbol. We show a precise transformation based on the approach and prove its correctness.

## Authors

• 1 publication
• 6 publications
• ### IdSan: An identity-based memory sanitizer for fuzzing binaries

Most memory sanitizers work by instrumenting the program at compile time...
07/26/2020 ∙ by Jos Craaijo, et al. ∙ 0

• ### Existence of Stack Overflow Vulnerabilities in Well-known Open Source Projects

A stack overflow occurs when a program or process tries to store more da...
10/31/2019 ∙ by Md Masudur Rahman, et al. ∙ 0

• ### Operationally-based Program Equivalence Proofs using LCTRSs

We propose an operationally-based deductive proof method for program equ...
01/27/2020 ∙ by Ştefan Ciobâcă, et al. ∙ 0

• ### Complexity of Deciding Syntactic Equivalence up to Renaming for Term Rewriting Systems (Extended Version)

Inspired by questions from program transformations, eight notions of iso...
06/25/2021 ∙ by Michael Christian Fink Amores, et al. ∙ 0

• ### A tool for visualizing the execution of programs and stack traces especially suited for novice programmers

Software engineering education and training have obstacles caused by a l...
11/30/2017 ∙ by Stanislav Litvinov, et al. ∙ 0

• ### Tail Modulo Cons

OCaml function calls consume space on the system stack. Operating system...
02/19/2021 ∙ by Frédéric Bour, et al. ∙ 0

• ### On Rearrangement of Items Stored in Stacks

There are n > 2 stacks, each filled with d items (its full capacity), an...
02/12/2020 ∙ by Mario Szegedy, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Recently, analyses of imperative programs (written in C, Java Bytecode, etc.) via transformations into term rewriting systems have been investigated [3, 4, 7, 12]. In particular, constrained rewriting systems are popular for these transformations, since logical constraints used for modeling the control flow can be separated from terms expressing intermediate states [3, 4, 7, 10, 14]. To capture the existing approaches for constrained rewriting in one setting, the framework of a logically constrained term rewriting system (an LCTRS, for short) has been proposed [8]. Transformations of C programs with integers, characters, arrays of integers, global variables, and so on into LCTRSs have been discussed in [6].

A basic idea of transforming functions defined in simple imperative programs over the integers, so-called while programs, is to represent transitions of parameters and local variables as rewrite rules with auxiliary function symbols. The resulting rewriting system can be considered a transition system w.r.t. parameters and local variables. Consider the function in Figure 1, which is written in the C language. The function sum1 computes the summation from to a given non-negative integer . The execution of the body of this function can be considered a transition of values for , , and , respectively. For example, we have the following transition for sum1(3):

 (3,0,0)→(3,0,1)→(3,1,1)→(3,1,3)→(3,2,3)→(3,2,6)→(3,3,6)→(3,3,6)

This transition for the execution of the function can be modeled by an LCTRS as follows [7, 6]:

 R1=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩sum1(x)→u1(x,0,0),u1(x,i,z)→u1(x,i+1,z+i+1) [i

Note that the auxiliary function symbol can be considered locations stored in the program counter. The transformed LCTRS is useful to verify the original program [6]. For example, the theorem proving method based on rewriting induction [13] can automatically prove that , i.e., correctness of the C program [14, 9, 6].

A function call is added as an extra argument of the auxiliary symbol that corresponds to the statement of the call. Let us consider the following function in addition to in Figure 1:

    int g(int x){
int z = 0;
z = sum1(x);
return x * z;
}


This function is transformed into the following rules:

 { g(x)→u2(x,0),  u2(x,z)→u3(x,z,sum1(x)),  u3(x,z,return(y))→u4(x,y),  u4(x,z)→return(x×z) }

The auxiliary function symbol calls in the third argument of by means of the rule for .

To deal with a global variable under sequential execution, it is enough to pass a value stored in the global variable to a function call as an extra argument and to receive from the called function a value of the global variable that may be updated in executing the function call, restoring the value in the global variable. Let us add a global variable counting the total number of function calls to the above program as in Figure 2. This program is transformed into the following LCTRS [6]:

 R2=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩sum1(x,num)→u1(x,0,0,num+1),u1(x,i,z,num)→u1(x,i+1,z+i+1,num) [i

The above approach to transformations of function calls is very naive but not general. For example, to model parallel execution, a value stored in a global variable does not have to be passed to a particular function or a process because another function or process may access the global variable.

In this paper, we show another approach to transformations of imperative programs with function calls and global variables into LCTRSs. Our target languages are call-by-value imperative languages such as C. For this reason, we use a small subclass of C programs over the integers as fundamental imperative programs. We show a precise transformation along the approach and prove its correctness.

Our idea of the treatment for global variables in calling functions is to prepare a new symbol to represent the whole environment for execution. Values of global variables are stored in arguments of the new symbol, and transitions accessing global variables are represented as transitions of the environment. In reduction sequences of LCTRSs obtained by the original transformation, positions of function calls are not unique, and thus, we may need (possibly infinitely) many rules for a transition related to a global variable. To solve this problem, we prepare a so-called call stack, and transform programs into LCTRSs that specify statements as rewrite rules for not only user-defined functions but also the introduced symbol of the environment. In calling a function, a frame of the called function is pushed to the stack, and popped from the stack when the execution halts successfully. This implies that any running frame is located at the top of the stack, i.e., positions of function calls are unique. We transform statements not accessing global variables into rewrite rules for called functions as well as the previous transformation, and transform statements accessing global variables into rewrite rules for the introduced symbol for the environment.

This paper is organized as follows. In Section 2, we recall LCTRSs and a small imperative language. In Section 3, using an example, we show a new approach to transformations of imperative programs into LCTRSs. In Section 4, we precisely define a transformation and show its correctness. In Section 5, we describe a future direction of this research.

## 2 Preliminaries

In this section, we recall LCTRSs, following the definitions in [8, 6]. We also recall a small imperative language SIMP with global variables and function calls. Familiarity with basic notions on term rewriting [2, 11] is assumed.

### 2.1 Logically Constrained Term Rewriting Systems

Let be a set of sorts and a countably infinite set of variables, each of which is equipped with a sort. A signature is a set, disjoint from , of function symbols , each of which is equipped with a sort declaration where . For readability, we often write instead of if . We denote the set of well-sorted terms over and by . In the rest of this section, we fix , , and . The set of variables occurring in is denoted by . Given a term and a position (a sequence of positive integers) of , denotes the subterm of at position , and denotes with the subterm at position replaced by . A context is a term containing one hole . For a term , denotes the term obtained from by replacing by .

A substitution is a sort-preserving total mapping from to , and naturally extended for a mapping from to : the result of applying a substitution to a term is with all occurrences of a variable replaced by . The domain of is the set of variables with . The notation denotes a substitution with for , and for .

To define LCTRSs, we consider different kinds of symbols and terms: (1) two signatures and such that , (2) a mapping which assigns to each sort occurring in a set , (3) a mapping which assigns to each a function in , and (4) a set of values—function symbols such that gives a bijective mapping from to —for each sort occurring in . We require that . The sorts occurring in are called theory sorts, and the symbols theory symbols. Symbols in are calculation symbols. A term in is called a theory term. For ground theory terms, we define the interpretation as . For every ground theory term , there is a unique value such that . We use infix notation for theory and calculation symbols.

A constraint is a theory term of some sort with , the set of booleans. A constraint is valid if for all substitutions which map to values, and satisfiable if for some such substitution. A substitution respects if is a value for all and . We typically choose a theory signature with , where contains , , , and, for all theory sorts , symbols , and an evaluation function that interprets these symbols as expected. We omit the sort subscripts from and when they are clear from context.

The standard integer signature is with values , , and for all integers . Thus, we use (in sans-serif font) as the function symbol for (in font). We define in the natural way, except: since all must be total functions, we set for all and all . When constructing LCTRSs from, e.g., while programs, we can add explicit error checks for, e.g., “division by zero”, to constraints (cf. [6]).

A constrained rewrite rule is a triple such that and are terms of the same sort, is a constraint, and has the form and contains at least one symbol in (i.e., is not a theory term). If with , we may write . We define as . We say that a substitution respects if for all , and . Note that it is allowed to have , but fresh variables in the right-hand side may only be instantiated with values. Given a set of constrained rewrite rules, we let be the set . We usually call the elements of constrained rewrite rules (or calculation rules) even though their left-hand side is a theory term. The rewrite relation is a binary relation on terms, defined by: if and respects . We may say that the reduction occurs at position . A reduction step with is called a calculation.

Now we define a logically constrained term rewriting system (an LCTRS, for short) as the abstract rewriting system which is simply written by . An LCTRS is usually given by supplying , , and an informal description of and if these are not clear from context. An LCTRS is said to be left-linear if for every rule in , the left-hand side is linear. is said to be non-overlapping if for every term and rule such that reduces with at the root position: (a) there are no other rules such that reduces with at the root position, and (b) if reduces with any rule at a non-root position , then is not a position of . is said to be orthogonal if is left-linear and non-overlapping. For , we call a defined symbol of , and non-defined elements of and all values are called constructors of . Let be the set of all defined symbols and the set of constructors. A term in is a constructor term of . We call a constructor system if the left-hand side of each rule is of the form with constructor terms.

###### Example 2.1 ([6])

Let , and , where . Then both and are theory sorts. We also define set and function interpretations, i.e., , , and is defined as above. Examples of theory terms are and that are constraints. is also a (ground) theory term, but not a constraint. Using calculation steps, a term reduces to in one step with the calculation rule , and reduces to in three steps. To implement an LCTRS calculating the factorial function, we use the signature above and the following rules: . Expected starting terms are, e.g., or . Using the constrained rewrite rules in , reduces in ten steps to .

### 2.2 Simp+: a Small Imperative Language with Global Variables and Function Calls

In this section, we recall the syntax of SIMP, a small imperative language (cf. [5]). To deal with global variables and function calls, we add them into the ordinary syntax and semantics of SIMP in a natural way. We refer to such an extended language as SIMP.

We first show the syntax adopting a C-like notation. A program of SIMP is defined by the following BNF:

 P ::= D  F D ::= ϵ∣int v = n; D F ::= ϵ∣int f(int x_1,…,int x_m) = { D  S  return E; }  F S ::= ϵ∣v = E; S∣v = f(E,…,E); S∣if(B){S}else{S} S∣while(B){S} S E ::= n∣v∣(E+E)∣(E−E) B ::= true∣false∣(E==E)∣(E

where , , is a function name, and we may omit brackets in the usual way. The empty sequence “” is used instead of the “skip” command. To simplify discussion, we do not use other operands such as multiplication and division, but we use , , , , , , etc, as syntactic sugars. We also use the for-statement as a syntactic sugar. We assume that a function name has a fixed arity, and the definition and call of are consistent with the arity. A program consists of declarations of global variables (with initialization) and functions. For a program , we denote the set of global variables appearing in by : let be , then . We assume that each function is defined at most once in a program and any function called in a function defined in is defined in . To simplify the semantics, we assume that local variables in function declarations are different from global variables and parameters of functions. An assignment is defined by a substitution whose range is over the integers, which may be used for terms in the setting of LCTRSs. We deal with SIMP programs that can be successfully compiled as C programs.

###### Example 2.2

The program in Figure 3 is a SIMP program, and we have that .

The semantics of integer and boolean expressions is defined as usual (see Figure 4): given an expression and an assignment with , we write where is the resulting value obtained by evaluating with . The transition system defining the semantics of a SIMP program is defined by

• configurations of the form , where

• is of the form “” with variable declarations ,111 Variable declarations may be the empty sequence. and a statement , and

• are assignments for global and local variables, respectively, which are represented by partial functions from variables to integers—the update of an assignment w.r.t.  for an integer is defined as follows: if then , and otherwise, ,

and

• a transition relation between configurations, which is defined as a big-step semantics by the inference rules illustrated in Figure 5.

We assume that for any configuration for a program , the assignment is defined for all global variables of . To compute the result of a function call under assignments for and , given a fresh variable , we start with the configuration . When holds, the execution halts and the result of the function call under is .

## 3 A New Approach to Transformations of Imperative Programs

In this section, using an example, we introduce a new approach to transformations of imperative programs with function calls and global variables.

### 3.1 The Existing Transformation of Functions Accessing Global Variables

In this section, we briefly recall the transformation of imperative programs with functions accessing global variables [6] using the program in Figure 3. Unlike in Section 1, in the following, we do not optimize generated rewrite rules in LCTRSs in order to make it easier to understand how to precisely transform programs. The program is transformed into the following LCTRS with the sort set and the standard integer signature  [6]:

where ,