KRust: A Formal Executable Semantics of Rust

04/28/2018 ∙ by Feng Wang, et al. ∙ 0

Rust is a new and promising high-level system programming language. It provides both memory safety and thread safety through its novel mechanisms such as ownership, moves and borrows. Ownership system ensures that at any point there is only one owner of any given resource. The ownership of a resource can be moved or borrowed according to the lifetimes. The ownership system establishes a clear lifetime for each value and hence does not necessarily need garbage collection. These novel features bring Rust high performance, fine low-level control of C and C++, and unnecessity in garbage collection, which differ Rust from other existing prevalent languages. For formal analysis of Rust programs and helping programmers learn its new mechanisms and features, a formal semantics of Rust is desired and useful as a fundament for developing related tools. In this paper, we present a formal executable operational semantics of a realistic subset of Rust, called KRust. The semantics is defined in K, a rewriting-based executable semantic framework for programming languages. The executable semantics yields automatically a formal interpreter and verification tools for Rust programs. KRust has been thoroughly validated by testing with hundreds of tests, including the official Rust test suite.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recently, a new system programming language Rust was designed and implemented by Mozilla [1], aiming at achieving both high-level safety and low-level control in developing system software, such as operating systems, device drivers, game engines and web browsers. Like most modern high-level languages, Rust guarantees memory safety and thread safety, and meanwhile it supports zero-cost abstractions for many common system programming idioms and provides fine low-level control over the use of memory, without needing a garbage collector. One key to meeting all these promises is Rust’s novel system of ownership, moves and borrows. The ownership system establishes a clear lifetime for each value, making garbage collection unnecessary in the core language. Moreover, it also prevents data-races at compile-time. The three mechanisms are checked at compile time and carefully designed to complement its static linear type system. Rust has been used to implement operating system [2], parallel browser engine [3], Intel SGX Enclave [4], etc. (organizations running Rust in production refers to 111https://www.rust-lang.org/en-US/friends.html.)

The new features make the semantics of Rust different from other common languages, which brings new difficulties in reasoning about Rust programs. The difficulty constantly baffles developers and has become a common topic of question-and-answer websites (for instance, [5]). The ownership of an object is a variable binding. When a variable goes out of its scope, Rust will free the bound resources. The ownership of an object can be transferred, after which the object cannot be accessed via the outdated owner. This ensures that there is exactly one binding to any object. Instead of transferring ownership, Rust provides borrows which allows an object to be shared by multiple references. There are two kinds of borrows in Rust: mutable references and immutable references. A mutable reference can mutate the borrowed object if the object is mutable, while an immutable reference cannot mutate the borrowed object even the object itself is mutable. The basic ownership discipline enforced by Rust is that an object shared by multiple references is immutable. This property eliminates a wide range of common low-level programming errors, such as use-after-free, data races, and iterator invalidation. These specific semantic rules are unusual, and hence the semantics of other modern programming languages such as C/C++, Java and Javascript cannot be directly adapted to Rust.

To remedy this situation, a formal semantics of Rust is desired and useful as a fundament for reasoning about Rust programs in a formal way and developing related computer-aided tools. To our knowledge, the formal semantics of Rust has not yet been well studied, which impedes further developments of formal analysis and verification tools for Rust programs. In this paper, we make a major step toward rectifying this situation by giving the first formal operational semantics of a realistic subset of Rust. We design a formal operational semantics of Rust capturing ownership, ownership moves and borrows. To avoid our semantics to be just “paper work”, we formalize the semantics in framework [6] (http://kframework.org), a scalable semantic framework for programming languages which has been successfully applied to C [7] and Java [8]. We call the definition of the semantics KRust. 222 and all the related sample examples are available for downloading from: http://sist.shanghaitech.edu.cn/faculty/songfu/Projects/KRust. To the best of our knowledge, it is the first formal executable semantics for Rust.

There are several benefits from the formal semantics defined in framework. Firstly, the semantics defined in is both machine readable and executable, from which an interpreter of Rust is generated automatically. Being executable, the semantics has been thoroughly tested with hundreds of tests, including the official Rust test suite. Secondly, provides a simple notation for modular semantics of languages, making the semantics easy to define and extensible. The semantics offers a formal reference and a correctness definition for implementers of tools such as parsers, compilers, interpreters and debuggers, which would greatly facilitate developers’ understanding, freeing them from lengthy, ambiguous, elusive Rust documentations. Moreover, the semantics could automatically yield formal analysis tools such as state-space explorer for reachability, model-checker, symbolic execution engine, deductive program verifier by the language-independent tools provided in [9].

Organization. Section II gives a brief overview of Rust. In Section III, after introducing the basic notations of , we present the formal semantics of Rust in . Conformance testing and applications of are described in Section IV. Section V discusses related work. Finally, we conclude the work with a discussion in Section VI.

Ii A Tour of Rust

In this section, we give a brief overview of Rust. Rust is a C-like programming language which contains common constructs from modern programming languages such as if/else branches, while/for loops, functions, compound data structures, etc. We will mainly point out distinct features of Rust, compared with other well-known modern programming languages such as C/C++ and Java.

Ii-a Mutability

Variables are declared using let statements. In default, variable is immutable, which means that its value cannot be mutated. To declare a mutable variable, mut is required in the declaration statement.

1fn main(){
2  let x=9;
3  x=10; // Error!
4  let mut y = 0;
5  let mut z: bool;
6}

For instance, the above code declares an immutable variable x at Line 2 and a mutable variable y at Line 4 whose types are inferred at compile-time. The type can also be explicitly specified in the program like the mutable variable z at Line 5. The Rust compiler will issue an error at Line 3, as the immutable variable x is reassigned. This is different from other modern programming languages like C/C++ and Java. The semantic rules of Rust should take into the mutability of variables into account.

Ii-B Functions

Functions are declared with the keyword fn and each of them should return exactly one value. There are two ways to return a value if the function definition declares a return type. The first one returns a value in the body of the function explicitly using the return statement, which is similar to existing prevalent languages. The another one returns the value of the last expression in the body of the function, if there is no explicit return statement. However, if the function definition does not declare a return type, Rust will implicitly returns the unit type (). Indeed, a function definition without declaring a return type is just syntactic sugar for the same function definition with return type (). Furthermore, Rust has no restriction on the order of function definitions, namely that Rust programs can invoke functions which are defined later. These features are unusual and introduce tricky corner cases.

1fn foo(x:i32, y:i32) -> i32 {
2  x+y // return x+y;
3}

The above program defines a function foo which takes two 32-bit integers x and y as arguments and returns x+y. This function behaves the same as the function which replaces the last expression x+y in foo with the return statement return x+y;.

Ii-C Ownership

Ownership is the key feature of Rust, which guarantees memory safety and thread safety without garbage collection. The basic form of ownership is exclusive ownership, namely that each object has a unique owner at any time. This ensures that at most one reference is allowed to mutate a given location. When an object is created and assigned to a variable x, the variable x becomes the owner of the object. If the object is reassigned (as well as parameter passing, etc.) to another variable y, the ownership of the object is transferred from the variable x to the variable y, namely that y becomes the owner of the object and x is not the owner of the object. This is so-called move semantics in Rust. This ownership discipline rules out aliasing entirely, and thus prevents from data race. Moveover, if the owner of the object goes out of scope, the object is destroyed automatically without garbage collection. This is implemented by the Rust compiler which inserts a call to a destructor at the end of the owner’s scope, called drop in Rust. This ownership discipline enforces automatic memory management and prevents other errors commonplace in low-level pointer-manipulating programs, like use-after-free or double-free. To see this principle in practice, consider the following sample program.

1struct Point{
2   x: i32,
3   y: i32,
4}
5fn main(){
6  let p = Point {x:1, y:2};
7  {
8    let mut q = p; //q becomes the owner
9    q.x = 2;
10    println!(”{}”,p.x);  // Error!
11  }
12  println!(”{}”,q.x);  // Error!
13}

Point is a compound data structure consisting of two 32-bit integer fields x and y. At Line 6, a Point object is created and assigned to the mutable variable p. The Rust compiler will issue an error at Line 10 which accesses the x field of the Point object via the reference p, as the ownership of the Point object was already transferred from p to q at Line 8. Moreover, the Rust compiler will also issue another error at Line 12 which accesses the x field of the Point object via the reference q, as the owner q went out of its scope and the Point object was already destroyed. These scenarios rarely happen in existing prevalent programming languages, thus makes the Rust semantics unusual.

Ii-D Borrowing

The ownership discipline is a fairly straightforward mechanism for guaranteeing memory safety and thread safety. However, it is also too restrictive to develop industry programs. To address this issue, Rust provides a mechanism used to handle references, called borrowing. There are two different borrowing (i.e., reference types) in Rust: mutable references and immutable references. A mutable reference grants temporary exclusive read and write access to the object, i.e., each object has at most one mutable reference (without any immutable references). This ensures that mutable references are always unique pointers. A mutable reference can be reborrowed to someone else. Contrary to mutable references, an immutable reference grants temporary read-only access to the object and it is allowed that multiple immutable references refer to the same object. The design of borrowing in Rust also guarantees memory safety and thread safety.

1fn main() {
2   let x1 = 1;
3   let p1 = &x1; // x1 is borrowed immutably
4   let q1 = &x1; // x1 is borrowed immutably
5   *p1 = 2; // Error!
6   let y = &mut x1; // Error!
7
8   let mut x2 = 1;
9   let p2 = &x2; // x2 is borrowed immutably
10   let q2 = &x2; // x2 is borrowed immutably
11   *p2 = 2; // Error!
12
13   let mut x3 = 1;
14   let p3 = &mut x3; // x3 is borrowed mutably
15   x3 = 2; // Error!
16   *p3 = 2; // OK!
17   let p4 = &mut x3;  // Error!
18}

We can see borrowing in action in the above example. In this example, the variable x1 is immutable which is immutably borrowed twice at Lines 3 and 4. The compiler will issue an error at Line 5 which tries to mutate the value of x1. It also issues another error at Line 6, as immutable variable x1 cannot be mutably borrowed. The code at Lines 8-11 shows that the mutable variable x2 can also be immutably borrowed multiple times, but cannot be mutated by an immutable reference. Line 15 demonstrates that the mutable variable x3 cannot be mutated via x3 once it is borrowed. Instead, it can be mutated via the mutable reference p3 at Line 16. Line 17 shows that the mutable variable x3 cannot be mutably borrowed more than once.

Ii-E Lifetime

Borrowing grants temporary access to the object. Rust associates to each reference with a lifetime to specify how long is temporary. Intuitively, lifetimes are effectively just names for scopes somewhere in the program, but they are not same. The lifetime of a reference should be included in the lifetime of the borrowed variable. Rust provides a convention so that lifetimes can be elided in general, which is why they did not show up in the above examples. Rust also supports named lifetimes which helps the Rust compiler to aggressively infer lifetimes and makes sure all borrows are valid.

1fn main(){
2   let mut x ;
3   {
4     let y = 1;
5     x = &y;   // Error!
6   }
7   let z = 1;
8   let p = &z; // OK!
9}

The above example illustrates intuition behind lifetime. There is an error at Line 5 as the lifetime of x is not included in the lifetime of y. Instead, it is fine to borrow z at Line 8, as the lifetime of p is included in the lifetime of z. Therefore, Rust’s variable context is substructural.

Syntax Description

Id ::= [a-zA-Z_][a-zA-Z0-9_]*
Identifier
Type ::= i8 u8 i16 u16 i32 u32 i64 u64 f32 f64 isize usize char
    &str bool   Id  ()   [Type;Exp] fn (Types) - Type
Types ::= Type* Variable Types
TypedId ::= Id : Type
TypedIds ::= TypedId* Auxiliary types
ConstAndStatic ::= const Id : Type = Exp ;  static mut ? Id : Type = Exp ;
DeclExp ::= let mut ? Id [: Type] ? [= Exp] ? ConstAndStatic; Variable declaration
Op ::= “+” “-” “*” “/” “%” “&” “==” “!=” “&&”
Exp ::= Int Bool Float String Char Id *Id [Exps] [Exp;Exp] vec![Exps] (Exp) Exp[Exp]
    {Exp} Ref Exp Id { StructValues } Exp(Exp) -Exp !  Exp Exp Op Exp
Exps ::= Exp* Expressions
AssignOp ::= “=” “+=” “-=” “*=” “/=”
AssignmentStmt ::= Id AssignOp Exp ;  Id[Exp] AssignOp Exp ;  *Id AssignOp Exp; Id . Id AssignOp Exp ; Assignment statement
If ::= if Exp else ? Block
While ::= while Exp Block
Loop ::= loop Block If While If, while and loop statement
Block ::= { } { Stmts } { Stmts Exp } Block
Ref ::= & &mut Two types of references
Struct ::= struct Id { TypedIds }
StructValue ::= Id : Exp
StructValues ::= StructValue*
StructInstance ::= Id { StructValues } Struct
For ::= for Id in Int..Int Block For statement
Function ::= fn Id (TypedIds) [- Type] ? Block Function
Stmts ::= DeclExp AssignStmt Block Exp ; return ; return Exp ; Loop ; Loop Function Struct For Statements
TABLE I: The syntax of

Iii : The Formal Semantics of Rust in

In this section, we first introduce the basic notations of , and then define the addressed formal syntax of Rust. Finally, we define the configurations and formal semantics of Rust in .

Iii-a Framework

Framwork is a rewrite-based executable semantic framework. Operational semantics of a programming language can be formally defined with the state of an executing program being represented as a configuration and the semantics of each program statement being defined as rules. With the defined operational semantics, automatically generates an interpreter which can execute programs of the language. Besides, also provides formal analysis functionalities such as model checking, symbolic execution, and theorem proving [6]. A number of programming languages have been formalized using , such as C [7], Python [10], PHP [11] and Java [8].

In this section, we briefly introduce the mechanism of how to define operational semantics in with an example. Basically, the syntax of a language is defined using BNF with semantic attributes, and the operational semantics is defined by a set of rules which describe the effect of atomic program statements over configurations. A configuration is essentially a nested cell defined in XML style, which specifies a state of an executing program.

For better understanding , let us consider the following semantic rule for lookup function in Rust, which is used to lookup the value of a name (e.g., variable) when a statement which contains that name is being executed.

There are three cells related to the lookup function. The cell k (i.e., the cell labeled with k) is used to store the computations of a program to be executed. In this example, is the next expression/statement to evaluate/execute, where is a label representing a program name. Both env and store cells are Map type to store key-value pairs, where env is used to store the mapping from program names to locations in the form of , and store is used to store the mapping from locations to values in the form of . The dots “” in the cells are structural frames, denoting irrelevant pairs or computations.

For instance, given the following two statements:

1let a = 1;
2let b = a;// b = 1

Variable a should be first evaluated to 1 using the lookup rule when the second statement is being executed.

Iii-B Syntax

Table I presents the syntax of a realistic subset of Rust defined in . The syntax is described by a dialect of Extended Backus-Naur Form (EBNF) according to the grammar of Rust [12]. We use two repetition forms in the definition of the syntax, where “?” means zero or one repetition. “*” means zero or more repetitions. The option is represented through squared brackets [ … ] followed with “?”. For instance, in syntax for function declaration, “[- Type] ?” means that “- Type” maybe present just once, or not at all. Since Rust shares many common conventions with prevalent functional and imperative programming languages, most of its syntax are easy to understand and hence we omit corresponding explanations. We remark that Table I

is not a full syntax of Rust, e.g., traits and pattern matching are excluded (c.f. Section

VI).

Iii-C Configuration

A configuration is represented by nested multisets of labeled cells. Figure 1 shows the 13 main cells in a configuration for the representation of the state of programs.

Fig. 1: The configuration for the states of programs
Variable declaration and assignment



Borrow, reference, lifetime and dereference




Function definition and function call

       


Struct and ownership





         

TABLE II: The partial semantic rules of

The cell T is the top one in which contains all the cells. As aforementioned, the k cell contains the computations of a program. The env cell is the local environment, recording the map from variables to their locations. Inside control cell, there is a fstack cell which encodes the stack frame. The fstack cell is a list, in which each element contains an env cell, some computations and a return type. The genv cell represents global environment. The typeEnv cell records the type of a given variable’s location. The values of all the defined variables are stored in the store cell.

Since a variable is either mutable or immutable in Rust, we add mutType cell to record whether the variable is mutable or immutable. When a new variable is declared, a new location that is an integer value is allocated from nextLoc cell. After that, integer value in nextLoc is increased by one. The borrow cell keeps the record whether a variable is mutably or immutably borrowed if there exists an alive reference to it. The ref cell records reference relations. The refType cell contains the types of references, including immutable and mutable references. The moved cell is a map from locations of variables to , where denotes that the variable has been moved, otherwise not.

Iii-D Formalization of the semantics

In this section, we present the formal semantics in , emphasizing on key features of Rust such as: mutability, borrows and ownership. Table II shows the partial semantic rules of , which specify the semantics of Rust statements by modifying the content of relevant cells. In the semantic rules, operator is used to add a new pair to its corresponding cell. For instance, means inserting a new pair into the env cell. The operator denotes undefined value. Dot is a special character in , representing the identity element in list and map structures. Note that curly braces are used only for the compactness of the rules, where we leave out all the dots “” in cells. The complete semantic rules of can be found in its source code.

Iii-D1 Variable declaration and assignment

The semantic rule [Declaration-of-Immutable-Variable] specifies how cells are affected after the execution of the statement in the k cell, where is a variable and is a primitive type. A new pair is added to store, where indicates that is uninitialized. The pair is added to mutType, where 0 means that is immutable. The next available location becomes since has consumed the location . Some other initialization operations are made in the relevant cells as shown in the rule. The rule for the declaration of a mutable integer variable is defined likewise with a modification of the mutType cell.

The semantic rule [Assignment] is defined for the ordinary assignment, e.g., assigning a new value to an integer variable. It says that the assignment could be successfully executed only if is mutable and the type of is , where is a pre-defined function to get the type of . The execution updates the value in store using , where denotes that the current value of can be any. The notation will be used regularly in this work, which stands for replacing the pair by in a cell. A declaration with an initial value can be handled similarly by combining the above semantic rules.

The semantic rule [Declaration-of-Mutable-Array] is defined for the declaration of a mutable array, where a new pair is added to env, i.e., allocates a location for . In typeEnv, is the array type. in store denotes the locations which are uninitialized. The integer in nextLoc cell is increased by , as locations are used to store the content of the array. The rules for the evaluation and updating of an array are defined likewise, as depicted in the table.

Iii-D2 Borrowing, reference, lifetime and dereference

Both immutable references and mutable references for borrowing are implemented in . [Mutable-Reference] and [Immutable-Reference] are two of semantic rules for immutable and mutable references respectively. The [Mutable-Reference] rule expresses that if both and are mutable and have not been moved, and has not been borrowed yet, then the value of in ref cell is updated to , which may be used for dereference, the reference type of in refType is assigned by denoting that the reference type of is mutable, the pair in cell borrow is updated to meaning that is now mutably borrowed, while the condition ensures that has to be declared later than , i.e., the lifetime of is included in the lifetime of . The [Immutable-Reference] rule is defined likewise, except that is already immutably borrowed and the reference type of in refType is assigned by denoting immutable. The intuition behind the [Dereference] rule is straightforward, namely, the value of is evaluated via the reference relation in the ref cell.

Iii-D3 Function definition and function call

The standard function definition that has an explicit return type is handled by the [Function-Definition-with-Return-Type] rule, which allocates a location for the name in the env cell and binds the location with an auxiliary function in store. The auxiliary function records the type and body of the function, which will be used for function call, i.e., the [Function-Call] rule.

The function call is processed into two steps: first replacing the function name by its function in store using the [Lookup-of-Variable] rule, then applying the [Function-Call] rule. The [Function-Call] rule first stores the current local environment together with the return type and the remaining computations into the cell, reallocates a new empty local environment denoted by in the rule, and sets as the computations. The latter firsts executes the helper function which is used to declare formal parameters initialized with actual parameters recursively (c.f. the [mkDecls-Helper-Function] rule), then executes the function body followed by an additional return statement . We remark that return (); is added to handle the case that the function definition does not have a return statement, and it will be executed only if there is no return statement at the end of the function body .

The semantic rule [Function-Definition-without-Return-Type] handles the corner case that the function definition does not have a return type, for which, the unit type is added as the return type. The semantic rule [Function-Definition-Return-by-Last-Expression] handles another corner case that the function definition returns the value of the last expression, for which, the last expression is rewritten as a return statement .

The rule [Return] first checks the type of the return value , then returns the value and finally restores the local environment Env and the computations from the top of the fstack cell at the same time.

Iii-D4 Struct and ownership

The rule [Struct-Definition] is used for struct definition, which adds a new pair into the cell. The value of in is a helper function that is used to record the fields of the struct.

The rule [Declaration-of-Struct-Instance] is defined for the declaration of a mutable struct instance. It allocates a location for (i.e., adds into env) and sets the type of as , which is a struct name (i.e., add into ). Other related cells are initialized accordingly. The declaration and initialization of fields are handled by the computation , which is defined by the [--Helper-Function] rule. This rule recursively allocates a location for each field, which is initialized with the corresponding value from Vs.

The rule [Ownership-Move] specifies Rust’s move semantics, in which the assignment statement is encoded as the computations of two helper functions and , if and have same type and is immutable. Notice that the pair in store ensures that is a struct instance. The semantics of the helper functions and are expressed by the rules [-Helper-Function] and [-Helper-Function] respectively. Intuitively, [-Helper-Function] helps to copy the fields from to . [-Helper-Function] is used to update the move cell. We remark that is not added in the [Ownership-Move] rule, as “” in [-Helper-Function] ensures that both variables are not moved yet. The semantic rules [Evaluation-of-Struct-Field] and [Updating-of-Struct-Field] are defined for evaluation and updating of a struct filed, in which it is required that the struct variable is not moved, i.e., the pair should occur in moved.

Iv Testing and Applications

In this section, we validate our semantics and show some potential applications of .

Iv-a Conformance Testing

Following previous work [7, 11, 8, 13] which used test suite for validating executable language semantics, we tried to do the same. We validated our semantics by testing the interpreter (that was automatically generated from the semantics using framework) against both the official test suite of Rust 333https://github.com/rust-lang/rust/tree/master/src/test. and hand-crafted tests.

The official test suite of Rust is used to test the Rust compiler. It is already split into folders containing different categories of tests. We chose tests from the “run-pass” folder, as others were designed for different purposes such as error message. There are 3119 tests in the “run-pass” folder: some of them can be compiled by the nightly version or stable version of the compiler, and some may be ignored during compiler testing, but it is unclear how these tests were used from the official documents. Therefore, we parsed all 3119 and 195 of them are supported by our syntax. In 195 tests, there are 38 tests that either do not have a main function or call some standard library functions. Therefore, we chose other 157 tests.

Because 157 tests might not cover all the supported constructs, we hand-crafted 25 tests according to the syntax defined in Table I. 25 tests together cover all the primitive constructs.

We have tested the interpreter against all these 182 tests. successfully parsed all of them and the results produced by the interpreter are same as the one produced by the compiled programs using the Rust compiler.

Remark that the semantic coverage of the test suite has not been well-studied, we leave this to future work.

Iv-B Applications

One of the main goals of our semantics is to provide a formal semantics for Rust. Beyond just giving a formal reference for the defined language, there are many applications of our formal semantics using the language-independent tools provided by . In this work, we demonstrate this by showing two applications: debugging and verification, which are automatically derived from the semantics .

Debugging. We can turn the debugger into a debugger for Rust which allows users to inspect program states. We demonstrate this by the following example.

1fn main() {
2    let mut x: i32 = 10;
3    while x > 0 {
4        x = x - 1;
5    }
6}

We can debug this program using the command.

1krun test.rs debugger

Users can step through one or more semantic rules individually from the current point and print the current state. For instance, after executing Line 4 once, part of the state will look like below:

1<env> x |-> 1 </env>
2<typeEnv> 1 |-> i32 </typeEnv>
3<mutType> 1 |-> 1 </mutType>
4<store> 1 |-> 9 </store>

Verification. A sound-by-construction program verifier for Rust can be automatically derived from without additional effort. The verifier allows us to automatically check reachability properties including all Hoare-style functional correctness claims and time complexity of a computation. As an example, we will verify the time complexity of the Euclidean algorithm by subtraction which computes greatest common divisor.

1fn gcd(a: i32, b : i32) -> i32 {
2    if a!=b {
3        if a>b { return gcd(a-b, b); }
4        else   { return gcd(a, b-a); }
5    }else { return a; }
6}

The Euclidean algorithm is implemented in Rust as shown above. We will prove its time complexity is indeed . To verify time complexity, an extra cell time is added to the configuration which increases a counter each time when gcd is called. The core part of the specification for proving is:

1 gcd(X:Int, Y:Int)
2 <time> T1 => T2 </time>
3 requires X > 0 , Y > 0
4 ensures T2 - T1 <= maxInt(X,Y)

where T1 and T2 respectively denote the value of the counter in pre-state and post-state of the function, requires and ensures respectively denote pre-condition and post-condition, T2 - T1 denotes the number of calls to gcd, and maxInt is a built-in function which returns the larger one of two integers. The verifier outputs true which proves that T2 - T1 <= maxInt(X,Y) holds, i.e., the time complexity is .

V Related work

A multitude of formal semantics for real programming languages have been proposed in the literature. Due to space restriction, we only discuss large semantics in and other works related to Rust.

V-a Other formal semantics in

Ellison and Rosu defined an executable formal semantics for C11, which has been extensively tested against the GCC torture suite [7] and evaluated by debugging, monitoring, and (LTL) model checking of C programs using the built-in capabilities of . Hathhorn et al. defined undefined behavior in C11 [14], complementing the semantics of [7].

Filaretti and Maffeis defined a formal semantics for PHP [11]. As there is no official language standard for PHP, they had to heavily rely on testing against the some test suite. Their semantics has been evaluated by model checking certain properties of some programs.

Bogdanas and Rosu [8] gave a formal semantics for Java 1.4, which is split into two phases: a static semantics and a dynamic semantic. The static semantics enriches the original Java program by annotating statically inferred information, while the dynamic one gives the executable semantics. Their semantics has been evaluated by model checking multi-threaded programs.

Park et al. [13] presented a formal semantics for JavaScript which has been tested against the ECMAScript 5.1 conformance test and passes all core language tests.

Besides the above languages, the formal semantics of Python 3.3, Verilog, Scheme, LLVM IR and Esolangk were also defined in [10, 15, 16, 17, 18].

Compared to other language semantics in , the most distinguished aspect of our semantics is the formalization of the distinct features of Rust, namely ownership, borrow, lifetime, etc. To address these features, we had to redesign the program state instead of just copying from the existing works. Our semantics has been validated using the Rust standard test suite as well as hand-crafted tests.

V-B Works related to Rust

Rust has attracted some attention of researchers, and there are some works on the Rust type system and Rust program verification.

Reed presented a formal semantics for Rust that captures the key features relevant to memory safety, unique pointers and borrowed references, described the operation of the Borrow Checker, and formalized the Rust type system [19]. The main goal of this work is to provide a framework for borrow checking. However, Rust has been evolved a lot since their work and their semantics has not been implemented yet, hence not executable. Very recently, Jung et al. defined a formal semantics and type system of a language , which incorporates Rust’s notions [20]. Their work has been implemented in Coq. The main goal of this work is to study Rust’s ownership discipline in the presence of unsafe code. However, the language is close to Rust’s Mid-level Intermediate Representation (MIR) rather than the actual Rust language, and their semantics defines more behaviors than Rust does. Our semantics addresses to exact behavior of the actual Rust language and is executable.

Dewey et al. proposed a technique to fuzz type checker implementations and applied to test Rust’s type checker [21]. It has identified 18 bugs. It is evident that formalization of the Rust type system is important, while formalization of the Rust semantics is the first step toward this.

Two model checkers for verifying unsafe Rust programs have been proposed [22, 23], which verify Rust programs by translating them into input languages of existing model checkers. Our semantics can be automatically turned into formal analysis tools such as state-space explorer for reachability, model-checker and symbolic execution engine using the language-independent tools provided by [9].

Vi Conclusion, Limitations and Future Work

In this work, we proposed a formal executable semantics for Rust using the framework. captures (1) all the primitive types and their operations, (2) compound data types: struct, array and vector, (3) all the basic control flow constructs: for, while, loop, if, if/else, function definition/call and (4) three most distinct and important features: ownership, borrow and lifetime. We tested on many hand-crafted tests and Rust’s official tests suite. Tests in supported syntax are all passed. We also demonstrated potential applications of for debugging and verification of Rust programs.

However, does not cover the full features of Rust, as Rust is being actively developed and some of these features are not stable so far. As a witness, although the Rust’s community provides some syntax in EBNF [12], it is still far away from complete. This makes the formalization of Rust much difficult, as mentioned in [20]. The following is a list of features that we haven’t implemented yet but plan to implement: (1) structs with reference fields, (2) pattern matching which can be seen as a generalization of switch-case, (3) trait objects which like interfaces in Java, (4) lifetime annotation which is used to mark explicit lifetime in functions or structs, (5) complex closures which use outside variables, (6) concurrency for writing multi-threading programs, (7) crates and modules which are used to call external library codes (8) unsafe which is used to write code that the Rust compiler is unable to prove its safety, etc. A long-term program is to develop an almost complete formal executable semantics for Rust and formally verify Rust programs using formal analysis tools turned from the semantics, towards which the work reported in this paper is the first cornerstone.

References

  • [1] N. D. Matsakis and F. S. Klock II, “The Rust language,” in ACM SIGAda Ada Letters, vol. 34, no. 3, 2014, pp. 103–104.
  • [2] Redox, “Redox: a unix-like operating system written in rust,” https://www.redox-os.org, 2018.
  • [3] B. Anderson, L. Bergstrom, M. Goregaokar, J. Matthews, K. McAllister, J. Moffitt, and S. Sapin, “Engineering the servo web browser engine using rust,” in ICSE’16, 2016, pp. 81–89.
  • [4] Y. Ding, R. Duan, L. Li, Y. Cheng, Y. Zhang, T. Chen, T. Wei, and H. Wang, “POSTER: rust SGX SDK: towards memory safety in intel SGX enclave,” in CCS’17, 2017, pp. 2491–2493.
  • [5] Stackoverflow, “https://stackoverflow.com/search?q=rust+,” 2018.
  • [6] G. Roşu and T. F. Şerbǎnuťǎ, “An overview of the k semantic framework,” The Journal of Logic and Algebraic Programming, vol. 79, no. 6, pp. 397–434, 2010.
  • [7] C. Ellison and G. Rosu, “An executable formal semantics of C with applications,” in POPL’12, 2012, pp. 533–544.
  • [8] D. Bogdanas and G. Roşu, “K-Java: a complete semantics of Java,” in POPL’15, 2015, pp. 445–456.
  • [9] A. Ştefănescu, D. Park, S. Yuwen, Y. Li, and G. Roşu, “Semantics-based program verifiers for all languages,” in OOPSLA’16.   ACM, 2016, pp. 74–91.
  • [10] D. Guth, “A formal semantics of Python 3.3,” Master’s thesis, University of Illinois at Urbana-Champaign, 2013.
  • [11] D. Filaretti and S. Maffeis, “An executable formal semantics of PHP,” in ECOOP’14, 2014, pp. 567–592.
  • [12] https://doc.rust-lang.org/grammar.html.
  • [13] D. Park, A. Stefanescu, and G. Rosu, “KJS: a complete formal semantics of javascript,” in PLDI’15, 2015, pp. 346–356.
  • [14] C. Hathhorn, C. Ellison, and G. Rosu, “Defining the undefinedness of C,” in PLDI’15, 2015, pp. 336–345.
  • [15] P. O. Meredith, M. Katelman, J. Meseguer, and G. Rosu, “A formal executable semantics of Verilog,” in MEMOCODE’10, 2010, pp. 179–188.
  • [16] P. Meredith, M. Hills, and G. Rosu, “A K definition of Scheme,” University of Illinois at Urbana-Champaign, Tech. Rep., 2007.
  • [17] LLVM IR in K, “http://github.com/davidlazar/llvm-semantics.
  • [18] E. in K, “http://esolang-semantics.googlecode.com.
  • [19] E. Reed, “Patina: A formalization of the rust programming language,” University of Washington, Tech. Rep., 2015.
  • [20] R. Jung, J. Jourdan, R. Krebbers, and D. Dreyer, “RustBelt: securing the foundations of the Rust programming language,” PACMPL, vol. 2, no. POPL, pp. 66:1–66:34, 2018.
  • [21] K. Dewey, J. Roesch, and B. Hardekopf, “Fuzzing the Rust typechecker using CLP (T),” in ASE’15, 2015, pp. 482–493.
  • [22] J. Toman, S. Pernsteiner, and E. Torlak, “Crust: A bounded verifier for Rust,” in ASE’15, 2015, pp. 75–80.
  • [23] F. Hahn, “Rust2viper: Building a static verifier for rust,” Master’s thesis, ETH Zürich, 2016.