K-Rust: An Executable Formal Semantics for Rust

04/17/2018 ∙ by Shuanglong Kan, et al. ∙ Nanyang Technological University 0

Rust is a system programming language designed for providing better memory safety whilst maintaining performance. Formalizing Rust is a necessary way to prove its memory safety and construct formal analysis tools for Rust. In this paper, we introduce an executable formal semantics of Rust using K-Framework (K), called K-Rust. K-Rust includes two parts: (1) the formal model of the ownership system, which is one of Rust's most compelling features for realizing its memory safety and zero-coast abstraction; (2) the formal operational semantics of Rust based on a core-language. The formal models are tested against various programs and compared with Rust's compiler to ensure the semantics consistency between K-Rust and the compiler. Through the construction of K-Rust we detected inconsistencies of the ownership mechanism between the Rust compiler and the specification in The Rust Programming Language.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rust [1] is a systems programming language designed for highly safe systems. The features of Rust set emphasis on memory safety without losing performance. It fulfills this goal by exploiting the ownership type system, which ensures that any Rust program satisfies “No mutation by aliased pointers”. It can prevents various memory unsafe problems, such as dangling pointers and data races. In addition, in Rust’s ownership system, only the owner of a resource is in charge of deallocating the memory. Therefore the memory deallocation can be decided at compiling time. It helps to avoid memory leaks without garbage collection and thus it is of high performance111Garbage collection like in JAVA, decides when to deallocate memory at runtime instead of at compiling time..

The complexity of Rust’s ownership and borrowing mechanisms makes Rust compilers prone to bugs that may compromise memory safety. It is therefore necessary a formal semantics of Rust that may allow not only reasoning on Rust programs, but also to check correctness and its safety mechanisms in the language and in the implementations of Rust compilers. Some pioneering work for laying formal foundation of Rust has been done. Reed [2] provides a formal model of Rust, which includes an ownership and a memory model and they proved the memory safety of Rust ensured by its ownership system. However the models and proofs are not mechanized. Another work is Rustbelt by Ralf Jung et al. [3], which provides a formal semantics of Rust using Coq [4] and Iris [5]. The semantics models the type system and operational semantics of a subset of Rust. The memory safety mechanisms of Rust in this model are proven with machine-checked proofs. Despite of this solid achievement, Rustbelt semantics is not executable and no evaluation is provided. These two models concentrate on abstraction level of Rust semantics. Toman et al. [14] constructs a bounded verifier for Rust’s unsafe library, called CRust, by translating Rust code to C code, which is then verified by CBMC model checker [15]. Florian Hahn et al. [16] build a verifier for Rust by translating Rust to the intermediate language of Viper [17], which is a verification infrastructure for permission-based reasoning.

According to our knowledge, we introduce K-Rust, the first formal executable semantics for Rust. K-Rust has been formalized in the K-framework ([6], a rewrite logic based formal executable semantics definition framework. The semantics of various programming languages have been defined using , such as Java [7], C [8, 9], and Javascript [10].   backends, such the Isabelle theory generator, model checker, and deductive verifier, can help us to prove properties on the semantics and construct verification tools for Rust.

K-Rust semantics covers all the safe constructors in Rust’s ownership and the totality of the Rust type system; only Rust unsafe constructors are not cover by this work. The formalization of K-Rust can be organized into three levels: memory Level, core-language Level, and surface-rust Level. In the memory level, we present a memory model together with memory operations. Based on this model, we formalize the operational semantics of a core-language, which is a functional program language together with memory operations. Rust-surface level programs can be translated into the core-language. The type system is formalized on surface-rust level. All these look like a compiler: a surface-rust program is checked against K-Rust’s type system. If the program is correct then it will be translated into the core-language level, in which it can be executed based on the operational semantics. The semantics is composed of about 300 rewriting rules222For space reasons we do not show them all. The semantics and tests can be obtained at securify.sce.ntu.edu.sg/SoftVer/KRUST/. We test our semantics by executing 50 programs in  environments and comparing the final state with the results with the Rust execution environment. This lead to finding inconsistencies between the Rust compiler and the rules described in [11].

Compared with related work, our semantics is executable and more detailed, strictly following the Rust specification and the compiler implementation for the borrowing and ownership mechanisms. Firstly, within the  environment programs written using K-Rust formal semantics can be executed. Secondly, we closely follow the semantics for the ownership system in Rust’s compiler, which is stronger than the one in [2, 3] therefore we are able to execute and verify real world Rust programs detecting inconsistencies between the specification and the compiler. In [3], they use predicates to model the type system, which is weaker than Rust’s compiler due to the gap between predicates and implementation and it would lead to spurious executions. The compiler selects a stronger condition, which may reject some programs even they satisfy this rule. In other words, the ownership in K-Rust, and in the Rust is a refinement of [2][3]. This is indeed very relevant since our work is the first step to prove this refinement relation and to prove memory safety of the ownership mechanisms in Rust and its implementation in the compiler. In addition,the Rust compiler is developed using Rust itself, therefore our formal semantics and derived tools from it can be used in the verification of the compiler. Actually, constructing K-Rust we detected that the Rust compiler is not consistent with the specification of the ownership mechanism described in [11], since the rules for ownership state that a resource cannot have a mutable and immutable borrow simultaneously. However this is allowed in the compiler although using additional mechanisms they prevent any possible data-race. Our work focusses on the safe constructors of Rust, which are not addressed in the work in [14]. Differently than the work in [16] we consider type-checking semantics and provide a complete formal semantics of the safe constructor of Rust rather than providing a mapping between languages.

The paper is organized as follows: the following section is the Background, Section 3 introduces the memory and core-language level semantics, Section 4 presents the ownership systems, Section 5 and 6 are evaluations and conclusions.

2 Background

In this section, we give a brief description of Rust’s ownership, the mechanism in charge of providing memory safety in Rust, and basic notions and notations of .

2.1 Rust ownership system

Rust’s ownership system consists of three parts [11] (1) ownership, (2) borrowing, and (3) lifetime.

Ownership. The principles of the ownership includes: (1) a variable binding in Rust has the ownership of what they bound to. For instance, “let v = vec![1,2]” declares a binding such that x

is the owner of the vector allocated in the heap. The owner of a resource is responsible for deallocating it, i.e., when

x goes out of the scope the vector will be deallocated. (2) Rust ensures that there is exactly one binding to any given resource. Therefore the binding “let v’ = v” transfer the ownership from v to v’ and v is set uninitialized.

Borrowing. Borrowings are also known as references, which create aliases to resources. A variable borrowing a resource from an owner, can read or write the resource, but cannot deallocate it. Therefore, any borrowing cannot live longer than the owner that it was borrowed from. Another two principles are (1) one or more immutable borrows are allowed to be shared by a resource and (2) exactly one mutable borrow is allowed per resource 333Immutable borrows can only read the resource and mutable borrows can both read and write the resource. These two rules ensure “No mutation by aliased pointers”, which help to rule out dangling pointers and data race in Rust.

Lifetimes. Lifetimes define the scope in which a variable is alive. For instance, the code { let x = E; { let y = x; } } creates two lifetimes with paired “{” and “}”, x is in the first lifetime, y is in the second lifetime and x lives longer than y since the second lifetime is nested in the first one.

2.2 K-framework

K-framework is a rewriting logic based semantics definition framework. A   model consists of three parts: configurations, computations, and rules. Configurations represent the states of programs, which can be used to store execution environments, function call stacks, and heaps, among other structures. Configurations in  are denoted as nested cells. The content in each cell can be another cell or the basic types, such as lists and maps. For instance it is possible to model a thread in  as: [1ex]¡ [1ex]¡ Program ¿ [1ex]¡ x1,y2,... ¿ [1ex]¡ .List ¿ ¿ where the cell is composed of cells: the cell stores the program to be executed, the cell stores the map from variables to values, and the cell is a function call stack, which is modeled as a List. Here, is a map, and is a list. Types prefixed with “.” mean empty structures. For example, .Map is an empty map.

Computations sequentialize Abstract Syntax Trees (AST) into a list of computation tasks. For instance, an assignment x := y + 1 can be sequentialized in tasks, separated by the operator “”, as y val(y) + 1 x := val(y+1). It means that in order to compute x:=y+1, we need to: (1) compute the value of y, (2) use the value of y to compute the value y+1, (3) assign the value of y+1 to x.

Rules are a set of rewriting rules triggering actions on configurations, i.e., state transitions.   rules make it explicit which parts of a configuration they read, write or do not care about it. The rewriting is represented as two terms separated by a horizontal line. The term above the line is rewritten to the term below the line when it is triggered. A rule will be triggered when its reading parts match the current configuration.

The following are two rules for illustrating the rewriting.

(Rule 1) [1ex]¡ x := y y x := HOLE ¿(Rule 2) [1ex]¡ x := 1 .K ¿ [1ex]¡ x(_ 1) ¿

The cell in K-framework represents the default cell for storing computation sequences, i.e. a sequence separated by . Rule 1 rewrites the computation x:=y, which is on the top of , into two actions: the evaluation of y and the assignment of the value of y to x. When the value of y is obtained in the first task,   puts the value in HOLE for the second task. HOLE is a  built-in placeholder. The symbol “...” denotes the part we do not care about.

Rule 2 is a statement that assigns 1 to x. In the cell, the assignment is rewritten to an empty item .K, i.e., the assignment is consumed. In , the value of of x is rewrite to 1. The symbol “_” denotes any value. It means that after executing the assignment, the value of x is rewritten to 1. Note that, we use two styles to denote rewriting. The first one is a horizontal line between two terms. The second one is a symbol “” denoting that the action rewrites a configuration satisfying the left side of “”, with the right side of “”.

 is composed of a number of user tools helping users to use the analysis framework. In particular, command krun executes programs for compiled semantics specified in  (compilation of semantics is carried out using another command, kompile). krun outputs configuration transitions from an initial configuration in the semantics to a final configuration.

3 Operational Semantics of Core-language

In this section, we introduce the operational semantics of the core-language. Since the memory model is crucial for the semantics and the operational semantics is built on it, we will present it firstly.

3.1 Memory Model

The memory model stores the values of declared variables in blocks containing a number of units of primitive data types, which depends on the base type of the variable. The use of blocks allows to create compounds datatypes such arrays and structs. Additionally, the memory model also records the status of memory addresses to keep track of ownership of variables. Detailed explanation on the semantics of modelled datatypes and their syntax can be found in Section 4. The configuration for the memory model in K-Rust is represented by:

[1ex]¡ [1ex]¡ 0:Int ¿ [1ex]¡ .Map ¿ [1ex]¡ [1ex]¡ .K ¿ [1ex]¡ 0:Int ¿ [1ex]¡ .Map ¿ ¿ ¿

The cell contains all the elements in the memory layouts. The cell denotes the number of allocated blocks. The cell is a map from memory addresses to their corresponding statuses. A memory status is a pair of integers , where is the number of operations reading the memory block and is the number of operations writing the block. A is a triple , where is the memory address of the block, is the number of units in the block, and is the map from the indexes of units to their corresponding values. The cell with “*” means multiplicity, i.e., we can create multiple blocks.

This memory model gives a uniform representation for different types, which helps to model the operational semantics (Appendix 0.A illustrates the uniform memory representations of four different types). It is also easy to append values to the end of a block, which will help to model vectors in Rust. The memory operations include: (1) allocating a block, (2) memory reading and writing atomically or non-atomically, (3) appending a block, (4) freeing a block, and (5) compare and swap operations.

Memory Allocation. Allocation operations are responsible for allocating a block in memory for a variable. The term allocate(I:Int) allocates a block of size I in the memory. Rule Allocate-Int is its corresponding   rule.

RULE Allocate-Int

[1ex]¡ allocate(I:Int) createUnits(N,I) ¿ [1ex]¡ .Map N memstatus(0,1) ¿ [1ex]¡ N:Int N +Int 1 ¿

[1ex]¡ .Bag [1ex]¡ [1ex]¡ addr(N) ¿ [1ex]¡ I ¿ [1ex]¡ .Map ¿ ¿ ¿

The integer in is consumed as the memory address of the allocated block, and its value increases one for the next allocation. allocate(I:Int) is rewritten to createUnits(N,I) to create all units in the block. In the cell, N is mapped to memstatus(0,1), i.e. the block is being written and no one has access to it. The cell creates a new block using an empty Bag444Bags are multisets. The address of the new block is addr(N) and the number of units is I. Units in will be created by the term createUnits. Atomic operations are straightforward therefore we will exclusively focus on non-atomic operations, omitting Compare and Swap (CAS), and atomic reading and writing operations.

Non-Atomic Memory Reading. Non-atomic readings consist of two steps: readna and readnac specified by Rules Non-Atomic-Read and Non-Atomic-Read-Finish, respectively. The first step, Non-Atomic-Read, increases the number of the reading operations in the cell by 1, and rewrites readna to readnac. The second step, Rule Non-Atomic-Read-Finish, reads the value of the address N with the offset I, and decreases the number of reading operations by . The two steps indicate that the reading is non-atomic.

The writing operations are similar to the reading operations updating some units in blocks. Appending operations increase the size of a block, and Free operations deallocate a block so that the block cannot be used again. All these rules can be found in our code.  

RULE Non-Atomic-Read

[1ex]¡ readna(addr(N:Int),I:Int) readnac(addr(N),I) ¿ [1ex]¡ Nmemstatus( K:Int K +Int 1 ,_) ¿

RULE Non-Atomic-Read-Finish [1ex]¡ readnac(addr(N:Int),I:Int) V ¿ [1ex]¡ addr(N) ¿ [1ex]¡ IV:Value ¿ [1ex]¡ Nmemstatus(K:Int K -Int 1),_) ¿

3.2 Operational Semantics

The core-language is a pure functional language expressive enough to capture the behaviour of Rust constructors. Fig. 1 shows a subset of the syntax of the core-language, which is selected to illustrate the operational semantics. It shows the basic computation structures in the core-language: variables (), dereferences (), arithmetic expressions (), branches (), function definitions and calls (), assignments (), and creation of new threads (). At this level there is not any notion of variable declarations. New variables are introduced by function arguments and the environment keeps values passed to functions when invoked.The functions arguments can be addresses to the global memory which is accessed through the rules defined in Section 3.1. The configuration for the operational semantics is described by:

[1ex]¡ [1ex]¡ [1ex]¡ [1ex]¡ $PGM:Exp ¿ [1ex]¡ Map ¿ [1ex]¡ List ¿ ¿ ¿ [1ex]¡ [1ex]¡ Int ¿ [1ex]¡ Map ¿ [1ex]¡ [1ex]¡ Int ¿ [1ex]¡ Map ¿ [1ex]¡ FnParams ¿ [1ex]¡ K ¿ ¿ ¿ [1ex]¡ Mem ¿ ¿ The configuration for the core-language includes the memory configuration together with: (1) , which define the configuration of program threads, (2) , which define the configuration of functions. The cell is a multiset of configurations. Each consists of (1) the code to be executed (wrapped in the cell ); (2) the environment cell , which maps the variables passed to functions to their corresponding values, which can be any type defined by the type system. (3) the stack cell , which stores the environments in the function call stack used to recover the environment when returning from functions calling.

Figure 1: The syntax of core language

Functions. In K-Rust, a function is viewed as an expression and its value is a closure that can be evaluated. A closure represents a functional value. It consists of: (1) the unique identifier stored in the cell ; (2) the map from variables to their values stored in , which will be used in the computation of the function body; (3) the parameters of a closure, stored in ; (4) the function body stored in . There are two additional cells in : (1) stores the number of created closures; (2) , stores the map from function names to the corresponding closure identifiers.

Rule Function-Definition shows the creation of a closure from a function definition. It creates a new integer C as the identifier of the closure and stores the map from the function name F to C. It creates a new closure in the cell . The context, parameters, and function body are copied to the corresponding cells in the configuration.

RULE Function-Definition [1ex]¡ fn F:Id (Ps:FnParams) {E:Exp} cr(C) ¿ [1ex]¡ CC+Int 1 ¿ [1ex]¡ Rho ¿ [1ex]¡ .Map F C ¿ [1ex]¡ .Bag [1ex]¡ [1ex]¡ C ¿ [1ex]¡ Rho ¿ [1ex]¡ Ps ¿ [1ex]¡ E ¿ ¿ ¿ RULE Function-Call [1ex]¡ cr(I:Int) (VL:Values) fnCalls(I,P,VL) ¿ [1ex]¡ [1ex]¡ I ¿ [1ex]¡ Rho:Map ¿ [1ex]¡ P:FnParams ¿ ... ¿ [1ex]¡ PRho:MapRho ¿ [1ex]¡ .List => ListItem(PRho) ¿ RULE Full-Application [1ex]¡ fnCalls(I,.Values,.FnParams) computeFunBody(B) ¿ [1ex]¡ [1ex]¡ I ¿ [1ex]¡ B:Exp ¿ ... ¿ RULE Partial-Application [1ex]¡ fnCalls(I,Ps,.Values) cr(C) ¿ [1ex]¡ [1ex]¡ I ¿ [1ex]¡ E ¿ ... ¿ [1ex]¡ ListItem(Rho1).List ¿ [1ex]¡ Rho Rho1 ¿ (.Bag [1ex]¡ [1ex]¡ C ¿ [1ex]¡ Ps ¿ [1ex]¡ Rho ¿ [1ex]¡ E ¿ ¿ ) [1ex]¡ C:IntC+Int 1 ¿ RULE Fork [1ex]¡ fork{E:Exp} .K ¿ [1ex]¡ R ¿ [1ex]¡ .Bag [1ex]¡ [1ex]¡ E ¿ [1ex]¡ R ¿ [1ex]¡ .List ¿ ¿ ¿

The application of a closure to some arguments yields another closure. Rule Function-Call rewrites cr(I)(VL), where I is the identifier of a closure and VL is an argument list, to fnCalls. Term fnCalls is an intermediate term for binding arguments to parameters. The current environment is pushed on top of the stack in the cell firstly and then replaced with the context of the closure I. Rule Full-Application deals with the full application of a closure, i.e., all parameters have corresponding arguments, where .Value and .FnParams are empty structures. In this case, the sort fnCalls is rewritten to the computation of the body, i.e., the expression B. The sort computeFunBody is for the computation of the body. Rule Partial-Application deals with the partial application of a closure. It creates a new closure identifier with a fresh integer C from an empty bag. It requires Ps not being empty. After creating a new closure, pops up to restore the current environment. In addition, we also have a tail function call, which avoids allocating spaces in stack. It is designed to translate sequential computations to tail function calls.

The fork expression fork{E:Exp} creates a new thread executing the expression E. Rule Fork creates a new thread from an empty bag.

Dereferences and assignments are supported by the memory model rules. Dereferences read values from the memory. For instance, “* na x” non-atomically reads a value from the location of the variable x. Assume that, in the configuration, x points to the address location(A,I) in the cell . It means that x is a pointer to the memory address location(A,I), where A is the address of a block and I is the offset within the block. Therefore * na x is rewritten to the memory operation readna(addr(A),I). Assignments are of the form Exp := Order Exp, where Order indicates whether the writing is atomic or non-atomic. For instance, “x := na V” is rewritten to writena(location(A,C),V), where location(A,C) is the address of x.

4 Type System of K-Rust

In this section, we introduce the type system of K-Rust and its ownership system. The type system is defined on the surface-Rust language in K-Rust. Surface-Rust can be translated into the core-language. Surface-Rust extends the core-language with variable modifiers to carry out ownership checking. This includes mutabilities, bindings, and lifetimes, among others. These modifiers do not have effect on the execution of programs (operational semantics), but help to ensure type-checking correctness.

The types in K-Rust are defined in Fig. 2, which include scalar types such as i32 and bool, pointer types, and compound types. Pointer types can be a reference type or an own type. Compound types can be a product type, a sum type, and a function type555List in the definition of types is a built-in symbol in . List{S,”,”} denotes a list of the symbol “S” separated by “,”. The reference type (ref ) has elements, where and denotes the lifetime and type of an owner that it borrows, respectively. indicates whether this borrow is mutable. The owner type (own) represents the owner of a resource of the type . The product type is a tuple or record. The lifetime list are only used in the declarations of reference types in the list in prod. in a product type is the types of all fields of the product type. Sum types are the dual of the product types, which are also know as discriminated unions. The three elements wrapped in the function type fnTy from left to right represent lifetime variables, parameter types, and return types, respectively.

Figure 2: The Syntax of Types in Rust

4.1 Configuration of Type System

The configuration of the type system is shown in the following:  
[1ex]¡ [1ex]¡ $PGM:Rust ¿ [1ex]¡ [1ex]¡ 0 ¿ [1ex]¡ .Map ¿ ¿ [1ex]¡ .Map ¿ [1ex]¡ .List ¿ [1ex]¡ 0 ¿ [1ex]¡ .Map ¿ [1ex]¡ [1ex]¡ [1ex]¡ 0 ¿ [1ex]¡ 0 ¿ [1ex]¡ 0 ¿ [1ex]¡ .Map ¿ ¿ [1ex]¡ 0 ¿ ¿ ¿

The cell initially stores a surface-Rust program. The cell (denotes variable context) consists of two nested cells: and , The cell stores the number of variables being created and it is used to generate indexes of variables to uniquely identify a variable. Consider the following 3 bindings: let x = Box::new(1); let y = &x ; let x = Box::new(2); There are two bindings to and the second binding shadows the first one. But the binding of always points to the first even though it is shadowed. Therefore we use a unique integer as the index of a variable to identify it. The cell is a map from the indexes of variables to the their corresponding type information.

The cell stores the map from available variables to their corresponding indexes. The cell is a list, treated as stack. When a new lifetime starts, the current is stored in the stack and restored when the lifetime ends. The cell stores the current lifetime which is an integer.

The cell stores the definitions of sum types and product types. The sub-cell stores the number of compound types, The sub-cell have the property multiplicity. This property enables us to store multiple sum or product types definitions. Each has elements: (1) : the unique Id of the type, (2) : indicates whether this type is a sum type or a product type, (3) : stores all fields of the type, (4) : the number of the fields of the type.

4.2   Rules for the Type System

In this subsection, we introduce the   rules for the type system (The syntax of surface-rust can be found at Appendix 0.B). Fig. 3 shows the architecture of K-Rust type system. “TC” in the figure is short for “Type Checking”. The arrows in the figure are decomposition relations. As shown in Fig. 3, Function TC is decomposed into four parts: (1) lifetime TC, (2) parameters TC, (3) Expressions (it is the body of the function) TC, and (4) return TC. Rule Decompose-Function shows this decomposition. The cell in the configuration stores the type of the function F666In the surface-rust function type and function definition are separated. The terms newlft and endlft correspond to the creation and ending of a lifetime, respectively. The term bindParamTys performs parameter TC first and then bind types to the parameters. The term rtTyCk(E,T) has a “strict” attribute on E in , which means that the expression E (function body) is firstly computed and then compares the type of E with the type T for return TC. Rule Binding-Decompose illustrates the decomposition of variable binding TC in Fig. 3. A binding from a RValue R to the variable X is decomposed into create variables (createVar), binding information from right to left (processLR), and continue to the computation of E.  

RULE Decompose-Function

[1ex]¡ fun F:Ident (Ps:CIdents) newlft E:Exp endlft newlftbindParamTys(Ps,Ts,Ls)rtTyCK(E,T)endlft ¿

[1ex]¡ FfnTy(Ls:LftVars;Ts:RTypes;T:RType) ¿

RULE Binding-Decompose [1ex]¡ let X:Ident = R:RValue in E:ExpcreateVar(X,imm)processLR(lhs(X),R) E ¿
Figure 3: The architecture of K-Rust type system

The number of   rules for the type system is more than . Therefore we select to introduce the type system only for the modules RValue Type Evaluation, Branch Checking, and Lifetime TC in Fig. 3, which are close to the ownership system.

Firstly, we discuss some modeling strategies for variables. In K-Rust, we need to maintain the type information of all variables. The information of a variable consists of two parts: (1) a static part, which will never be modified during the type checking procedure, including the lifetime, mutability, and type of a variable. (2) a dynamic part, which might be modified during the type checking procedure, including the status of borrows and initialization. Therefore the information is a 6-tuple , where is the lifetime of the variable, is the mutability of the variable, is the type of the variable. is the biggest lifetime in which the variable is immutable borrowed, is the biggest lifetime in which the variable is mutable borrowed, is the initialization status. In , we use the term varInfo(Int,Mutability,RType,Int,Int,Bool) to store the information of a variable.

There is a total order (denoted as ) over variables in K-Rust corresponding to the creation order of variables. Since we use natural numbers as the indexes of variables, the indexes indicate the creation order of variables. 777If the index of a variable is then the variable is the th created variable in the system. The total order of variables inherits the total order of natural numbers defined by the less than operator “”. This total order is also compatible with the order of variables in the stack. In Rust, a newly created variable is always on top of the stack. If a variable is on top of another variable in the stack, then we always have in K-Rust, and cannot be borrowed by . This order is for borrow checking.


1. RValue Type Evaluation

RValues can be a variable, a deference (* RValue), a borrow (& Mutability RValue) and a field reference (RValue.Int). The RValues of scalar types are easy to deal with. We focus on RValues of pointers. Consider the following binding:

let x = & mut p in ...

The binding is from the RValue& mut p” to x. In order to type check the RValue, the following information should be inferred: (1) whether p is initialized, since only initialized variable can be read in RValue, (2) whether p is mutable so that it can be borrowed mutable, (3) whether p has been borrowed before, (4) the type and lifetime of p. A more complex RValue could be “& mut * * p”. In order to type check it, we need to infer the aboveinformation for * * p.

In order to reason on the information of an RValue, we need the following contexts: (1) is the type context for RValue, means that is of type in , (2) is a set of RValues mutable borrowed, (3) is a set of RValues immutable borrowed, (4) is a set of mutable RValues. (5) is a map from RValues to their corresponding lifetimes. Fig. 4 introduces inference rules for the type evaluation of an RValue.

[(1)] Γ⊢& m R:Ref(L,m,own(T))↝R∈B_m,L(& m R)=CL Γ⊢R:own(T) R∈M¯B_m¯B_iL(R)=L [(2)] Γ⊢(& i R:ref(L,m.own(T)))↝R∈B_i, L(& i R)=CL Γ⊢R:own(T)R∈¯B_m


[(3)] Γ⊢(&   m  R:ref(L,m,ref(L’,M,T)))L(&  m R)=CL, R∈B_m Γ⊢R:ref(L’,M,T) L(R)=LR∈M¯B_m¯B_i [(4)] Γ⊢(*R:T) ↝*R∈B_i, L(*R)=L Γ⊢R:ref(L,i,T)

[(5)] Γ⊢( &   i  R:ref(L,i,ref(L’,M,T))) ↝L(& i R)=CL, R∈B_i Γ⊢R:ref(L’,M,T) & L(R)=L & R∈¯B_m

[(6)]Γ⊢(*X : T)Γ⊢X:own(T) [(7)]Γ⊢(*R:T)↝*R∈B_m, L(*R)=LΓ⊢R:ref(L,m,T)
Figure 4: The inference rules for RValue

Rules (1) and (7) are selected for explaining the evaluation. In Rule (1), the premises include: the RValue is of an own type , the RValue is mutable, and neither mutable nor immutable borrowed, ( equals and , is the complement of ), the lifetime of is . The conclusions are is well-typed in ( and denote mutable and immutable borrow of , respectively). The symbol means that if “” is added to then the contexts are modified as (1) is mutable borrowed now, i.e., , (2) the lifetime of is the current lifetime . Rule (7) is for dereferencing a reference type. Its premise requires the RValue is a reference type . We can obtain that is of type . After adding the type of to , we can infer that has been borrowed by .

In order to model these rules in , we use the following term: expTy(RValue, RType,Int,Int,Int,Mutability), where RValue is the expression, RType is the type of the expression, the three integer from left to right denote the lifetime of the expression, immutable borrowed or not, and mutable borrowed or not, respectively. Mutability indicates whether this expression is mutable.

Rule Mut-Bor-Own illustrates the modeling of Rule (1) in Fig. 4. The type own(T) in expTy means that V has the own type. The two integers -1,-1 indicate that V is neither immutable nor mutable borrowed. These are the premises of Rule M-B-own. If all these premises are satisfied then the mutable borrow is rewritten to another expTy, whose RValue is & mut V, with type ref(L,mut,own(T)). The lifetime of & mut V is the current lifetime. The RValueV” is set as mutable borrowed by the term setBorrow. Rule Deref-Mut-Borrow corresponds the Rule 7 in Fig. 4.  

RULE Mut-Bor-Own

[1ex]¡ & mut expTy(V:RValue, own(T),L,-1,-1,mut) expTy(& mut V,ref(L,mut,own(T)),CL,-1,-1,mut) setBorrow(V,mut,CL) ¿

[1ex]¡ CL:Int ¿ [1ex]¡ VvarInfo(L,_) ¿

RULE Deref-Mut-Borrow [1ex]¡ * expTy(V:RValue,ref(L,mut,T),_,_,_,_) expTy(*V,T,L,-1,1,imm) ¿

The evaluation becomes more complex when dealing with branches and product types. Two branches may result in different variable statuses. For instance, an uninitialized variable maybe be initialized in one branch and retain uninitialized in the other branch. A variable may borrow different resources in different branches. For instance, Listing 1 defines a product type Point with two fields of reference types. Three variables x,y,z are borrowed by two variables p1,p2 of the type Point. At line 6, the if-then-else branch makes it possible that k points to p1 or p2. RValue *k.2 can point to either y or z, which leads to different lifetimes for the type of *k.2. In order to solve this condition, we introduce Reference Dependence Graphs (RDGs). A RDG is designed for describing all possible reference relations between variables. Figure 5 illustrates the RDG of Listing 1.1. The solid lines with arrows denote the reference dependences. The dashed lines denote the field relations of product types, which is not presented in RDGs explicitly but can be easily inferred. The variable k points to p1 and p2. It means that it is possible for k to borrow p1 or p2 due to the two branches. Therefore (*k).2 can be the references to y or z. Therefore the type of (*k).2 is ref(L,imm,own(i32)), where L is the larger lifetime of y and z.

1Point :=: {’a, ref(’a,imm,own(i32)),ref(’a,imm,own(i32))}
2fn main() newlft
3 let x = new(i32) in let y = new(i32) in newlft
4 let z = new(i32) in let p1 = new(Point,{&imm x, &imm y}) in
5 let p2 = new(Point,{&imm x, &imm z}) in let k in
6 if (…) then { k := p1 } else {k := p2}; (*k).2 endlft endlft
Listing 1: Example 1
Figure 5: The RDG of the running Example

We write a library for RDG operations, where RDG edges are represented as a map. For each element I:Int S:Set in the map, I is the index of a variable, S is the set of variables that I points to. The operations for RDG include: adding new edges, modifying edges, merging two RDGs, calculating the direct successor nodes of a node, etc. The sort next(S:Set,R:Map) computes the successors of the nodes in S with respect to the RDG Rho. Rule Next-RDG illustrates the computation of next. “SetItem(V) S:Set” is a set. V is one of its elements and S is the rest elements. The term getbyKey(V,Rho) computes the successors of the single node V. The term next(SetItem(V) S,Rho) returns the union of getbyKey(V,Rho) and the successors of the other elements, i.e. next(S,Rho).  

RULE Next-RDG

[1ex]¡ next(SetItem(V) S:Set, Rho:Map) setUnion(getbyKey(V,Rho),next(S,Rho)) ¿

  rules for lifetimes are responsible for the creation and ending of a lifetime. The creation of a lifetime generates an unique index for the new lifetime. In addition, the current environment is pushed in the stack. The ending of a lifetime (In Rule Lifetime-End) is rewritten to removeLifetime. This term is responsible for removing died variables from the cell and RDG due to the ending the lifetime. In addition, the cell and are restored.  

RULE Lifetime-End

[1ex]¡ endlft removeLifetime(L,R,.Map,.Set) ¿ [1ex]¡ L:Int L -Int 1 ¿

[1ex]¡ ListItem(Rho).List ¿ [1ex]¡ _ Rho:Map ¿ [1ex]¡ R:Map ¿

4.3 Translation to Core-Language

Translation from surface-rust to core-language is almost straightforward. Table 1 presents a subset of the translation rules from surface-rust to core-language. The translation for needs to compute the size of the type . The sequential composition separated by ”;” is translated to a tail function call, which means that this function will not allocate any stack space for the call. The evaluation order of arguments mimics the execution order of and .

surface-rust core-language
, where is the size of the type
, where is a variable
Table 1: A subset of the translation from surface-rust to core-language

5 Applications and Testing

Based on the operational semantics of K-Rust,   back-ends provide various tools for formal analysis.   provides pre- and post-conditions verification by Matching Logic [12].   also supports model checking and symbolic executions. In fact,   aims to provide a semantics-based program verifier for all languages [13] by defining   semantics for languages.

Due to the space limitation, we put a K-Rust program implementing the queue datatype in the Appendix 0.C to illustrate the ways to execute our type checking and operational semantics. The K-Rust models and tests can be download from securify.sce.ntu.edu.sg/SoftVer/KRUST/.

The correctness of formal semantics is critical to the successive work based on the semantics and it is necessary to evaluate them to get certain degree of confidence on its correctness. K-Rust can be tested to check its correctness thanks to

’s executable character. The testing procedure includes the following three steps: firstly, type checking a surface-rust program with K-Rust’s type system. If the program is correct then it is translated into a core-language representation to test its operational semantics. The testing benchmarks can be classified into the following: (1) ownership test, (2) branch testing, (3) product type testing, (5) sum type testing, (6) lifetime testing, (7) function testing. We model examples from

The Rust Programming Language[11] in K-Rust and compare the results with Rust’s compiler to ensure both executions are equivalent. During the construction of K-Rust we detected an inconsistency in the ownership mechanism as specified in the The Rust Programming Language[11].

In let mut x = new(i32) in let y = & mut x in let z = & imm (* y) y is a mutable borrow of x and z is an immutable borrow of x (this borrow is obtained indirectly through y). However, this is not allowed in the rules described in [11]. Although the compiler implements a freezing mechanism on y when binding z to avoid possible data-races, and hence conflicts with the official description. Indeed this kind of inconsistencies may be source of bugs in the implementation. Another conflict between the specification and the compiler can be found at Appendix 0.D.

6 Conclusion

Formal semantics is always the prerequisite for formal program verification and building reliable programming tools, such compiler, type-checkers. In this paper, we introduce a formal semantics for Rust using . The formalization is close to the implementation of Rust’s compiler, which can perform ownership checking and execute real programs according the semantics. This work is the first step to construct a safe and reliable Rust programming environment. Future work includes to provide automatic translation between the surface-language and core-language, semantics for non-safe Rust constructors, and to prove refinement between K-Rust ownership system and existing abstract principles of the ownership system. Since   provides backends to theorem provers such Isabelle, this work can be done using these theorem provers.

References

  • [1] Rust Team. The rust language homepage. https://www.rust-lang.org/en-US/.
  • [2] Eric Reed. Patina: A formalization of the rust programming language. Technical report, University of Washington, 2015.
  • [3] Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. RustBelt: Securing the Foundations of the Rust Programming Language. Proc. ACM Program. Lang. 2, POPL, Article, January 2018.
  • [4] The coq proof assistant. http://coq.inria.fr, 1999-2018.
  • [5] Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’15, pages 637–650, New York, NY, USA, 2015. ACM.
  • [6] Grigore Roşu and Traian Florin Şerbănuţă. An overview of the K semantic framework. Journal of Logic and Algebraic Programming, 79(6):397–434, 2010.
  • [7] Denis Bogdănaş and Grigore Roşu. K-Java: A Complete Semantics of Java. In Proceedings of the 42nd Symposium on Principles of Programming Languages (POPL’15), pages 445–456. ACM, January 2015.
  • [8] Chris Hathhorn, Chucky Ellison, and Grigore Roşu. Defining the undefinedness of c. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15), pages 336–345. ACM, June 2015.
  • [9] Chucky Ellison and Grigore Rosu. An executable formal semantics of c with applications. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12), pages 533–544. ACM, January 2012.
  • [10] Daejun Park, Andrei Ştefănescu, and Grigore Roşu. KJS: A complete formal semantics of JavaScript. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15), pages 346–356. ACM, June 2015.
  • [11] Rust Team. The Rust Programming Language. Mozilla Research, 2016. https://doc.rust-lang.org/book/second-edition/.
  • [12] Grigore Roşu. Matching logic. Logical Methods in Computer Science, 13(4):1–61, December 2017.
  • [13] Andrei Ştefănescu, Daejun Park, Shijiao Yuwen, Yilong Li, and Grigore Roşu. Semantics-based program verifiers for all languages. In Proceedings of the 31th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’16), pages 74–91. ACM, Nov 2016.
  • [14] John Toman, Stuart Pernsteiner, and Emina Torlak. Crust: A bounded verifier for rust (N). In Myra B. Cohen, Lars Grunske, and Michael Whalen, editors, 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, pages 75–80. IEEE Computer Society, 2015.
  • [15] Edmund M. Clarke, Daniel Kroening, and Flavio Lerda. A tool for checking ANSI-C programs. In Kurt Jensen and Andreas Podelski, editors, Tools and Algorithms for the Construction and Analysis of Systems, 10th International Conference, TACAS 2004, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29 - April 2, 2004, Proceedings, volume 2988 of Lecture Notes in Computer Science, pages 168–176. Springer, 2004.
  • [16] Florian Hahn. Rust2viper: Building a static verifier for rust. Master’s thesis, ETH Zürich, 2016.
  • [17] Peter Müller, Malte Schwerhoff, and Alexander J. Summers. Viper: A verification infrastructure for permission-based reasoning. In Barbara Jobstmann and K. Rustan M. Leino, editors, Verification, Model Checking, and Abstract Interpretation - 17th International Conference, VMCAI 2016, St. Petersburg, FL, USA, January 17-19, 2016. Proceedings, volume 9583 of Lecture Notes in Computer Science, pages 41–62. Springer, 2016.
  • [18] Thomas Bracht Laumann Jespersen, Philip Munksgaard, and Ken Friis Larsen. Session types for rust. In Patrick Bahr and Sebastian Erdweg, editors, Proceedings of the 11th ACM SIGPLAN Workshop on Generic Programming, WGP@ICFP 2015, Vancouver, BC, Canada, August 30, 2015, pages 13–22. ACM, 2015.

Appendix 0.A Examples for Memory Model

The following block representations illustrates how to use the memory model to express scalar type, arrays, product type, and sum type values.  
(M1) [1ex]¡ [1ex]¡ 0 ¿ [1ex]¡ 1 ¿ [1ex]¡ 02 ¿ ¿ (M3) [1ex]¡ [1ex]¡ 2 ¿ [1ex]¡ 3 ¿ [1ex]¡ 00,1|->addr(0),2|->9 ¿ ¿  
(M2) [1ex]¡ [1ex]¡ 1 ¿ [1ex]¡ 2 ¿ [1ex]¡ 06,19 ¿ ¿ (M4) [1ex]¡ [1ex]¡ 3 ¿ [1ex]¡ 2 ¿ [1ex]¡ 01,1"Hi" ¿ ¿

We start with blkNum being zero. Firstly, a block is created for an integer (scalar type) , which is shown in (M1). The memory address of the block is , the value is stored in the map 02. (M2) is the representation of an integer array [6,9]. The address (baddress) of the array is , and bnum indicates that there are two elements in the array, which are stored in bstore. (M3) shows the value of a product type which can be defined using C structs like: struct X { x: Bool; y: Address; z: Int; }. The block address is and there are three elements in bstore. The pair 00 in bstore indicates that the value of x in the product type is , which denotes false. The pair 1address(0) indicates that the value of y is the address , i.e., the (M1) representation. The pair 29 indicates that the value of z is . (M4) is the representation of a sum type. Consider a sum type in Rust option<String>. The value of this type can be None (no data) or a string some(String). The memory representation for this kind of value only have two fields: one is for indicating which case the variable selects, the other one is for storing the value of the case. For option<String>, we use to indicates the two cases: for case 1 and for case 2. The pair 01 in bstore indicates that it selects the second case. The pair 1|->"Hi" stores the values.

Appendix 0.B The Syntax of Surface Rust

A subset of the syntax of Surface Rust used in this paper. Type checking is performed on these structures.

Appendix 0.C Queue Example

The data structure for queues is defined as a product type with fields, where the first field of the type i32 is the capacity of the queue, the second and third fields of the type i32 are the head and tail of the queue, respectively, the last field of the type own(array(i32)) is an array for storing elements in the queue. The type Option is a sum type, It can be a bool value or an integer value. It is used in the function get, which gets an element from the head of a queue. If the queue is empty, it will return an Option type with false, otherwise it returns an Option type with an integer in the head of the queue. The function put puts an element at the tail of a queue. The function for mimics loops with recursive functions. The main function firstly create a queue with the capability of and then puts elements to the queue. Finally, it gets the first element of the queue.

1Queue :=: prodTy(i32,i32,i32,own(array(i32)))
2Option:=: sumTy(bool,i32)
3get   :=: fnTy(’a;ref(’a,mut,own(ty(Queue)));own(ty(Option)))
4put   :=: fnTy(’a;i32,ref(’a,mut,own(ty(Queue)));bool)
5for   :=: fnTy(;ref(’a,mut,own(ty(Queue))),i32;void)
6main  :=: fnTy(;;void)}
7fun get(q) newlft
8 let return = new(ty(Option)) in {
9  if ((*q).3-(*q).2 = (*q).1) then {return :=inj 1 false}
10 else { return :=inj 2 (*q).4.((*q).2)};
11 (*q).2 := (*q).2 + 1;return} endlft
12fun put(e,q) newlft
13  let return in {
14    if ((*q).3-(*q).2 = 0)  then {return := false}
15  else {(*q).4.((*q).3 mod q.0) := e; (*q).3 := (*q).3 + 1;
16     return := true}; return} endlft
17fun for(q,i) newlft
18 if (i > 0) then { call put(i,q); call for(q, i - 1)} else {void}
19endlft
20fun main() newlft
21 let mut q = new(ty(Queue)) in {
22  q.1 := 5;q.2 := 0;q.3 := 0;q.4 := new(i32,5);call for(& mut q, 6);
23 let re = call get(& mut q) in{case re of {false, re.1 = 6}}}
24endlft
Listing 2: A surface-rust program for queues

K-Rust can check whether the program is well-typed by means of the execution of the type checker semantics using ’s “krun” command. The program in Listing 2 passes the K-Rust type system. If we change the type ref(’a,mut,own(ty(Queue))) in the functions “put” and “for” to own(ty(Queue)), it breaks Rust ownership system and the type-checking fails, since in the function “for”, when we call “put”, q is moved, therefore it cannot be used again.

If a Rust program is correct then it is translated into a corresponding core-language program. Listing 3 shows the core-language program.

1fn get (q){
2 (fn  (return)
3  {case (* na (q.2))-(*na(q.1))==0 of
4    {(return.0:=na 2;
5      return.1:=na *na(((*na(q.3)).((*na(q.1))mod(*na(q.0)))))),
6     (return.0:=na 1; return.1 := na 1)};
7     q.1 := na ((* na (q.1)) + 1) ;
8     return }) ( allocate(2) ) };
9fn put (e, q){
10 (fn  (return)
11 {case (((* na (q.2)) - (*na (q.1))) == (* na (q.0))) of
12  {((* na (q.3)).((* na (q.2)) mod (* na (q.0))) :=na e;
13   q.2 :=na (* na (q.2)) + 1;
14   return:= 1),
15   (return := 0)};
16   return}) (0) } ;
17fn for (q, i) { case (i > 0) of { clskip,(put(i, q); for(q, i - 1))}};
18fn main (){
19(fn (q) { q.0:=na 5; q.1:=na 0; q.2:=na 0; q.3:=na allocate(5);
20 for (q,6);let re = get (q) in
21  (case (*na(re.0)) - 1 of {0, ((*na(re.1))==6)})})(allocate(4))};
22main()
Listing 3: A core-language program corresponding to Listing 2

Fig. 6 illustrates the memory layouts after executing the core-language program. The addresses addr(0), addr(1), and addr(2) correspond to the variable q, q.4, and re, respectively.

<baddress>    addr( 0 ) </baddress> <bnum>4</bnum> <bstore>   0|->5   1|->1   2|->5   3|->location(1,0)demo </bstore> <baddress>   addr( 1 ) </baddress> <bnum>5</bnum> <bstore>  0 |-> 6  1 |-> 5  2 |-> 4  3 |-> 3  4 |-> 2 </bstore> <baddress>  addr( 2 ) </baddress> <bnum>  2 </bnum> <bstore>   0 |-> 2   1 |-> 6 </bstore>
Figure 6: The resulting memory after executing the program

Based on the operational semantics, some verification can be performed by using   back-ends. It is easy to write some auxiliary operations to support verification. For instance, the operation assert(E:Exp) can be added as the verification library.  rules can be defined on assert(E) when E is true and no rules for E being false. Therefore, when E is false, the verification is stuck. The krun commands with --search will search all possible execution paths. In addition,   provides pre- and post-conditions verification by Matching Logic.   also supports symbolic executions. In fact,   aims to provide a semantics-based program verifier for all languages [13].

Appendix 0.D An Example Satisfies the Abstract Description of Ownership but rejected by the compiler

Consider the following surface-rust program:

1       let x = new(i32) in{
2       let y = & imm x in{
3       let z = new(i32) in{
4               y := & imm z;
5               let k = & mut x;}}}

In this example, line 2 makes y point to x. Line 4 makes y point to z and now x is not borrowed by any variable. Therefore line 5 could be execute. But in fact, line 5 is an error in Rust’s compiler. The compiler thinks that y still points to x at line 5.