Executable Operational Semantics of Solidity

04/04/2018 ∙ by Jiao Jiao, et al. ∙ 0

Bitcoin has attracted everyone's attention and interest recently. Ethereum (ETH), a second generation cryptocurrency, extends Bitcoin's design by offering a Turing-complete programming language called Solidity to develop smart contracts. Smart contracts allow creditable execution of contracts on EVM (Ethereum Virtual Machine) without third parties. Developing correct smart contracts is challenging due to its decentralized computation nature. Buggy smart contracts may lead to huge financial loss. Furthermore, smart contracts are very hard, if not impossible, to patch once they are deployed. Thus, there is a recent surge of interest on analyzing/verifying smart contracts. While existing work focuses on EVM opcode, we argue that it is equally important to understand and define the semantics of Solidity since programmers program and reason about smart contracts at the level of source code. In this work, we develop the structural operational semantics for Solidity, which allows us to identify multiple design issues which underlines many problematic smart contracts. Furthermore, our semantics is executable in the K framework, which allows us to verify/falsify contracts automatically.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The success of Bitcoin since 2009 stimulates the development of other blockchain based applications such as Ethereum. Ethereum is a second generation of cryptocurrency which supports the revolutionary idea of smart contracts. A smart contract [3] is a computer program written in a Turing-complete programming language called Solidity, which is stored on the blockchain to achieve certain functionality. Smart contracts benefit from the features of blockchain in various aspects. For instance, it is not necessary to have an external trusted authority in order to achieve consensus, and transactions through smart contracts are always traceable and credible.

Smart contracts must be verified for multiple reasons. Firstly, due to the decentralized nature of blockchain, smart contracts are different from original programs (e.g., C/Java). For instance, in addition to stack and heap, smart contracts operate a third ‘memory’ called storage, which are permanent addresses on the blockchain. Programming smart contracts thus is error-prone without a proper understanding of the underlying semantic model. This is further worsened by multiple language design choices (e.g., fallback functions) made by Solidity. In the following, we illustrate the difference between smart contracts and original programs using the contract named Test2 shown in Fig. 1 (b), where b is a global two dimension array, i.e., b[0] = [1,2,3] and b[1] = [4,5,6]. In function foo2(), a local array d of three elements is declared. Its second and third elements are set to be and respectively afterwards. After the execution of foo2(), the global array b is changed as: b[0] = [0,10,0] and b[1] = [4,5,6]. To understand this surprising behavior, we must understand the storage/memory model of Solidity, and make sure it is formally defined so that programmers can write contracts accordingly. If a programmer implements a smart contract with his/her intension inconsistent with the Solidity semantics, vulnerabilities are very likely to be introduced.

1contract Test { 2   uint128 a = 1; 3   uint256 b = 2; 4 5   function foo() public { 6      uint256[2] d; 7      d[0] = 7; 8      d[1] = 8; 9   } 10} (a) 1contract Test2 { 2   uint128 a = 9; 3   uint128[3][2] b = [[1,2,3],[4,5,6]]; 4 5   function foo2() public { 6      uint256[3] d; 7      d[1] = 10; 8      d[2] = 11; 9   } 10} (b)
Figure 1: Strange Contracts

Secondly, a smart contract can be created and called by any user in the network. A bug in the contract potentially leads to threats to the security properties of smart contracts. Verifying smart contracts against such bugs is crucial for protecting digital assets. One well-known attack on smart contracts is the DAO attack [4]. The attacker exploited a vulnerability associated with fallback functions and the reentrancy property in the DAO smart contract [8], and managed to drain more than 3.6 million ETH (i.e., the Etheruem coin which has a value of about $1000 at the time of writing). Thirdly, unlike traditional software which can be patched, it is very hard if not impossible to patch a smart contract, once it is deployed on the blockchain due to the very nature of blockchain. For instance, the team behind Ethereum decided to conduct a soft-fork of the Ethereum network in view of the DAO attack, which caused a lot of controversial. It is thus extremely important that a smart contract is verified before it is deployed as otherwise it will be forever under the risk of being attacked.

There have been a surge of interests in developing analysis/verification techniques for smart contracts [16, 9, 2, 1]. For instance, the authors in [16] developed a symbolic execution engine called Oyente which targets bytecode running on Ethereum Virtual Machine (EVM). Since Solidity programs are compiled into bytecode and run on EVM, Oyente can be used to analyze Solidity programs. In addition, the authors in [9] developed a semantic encoding of EVM bytecode in the K-framework. To the best of our knowledge, all existing approaches focus on bytecode. We believe that it is equally important to formally understand the semantics of Solidity since programmers program and reason about smart contracts at the level of source code. Otherwise, programmers are required to understand how Solidity programs are compiled into bytecode in order to understand them, which is far from trivial.

In this work, we develop the structural operational semantics (SOS) for the Solidity programming language so that smart contracts written in Solidity can be formally reasoned about. The contributions of this work are twofold. Firstly, our work is the first approach, to our knowledge, on the formal semantics of the Solidity programming language other than the Solidity compiler itself. Our executable semantics covers most of the semantics specified by the official Solidity documentation [7]. Secondly, we implement the proposed SOS in K-framework [5], which provides a Reachability Logic prover [11]. With the proposed SOS and its implementation, we are able to detect vulnerabilities or reason about the correctness of smart contracts written in Solidity systematically.

The remaining part of this paper is organized as follows. In Section 2, we introduce the background of Solidity smart contracts. The proposed executable operational semantics is introduced in Section 3. In Section 4, we introduce the implementation of Solidity semantics in K-framework by illustrating some important rules. Section 5 shows the evaluation results of our Solidity semantics in K-framework. In Section 6, we review related works. Section 7 concludes this work and discusses our future directions.

2 Background of Solidity Smart Contracts

Ethereum, proposed in late by Vitalik Buterin, is a blockchain-based distributed computing platform supporting smart contract functionality. It provides a decentralized international network where each participant node (also known as miner) equipped with EVM can execute smart contracts. Ethereum also provides a cryptocurrency called “ether” (ETH), which can be transferred between different accounts and used to compensate participant nodes for their computations on smart contracts.

Solidity is one of the high-level programming languages to implement smart contracts on Ethereum. A smart contract written in Solidity can be compiled into EVM bytecode and then be executed by any participant node equipped with EVM. Fig. 2 shows an example of Solidity smart contract, named Coin, implementing a very simple cryptocurrency. A Solidity smart contract is a collection of code (its functions) and data (its state) that resides at a specific address on the Ethereum blockchain. In line , the public state variable minter of type address is declared to store the minter of the cryptocurrency, i.e., the owner of the smart contract. The constructor Coin(), which has the same name as the smart contract, is defined in lines . Once the smart contract is created and deployed111How to create and deploy a smart contract is out of scope and can be found in: https://solidity.readthedocs.io/, its constructor is invoked automatically, and minter is set to be the address of its creator (owner), represented by the built-in keyword msg.sender. In line , the public state variable balances is declared to store the balances of users. It is of type mapping, which can be considered as a hash-table mapping from keys to values. In this example, balances maps from an user (represented as an address) to his/her balance (represented as an unsigned integer value).

The mint() function, defined in lines , is supposed to be invoked only by its owner to mint amount coins for the user located at the receiver address. If mint() is called by anyone except the owner of the contract, nothing will happen because of the guarding if statement in line . The send() function, defined in lines , can be invoked by any user to transfer amount coins to another user located at the receiver address. If the balance is not sufficient, noting will happen because of the guarding if statement in line ; otherwise, the balances of both sides will be updated accordingly.

1contract Coin { 2   address public minter; 3   mapping (address => uint) public balances; 4 5   function Coin() public { 6      minter = msg.sender; 7   } 8 9   function mint(address receiver, uint amount) public { 10      if (msg.sender != minter) return; 11      balances[receiver] += amount; 12   } 13 14   function send(address receiver, uint amount) public { 15      if (balances[msg.sender] < amount) return; 16      balances[msg.sender] -= amount; 17      balances[receiver] += amount; 18   } 19}
Figure 2: Smart Contract Example

A blockchain is actually a globally-shared transactional database or ledger. Every participant node can read the information in the blockchain. If one wants to make any state change in the blockchain, he or she has to create a so-called transaction which has to be accepted and validated by all other participant nodes. Furthermore, once a transaction is applied to the blockchain, no other transactions can alter it. For example, deploying the Coin smart contract generates a transaction because the state of the blockchain is going to be changed, i.e., one more smart contract instance will be included. Similarly, any invocation of the functions mint() and send() also generates transactions because the state of the contract instance, which is a part of the whole blockchain, is going to be changed. As mentioned earlier, each transaction has to be validated by other participant nodes. This validation procedure is so-called mining, and the participant nodes validating transactions are called miners. To motivate the miners to execute and validate transactions on the blockchain, they are rewarded ETH, which drives the blockchain functionally. Thus, each transaction is charged with a transaction fee to reward the miners. The transaction fee depends on the product of the gas price and the amount of gas. When one requests a transaction to be mined, he or she can decide the gas price to be paid. Of course, the higher the price is, the more likely miners are willing to validate the transaction. The amount of gas is determined by the computation required by the transaction. The more computation a transaction requires, the more gas miners will charge.

A smart contract instance has two memory areas to store data, namely, storage and memory. Storage is an array of storage slots, each of which is  bytes long and addressable by an -bit address. When a contract is created, its static data is allocated in the storage from storage slot  and growing to higher slots. Dynamic data such as pushing a new element into a dynamic array is allocated based on a hash algorithm. Complex datatypes such as arrays or structs are aligned to storage slot (

 bytes), adding padding to fill the necessary space. Primitive datatypes allocated in adjacent positions are packed together to save space. The storage/memory model is formally introduced in Section 

3.

3 Formal Specification of Solidity

We build an abstract model of the Solidity semantics. These representations of rules omit the details in the rule formation in K-framework, but reveal the idea of the semantics from a general as well as abstract perspective.

3.1 Notations

3.1.1 Type Specification

Types are inductively defined in , as follows:

Implicitly, types in Solidity have associated a memory space indicating whether a variable, and hence an expression, is allocated in the storage or memory. To consider this, we extend the definition of type as a tuple , where refers to the storage and to the memory.

includes usual data type constructors of high level languages such as arrays and dynamic arrays represented by and , respectively. Structures are represented by . Keys for mappings are constrained to types included in type . The type defines -bit unsigned integers for . We do not provide specification for other primitive types other than because we can construct them based on . The type is not an explicit type in Solidity, but the compiler uses it implicitly when allocating in memory variables with complex data types. Solidity includes specific types such as contracts allocated in the ledger represented by the type .

3.1.2 State Specification

We represent the storage as , a function from addresses in the domain of positive numbers to bytes. Since static variables are allocated in the storage sequentially from address 0, we attach a next address to to indicate where a new declared variable is allocated into. In addition, is also associated with a name space and a type space , respectively mapping variables to memory addresses and types. We denote with or to represent that is mapped to in the name space or type space, respectively.

Memory, denoted by , is also a function from addresses to bytes. It is used to store local variables in functions. Similar to storage, is also associated with a name space and a type space . Differently from the spaces in the storage, the memory contains a stack of name and type spaces to model new scopes when calling a function. For simplicity, and access the top spaces in and we omit and when it is clear in the context.

We use to denote the storage and memory configuration of a smart contract instance, and is a stack of storage configuration for handling external function calls. We access the elements of with , , and . The overall configuration of a smart contract instance is denoted by , where is the set of program statements. After a smart contract instance is deployed on the blockchain, it is identified by a unique -bit address. We denote the configuration of the blockchain by , which is a function mapping , a -bit address of a contract instance, to its configuration , i.e., .

To access locations in and , we use to represent a value stored in position of bytes stored in . Thus, given a variable allocated in , the notation represents the stored value of . Function gives the number of bytes a datatype, considering packing and padding applied in Solidity during variable allocation. We provide a detailed specification in Appendix 0.D.

3.1.3 State Modifications in Rules

For a configuration , we use to denote the configuration after we apply changes on . For example, the following rule does two changes in : (1) it modifies the type space in the storage for the element to value , and (2) it modifies the name space in the storage for the element to value . Any other state component such as the storage , the memory , or elements different from in and are not changed.

3.2 Structural Operational Semantics of Solidity

We abstract the semantics of solidity in rules for statements, expressions, and types. Most of the statements and semantics in Solidity are very similar to those used in high level languages like Java and C. For space reasons, we focus on the semantic for the evaluation of expressions involving storage/memory access, e.g., variable access, arrays, and mapping. We also focus on statements for variable declaration and function calls. We define the rules inductively based on Solidity type constructors and statements.

We first start with the semantics for instructions defining variables represented by rules VD and VD. Notice that if a state variable is declared without giving the initial value, it is initialized as zero.

Let us take the example in Fig. 1 (a) to illustrate the rules for variable declarations. Here, we have two state variables declaration statements for a and b, respectively. To allocate a, the rule performs six steps. (1) the rule calculates the value a is going to be initialized to using the R-Value of from state . Note that an expression can be a function, therefore the may have side effects and modify to . (2-3) the rule updates the name space and type space for variable . The type space is updated to , i.e., uint128 in this example. We update the name space using the function . uses Solidity rules for allocation of variables to calculate if the declared variable must be allocated in the current free position or in the next position aligned to . If the declared variable is a complex datatype then it must be allocated in a memory address aligned to . Otherwise, it calculates the next address aligned to if the current position plus the size of the datatype is bigger than the next address aligned to . In Solidity the memory alignment is bytes. In the example, takes the initial value of zero which is aligned to . Note that refers only to storage variables, therefore we omit in the rule when accessing and . (4), is updated using first aligning to if it is necessary, and then it increases it with the size of the type of the variable, which in the example is , the size of (). The rules to calculate the size of a type considers padding and packing of types (see 0.D for more details). (5) the memory address where the variable is allocated takes the value of the R-Value of the expression, which is . And (6) the rule checks that the current id does not belongs to the name space in . After the allocation, variable a is allocated in the beginning of slot  and occupies bytes, as shown in the blue box of Fig.  3 (a).

For state variable b, its type is uint256, which is the basic type of -bit unsigned integers. Since variable b requires bytes, we are not able to allocate b within the next address aligned to , that is within the slot . Instead, we need to allocate it in the next storage slot. After the allocation, variable b is allocated at address (the beginning of slot ) and occupies  bytes, as shown in red box of Fig. 3 (a). The auxiliary variable is updated to accordingly.

Let see another smart contract example in Fig. 1 (b). After allocating the state variable a, auxiliary variable become , as shown in the blue part of Fig. 3 (c). Notice that b is an array of two elements, each of which is an array of three uint128 integers. That is, b[0] = [1,2,3] and b[1] = [4,5,6]. To allocate the state variable b, rule Size used in function packs together the first two unsigned integers of the first dimension of the array, and adds padding for the second one to align the array to . Then the size of the first dimension of b is bytes and the total size of the type is bytes. So, totally, four storage slots are allocated to b, as shown in the red part of Fig. 3 (c).

The rule VD is similar the rule for state variable declarations except that the target is memory instead of storage and that we allocate the position and the R-Value of the expression in a fresh location. Function fr updates id in spaces N.M and with a new fresh address in memory and Type, respectively. Additionally fr copies the expression R-Value to the new location. After all the state variables are declared, whenever they need to be evaluated in statements, the following two rules are applicable to get their L-values and R-values.

Rule E_RV returns the value of address addr in the configuration obtained after the l-valuation of expression exp in the state . The accessed memory space, i.e., M or , is calculated from the type of exp. This behaviour is abstracted in the function ST.


(a)                                                      (b)

(c)                                                      (d)

Figure 3: State Variable Declaration

Let us see function foo() in Fig. 1 (a). Here, we declare a local array d, whose type is uint256[2] storage. Notice that, in Solidity, if an array is declared in a function without specifying the area (either in storage or memory), its default area is in storage. However, since we did not initialized the array d, it references the storage slot by default. Thus, changing the values of d[0] and d[1] overwrites the content in storage slots  and , as shown in the red part of Fig. 3 (b). That is why the values of a and b are changed to and after the execution of function foo(). Similarly, the local array d in function foo2() in Fig. 1 (b) references storage slot  by default as well. Thus, changing d[1] and d[2] overwrites the content in storage slots  and , as shown in the red part of Fig. 3 (d). That is why the global array b becomes [0,10,0] after function foo2() is executed. The evaluation of arrays can be performed by the following rules inductively.

1contract Test3 { 2   uint256[] a; 3 4   function foo3() public { 5      a.push(10); 6          a.push(11); 7   } 8} (a) 1contract Test4 { 2   mapping(uint=>uint) a; 3 4   function foo4() public { 5      a[100] = 10; 6          a[200] = 11; 7   } 8} (b)
Figure 4: Dynamic Arrays and Mapping

(a)                                                      (b)

Figure 5: Dynamic Array

(a)                                                      (b)

Figure 6: Mapping

In addition, Solidity provides two special data structures, dynamic array and mapping, which allocation is based on hash functions. Fig. 4 (a) shows a simple contract using a dynamic array a. When a dynamic array is declared, it has no element, and one storage slot is allocated to it as the base slot to store the number of elements it has so far, as shown in Fig. 5 (a). We can push elements into a dynamic array, e.g., function foo3() in Fig. 4 (a) pushes two integers, and , into the dynamic array a. Now, the unique characteristic of dynamic arrays is that the location to store the pushed element is decided by a hash function, denoted by HASH. The first element will be stored in storage slot where and is the base slot number of the dynamic array padded into  bytes long. The second element will be stored in slot , and so on. Fig. 5 (b) shows the locations to store the two elements pushed into a. Notice that the value of the base slot is updated to because a now has two elements. The rule for evaluating dynamic arrays can be obtained by the following rule. We use rule exp to check that the current expression is indeed a dynamic array. First, we obtain the l-value of the base expression to calculate the address where the expression is allocated. Second, we access to its value to check that the R-Value of the index expression is within the number of elements allocated in the dynamic array. With this information we calculate the final address through the Hashing function on the base address and the index accessed.

Fig 4 (b) shows another contract using a mapping m, which maps an unsigned integer to another. After it is declared, one storage slot is allocated to it as the base slot, but nothing is stored there, as shown in Fig. 6 (a). We can add key/value pair into a mapping, e.g., function foo4() in Fig. 4 (b) adds two key/value pairs and , where and are keys and and are their corresponding values. The unique characteristic of mappings is that the location to store values is decided by the hash function as well. For a key/value pair of a mapping with its base slot at , the value will be stored in storage slot where is the key padded into  bytes long and is the concatenation operator. Fig. 6 (b) shows the locations to store the two values and . The rule for evaluating mappings can be performed by the following rule:

The semantics of an internal function call is captured by the rules I-FUN and E-FUN to model when a function’s role is an instruction or an expression. Every internal function call have its own name and type space to store local variables, arguments, and return values. Thus, a fresh name N’ and type spaces are pushed into the stack , and the internal function call is rewritten into a sequence of memory variable declaration statements to link arguments with function parameters, followed by the returning value, if any, and the function body . When behaving as an expression the function call will return the L-Value of the returned variable, which value is added by the execution of the return instruction. The evaluation of both the instruction and the expression removes the name and type spaces from the stack the memory contains.

The rules for executing the statements in the function body are the similar to those used in high level programming languages and can be found in Appendix 0.C. Here, we only highlight some important rules. For the () statement, we obtain the R-Value of the returning expression, and we assign it to the return variable declared when calling the function. Therefore the returning value is available to the caller after returning from the call when the function is in an expression.

The semantics of external function calls is captured by the following two rules. The E-FUN rule is applicable when one contract instance wants to call an external function of another contract instance located in address . After looking up the configuration of the callee contract, , we push the caller’s configuration into the callee’s configuration stack such that when the external function call is finished, the caller’s configuration can be restored. Then, the external function call is translated into an internal function call under the callee’s configuration. The E-FUN rule is applicable when one contract instance just wants to send ether to another contract instance located in address without calling any function. In this case, the fallback function of the callee contract will be invoked.

4 Solidity Semantics in K-framework

In this section, we introduce the Solidity semantics we have implemented in K-framework[5] by illustrating some important rules. This implementation reflects the idea of the formal specification we introduce in section 3 and involves over 200 rules. The K definition of the Solidity semantics takes up more than 2000 lines and consists of three main parts, namely syntax, configuration and rules. The syntax can be found in [7], and the configuration of the semantics is attached in the appendix 0.A, so we do not explain these parts in detail when presenting the rules. Due to limit of space, we only show some important rules here.

RULE Elementary-TypeName

[1ex]¡ pcsContractPart(C:Id, X:ElementaryTypeName Y:Specifiers Z:Id = E;) .K ¿

[1ex]¡ [1ex]¡ C ¿ [1ex]¡ N:IntN +Int 1 ¿ [1ex]¡ .Map N Z ¿ [1ex]¡ .Map Z !Num:Int ¿ [1ex]¡ .Map Z X ¿ [1ex]¡ .Map !Num E ¿ ¿

Let us start with the state variable declaration for elementary type names (shown in Rule Elementary-TypeName). When there is a state variable declaration for elementary type names in contract parts, we take a record of it in the cell. The number of variables will be increased by one (In cell). The symbol !Num means generating a fresh integer number as the address of the variable. The two pairs: (1) Z to its address !Num and (2) the address !Num to its value E are added to the and cells, respectively.

RULE Function-Definition

[1ex]¡ pcsContractPart(C:Id, function F:Id (Ps1:Parameters) FQ:FunQuantifiers returns (Ps2:Parameters)B) .K ¿

[1ex]¡ C ¿ [1ex]¡ .Map F CF ¿

[1ex]¡ CF:IntCF +Int 1 ¿

.Bag [1ex]¡ [1ex]¡ CF ¿ [1ex]¡ Ps1 ¿ [1ex]¡ Ps2 ¿ [1ex]¡ FQ ¿ [1ex]¡ true ¿ [1ex]¡ 0 ¿ [1ex]¡ B ¿ ¿

Rule Function-Definition is how we deal with function definitions. A mapping from the function name to function Id is generated in the cell cfunction, which enables us to identify each function by using function Id. A bag of the cell function containing function Id, input and output parameters, function quantifiers, function body, etc, is created for this function definition. In this way, we can retrieve the details of this function in the cell function.

RULE Internal-Function-Call

[1ex]¡ functionCall(F:Id ; Es:Values) FunQs(FQ,F) Call(F,Es) ¿

[1ex]¡ ListItem(CI:Int) ¿

[1ex]¡ [1ex]¡ CI ¿ [1ex]¡ Cn:Id ¿ ¿

[1ex]¡ [1ex]¡ [1ex]¡ Cn ¿ [1ex]¡ FCT:Int ¿ ¿ ¿

[1ex]¡ [1ex]¡ CT ¿ [1ex]¡ Ps ¿ [1ex]¡ FQ ¿ [1ex]¡ Con ¿ [1ex]¡ B ¿ ¿

As for internal function call (shown in Rule Internal-Function-Call), we need to process the function quantifiers first, and then Call which deals with the execution of the function body. The function Id is obtained from the Id of current contract instance and the name of the contract defining the function, to retrieve the details of this function.

RULE Call

[1ex]¡ Call(F:Id,Es:Values) BindParam(Ps,Es) if(Con)B ¿

[1ex]¡ ListItem(CI:Int) ¿

[1ex]¡ [1ex]¡ CI ¿ [1ex]¡ Cn:Id ¿ ¿

[1ex]¡ [1ex]¡ [1ex]¡ Cn ¿ [1ex]¡ FCT:Int ¿ ¿ ¿

[1ex]¡ [1ex]¡ CT ¿ [1ex]¡ Ps ¿ [1ex]¡ FQ ¿ [1ex]¡ Con ¿ [1ex]¡ B ¿ ¿

Rule Call deals with the execution of function body. It first binds the parameters of function call in current execution environment. After that, the body of the function is executed with a condition specified by the function quantifiers. If there is no modifier invocation in the function quantifiers, the condition is always true.

RULE External-Function-Call

[1ex]¡ functionCall(C:Int ; F:Id ; Es:Values ; M:Msg) createTransaction(L) functionCall(F ; Es)returnContext(C) ¿

[1ex]¡ (.List ListItem(C)) L:List ListItem(-1) ¿

[1ex]¡ M1M ¿ [1ex]¡ (.List ListItem(M1)) ¿

[1ex]¡ (.List => ListItem(F)) ¿

Rule External function call is associated with transactions. The input parameters of External function call are the Id of the contract instance to be called, the name and parameters of the function and Msg which contains information about this transaction. The number of transactions that have been executed is counted in createTransaction(L), followed by an internal function call and context return. Meanwhile, the Id of contract instance, current Msg and the function to be called are stored in the corresponding stacks, and the cell Msg is updated.

5 Evaluation

We evaluate the proposed Solidity semantics from two perspectives: the first one is its coverage, and the second is the ability to detect vulnerabilities in smart contracts. Our test set is obtained from  [7]. In Section 5.1, we show that the proposed Solidity semantics covers most of the important semantics specified by the official Solidity document [7] and is consistent with the official Solidity compiler [6]. In Section 5.2, we show that some variants of DAO attacks can be detected by using the proposed semantics, which facilitates the verification of smart contracts.

5.1 Coverage and Testing

We evaluate and test the proposed Solidity semantics by using the official Solidity compiler Remix [6]. The evaluation is done by manually comparing the results of our implementation in K-framework with the results of the Remix compiler. We consider the proposed semantics is correct if the result is consistent with that of the Remix compiler. We list the coverage of our Solidity semantics in Table 1 from a variety of perspectives specified by the syntax provided by the official Solidity document[7].

Perspectives Coverage Perspectives Coverage
Syntax
Basic Syntax FC Using For N
Hex Number/Hex Literal N Event N
Assembly N Inheritance N
Storage Statements
Elementary TypeName If Statement FC
address FC While Statement FC
bool FC For Statement FC
string FC Block FC
var FC Inline Assembly N
int256 FC Statement
Other Int Size N Do While Statement FC
uint256 FC Place Holder Statement FC
Other Uint Size N Continue N
Byte N Break N
Fixed N Return FC
Ufixed N Throw N
User Defined TypeName P Simple Statement FC
Mapping FC
Array TypeName FC
Function TypeName N
Functions Expressions
Function Definition Bitwise Operations N
Constructor FC Other Expressions FC
Normal Functions FC
Fallback Functions FC
Modifier FC
StateMutability N
Specifier N
Function Call
Internal Function Call FC
External Function Call FC
  • FC: Fully Covered and Consistent with Solidity IDE

  • P: Partially Covered and Consistent with Solidity IDE for Covered Parts

  • N: Not Covered

Table 1: Coverage of The Proposed Solidity Semantics

From Table 1, we can observe that the proposed Solidity semantics covers most of the syntax except Hex number and literal, and Solidity assembly code. As for storage, our semantics implementation in K-framework covers the following elementary types: address, bool, string, var, int256 and uint256. User-defined type is partially covered, including struct and contract instances. Mappings and arrays are covered, while function types are not. In addition, most parts of semantics associated with functions are covered except state mutability and specifiers which are ignored in our current semantics implementation in the execution. Furthermore, a majority of statements and expressions are covered. For all the parts of covered semantics, they are considered to be correct since the execution behaviours involved are consistent with the official Solidity compiler.

Although our semantics implementation in K-framework is not complete yet, it covers most of the semantics in smart contracts. Actually, the set of semantics in which vulnerabilities of smart contracts lie has already been covered. Taking the DAO attack (c.f. Section 0.B) as an example, the two vulnerabilities, reentrancy and call to the unknown, are mainly associated with the semantics of function calls. For the uncovered parts, they can be either ignored or transformed into the semantics that are covered such that the missing semantics does not have a big impact on the execution behaviours. Thus, our implementation of Solidity semantics can be used in the verification of smart contracts.

5.2 Detecting DAO Attacks

We briefly introduce the DAO attack in Appendix 0.B. Interested readers can get the detail there. We evaluate four variants of DAO attacks by using our implementation of Solidity semantics in K-framework. We simulate the behaviour of users on the blockchain by using a Main contract in which transactions are generated. Notice that the mining process on the blockchain is not modeled. The evaluation result shows that these DAO attacks can be fully executed, and the non-reentrant behaviour can be detected from the values of cells in the configuration. The former indicates that our implementation of Solidity semantics is executable, while the latter shows that some vulnerabilities in smart contracts can be detected with the executable semantics, contributing to the verification of security properties in smart contracts.

6 Related Works

K-framework ([17] is a rewrite logic based formal executable semantics definition framework. The semantics of various programming languages have been defined using , such as Java [10], C [13, 12], and Javascript [14]. Particularly, the executable semantics of the EVM(Ethereum Virtual Machine), the bytecode language of smart contracts, has been created in K-framework[9].   backends, like the Isabelle theory generator, model checker, and deductive verifier, can be utilized to prove properties on the semantics and construct verification tools. For instance,   provides pre- and post-condition verification by using Matching Logic [15]. Also, the Reachability Logic prover in  can be used to verify properties specified as reachability claims. In fact,   aims to provide a semantics-based program verifier for all languages [11].

7 Conclusion And Future Work

In this paper, we introduce our executable operational semantics of Solidity in K-framework. We present an abstract model of semantics and illustrate some important rules implemented in K-framework. Experiment results show that our Solidity semantics has already covered most of the semantics specified by the official Solidity documentation[7], and the covered semantics are consistent with the official Solidity compiler[6]. Furthermore, we show that our semantics can be used to verify certain properties in smart contracts.

For future work, we plan to complete the Solidity semantics in K-framework to completely cover all the features os Solidity. Additionally, we plan to approach verification of attacks in Solidity contracts, identifying different kinds of vulnerabilities in smart contracts [8] and constructing verification properties against attacks.

References

  • [1] Yoichi Hirai. Defining the Ethereum Virtual Machine for Interactive Theorem Provers. Financial Cryptography and Data Security 2017.
  • [2] Sidney Amani, Myriam Bégel, Maksym Bortin and Mark Staples. Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL. Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs
  • [3] Kevin Delmolino, Mitchell Arnett, Ahmed E. Kosba, Andrew Miller and Elaine Shi. Step by Step Towards Creating a Safe Smart Contract: Lessons and Insights from a Cryptocurrency Lab. Financial Cryptography and Data Security - FC 2016 International Workshops, BITCOIN, VOTING, and WAHC, Christ Church, Barbados, February 26, 2016, Revised Selected Papers.
  • [4] Understanding the DAO attack. http://www.coindesk.com/understanding-dao- hack-journalists/.
  • [5] K-framework. http://www.kframework.org/index.php/.
  • [6] Remix - Solidity IDE. http://remix.readthedocs.io/en/latest/.
  • [7] Solidity 0.4.20 documentation. https://solidity.readthedocs.io/en/develop/
  • [8] Nicola Atzei, Massimo Bartoletti and Tiziana Cimoli. A Survey of Attacks on Ethereum Smart Contracts (SoK). Principles of Security and Trust - 6th International Conference.
  • [9] Everett Hildenbrandt, Manasvi Saxena, Xiaoran Zhu, Nishant Rodrigues, Philip Daian, Dwight Guth and Grigore Rosu. KEVM: A Complete Semantics of the Ethereum Virtual Machine. http://hdl.handle.net/2142/97207.
  • [10] Denis Bogdănaş and Grigore Roşu. K-Java: A Complete Semantics of Java. Proceedings of the 42nd Symposium on Principles of Programming Languages (POPL’15).
  • [11] Andrei Ştefănescu, Daejun Park, Shijiao Yuwen, Yilong Li and Grigore Roşu. Semantics-Based Program Verifiers for All Languages. Proceedings of the 31th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’16).
  • [12] Chucky Ellison and Grigore Rosu. An Executable Formal Semantics of C with Applications. Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12).
  • [13] Chris Hathhorn, Chucky Ellison and Grigore Roşu. Defining the Undefinedness of C. Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15).
  • [14] Daejun Park, Andrei Ştefănescu and Grigore Roşu. KJS: A Complete Formal Semantics of JavaScript. Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15).
  • [15] Grigore Roşu. Matching logic. Logical Methods in Computer Science, 13(4):1-61, December 2017.
  • [16] Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena and Aquinas Hobor. Making Smart Contracts Smarter. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016.
  • [17] Grigore Roşu and Traian Florin Şerbănuţă. An Overview of the K Semantic Framework. Journal of Logic and Algebraic Programming, 79(6):397-434, 2010.
  • [18] The DAO raises more than 117 million dollars in world’s largest crowd funding to date. https://bitcoinmagazine.com/articles/the-dao-raises-more-than- million-in-world-s-largest-crowdfunding-to-date-1463422191.

Appendix 0.A Solidity Configuration

[1ex]¡ [1ex]¡ [1ex]¡ [1ex]¡ $PGM:SourceUnit ¿ [1ex]¡ Map ¿ [1ex]¡ ListItem(-1) ¿ [1ex]¡ List ¿ [1ex]¡ List ¿ ¿ ¿ [1ex]¡ Map ¿ [1ex]¡ 1:Int ¿ [1ex]¡ [1ex]¡ 0:Int ¿ [1ex]¡ [1ex]¡ K ¿ [1ex]¡ (-1):Int ¿ [1ex]¡ Map ¿ [1ex]¡ 0:Int ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ 1:Int ¿ [1ex]¡ Map ¿ ¿ ¿ [1ex]¡ [1ex]¡ [1ex]¡ 0:Int ¿ [1ex]¡ Parameters ¿ [1ex]¡ Parameters ¿ [1ex]¡ K ¿ [1ex]¡ K ¿ [1ex]¡ K ¿ [1ex]¡ true ¿ ¿ [1ex]¡ 0:Int ¿ ¿ [1ex]¡ [1ex]¡ [1ex]¡ (-1):Int ¿ [1ex]¡ K ¿ [1ex]¡ 0:Int ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ [1ex]¡ Map ¿ ¿ [1ex]¡ 0:Int ¿ ¿ [1ex]¡ [1ex]¡ 1:Int ¿ [1ex]¡ 0 |-> "Main" ¿ ¿ [1ex]¡ K ¿ [1ex]¡ List ¿ ¿

Appendix 0.B DAO Attack

DAO [4] is a contract which implements a platform for crowd-funding. As reported before, 60M can be taken under the control of the attacker in the DAO attack [18], which has a huge impact in the financial aspects. A simplified version of DAO attacks is shown in Fig. 7. The smart contract Bank is used to collect funding from different clients. A client can invoke the function deposit() to deposit ETH to its account in the Bank contract, or invoke the function withdraw() to withdraw his/her credit. The malicious contract Attack can be used to stole ETH from the contract Bank. Let us assume that the contract Bank has accumulated a certain amount of ETH.

The attack can be launched in the following procedures. First, the contract Attack is created and deployed with its state variable target pointing to the victim bank in its constructor. After that, the function addToBalance() is invoked by the attacker to deposit  wei222Wei is the minimum unit of ETH.  wei ETH. to the contract Bank. As a result, the balance of Bank is increased by 2 wei. Since the contract Attack is just created and deployed on the blockchain, its initial credit in Bank is 0. After deposit, the credit of the sender in this case, should become 2 wei. Subsequently, the attacker invokes the function withdrawBalance() of contract Attack to withdraw 2 wei from the Bank contract. In the function withdraw() of contract Bank, the amount to be withdrawn is sent to contract Attack first (line ), and then the amount is deduced from the credit of contract Attack (line ). When contract Attack receives the withdrawn amount (due to line  of Bank), its fallback function (lines ) is invoked. Inside the function body of its fallback function, it maliciously invokes the withdraw() function of Bank again. At this point, the amount to be withdrawn has not been deduced from Attack’s credit (line  of Bank), which makes the condition checking in line of Bank still valid. Thus, contract Attack is able to withdraw money from contract Bank recursively until the balance of Bank becomes zero.

The vulnerability comes from the fact that the withdraw() function of contract Bank is not reentrant due to the wrong order of lines  and . The amount to be withdrawn should be deduced from the credit first and then sent to the withdrawer. If we switch the order of lines  and of contract Bank, then the function withdraw() becomes reentrant.

1contract Bank { 2 mapping(address=>uint) credit; 3 4 function getUserBalance(address user) 5   constant returns(uint) { 6   return credit[user]; 7 } 8 9 function deposit() payable{ 10   credit[msg.sender] += msg.value; 11 } 12 13 function withdraw(uint amount){ 14   if(credit[msg.sender] >= amount){ 15     msg.sender.call.value(amount)(); 16     credit[msg.sender] -= amount; 17   } 18 } 19} (a) The Bank Smart Contract 1contract Attack { 2 Bank target; 3 4 function Attack(address addr){ 5   target = Bank(addr); 6 } 7 8 function addToBalance(){ 9   target.deposit.value(2)(); 10 } 11 12 function withdrawBalance(){ 13   target.withdraw(2); 14 } 15 16 function() payable{ 17   target.withdraw(2); 18 } 19} (b) The Attack Smart Contract
Figure 7: DAO Attack

Appendix 0.C Rules of Statements

Appendix 0.D Rules of Evaluations

[SR1]SizeR n ⟨⟩= n
[SR2]SizeR n uint_m@Tl = m
[SR3]SizeR n T@Tl = m
[Size1]Size uint_m = 2^m-3
[Size2]Size T[n] = ⌈( n*(Size T))⌉^l
[Size3]Size T@Tt = ⌈n⌉^l
[Size4]Size T[] = l
[Size5]Size map K T = l
[Size6]Size Call fid() = Size T
[Size7]Size T ref = l
[Type1]Type_σ exp[exp_i] =T
[Type2]Type_σ exp.k = T_k
[Type3]Type_σ id = σ_t v
[Type4]Type_σ map_acc exp exp_i = T
[Type5]Type_σ Call fid() = T
[Type6]Type_σ map_acc exp exp_i = T
[Type7]Type_σ exp[exp_i] =T
[Type8]Type_σ exp.k = T_k

Accessing bytes memory :