An Empirical Study of Ownership, Typestate, and Assets in the Obsidian Smart Contract Language

03/27/2020 ∙ by Michael Coblenz, et al. ∙ Carnegie Mellon University 0

Some blockchain programs (smart contracts) have included serious security vulnerabilities. Obsidian is a new typestate-oriented programming language that uses a strong type system to rule out some of these vulnerabilities. Although Obsidian was designed to promote usability to make it as easy as possible to write programs, strong type systems can cause a language to be difficult to use. In particular, ownership, typestate, and assets, which Obsidian uses to provide safety guarantees, have not seen broad adoption in popular languages and result in significant usability challenges. We performed an empirical study with 20 participants comparing Obsidian to Solidity, which is the language most commonly used for writing smart contracts today. We observed that most of the Obsidian participants were able to successfully complete most of the programming tasks we gave them. We also found that asset-related bugs, which Obsidian detects at compile time, were commonly accidentally inserted by the Solidity participants. We identified potential opportunities to improve the usability of typestate as well as to apply the usability benefits of Obsidian's ownership system to other languages.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Obsidian (Coblenz et al., 2019b) is a new programming language for writing smart contracts (Szabo, 1997), which are programs for blockchain platforms. Blockchains aim to provide a trusted computing medium among users who have not necessarily established trust. By decentralizing computation, users can establish strong security properties. Unfortunately, these properties can only be obtained when the smart contracts themselves implement the intended behavior. Through bugs and security vulnerabilities in blockchains, attackers have stolen millions of dollars’ worth of cryptocurrencies (Sirer, 2016; Graham, 2017). Most public smart contracts have been written in Solidity (Ethereum Foundation, 2020b), a domain-specific language.

To improve security, researchers have developed new languages with stronger security properties. In this work, we focus on Obsidian, which was designed to provide safety properties that would be relevant to many blockchain applications (Coblenz et al., 2019c). In addition, Obsidian was designed to be usable though iterative user testing with 44 participants (Coblenz et al., 2019a) so that programmers would be able to use it effectively. In prior work, Coblenz et al. showed that most of the participants in a six-participant qualitative study were able to complete relevant programming tasks (Coblenz et al., 2019a). However, that small study did not compare Obsidian to any other languages, so we wanted to assess whether Obsidian is an improvement over Solidity. We are only aware of two other languages (HANDS and Quorum) that have been evaluated empirically in a summative evaluation, and those had relatively conventional type systems. A key contribution of this work, then, is showing how to conduct a quantitative user evaluation of a programming language with a type system that is unfamiliar to most available participants.

asset contract Medicine {} asset contract Pharmacy {   Medicine@Owned med;   transaction getNewMedicine(Medicine@Owned >> Unowned m)   {     med = m;   } }
Figure 1. An example practice question on assets from the Obsidian tutorial, showing the correct answer.

Obsidian uses ownership types (Clarke et al., 1998) to express in the type system that each asset (an object that has value and therefore should not be lost) has exactly one owning reference. Ownership may be transferred between references, but if the owning reference goes out of scope, then the compiler reports an error. This ensures that owning references to assets cannot be lost unless they are explicitly disowned by the programmer.

Obsidian extends ownership types with typestate (Aldrich et al., 2009) because blockchain programs are typically stateful with objects supporting different operations depending on their state (Ethereum Foundation, 2020c). Thus, owning references can also specify what state the referenced object is in. For example, Auction@Open is the type of a reference to an Auction object that is in state Open. Because only owning references can specify typestate, ownership is implied by the presence of a typestate specification. Alternatively, Auction@Owned is the type of an owned reference to an Auction object, which may be in any state. Ownership types are one example of linear types (Wadler, 1990): types that are consumed when used rather than being duplicated, and which must be consumed rather than merely dropped.

To improve flexibility, Obsidian also allows Shared references, which can refer to an object that has no owner, and Unowned references, which refer to an object that may have an owner. These additional options allow users to avoid establishing ownership structures when none are needed. Unlike Owned and Shared references, Unowned references cannot be used to change the nominal state that an object is in. This is required for soundness, since a change in state through an Unowned reference might violate a typestate specification on the owning reference. These annotations (Owned, Shared, and Unowned) are referred to as permissions because they denote what operations can be done via references with those annotations.

Although Rust (Developers, 2017) supports a version of ownership, permissions and typestate features have not been included in popular programming languages, so we lack even qualitative data regarding their broader usability. None of these aspects has been formally studied in a quantitative user study before, either. In order to gather evidence regarding the usability of these powerful, but novel and potentially confusing, constructs, we conducted a quantitative user study comparing Obsidian to Solidity.

We randomly assigned 20 participants to use either Obsidian or Solidity, taught them their assigned language (with a tutorial; one example practice question is shown in Fig. 1), and asked them to complete programming tasks. We were interested in investigating three research questions. First, Obsidian was designed to enable static detection of asset loss bugs (in which a valuable asset, such as a quantity of cryptocurrency, is lost). We wanted to assess whether the kind of asset loss bugs that Obsidian detects statically are ones that programmers are likely to insert accidentally when using Solidity. Second, we wanted to generalize and quantify the previous usability result: can programmers use Obsidian at all, or does the type system get in the way of doing work (Coblenz et al., 2019a)? What could we learn about the usability of the key aspects of the type system: ownership, typestate, and assets? Finally, in many cases, even if programmers can achieve their goals, strong type systems can impose a usability burden on programmers because the compiler forces programmers to write code so that the compiler can verify various safety properties. We were interested in comparing Obsidian to Solidity to see whether the type system in Obsidian imposes a significant burden on programmers as measured by task completion rates or times.

Prior empirical work in programming language evaluation found that testing and debugging programs in the context of empirical studies substantially increases variance

(Coblenz et al., 2017): different participants have different levels of thoroughness in writing tests and different levels of debugging skill. When allowed to test and debug, participants frequently spend large amounts of time debugging issues that are not relevant to the experiment. To focus our participants’ time on work related to our research questions, we allowed them to edit their code until they were satisfied, but did not give them an opportunity to test their code. Then, instead of assessing programs on the basis of tests, we inspected their code. We looked both for specific bugs that corresponded to the research questions we were interested in as well as for unrelated bugs. This approach was made feasible by the relatively small codebases (see Table 9). Although it is possible that we missed particular bugs in the code, the same would be true even if we provided unit tests.

Because Obsidian’s design is focused on identifying more bugs at compile time, this approach allowed us to focus on our research questions about the usability and effectiveness of the type system while using our participants’ time as effectively as possible. Although some of the bugs that participants inserted might eventually have been found via testing, the bugs we were looking for are ones that are easy to miss when testing main use cases (as evidenced by their past deployment in real systems and difficulties observed in past user studies, e.g., (Delmolino et al., 2016)).

Although Solidity and Obsidian target different blockchain platforms (Ethereum (Ethereum Foundation, 2020a) and Hyperledger Fabric (The Linux Foundation, 2020b), respectively), we chose Solidity for comparison because it is the dominant programming language used in public smart contract development and because it is the language that was used to develop popular smart contracts that had serious security vulnerabilities (Sirer, 2016; Graham, 2017).

In this paper, we describe the first formal, quantitative user study (of which we are aware) of a type system that supports linear types, ownership, or typestate, or any combination of these together. We found that after a brief training period (about 90 minutes), most of the Obsidian participants were able to successfully use Obsidian to finish implementing a small auction program. In contrast, most of the participants who were assigned the Solidity condition inserted serious bugs involving asset loss. Likewise, most of the Obsidian participants were able to use ownership to fix a security vulnerability in a prescription-tracking program, which, in contrast, required writing a significant amount of code in Solidity. However, in a third task, although Obsidian participants were generally able to implement the requested application, some participants abused the disown operator to allow asset loss. Furthermore, they did not take full advantage of typestate. This suggests that although ownership types can be used successfully after only a small amount of training, additional training or language changes may be needed in order for people to take full advantage of typestate and for people to use the asset-retention guarantee safely.

2. The Obsidian Language

Obsidian (Coblenz et al., 2019b) is a class-based object-oriented language that uses a syntax similar to Java. Obsidian uses the keyword contract instead of class due to conventions on blockchain platforms, and calls methods transactions because invocations of blockchain code from outside the blockchain have transactional semantics: either the execution finishes successfully and new state is stored in the blockchain, or the transaction is reverted and any state changes are not preserved. Currently, Obsidian supports the Hyperledger Fabric blockchain platform (The Linux Foundation, 2020a).

Coblenz et al. provided a full description and formal treatment of Obsidian (Coblenz et al., 2019b); here, we explain Obsidian by example. The left column of Fig. 3 shows part of an Auction contract. An Auction instance is always in one of three states: Open, BidsMade, or Closed. A field, seller, is always in scope, but when the object is in state BidsMade, fields maxBidder and maxBid are also in scope. Transactions specify types for their parameters, but in addition to the normal parameters, when this is specified as the first parameter, the program can specify a type for the this reference. The bid transaction, for example, can only be invoked via a Shared reference to the receiver. Formal parameters use >> to denote changes in permission or state. For example, the money parameter of bid is specified with permission Owned >> Unowned, meaning that the caller must pass an Owned reference, and when bid returns, from the perspective of the caller, the money is now Unowned. In other words, ownership was passed from the caller to the transaction.

Dynamic state tests, via the in operator, check the dynamic state of a referenced object. For example, line 18 checks to see if this is in state Open. If the test passes, lines 21-22 initialize the maxBidder and maxBid fields of the BidsMade state, to which the object transitions on line 23. Line 22 transfers ownership of an object initially referenced by money to the maxBid field, leaving money with type Money@Unowned.

Fields can temporarily have types that differ from their declarations as long as by the end of the transaction, all fields have types consistent with their declarations. For example, line 31 returns the money from the previous maxBidder which causes field maxBid to temporarily have type Money@Unowned; this is corrected on line 33, which transfers ownership from money to maxBid.

The revert statement (line 39) discards all changes that have been made to state in the transaction and reports an error.

3. The Solidity Language

Solidity (Ethereum Foundation, 2020b), like Obsidian, is a class-based object-oriented language. Solidity targets the Ethereum blockchain platform (Ethereum Foundation, 2020a). The function keyword denotes methods, though methods have transactional semantics. It has no built-in notion of states, but programs can declare enums and use them to represent states. Solidity supports a built-in cryptocurrency called ether. Functions that are annotated payable can receive quantities of ether; the ether is conceptually sent with the invocation, and the amount is stored in the variable msg.value. Each contract instance can own a quantity of ether; this quantity is automatically updated by the runtime when a function receives a payment. Every contract instance is stored on the blockchain at a particular address. The language has a built-in type called address to represent these addresses.

Programs typically implement their own fine-grained accounting mechanism. For example, the pendingReturns structure records how much money is owed to each of a number of addresses. Without this mapping, although the contract would still record how much ether it held, the implementation would not be able to track for whom it is being kept.

The pendingReturns mapping supports the withdrawal pattern (Ethereum Foundation, 2020d), which is an Ethereum coding convention that protects against re-entrancy attacks. The possibility of attacks arises because sending ether to a contract can cause the recipient to execute arbitrary code: the recipient can invoke a function on the sender, to which there is already an invocation on the stack. This is dangerous if the funds were sent while the sender was in an inconsistent state. Instead, it is recommended that contracts merely record that they owe ether to the intended recipient and provide a withdraw function that recipients can call to retrieve their money. The withdrawal pattern is not used in Obsidian, since Obsidian targets Hyperledger Fabric, which does not have this vulnerability. Because this was a summative study of the programming environments that users would encounter when using each language, we expected participants to use the withdrawal pattern with Solidity and not with Obsidian. If Obsidian were used with Ethereum, we expect Obsidian’s asset-based approach would guard against bugs in using the withdrawal pattern, regardless whether the opportunity for it arises from the platform design or the particular API being used.

4. Participants

We recruited participants to our four-hour study with posters at our university, with emails to lists of appropriate degree programs (such as the Master of Software Engineering program), and by advertising to students in relevant courses (e.g., a software engineering course). The protocol was approved by the university IRB. We compensated participants with a $75 Amazon gift card.

We pre-screened participants with an online survey that asked them basic Java questions; we invited respondents who answered five of six questions correctly to participate in the study. Table 1 summarizes the previous experience of the experiment participants in each condition. Overall, 14 of the participants identified as male, and six as female. We excluded an additional participant who took too long on the training phase111

That participant spent 3 hours and 11 minutes on the tutorial, which was three standard deviations above the mean.

.

Solidity Obsidian
Programming experience, years 8.6 5.0
Professional experience, years 1.8 1.8
Java experience, years 2.4 2.2
Table 1. Participant experience. N=10 in each condition.

5. Training

We provided a web-based tutorial (implemented with Qualtrics) for their assigned condition222The tutorial materials are included in the supplement., which stepped them through web-based documentation and exercises. Some of the exercises were to be completed in Visual Studio Code, which was configured with a compiler for their assigned language. Some of the questions were multiple-choice; for these, the tool automatically showed participants if they had entered an incorrect answer. An experimenter was available to answer questions. Participants were told that they should try to get all of their questions addressed during the training phase, since no questions could be answered after training was completed.

We included practice questions in the tutorial to ensure that participants absorbed the material (prior studies had found that without practice questions, participants skimmed the material without mastering the concepts). Fig. 1 shows an example practice question. Fig. 2 shows a sample from the Obsidian documentation.

Figure 2. An example from the web-based Obsidian tutorial.

To make the two experimental conditions as similar as possible, even though Solidity includes no support for ownership, typestate, or assets, Solidity participants received a tutorial that explained these concepts and recommended using comment-based annotations. If participants asked about the utility of these annotations, we argued that this was similar to how one might write preconditions or postconditions in comments. Table 2 summarizes the distribution of times participants spent on the tutorial in the two conditions.

Solidity Obsidian
Average (standard deviation) 86 (28) min. 98 (31) min.
Range 39 to 138 min. 50 to 148 min.
Table 2. Training times in Solidity and Obsidian conditions.

6. Auction Task

In the Auction task, we asked participants to fill in missing code in an implementation of an English auction, in which bids are made openly and the highest bidder wins. To increase external validity, we modeled the task after a Solidity example (Foundation, 2020). We required that all bids be accompanied by funds to ensure that the winning bidder will pay for the item. When a bid is exceeded, the original bidder should receive a refund of their bid. We gave the participants 30 minutes to complete the task.

Fig. 3 shows the bid transaction that was provided to participants as well as a sample solution. In the first subtask, marked by // 1. TODO, participants needed to write code to refund the existing bid to the previous bidder, whose address was stored in maxBidder, and record the new bid (money). In the second subtask (// 2. TODO), participants needed to refund the bid to the bidder. The code in yellow shows a correct answer. In both cases, there was an opportunity for asset loss: if participants overwrote the old Money reference (stored in maxBid), then the old bid would be lost. In Obsidian, the compiler would report an error if this happened; in Solidity, there was no protection against that mistake.

1main asset contract Auction { 2  Participant@Unowned seller; 3 4  state Open; 5  state BidsMade { 6    // the bidder who made the highest bid so far 7    Participant@Unowned maxBidder; 8    Money@Owned maxBid; 9  } 10  state Closed; 11 12   13 14 15  transaction bid(Auction@Shared this, 16                  Money@Owned >> Unowned money, 17                  Participant@Unowned bidder) { 18      if (this in Open) { 19        // Initialize destination state, 20        // and then transition to it. 21        BidsMade::maxBidder = bidder; 22        BidsMade::maxBid = money; 23        ->BidsMade; 24      } 25      else { 26        if (this in BidsMade) { 27          //if the newBid is higher than the current Bid 28          if (money.getAmount() > maxBid.getAmount()) { 29            //1. TODO: fill this in. 30            // You may call any other transactions as needed. 9            maxBidder.receivePayment(maxBid); 10            maxBidder = bidder; 11            maxBid = money; 30          } 31          else { 32            //2. TODO: return the money to the bidder, 33            //   since the new bid wasn’t high enough. 34            //You may call any other transactions as needed. 11            bidder.receivePayment(money); 34          } 35        } 36        else { 37          revert ("Can only make a bid on an open auction."); 38        } 39      } 40    } 41}
Obsidian condition.
contract Auction {   // the bidder who made the highest bid so far   address maxBidder;   uint maxBidAmount;   // ’payable’ indicates we can transfer money to this address   address payable seller;   // Allow withdrawing previous money for bids that were outbid   mapping(address => uint) pendingReturns;   enum State { Open, BidsMade, Closed }   State state;      function bid() public payable {     if (state == State.Open) {       maxBidder = msg.sender;       maxBidAmount = msg.value;       state = State.BidsMade;     }     else {       if (state == State.BidsMade) {         //if the newBid is higher than the current Bid         if (msg.value > maxBidAmount) {           //1. TODO: fill this in.           // You may call any other functions as needed.           pendingReturns[maxBidder] += maxBidAmount;           maxBidder = msg.sender;           maxBidAmount = msg.value;         }         else {           //2. TODO: return the newBid money to the bidder,           //   since the newBid wasn’t high enough.           //You may call any other functions as needed.           pendingReturns[msg.sender] += msg.value;         }       }       else {         revert ("Can only make a bid on an open auction.");       }     }   } }
Solidity condition.
Figure 3. Code for the two Auction tasks. Code highlighted in yellow represents a correct solution; the rest was given to participants as starter code.
RQ A.1::

Overall, do more participants complete the Auction task correctly with Obsidian than with Solidity?

RQ A.2::

How frequently do Solidity participants accidentally lose assets in the Auction task?

6.1. Results and Discussion

Table 3 summarizes the results of the Auction task; errors are shown in Table 4. Nine Solidity participants said they were done with the task before the 30 minutes expired; among these nine, the average time was 12 minutes. Eight Obsidian participants said they were done with the task before running out of time; among these eight, the average time was also 12 minutes. Overall, two participants completed the task correctly in the Solidity condition; seven completed the task correctly in the Obsidian condition. The difference, summarized in the first two rows of Table 3, is statistically significant, with

(Fisher’s exact test; odds ratio 0.053). We conclude for

RQ A.1 that participants who finished were more likely to finish correctly if they used Obsidian than if they used Solidity. We did not observe a significant difference in completion times across the two groups.

Of the two Obsidian participants who did not finish the Auction task in time, one was confused about the semantics of the :: field initialization operator, attempting to use BidsMade::maxBid to refer to the current value of the maxBid field rather than the future value after a state transition. This misconception led to a compiler error that the participant did not find helpful. The other participant also received a confusing error message: although the code invoked a transaction that did not exist, the error message pertained to ownership of the transaction’s parameter. A more mature compiler might have helped the participant finish the task.

In subtask 1 (starting at line 29 in Fig. 3), participants needed to record the new bid and refund the old bid. We found the following errors among the Solidity participants who said they were done:

  1. Loss of previous refunds: the correct implementation added the new refund to any prior refund. Four participants used = instead of +=, overwriting any old refund (in line 31).

  2. Omission of refund: three participants neglected to refund the previous bid (e.g., omitting line 31).

All eight of the Obsidian participants who said they were done did so without losing any assets, since otherwise the compiler would have given an error. However, one participant refunded the old money to the new bidder instead of the previous bidder.

While doing the task, two of the Obsidian participants received a compiler error indicating that they had lost an asset. For example:

auction.obs 37.28: Variable 'maxBid' is an owning reference to an asset, so it cannot be overwritten.

Both of these participants successfully fixed the error.

In subtask 2 (starting at line 39 of Fig. 3), participants needed to refund the new bid, since it was not larger than the previous bid. Among the nine Solidity participants who said they finished the task, two refunded the bid properly (using pendingReturns); four refunded via transfer, which would not have resulted in asset loss but was inconsistent with the documentation we gave them; four attempted to refund via pendingReturns but, as in the first subtask, overwrote any previous refund, potentially losing money.

One might argue that the potential for asset loss due to improper use of pendingReturns was due to the need to use the withdrawal pattern (Ethereum Foundation, 2020d), as discussed above. However, the particular bug we observed was due to participants overwriting an integer rather than adding to it, and we infer that arithmetic errors are likely common when manipulating assets manually. Obsidian protects against these bugs by using assets to represent money.

Solidity Obsidian
Completed task correctly 2 7
Completed task with bugs 7 1
Time in min., completed tasks only (standard deviation) 12 (6.9) 12 (7.13)
Did not complete the task 1 2
Table 3. Auction task results.
Solidity Obsidian
Ran out of time 1 2
Lost an asset in either subtask 7 0
Subtask 1
      omitted refund of old bid 3 0
      overwrote old refund 4 0
      refunded to wrong bidder 0 1
Subtask 2
      overwrote old refund 4 0
      refunded via transfer() instead of pendingReturns 4 N/A
Table 4. Errors in Auction task.

We conclude (RQ A.2) that asset loss was frequent among Solidity users, and more frequent than among Obsidian users, who did not lose any assets (, Fisher’s exact test).

7. Prescription Task

We gave participants a short Pharmacy contract (43 lines in Obsidian or 46 lines in Solidity including whitespace). The code included an example to show how the contract was vulnerable to attack. Although a Prescription was specified to only permit a fixed number of refills, a Patient could invoke depositPrescription on more than one Pharmacy object, resulting in the patient being able to refill the prescription the given number of times at each pharmacy. We asked participants to fix the bug, avoiding runtime checks if possible. In Solidity, for example, depositPrescription had the signature below:

function deposit(Prescription p) public returns (int);

In Obsidian, the starter code provided this signature:

transaction deposit(Prescription@Shared p) returns int;

In Obsidian, it sufficed for participants to change the signature so that the Pharmacy acquired ownership of the prescription object:

transaction deposit(Prescription@Owned >> Unowned p)
            returns int;

In Solidity, in contrast, since there is no static feature that would make the above safe, participants had to implement a global tracking mechanism across all Pharmacy objects.

The task was based on a task from a prior study, which found that users of an earlier version of Obsidian had great difficulty using ownership to fix this problem (Coblenz et al., 2019a). For example, some participants in that study thought about ownership in a dynamic way (for example, writing if statements to test ownership) or were confused about when ownership was transferred between references. The version of the language used in the present study includes changes that resulted from that work, such as fusing typestate and ownership in the language syntax, making ownership transfer explicit in transaction signatures, and removing local variable ownership annotations. Our research questions were centered around evaluating the revised language:

RQ P.1::

How effectively could Obsidian participants use ownership to fix the multiple-deposit vulnerability?

RQ P.2::

Does using ownership to prevent the multiple-deposit vulnerability take less time than using a traditional dynamic approach?

We gave participants 35 minutes to complete the task. Because ownership-based approaches are not checked by the Solidity compiler, Solidity participants who proposed ownership approaches were permitted to continue working in the remaining time.

7.1. Results

Table 5 summarizes the results. Regarding RQ P.1, six of the ten Obsidian participants successfully used ownership to solve the problem. Although the previous study (Coblenz et al., 2019a) was not quantitative, based on the data from that study, we believe this is a substantial improvement. The dynamic Obsidian solution that we judged to be correct tracked global state by making Prescription mutable, despite a comment indicating that Prescription should be immutable.

Five of the ten Solidity participants tried to use ownership, even though Solidity does not check ownership. Only three of the Solidity participants said that they were done within the time limit, and of those, only two had a correct solution. The incorrect Solidity solution attempted to solve the problem by making Prescription mutable to track remaining refills globally, but in addition, although the participant tried to track the number of refills across all pharmacies, the code did not update the global number of refills when refilling a prescription.

Solidity Obsidian
Attempted a static solution 5 6
Correct static solution N/A 6
Attempted a dynamic solution 6 3
Correct dynamic solution 2 1
Made Prescription mutable 2 1
Completed within time limit 3 9
Time among participants who did not run out of time 22 (3) min. 25 (12) min.
Table 5. Summary of Prescription task results. Times are shown as mean (standard deviation).

Regarding RQ P.2, we did not observe a significant difference in completion times. However, due to the small amount of code required for the static solution, we suspect that there is a significant learning effect to be leveraged here, and the participants who succeeded could likely do so again in a similar situation much faster.

7.2. Discussion

One might have expected that applying ownership would be challenging, since the concept was new to the participants, but since only half of those who said they had completed a dynamic solution had correct solutions, it would appear that using the new static construct may not be harder than writing global state tracking code. In fact, using ownership to solve programming problems is teachable: six of nine Obsidian participants who completed the task used ownership to do so; we expect that the remaining three could be taught to do so with additional practice.

Security experts have long argued in favor of immutable data structures (Oracle Corp., 2019; Seacord, 2013), which is one reason why we specified that Prescription was immutable. However, these results point out that this approach may not be tenable: specifications of immutability may be ignored or removed, and attempts to maintain immutability require substantial work, which may itself be bug-prone. Indeed, when language-based mechanisms do not provide the required safety properties (Coblenz et al., 2017), it may be safer and cheaper to use a mutable design than to bear the cost of immutability.

The benefits of Obsidian’s ownership system in the Prescription task contrast with the benefits of other kinds of ownership. For example, ownership in Rust (Mozilla Research, 2015) introduces constraints on mutation. However, in Rust, only owning references can mutate objects, whereas in Obsidian, only modification of nominal state is restricted, and then only through Unowned references. In Obsidian, mutation is restricted only as much as is needed to provide sound typestate specifications, since concurrency is not a concern. Likewise, ownership types are sometimes used for expressing encapsulation (Clarke et al., 1998; Boyapati et al., 2002), in which case restricting mutation is also not relevant. A significant fraction of participants were able to use the linear aspects of ownership alone in the Prescription task, suggesting that languages that adopt just linearity (and not mutability restrictions) may be usable.

Ownership is typically understood to be one of the more challenging aspects of learning Rust (Yegulalp, 2018). Our usability results for Obsidian suggest that perhaps by integrating a more flexible permissions system, languages such as Rust could be made more convenient for common cases, and thus have a more gradual learning curve.

Although a majority of the Obsidian participants used ownership successfully, four participants did not. One of the four participants did not complete any of the three tasks. Another “fixed” the issue by modifying code in Patient that we had provided as an example of how a nefarious patient might exploit the bug; perhaps this participant did not really understand the task. The other two seemed to need more time studying ownership in order to use it effectively.

The instructions included: “Please use what you have learned today to fix this problem (avoiding runtime checks if possible).” Perhaps as a result, a similar fraction of participants tried to use ownership in both conditions. We wanted to encourage the Obsidian participants to use ownership so that we could assess to what extent they could use it effectively, but perhaps the instructions persuaded some of the Solidity participants to use it even though doing so was not checked by the compiler.

8. Casino Task

Nine Solidity and five Obsidian participants completed the Casino task. We excluded one Obsidian participant who should have received an error from the compiler but, due to a bug, did not. We excluded one Solidity participant and four Obsidian participants who did not have enough of their four hours remaining, and in the time available, did not feel they had finished the task. This left nine Solidity participants and five Obsidian participants.

The results of a comparison between conditions on this task may be biased because the Obsidian participants who were included for analysis in this task are those who completed the earlier tasks fastest. As a result, the Obsidian programmers may have been stronger programmers on average than those in the Solidity condition (in which almost all the participants had time to try the task).

Casino was more open-ended than the other tasks, resulting in more variance, as participants made varying implementation choices. Given this, the small numbers, and the potential bias, rather than focusing on a statistical analysis, we use this primarily an opportunity to develop hypotheses and design insights.

We gave participants a web page with a diagram showing invocations that needed to be supported (Fig. 4). The web page included a list of requirements:

  1. If a Bettor predicts the outcome correctly, the Bettor gets twice the Money they put down. For example, if Bettor b puts down 5 tokens on the correct outcome, they should receive 10 tokens after the Game is played.

  2. If the Bettor predicted incorrectly, the Casino keeps their tokens.

  3. Bets can only be made before the Game starts.

  4. Winnings can only be distributed after the Game is finished.

  5. Bettors must collect winnings themselves from the Casino after a Game by calling code, which you need to write. Until winnings are collected, the Casino keeps track of them.

  6. A Bettor can have one active bet per game. If a Bettor bets more than once, their original bet should be replaced by the new one and any previous bet should be refunded.

  7. A Bettor MUST put down tokens at the same time that they’re making a Bet.

  8. If the Casino does not have enough tokens available to pay out winnings, the invocation to collect winnings can fail.

Figure 4. Sequence diagram given to participants to show what operations the Casino contract should support.

We provided starter code for Casino, Game, and Bet. Obsidian participants also received implementations of appropriate containers (Solidity has built-in containers that participants could use).

We used the Casino task to investigate four research questions:

RQ C.1::

To what extent do Obsidian participants leverage typestate in transaction signatures to avoid dynamic checks?

RQ C.2::

In both versions, programs represented funds with Token objects. Does Obsidian’s type system help participants avoid losing Token objects compared with Solidity?

RQ C.3::

Do Obsidian participants view Token objects as resources that should not be created or destroyed, or as data, which could be created and destroyed as needed?

RQ C.4::

How do task completion times compare between Solidity and Obsidian participants?

8.1. Results and Discussion

One participant in the Obsidian condition gave up after 1 hour, 15 minutes. That participant had chosen an unnecessarily difficult implementation strategy, requiring implementing a new container (implemented as a linked list). Also, the participant delayed trying to compile until after writing a lot of code, resulting in a large collection of compiler errors. The remaining four Obsidian participants all wrote code that compiled successfully. One participant in the Solidity condition gave up after 39 minutes, having received a parser error that they were not sure how to fix.

Regarding RQ C.1, we had expected that some participants would define states in Casino to correspond with the states in Game (BeforePlay, Playing, FinishedPlaying). By doing so, they could avoid dynamic state checks that would otherwise be needed to implement requirements 3 and 4. Instead, all five of the Obsidian participants wrote code in the corresponding Casino transactions to check the current state of the Game. The participant who gave up tried to check the state with a static assertion, evidently not understanding that the assertion was static, not dynamic.

The lack of usage of static state information is unfortunate because it represents a missed opportunity to rule out bugs in calling code. Perhaps more-experienced programmers would be more interested in leveraging this language feature; alternatively, it might require more training or a more-convenient language design. The design the participants chose may have been best given their incentives; the typestate-based approach would have required adding more structure to reflect the typestate relationships between Casino and Game. However, motivated by the results of this study, we hope to consider future language design changes that make typestate coupling of different objects convenient.

The results here contrast with the results from the Auction task, in which participants did leverage state constructs that were already present. Perhaps creating new interfaces that use novel verification-related features (such as typestate) is harder than consuming them, in which case further research should consider novel ways of scaffolding interface design and creation.

Surprisingly, in RQ C.2

, we observe that in fact, Solidity participants were probably more likely to have the casino keep tokens when a bet is lost (Table

7, , Fisher’s exact test). Likewise, Solidity participants may be more likely to successfully issue refunds for bets after the first bet (, Fisher’s exact test). We believe this is related to abuse of disown by Obsidian participants.

Three Obsidian participants who finished the Casino task used disown improperly to throw away assets. We found this surprising, since we had warned the participants against improper use of disown. The tutorial included an example of how disown might be needed inside the implementation of a Money contract, and wrote below the example:

IMPORTANT: disown should be used only when you really want to throw something out. Above, disown is required because of the manual arithmetic used to manipulate amount inside the implementation of Money, but it is not needed in most normal code.

It would not have sufficed to remove disown from the language; in addition to the fact that it is needed in certain (rare) cases, programmers could build a Trash contract to hold discarded objects, thus suppressing any errors the compiler would emit. We have several hypotheses regarding why disown was abused:

  • Some participants may have used disown to silence the compiler when they felt they had a correct solution. One participant discarded the old wager, using money from the casino’s pot to pay out bets. The participant also disowned the bet when a losing bettor tried to retrieve their winnings (a correct solution would have put the tokens in the casino’s pot).

  • Some participants may not have read or understood the tutorial’s warning about disown; in retrospect, we should have assessed understanding explicitly.

  • Some participants did not sufficiently understand the notion of assets. For example, when disbursing winnings, one solution disowned the previous wager and created new Tokens when needed, rather than reusing the tokens from the wager.

  • Some participants may have used disown as a workaround for an unsolved problem. In code to accept a new bet, one participant disowned any previous bet by that bettor, and then wrote: // Currently just throws the bettor’s money away and hopes they find it eventually.

This motivates a question for future research to characterize the use of these escape hatch constructs, which can be used in both safe and dangerous ways. Some languages include warnings in the names of such constructs, as in unsafePerformIO in Haskell (of Glasgow, 2001), but such approaches may not be effective in explaining the danger.

Risk compensation (Contributors, 2020) refers to the idea that people compensate for safety features by taking additional risks. For example, drivers may drive faster when wearing seat belts. Further study is needed to consider the question of whether risk compensation occurs with strong type systems. Perhaps some participants who used disown assumed that if there were a bug, the compiler would report it.

We hope that showing how we identified usability problems with Obsidian in this study will show others how they might do so for other languages. As another example, a study might have identified NULL as a design error before it became a common feature (Hoare, 2009).

Regarding RQ C.3, all four of the participants who wrote solutions that compiled treated tokens as data rather than assets, i.e., at some point in their code, they either created new tokens or disowned tokens. The Obsidian participant who gave up did not do either of those things, likely viewing tokens more as assets, but became mired in a list of type errors. We conclude that use of assets was likely not natural for our participants. This interpretation is consistent with the post-study survey results (§8), in which we observed that participants said they felt ownership and states were more useful than assets.

For RQ C.4, Obsidian participants spent significantly longer on the Casino task than Solidity participants did (, Mann-Whitney U test, ). Therefore, the stronger type system provided by Obsidian likely has a significant cost in development time. We hypothesize that this cost is greater with more open-ended tasks, which would explain why we did not observe this difference in the two prior tasks. Of course, the additional cost may be worthwhile since Obsidian rules out some classes of bugs statically, particularly when used by skilled programmers who do not abuse disown.

Solidity () Obsidian ()
Had enough time to try Casino 9 5
Completed Casino with a program that compiled 8 4
Table 6. Summary of Casino task completions.
Solidity () Obsidian ()
Casino keeps tokens when a bet is lost 8 2
Bettor’s extra bets result in refunds 8 3
Mean completion time 37 min. 64 min.
Table 7. Summary of Casino task results among completed programs that compiled.

9. Post-study Survey Results

We conducted a post-study survey asking participants about their opinions. Several of the Solidity participants expected that the language would have included constructs for states or ownership. For example, one participant wrote in the post-study survey:

It also seemed like there should be some syntactic sugar for writing things like:

enum State { Foo, Bar, Buzz }
State s

since they are so common.

Three Solidity participants expected that the compiler would check ownership. For example:

On [semantics] — I was hoping ownership / assets / states would be statically verified. When I wrote code during the 2nd phase of the study, I found that I didn’t really document ownerships / asset status.

Similarly, from another Solidity participant:

I think it would be nice to have some static analysis to check ownership information rather than relying on the programmer to have good comments documenting ownership because in practice documentation is never perfect and often overlooked.

Participants using Obsidian had differing opinions regarding how easy it was to learn. One wrote:

The tutorial and the exercises are well-written and they helped me a lot in understanding the concepts of new language!

Another Obsidian participant wrote:

The smaller coding exercises were nice to follow and complete. The open-ended part was a little overwhelming to finish, for somebody that just got introduced to new concepts of ownership, states and assets.

One Obsidian participant commented on how ownership seemed natural after some practice:

…the general concepts of ownership [were] a little unintuitive but after working with the language they started to make more sense and seem more natural….

We asked participants (on a 5-point scale) how well they understood particular concepts and how useful they thought those concepts were. The results from participants who completed the survey before being de-briefed from the study are summarized in Table 8. Although the Obsidian participants said they thought ownership was significantly more useful than the Solidity participants did (, ), the Solidity participants indicated that they felt they understood states better (, ). Perhaps the existence of an unfamiliar state construct in Obsidian, or the unfamiliarity of the relationship between states and types, led to less confidence. There was no significant difference in views of the utility of states. This may be due to the tasks we gave, which did not particularly rely on the static aspects of states.

Solidity

(N=6)

Obsidian

(N=8)

How much did you like the language you used? 3.7 (0.82) 4.0 (0.53)
How well do you feel you understand the concept of ownership? 3.8 (0.98) 3.75 (0.99)
*How useful do you think ownership is? 3.0 (1.1) 4.88 (0.36)
*How well do you feel you understand the concept of states? 4.8 (0.41) 4.1 (0.64)
How useful do you think states are? 4.3 (0.81) 4.1 (0.64)
How well do you feel you understand the concept of assets? 3.2 (0.98) 3.4 (1.3)
How useful do you think assets are? 2.7 (0.52) 3 (1.2)
Table 8. Perceptions of ownership, states, and assets. Cells show average (standard deviation). * indicates that a Mann-Whitney U test shows a significant difference at .

We also compared across questions. Obsidian participants said that they felt both ownership and states were more useful than assets (, and , , respectively, according to a Mann-Whitney U test and Cohen’s ). The differences between perceptions of understanding between ownership and assets and between states and assets were not significant at . Of course, perceptions of understand and utility were influenced by the particular tasks the participants did and their perceived success at doing those tasks, but these results correspond with the Obsidian participants’ failure to regard tokens as assets in the Casino task.

10. Limitations

The student participants may not be representative of the population of smart contract programmers. However, since most of the students had some professional experience, they were likely representative of entry-level programmers in industry (Stack Overflow, 2019). The tasks were more constrained than real-world programming tasks, although smart contracts tend to be small in practice, averaging 322 lines (Pinna et al., 2019). Table 9 describes solution lengths in our study.

Solidity Obsidian
Min Max Min Max
Auction 291 355 352 395
Prescription 455 570 457 518
Casino 257 425 135 416
Table 9. Ranges of solution lengths in lines of code.

The participants were new to the programming languages and to smart contract development in general, so it is possible that experienced programmers would have behaved differently. Although we tried to infer how particular aspects of the languages and their type systems affected participants’ behavior and performance, because this was a summative study, the results may have been influenced other aspects of the experience, such as the way in which different aspects of the type systems interacted or the details of the particular tasks we gave participants.

11. Related Work

Other programming languages were designed to improve smart contract safety. Flint (Schrans et al., 2019) is a typestate-oriented language that supports linear assets. Pact (Kadena, 2019) is Turing-incomplete, avoiding nonterminating behavior. Scilla (Sergey et al., 2019) is an intermediate language whose semantics were formalized in Coq; it represents programs as communicating automata, avoiding complex inter-contract transactions (instead requiring that these be implemented as continuations). None of the above languages were evaluated empirically.

Delmolino et al. (Delmolino et al., 2016) described a user study of Serpent, which was a precursor to Solidity, showing several classes of bugs that occurred in the lab. Among these was asset loss, which motivated our study to see whether our participants would avoid these bugs when using Obsidian.

Coblenz et al. argued for using a variety of methods when designing programming languages (Coblenz et al., 2018). Stefik and Hanenberg focused on the need for empirical evaluation of programming languages (Stefik and Hanenberg, 2014). For example, Stefik et al. developed methodology to evaluate syntax choices for novice programmers (Stefik and Siebert, 2013). Hanenberg et al. compared static and dynamic type systems (Hanenberg et al., 2014), finding benefits of static type systems in both documentation and compile-time checking. Uesbeck et al. evaluated the benefit of C++ lambdas (Uesbeck et al., 2016), finding that no benefit even for the purposes for which lambdas were created. The only other quantitative empirical studies we are aware of for complete, novel programming languages (as opposed to ones that were already familiar to the participants) were of Quorum (Stefik and Siebert, 2013) and HANDS (Pane et al., 2002). Other work has focused on language extensions, such as for software transactional memory (Pankratius and Adl-Tabatabai, 2014) and immutability (Coblenz et al., 2017).

Several languages have integrated support for linearity. Wadler (Wadler, 1990) proposed the use of linear types for programming languages. In functional languages, linearity may take the form of session types (Caires and Pfenning, 2010). This approach mirrors typestate, since channel types (expressed as session types) change as messages are sent through them. Typestate was originally proposed by Strom and Yemini (Strom and Yemini, 1986), but more recent work by DeLine, Aldrich, Bierhoff, and others describes how object-oriented languages can be used with typestate (DeLine and Fähndrich, 2004; Aldrich et al., 2009; Bierhoff and Aldrich, 2007). Garcia et al. gave a formalization of typestate (Garcia et al., 2014). None of the these languages were empirically evaluated with users. Coblenz et al. designed Obsidian (Coblenz et al., 2019b) using user-centered design (Coblenz et al., 2019a); this paper focuses not on the design or design methodology but on an empirical comparison between Obsidian and Solidity.

12. Future Work

Casino, which included significant use of nominal states, resulted in Obsidian participants writing dynamic checks. Future work should investigate the extent to which typestate, as provided in Obsidian and other typestate-oriented languages, can be made more compelling for programmers. For example, when pairs of objects have states that are coupled, the language could provide features to make representing and using this relationship convenient. Likewise, the existence of risk compensation (Contributors, 2020) among programmers should be investigated in future studies.

A study that included testing and debugging would be more representative of real-world use. Also, the participants were new to the language they used in the study. In a study of experienced Obsidian and Solidity programmers, we hypothesize that the task completion time difference would diminish significantly but not disappear, and the Obsidian users would take more advantage of Obsidian’s security features.

13. Conclusion

As programming languages are tools for empowering programmers, empirical methods offer an opportunity for designers to provide evidence of the benefits of their work. For instance, in our study, we showed that ownership alone can be used effectively with a short training period, that assets can be used to detect bugs that would otherwise likely be inserted, and that typestate may need additional training or refinement to be used effectively by some programmers. Although few language designs have been evaluated in this way, our work shows that it is possible to empirically evaluate a novel language, to both support hypotheses of usability as well as identify areas for potential improvement. We also hope that our findings of usability for the less-common type system features we analyzed will lead to more adoption of safer, more sophisticated type systems in future languages.

Acknowledgements.
We would like to thank the many anonymous participants in our experiments. This material is based upon work supported by the Sponsor National Science Foundation Rlhttp://dx.doi.org/10.13039/100000001 under Grants Grant #3 and Grant #3, by the U.S. Department of Defense, and by Sponsor Ripple Rlhttps://www.ripple.com. In addition, the first author is supported by an IBM PhD Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

References