A formalisation of the SPARC TSO memory model for multi-core machine code

06/24/2019 ∙ by Zhé Hóu, et al. ∙ 0

SPARC processors have many applications in mission-critical industries such as aviation and space engineering. Hence, it is important to provide formal frameworks that facilitate the verification of hardware and software that run on or interface with these processors. This paper presents the first mechanised SPARC Total Store Ordering (TSO) memory model which operates on top of an abstract model of the SPARC Instruction Set Architecture (ISA) for multi-core processors. Both models are specified in the theorem prover Isabelle/HOL. We formalise two TSO memory models: one is an adaptation of the axiomatic SPARC TSO model, the other is a novel operational TSO model which is suitable for verifying execution results. We prove that the operational model is sound and complete with respect to the axiomatic model. Finally, we give verification examples with two case studies drawn from the SPARCv9 manual.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As multi-core processors prevail in computers, it is important to provide a formal specification of the instruction set architecture (ISA) and weak memory model that establishes the precise principles of concurrent low-level programs and the contract between hardware and software. ISA provides the semantics of instructions and processor operations, and it is essential in formal verification of the correctness and security of micro-kernels [17, 14]. Weak memory behaviour is particularly important for low-level system code such as synchronisation libraries, concurrent data structures, concurrent program compilers, etc [30]. The main purpose of such a specification is to [31]

“Allow hardware designers and programmers to work independently, while still ensuring that any program will work as intended on any implementation.”

Sindu and Frailong also point out that a specification should be formal so conformance to specification can be verified at some level [31]. Interactive theorem proving allows one to specify theories in rigorous mathematics and logic, and to reason about the specification with machine assisted tools. Deductive verification methods used in theorem provers enable the verification of complex infinite-state systems, where automatic techniques such as model checking struggle. As a result, formal verification projects, such as the renowned seL4 [17] and CertiKOS [14], rely on theorem provers and mechanised models to provide a higher level of confidence that the formalisation is correct. In our context, “formal” means that the model not only is specified in mathematics, but also is mechanised in a theorem prover.

The state-of-the-art on ISAs models cover different architectures such as Intel, AMD, SPARC, and PowerPC (references). Some of these formalizations also include weak-memory models to model multi-core architectures but as far as we are aware of, there are no formalisations of the weak memory model for the SPARC ISA. The multiprocessor SPARC architecture is adopted by the European Space Agency (ESA) to develop SPARC-based LEON multi-core processors in their space-crafts for critical missions [6]. In order to formally verify concurrent software running in top of these CPUs down to the lowest layers of the execution stack, it is necessary to formalise the SPARC ISA and its weak memory model. To assist with the verification tasks, we need a model that (1) supports SPARC ISA for multi-core processors, and (2) is formalised in a theorem prover. We focus on the SPARC TSO memory model since the critical software in our application uses TSO to avoid complex programs that require PSO. This work solves the above problems and serves as a case study to the verification community for our specific needs.

We build upon the single-core SPARCv8 ISA model of Hóu et al. [16], which has been tested against a LEON3 simulation FPGA board for correctness, and develop a new SPARC ISA model for multi-core processors. The new ISA model abstracts the detailed operational semantics in the SPARCv8 ISA model into more general semantics while retaining the same operations in successful executions. Therefore, the previous experimental validation still holds for successful executions of the abstracted semantics. The new semantics is more suitable to be used as an interface for memory operations. The new ISA model is also an adaptation because various considerations for multi-core processors are taken into account. We drop the suffix “v8” for the abstract ISA model because we extend the SPARCv8 model with features and instructions from the SPARCv9 architecture. Specifically, we include the SPARCv9 atomic load-store instruction Compare and Swap (CASA), which is not present in SPARCv8 manual but is implemented on certain SPARCv8 processors. CASA is crucial for symmetric multi-processors (SMP).

On top of the abstract ISA model, we give two TSO models: the first one is a formalisation of the axiomatic SPARC TSO model [31, 32]; the second one is an operational TSO model which can be used to reason about program executions. The integration of instruction semantics and weak memory model is essential to support formal reasoning about concurrent programs, but this problem is sometimes neglected in the weak memory literature [30]. We show that the operational TSO model is sound and complete with respect to the axiomatic model. That is, every execution given by the operational model conforms with the axioms, and every sequence of memory operations that conforms with the axioms can be executed by the operational model. Finally, we give two case studies based on the “Indirection Through Processors” program and spin lock with CASA, both of which are drawn from the SPARCv9 manual, to exemplify verifications on the order of memory operations as well as on the result of execution. All the models and proofs in this paper are formalised in Isabelle/HOL111http://securify.sce.ntu.edu.sg/MicroVer/SparcTSO/TSO.zip.

2 Related Work

An essential part of our work is the formal model for SPARC instruction semantics. There has been much work on formalising various instruction set architectures, but they focus on instruction level modelling instead of memory operations. A model of the SPARCv9 architecture is given by Santoro et al. [28], but their model is not formalised in a theorem prover. Hóu et al. [16] formalise the ISA for the integer unit of SPARCv8 single-core processors. Their model can be exported for execution, and they have proven an instruction level noninterference property for the SPARCv8 architecture. Fox et al. give various models for ARM [8, 11], they also build a framework for specifying and verifying ISAs [9, 10]. Goel et al. has a framework for building ISA models in ACL2 [12]. There are also formalisations for compilers for PowerPC, ARM, and IA32 processors [18, 19], and for JVM [20, 3]. Our ISA model differs from the above work in that we model multi-core processors.

There is an non-exhaustive list of literature on relaxed memory models, but most of them do not consider machine code semantics. Here we only discuss the most closely related ones. Typically memory models appear in two forms: axiomatic model and operational model. The axiomatic TSO memory model for SPARC is given by Sindhu and Frailong [31]. This model is used in the SPARCv8 manual [32], and is later referred to as the “golden memory model” [21]. Petri and Boudol [26, 4] give a comprehensive study on various weak memory models, including SPARC TSO, PSO, and RMO. They show that the store buffer semantics of TSO and PSO corresponds to their semantics of “speculations”. Gray and Flur et al. [13, 7]

have established axiomatic and operational models for TSO, and their equivalence. Their work is also integrated with detailed instruction semantics for x86, IBM Power, ARM, MIPS, and RISC-V. They have developed a language called Sail for expressing sequential ISA descriptions with relaxed memory models that later can be translated into Isabelle/HOL. However, the current set of modelled ISA does not include any variance for the SPARC ISA. Although it would have been possible to rewrite the semantics of 

[16] in Sail, this language lacks some important features necessary for our work. First, Sail does not provide some low level system semantics such as exceptions and interrupts; second, their framework does not include an execution model for multi-core processors.

Besides Burckhardt’s work, there are other tools and techniques developed for verifying memory operations. Notably, Hangal et al.’s TSOtool [15] is a program for checking the behaviour of the memory subsystem in a shared memory multiprocessor computer aginst the TSO specification. Although verifying TSO compliance is an NP-complete problem, the authors give a polynomial time incomplete algorithm to efficiently check memory errors. Companies such as Intel also actively work on tools for efficient memory consistency verification [27]. Roy et al.’s tool is also polynomial time and is deployed across multiple groups at Intel. A tool specialised for SPARC instructions is developed by Park and Dill [25].

There are also memory models that are formalised in theorem provers, such as Yang et al.’s axiomatic Itanium model Nemos in SAT solvers and Prolog [34] and the Java Memory Model in Isabelle/HOL [2]. Alglave et al. formalised a class of axiomatic relaxed memory models in Coq [1]. Crary and Sullivan formalise a calculus in Coq for relaxed memory models [5]. Their calculus is more relaxed than existing architectures, and their work is intended to serve as a programming language. A more related work is Owens, Sarkar, Sewell, et al.’s formalisation of x86 ISA and memory models [29, 24, 30]. They formalise both the ISA and relatex memory models such as x86-CC and x86-TSO in HOL and show the correspondence between different styles of memory models. It is possible to translate Gray and Flur et al.’s work [13, 7] to Isabelle/HOL or Coq code. However, the resulting formal model would rely on the correctness of the translation tool such as Lem [22], which adds one more layer of complication in our verification tasks.

3 SPARC Abstract Instruction Set Architecture

This section presents our abstract SPARC ISA model, which is an abstraction and adaptation of the one of Hóu et al. [16]. The previous model is suitable for reasoning about operations at instruction level, but it is too complex and detailed to reason about memory operations. Hence we abstract their work into a more general model with big-step semantics and less SPARC specific features. Besides the non-memory-access instructions in the integer unit, we focus on the following instructions for memory access: load (LD), store (ST), swap memory and register content (SWAP), and compare and swap (CASA). The latter two are atomic load-store instructions.

3.1 Mapping from Instructions to Memory Operation Blocks

To bridge the gap between the instruction semantics level and the memory operation level, we define the concept of program block as a list of instructions where there can be at most one instruction for memory access (load, store, etc.), and the memory access instruction must be the last instruction in the list. Intuitively, a block of instructions in the ISA model corresponds to a memory operation in the memory model, with an exception discussed below. We illustrate program blocks with the example in Figure 1.

Figure 1: Illustration of memory operation blocks.

Given a list of instructions for the processor to execute, we identify the memory access instructions (in bold font, such as LD, ST) and divide the list into several program blocks. In the example in Figure 1, there are instructions after the last memory access instruction, they form a block as well (block 6), although strictly speaking they are not memory operations. In the SPARC TSO axiomatic model [31], an atomic load-store instruction is viewed as two memory operations where the load part and the store part have to be executed atomically. In correspondence, we split an atomic load-store instruction, such as SWAP, into two parts and put them in two consecutive program blocks (block 3 and 4 in Figure 1). We assume that each program block can be uniquely identified. This gives rise to a mapping from an identifier (natural number) to a program block. The latter is a tuple , where is a list of instructions, (natural number) is the processor in charge of executing the code, and is the identifier of the load part of an atomic load-store instruction (optional).

We distinguish the types of program blocks by the memory operation involved in it. Program blocks without memory operations are called non-mem block, whilst program blocks including memory operations are called memory operation blocks. A memory operation block is a load block when it has an LD, it is a store block when it has an ST. An atomic load block has either SWAP_LD or CASA_LD, whereas an atomic store block has either SWAP_ST or CASA_ST.

In contrast to the SPARCv8 ISA model, here we lift the processor execution to be oriented on program blocks, based on the program order. A program order is the order in which a processor executes instructions [31]. Since we can identify program blocks using their identifiers we define the program order for a processor as a mapping from to a list of identifiers: .

Given a program order and a processor , the program blocks in this program order are related by a before relation “” as follows:

Definition 1 (Program Order Before)

iff is before in the list of program block identifiers given by .

We shall omit the and/or the in the notation of program order before and write when the context is obvious. Only program blocks issued by the same processor can be related by program order. Thus implicitly identifies a processor.

We divide program execution into two levels: the processors execute instructions and issue memory operations in a given program order; the memory executes memory operations in its own memory order, which will be described in Section 4.

3.2 State and Instruction Semantics

The state of a multi-core processor is a tuple , with the following definitions:

are the control registers (per processor), these include Processor State Register (PSR), which records the current set of registers, whether the processor is in user mode or supervisor mode, etc.; Program Counter (PC); Next Program Counter (nPC), among others. is formally defined as a function where is the processor, is the control register, is the value held by the register (32-bit word).

are the general registers (per processor). Formally, where is the processor, is the address of the register (32-bit word), and is the value of the register. SPARC instructions often use three general registers: two source registers, refered to as and , and a destination register, refered to as . For instance, the addition instruction takes two values from and , and store the sum in . We shall refer to the value of a register in processor as when the context of the processor and the state is clear. SPARC fixes the value at register address to be . So when , we have .

A main memory is shared by all processors. Similar to the machine code semantics for x86 [30], we focus on memory access of word (32-bits) only, and we assume that each memory address points to a word, and data are always well-aligned. Memory is a (partial) mapping

Each processor has a local Boolean variable This Boolean variable is used to record whether the next instruction should be skipped or not after executing branching instructions. We refer to this variable as the annul flag.

All processors share a global variable , which is a pair , where is the id of the atomic load block when the processor is executing the corresponding atomic load-store instruction, or is undefined otherwise. stores the value of the general register for destination which is used in atomic load-store instructions.

records a memory operation. Formally, where is the identifier of the program block for the corresponding memory operation, is the address of the operation, and is the value of the operation. For instance, a store operation writes value at address , whereas a load operation loads value from address . For a given , and are initially undefined. These values are computed during execution of memory blocks.

Finally, indicates whether the state is undefined or not, and gives the index (in the list typically given by ) of the next memory operation to be issued by processor. Formally, , where is a processor and is the index.

To provide consistency w.r.t. the memory model, we split the semantics of atomic load-store instructions into the load part and the store part. The processor executes them separately, but the memory model guarantees that their executions are “atomic”.

We give an example of the formalisation of the CASA instruction below. The SPARC manual [33] specifies the semantics of CASA as follows, where we adapt the setting from -bit registers in SPARCv9 to -bit registers in the SPARCv8 model: The CASA instruction compares the register with a memory word pointed to by the address in . If the values are equal, the value in register is swapped with the contents of the memory word pointed to by the address in . If the values are not equal, the memory location remains unchanged, but the memory word pointed to by replace the value in . We formalise the core of the load part as below, presented in pseudo-code:

Definition 2 (CASA Load)


     if then ; ; ; ;      else ; ; ;

Given a processor and the of a CASA load block, we can obtain the value in processor , and the pair of the operation. When , we store in the temporary global variable , and write into . We then store and in and respectively. When , we do not have to write the register because its value must be . In this definition, is obtained from , and (the value at address ) is obtained from Axiom Value of the TSO model which is described in Section 4.1. The store part is given below:

Definition 3 (CASA Store)


     if then ; ;

We check if has the same value as , which corresponds to in the load part. If this is the case, we then update and with and respectively, where is the same as the address in the load part. Note that instruction semantics is only for processor execution, which does not update the memory. Memory write occurs in the store operation defined in the operational semantics of the TSO model, which is introduced in Section 4.2.

3.3 Processor Execution

Processor execution includes three stages: fetch, decode, and dispatch. Since this model is built for analysing memory operations, we assume that there is a given program order from which we fetch the instructions. This is similar to the concept of “run skeletons” in the x86 weak memory models [29]. Decoding facilities are provided by the SPARCv8 ISA model [16]. Dispatching and executing the instructions require more care because we will be executing blocks (lists) of instructions at a time. For simplicity we only discuss three interfaces in prose here.

Definition 4 ()

Given a processor , program order , program block map , a memory operation block identified by , and , the function

executes the program blocks in the list given by from the position given by to the position of (inclusive). The function returns the state after the above execution. We may simplify the above and write .

The function is used for executing store blocks, atomic store blocks, and non-mem blocks. Load and atomic load blocks require more execution steps. We define the following functions to handle them, assuming the same parameters:

Definition 5 ()

The function

executes until the instruction before the last one in the block . The function returns the state after the above execution. We may simplify the above and write .

Take Fig. 1 for example, if , then executes up to the OR instruction and then stops without executing the SWAP_LD instruction.

Definition 6 ()

The function

which takes an additional 32-bit word value as input, executes the last instruction in the block . The function returns the state after the above execution. We may simplify the above and write .

The function essentially executes the load (or atomic load) instruction by loading the value from memory. Again, take Fig. 1 for example, when , executes the SWAP_LD instruction. Note that we do not need the extra input for executing store instructions because both the address and the value for a store can be pre-computed from the instruction code. For load instructions, however, only the address can be pre-computed from instruction code. We need to execute until the instruction before the load instruction, then invoke the memory model to determine the value to be loaded, which is why we need two steps when executing a load (or atomic load) block.

In this setup, when executing a memory load operation, all previous memory operations in the program order have been executed, and their corresponding addresses () and values () have been updated in the state. This allows us to directly use the SPARC TSO Axiom Value (cf. Section 4.1) to obtain the value of the load operation.

4 SPARC TSO Memory Model

Details of the SPARC TSO model can be found in [31, 32]. This section formalises the axiomatic model in Isabelle/HOL. More importantly, we give a novel operational model, and show that the operational model corresponds to the axiomatic model.

4.1 Axiomatic TSO Model

The complete semantics of TSO are captured by six axioms [31, 32], which specify the ordering of memory operations. The semantics of loads and stores to I/O addresses are implementation-dependent and are not covered by the TSO model. The SPARCv8 manual only specifies that loads and stores to I/O addresses must be strongly ordered among themselves. We adapt these axioms to our abstract SPARC ISA model and formalise them in Isabelle/HOL. Similar to the x86-TSO model [24], we focus on data memory, thus our memory model does not consider instruction fetch and flush.

Besides the program order before relation (cf. Definition 1), the axiomatic model also relies on a before relation over operations but in memory order, which is the order that the memory executes load and store operations. Given a partial/final memory execution represented by a sequence of s, the before relation over two operations and in memory order is defined below as a partial function from the pair to , where we write when is in the sequence :

Definition 7 (Memory Order Before)


     if then        if is before in then else      else if then else if then else

We may loosely refer to a memory order by the corresponding partial/final memory execution sequence . We may write when the context is clear. Note that any memory operation id in the sequence of executed operations x has been already executed by the processor and thus in the current state is defined.

The axiom Order states that in a final execution sequence , every pair of store operations are related by . This axiom is formalised as below:

Definition 8 (Axiom Order)


If both and are either a store or an atomic store block, and both and are in , and , then either or .

The axiom Atomicity ensures that for an atomic load-store instruction, the load part is executed by the memory before the store part , and there can be no other store operations executed between and .

Definition 9 (Axiom Atomicity)


    If   and are from the same instruction instance, and , and is an atomic load block, and is an atomic store block, then , and for all store or atomic store block , if and , then either or .

The axiom Termination states that all store operations eventually terminate. We capture this by ensuring that after the execution is completed, every store operation that appears in the program list of some processor is in the sequence of executed operations. We formalise this axiom as follows:

Definition 10 (Axiom Termination)


If there exists a processor such that , and is a store or atomic store block, then .

The axiom Value states that the value of a load operation issued by processor at address is the value written by the most recent store to that address. The most recent store at could be: (1) the most recent store issued by processor , or (2) the most recent store (issued by any processor) executed by the memory.

Definition 11 (Axiom Value)


Let denote a function that outputs the last element in the order defined by (memory order before) in a set of s.

, and is a store or atomic store block, and is equal to of and is a store or atomic store block, and is equal to of ,

the value to be loaded is of the output of .

Intuitively, the output of is the last element in the order given by from two sets of block ids: The first set includes all the store operations that are before in the memory order and write values at address . The second set includes all the store operations that are before in the program order (given by ) and write values at address . Therefore returns the most recent store operation at address in memory order. We write to denote the value to be loaded for operation based on Axiom Value.

The axiom LoadOp requires that any operation issued after a load in the program order must be executed by the memory after . This is formalised as below:

Definition 12 (Axiom LoadOp)


If is a load or atomic load block, and , then .

The axiom StoreStore states that if a store operation is before another store operation in the program order, then is before in the memory order.

Definition 13 (Axiom StoreStore)


If and are store or atomic store blocks, , then .

Figure 2: Rules for the operational TSO model.

4.2 Operational TSO Model

Compared with other operational memory models such as the x86-TSO model [30], our ISA model enables us to develop a more abstract operational memory model without using concrete modules such as store buffer, which effectively buffers the address and value of most recent store operations. This alleviates the burden of modelling complicated operations and interactions between the processor and the store buffer, and results in a simple and elegant operational memory model. Our operational TSO model is defined via inference rules. An operation takes the form where and are respectively the partial execution sequence and state before the operation, and and are respectively the partial execution sequence and state after the operation.

We shall use the following notations: We write to denote the type of the memory operation block (). We use the following abbreviations for memory operation block types: ld (load), ald (atomic load), st (store), ast (atomic store), non (non-mem). We write for the concatenation of two sequences and . We write for memory commit (write) of operation in state . We define the operation to set the atomic flag to in state . This operation returns a new state. We write to set the flag to undefined. When the operation is an atomic store operation, the function returns the operation such that is the corresponding atomic load operation of the same instruction. This function is otherwise undefined.

The operational TSO model consists of four rules, which are given in Figure 2. The first rule for load operations has two premises: (1) the type of the operation is load; (2) every load operation before in the program order has been executed by the memory. The operation first executes () all instructions in the program order before the last instruction (which must be the load instruction) in the block , then uses Axiom Load () to determine the value to be loaded, and finally executes () the load instruction.

The rule for store operations requires that in state must be undefined. That is, the memory is not in the middle of executing an atomic load-store operation. Also, the rule requires that every load or store operation before in the program order has been executed by the memory. Combining the last premise of , , and respectively, these requirements ensure that axioms LoadOp and StoreStore are respected in execution. For instance, it is possible that a store is issued (by a processor) before a load but is executed (by memory) after the load; but it is not possible that a load is issued before a store but executed after the store. The store operation’s final step is to commit the store operation in memory. This step fetches the value and address of the operation from the state, and writes the value at the address in the memory.

The premises for the rule can be read similarly. The final step of the operation sets to , where is the atomic load operation. Accordingly, the rule requires that the memory has executed the atomic load part , but has not executed the store part. The rule also ensures that the of the store part is indeed . The operation eventually sets the back to undefined and commits the operation in the memory. The premises with regard to and ensure that axiom Atomicity holds in execution.

In addition to the rules for memory operations, to obtain the final result of processor execution, we may need the rule :

This rule executes the block after the last memory operation (e.g., block 6 in Figure 1), if there is any. This rule is not related to the memory model because it does not involve memory operations. It plays no roles in the proofs in the remainder of this section.

4.3 Soundness and completeness of the operational model

We are now ready to present the main results of this work: the operational TSO model is sound and complete w.r.t. the TSO axioms. The previous subsection has briefly discussed that the design of operational rules respects the axioms such as LoadOp, StoreStore, and Atomicity. Axiom Value trivially holds in the operational model because the rule directly uses axiom Value to obtain load result. Axiom Termination is satisfied by the construction of the execution witness sequences, because the part of the final witness is guaranteed to contain all the store operations, which means that the execution of these operations have been completed by the memory. Axiom Order holds because all the executed store operations are recorded in a list, which means every pair of them are ordered. The formal proof of the correspondence of the axiomatic model and the operational model is rather complicated, and here we only discuss the results. Interested readers can check the Isabelle/HOL formalization and proofs222Appendix with proofs is at http://securify.sce.ntu.edu.sg/MicroVer/SparcTSO/appendix.pdf for more details.

Theorem 4.1 (Soundness)

Every memory operation sequence generated by the operational model satisfies the axioms in the axiomatic model.

Theorem 4.2 (Completeness)

Every memory operation sequence that satisfies the axioms in the axiomatic model can be generated by the operational model.

5 Case Studies

With the above work, we can now formally reason about concurrent machine code. The axiomatic model can be used to reason about the order of memory operations, while the operational model is better at reasoning about properties of the execution flow. We run two case studies drawn from examples in the SPARCv9 manual [33]. We may use the term process and processor interchangeably. See Owen’s work [23] for a semantic foundation for reasoning about programs in TSO-like relaxed memory models.

5.1 Indirection Through Processors

Processor Instruction
1 0
1
2 2
3
3 4
5
Table 1: “Indirection Through Processors”.

The “Indirection Through Processors” program is taken from Figure 46 of the SPARCv9 manual [33]. This example intends to reflect the TSO property that causal update relations are preserved. The original program involves three processors, each processor issues two memory operations. A memory operation is given in an “instruction-like” style, e.g., means that the value is stored into address of the memory. Unfortunately in real SPARC store instructions, the value to be stored and the value of the memory address must be taken from registers, so we need to add a few instructions to initialise the registers for this example to work. Our formalised “Indirection Through Processors” example is shown in Table 1. The global register in SPARC always contains . The first instruction in block adds and , and puts the result in register . The in block thus stores at memory address . The in block stores at address . The in block loads the value at address to register . Block then stores the value in at address . Finally, processor loads the values at addresses and to registers and .

Reasoning about memory operation order.

It is intuitive to use the axiomatic TSO model to reason about the order of memory operations. For the program in Table 1, the SPARCv9 manual gives some example sequences of memory operations allowed under TSO, and an example sequence that is not allowed under TSO: . This is because must hold in the program order given by Table 1, and the above sequence implies that in the memory order, which falsifies the axiom StoreStore.

Alternatively, the completeness of the operational TSO model enables us to use the operational model to reason about the possible next step operations. The above reasoning can be confirmed by our operational model in the lemma below:

Lemma 1

Lemma 1 states that given a partial execution sequence which contains only an initialisation step where memory addresses are set to and registers are set of , memory operation block in Table 1 cannot be the first operation to be executed.

Reasoning about execution result.

Besides eliminating illegal executions, one can also use our operational model to reason about the results of legal executions. For instance, the SPARCv9 manual lists the sequence as a legal execution under TSO. For simplicity, here we only show that after a partial execution , the register of processor has value , which is stored to address by processor 1 previously. This shows that a processor can observe the memory updates made by other processors. This is formalised in the following lemma:

Lemma 2

The right hand side of the implication means that in state , the general register of processor contains value . The proof for execution results usually involves a “simulation” of the execution using the abstract ISA model and the operational TSO model. For this example, we start from the initial witness, and prove a series of lemmas about the execution witnesses for the intermediate execution steps. It is straightforward to complete this series of proofs and obtain the result of a final execution.

5.2 Spin Lock with Compare and Swap

Lock
retry:
  mov
  cas
  tst
  be out
  nop
loop:
  ld
  tst
  bne loop
  nop
  ba,a retry
out:
  code in critical region
Unlock
  st
(a) Spin lock using CASA.
Processor Instruction
1 0
1
5
2 2
3
6
3 4
(b) A fragment of formalised spin lock code.
Figure 3: The spin lock example.

Section J.6 of the SPARCv9 manual [33] gives an example of spin lock implemented using the CASA instruction, the code is shown in Figure 2(a). Note that the code in Figure 2(a) is in synthetic instruction format. SPARCv8/v9 manual provides a straightforward mapping from this format to SPARC instruction format, which is what our ISA model supports. For instance, in the retry fragment, the first instruction corresponds to an , which adds the ID of the current process and , and stores the result to register , which corresponds to register . After executing this line, () contains the ID of the current process. The second line is the CASA instruction. It checks whether the memory value at address is equal to the value at (which must be ), and swaps the value at address and the value at register when the above check is positive. Otherwise, the value at address is stored at register . Therefore, when no processes hold the lock, the value at address is , and after executing the second line, () will have and address will contain the ID of the current process. On the other hand, when the lock is held by another process, after executing CASA, the memory address is unchanged, and contains the ID of the process that holds the lock. The code corresponds to an , which checks if is equal to . If it is, then the program branches to , and starts to execute in the critical region. Otherwise, the program goes to and keeps reading the address until it contains a .

We give the fragment of instructions before entering the critical region in Figure 2(b), and consider a concrete situation where two processes (processors) and are competing to get the lock, and process initialises the lock to . Assume that process executes operation first for initialisation, also assume without of loss generality that operation of process is executed by the memory earlier than process ’s operations, we show that process will enter the critical region. The case where operation of process is executed earlier by the memory is symmetric. In this example, we set the address of critical region as relative to the address of the branch instruction , where is sign extended shift to the left.

The proof uses a mixture of the techniques in the previous subsection to obtain valid memory operation sequences and reason about the results. We omit the intermediate steps and show the final lemma below:

Lemma 3

The right hand side of the implication shows that the (next program counter) of processor is the entry point of the critical region, while the of processor points to , after which will lead processor to the loop in Figure 2(a).

6 Conclusion and Future Work

This paper gives an abstraction of the SPARCv8 ISA model in Isabelle/HOL [16]. The new model is suitable for formal modelling and verification at the memory operation level. We also extend the ISA model with semantics for the SPARCv9 instruction Compare and Swap, which is useful in concurrent programs. The more abstract ISA model splits the semantics for atomic load-store instructions into two parts: the load part and the store part, which correspond to the operations in the memory model.

On top of the abstract ISA model, we formalise the SPARC TSO axiomatic memory model in Isabelle/HOL. This model is useful for reasoning about the order of memory operations. We also give a novel operational TSO memory model as a system that consists of four rules. We show that the operational TSO model is sound and complete with respect to the axiomatic model. Finally, we demonstrate the use of our memory models with two examples in the SPARCv9 manual.

All the models and proofs in this paper are formalised in Isabelle/HOL. The abstract SPARC ISA model measures 1960 lines of code, the two memory models and the soundness and completeness proofs constitute 4753 lines of code, the case studies take up 1750 lines of code.

One of our next steps is to generate executable code from our operational TSO model and conduct experiment against real hardware. One can view this as a “validation” step. However, our understanding of the SPARC TSO model is that the TSO axiomatic model came as a part of the SPARCv8 manual before the implementation of actual hardware, thus the TSO axiomatic model should be seen as a standard that the hardware must comply rather than the other way around. Therefore a better validation would be to show that our formalisation of the TSO axiomatic model is consistent with the definitions in the SPARCv8 manual, which is easy to verify.

Our current on-going work is about developing a Hoare-style logic for SPARC machine code. The current framework, which includes the abstract ISA model and the memory models, provides the foundation for the verification of concurrent machine code. However, if a program involves a complex control-flow with branches and loops, it is tedious to use the current models to reason about the program. A Hoare-style logic is much desired to make the reasoning task easier. We envision that this new work will make it easier to prove properties such as reachability, safety, and non-interference.

References

  • [1] Alglave, J., Maranget, L., Sarkar, S., Sewell, P.: Fences in Weak Memory Models, pp. 258–272. Springer Berlin Heidelberg (2010)
  • [2] Aspinall, D., Ševčík, J.: Formalising Java’s Data Race Free Guarantee, pp. 22–37. Springer Berlin Heidelberg (2007)
  • [3] Atkey, R.: CoqJVM: An executable specification of the Java virtual machine using dependent types. In: TYPES. pp. 18–32. LNCS, Springer (2005)
  • [4] Boudol, G., Petri, G.: Relaxed memory models: an operational approach. In: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009. pp. 392–403 (2009)
  • [5] Crary, K., Sullivan, M.J.: A calculus for relaxed memory. In: Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 623–636. POPL ’15, ACM (2015)
  • [6] ESA: ESA LEON processor. http://www.esa.int/Our_Activities/Space_Engineering_Technology/LEON_the_space_chip_that_Europe_built (2017), [Online; accessed 19/06/2016]
  • [7] Flur, S., Gray, K.E., Pulte, C., Sarkar, S., Sezgin, A., Maranget, L., Deacon, W., Sewell, P.: Modelling the armv8 architecture, operationally: Concurrency and ISA. SIGPLAN Not. 51(1), 608–621 (Jan 2016)
  • [8] Fox, A.: Formal specification and verification of ARM6. In: Theorem Proving in Higher Order Logics, LNCS, vol. 2758, pp. 25–40. Springer (2003)
  • [9] Fox, A.: Directions in ISA specification. In: Interactive Theorem Proving, LNCS, vol. 7406, pp. 338–344. Springer Berlin Heidelberg (2012)
  • [10] Fox, A.: Improved tool support for machine-code decompilation in HOL4. In: Interactive Theorem Proving 2015. pp. 187–202 (2015)
  • [11] Fox, A., Myreen, M.O.: A trustworthy monadic formalization of the ARMv7 instruction set architecture. In: Interactive Theorem Proving. pp. 243–258 (2010)
  • [12] Goel, S., Hunt, W.A., Kaufmann, M.: Abstract stobjs and their application to ISA modeling. In: ACL2 2013. pp. 54–69 (2013)
  • [13] Gray, K.E., Kerneis, G., Mulligan, D., Pulte, C., Sarkar, S., Sewell, P.: An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors. In: Proceedings of the 48th International Symposium on Microarchitecture. pp. 635–646. MICRO-48, ACM (2015)
  • [14] Gu, R., Shao, Z., Chen, H., Wu, X., Kim, J., Sjöberg, V., Costanzo, D.: Certikos: An extensible architecture for building certified concurrent os kernels. In: OSDI’16. pp. 653–669. OSDI’16 (2016)
  • [15] Hangal, S., Vahia, D., Manovit, C., Lu, J.Y.J.: Tsotool: A program for verifying memory systems using the memory consistency model. SIGARCH Comput. Archit. News 32(2), 114– (2004)
  • [16] Hou, Z., Sanán, D., Tiu, A., Liu, Y., Hoa, K.C.: An executable formalisation of the sparcv8 instruction set architecture: A case study for the LEON3 processor. In: FM 2016: Formal Methods - 21st International Symposium, 2016, Proceedings. pp. 388–405 (2016)
  • [17] Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: sel4: Formal verification of an os kernel. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles. pp. 207–220. ACM (2009)
  • [18] Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In: In Proceedings. 33rd ACM Symposium on Principles of Programming Languages (2006)
  • [19] Leroy, X.: The CompCert C verified compiler. http://compcert.inria.fr/man/manual.pdf (2015), [Online; accessed 29/01/2016]
  • [20] Liu, H., Moore, J.S.: Executable JVM model for analytical reasoning: A study. In: Proceedings of the 2003 Workshop on Interpreters, Virtual Machines and Emulators. pp. 15–23. ACM (2003)
  • [21] Loewenstein, P., Chaudhry, S.: Multiprocessor memory model verification. In: Proc. Automated Formal Methods. FLoC workshop (2006)
  • [22] Mulligan, D.P., Owens, S., Gray, K.E., Ridge, T., Sewell, P.: Lem: reusable engineering of real-world semantics. In: Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming. pp. 175–188 (2014)
  • [23] Owens, S.: Reasoning about the implementation of concurrency abstractions on x86-tso. In: Proceedings of the 24th European Conference on Object-oriented Programming. pp. 478–503. ECOOP’10 (2010)
  • [24] Owens, S., Sarkar, S., Sewell, P.: A Better x86 Memory Model: x86-TSO, pp. 391–407. Springer Berlin Heidelberg (2009)
  • [25] Park, S., Dill, D.L.: An executable specification, analyzer and verifier for rmo (relaxed memory order). In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 34–41. SPAA ’95, ACM (1995)
  • [26] Petri, G.: Operational semantics of relaxed memory models (2010), thesis
  • [27] Roy, A., Zeisset, S., Fleckenstein, C.J., Huang, J.C.: Fast and Generalized Polynomial Time Memory Consistency Verification, pp. 503–516. Springer Berlin Heidelberg (2006)
  • [28] Santoro, A., Park, W., Luckham, D.: SPARC-V9 architecture specification with Rapide. Tech. rep., Stanford, CA, USA (1995)
  • [29] Sarkar, S., Sewell, P., Nardelli, F.Z., Owens, S., Ridge, T., Braibant, T., Myreen, M.O., Alglave, J.: The semantics of x86-CC multiprocessor machine code. In: Proceedings of the 36th Annual ACM Symposium on Principles of Programming Languages. pp. 379–391. ACM (2009)
  • [30] Sewell, P., Sarkar, S., Owens, S., Nardelli, F.Z., Myreen, M.O.: X86-tso: A rigorous and usable programmer’s model for x86 multiprocessors. Commun. ACM 53(7), 89–97 (Jul 2010)
  • [31] Sindhu, P.S., Frailong, J.M., Cekleov, M.: Formal Specification of Memory Models, pp. 25–41. Springer US, Boston, MA (1992)
  • [32] SPARC: The SPARC architecture manual version 8. http://gaisler.com/doc/sparcv8.pdf (1992), [Online; accessed 27/10/2015]
  • [33] SPARC: The SPARC architecture manual version 9. https://cr.yp.to/2005-590/sparcv9.pdf (1994), [Online; accessed 12/06/2017]
  • [34] Yang, Y., Gopalakrishnan, G., Lindstrom, G., Slind, K.: Nemos: a framework for axiomatic and executable specifications of memory consistency models. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. (April 2004)