The new golden age of computer architecture relies on advances in the design and implementation of computer-aided design (CAD) tools that enhance productivity [10, 21]. While hardware generators have become much more powerful in recent years, the capabilities of verification tools have not improved at the same pace . This paper introduces fault,111https://github.com/leonardt/fault a domain-specific language (DSL) that aims to enable the construction of flexible and portable verification components, thus helping to realize the full potential of hardware generators.
Using flexible hardware generators [16, 1] drastically improves the productivity of the hardware design process, but simultaneously increases verification cost. A generator is a program that consumes a set of parameters and produces a hardware module. The scope of the verification task grows with the capabilities of the generator, since more sophisticated generators can produce hardware with varying interfaces and behavior. To reduce the cost of attaining functional coverage of a generator, verification components must be as flexible as their design counterparts. To achieve flexibility, hardware verification languages must provide the metaprogramming facilities found in hardware construction languages .
However, flexibility alone is not enough to match the power of generators; verification tools must also enable the construction of portable components. Generators facilitate the development of hardware libraries and promote the integration of components from external sources. Underlying the utility of these libraries is the ability for components to be reused in a diverse set of environments. The dominance of commercial hardware verification tools with strict licensing requirements presents a challenge in the development of portable verification components. To encourage the proliferation of verification libraries, hardware verification languages must design for portability across verification tools. Design for portability will also promote innovation in tools by simplifying the adoption of new technologies, as well as enable new verification methodologies based on unified interfaces to multiple technologies.
This paper presents fault, a domain-specific language (DSL) embedded in Python designed to enable the flexible construction of portable verification components. As an embedded DSL, fault users can employ all of Python’s rich metaprogramming capabilities in the description of verification components. Integration with magma , a hardware construction language embedded in Python, is an essential feature of fault that enables full introspection of the hardware circuit under test. By using a staged metaprogramming architecture, fault
verification components are portable across a wide variety of open-source and commercial verification tools. A key benefit of this architecture is the ability to provide a unified interface to constrained random and formal verification, enabling engineers to reuse the same component in simulation and model checking environments.fault is actively used by academic and industrial teams to verify digital, mixed-signal, and analog designs for use in research and production chips. This paper demonstrates fault’s capabilities by evaluating the runtime performance of different tools on a variety of applications ranging in complexity from unit tests of a single module to integration tests of a complex design. These experiments leverage fault’s portability by reusing the same source input across separate trials for each target tool.
We had three goals in designing fault: enable the construction of flexible test components through metaprogramming, provide portable abstractions that allow test component reuse across multiple target environments, and support direct integration with standard programming language features. The ability to metaprogram test components is a vital requirement for scaling verification efforts to cover the space of functionality utilized by hardware generators. Portability widens the target audience of a reusable component and enhances a design team’s productivity by enabling simple migration to different technologies. Integration with a programming language enables design teams to leverage standard software patterns for reuse as well as feature-rich test automation frameworks.
Figure 1 provides an overview of the system architecture. fault is a DSL embedded in Python, a prolific dynamic language with rich support for metaprogramming and a large ecosystem of libraries. fault is designed to work with magma , a Python embedded hardware construction language which represents circuits as introspectable Python objects containing ports, connections, and instances of other circuits. While fault and magma separate the concerns of design and verification into separate DSLs, they are embedded in the same host language for simple interoperability. This multi-language design avoids the complexity of specifying and implementing a single general purpose language without sacrificing the benefits of tightly integrating design and verification code.
To construct fault test components, the user first instantiates a Tester object with a magma circuit as an argument. The user then records a sequence of test actions using an API provided by the Tester class. Here is an example of constructing a test for a 16-bit Add circuit:
tester = Tester(Add16)
The poke action (method) sets an input value, the eval action triggers evaluation of the circuit (the effects of poke actions are not propagated until an eval action occurs), and the expect action asserts the value of an output. Attributes of the Add16 object refer to circuit ports by name.
fault’s design is based on the concept of staged metaprogramming ; the user writes a program that constructs another program to be executed in a subsequent stage. In fault, the first stage executes Python code to construct a test specification; the second stage invokes a target runtime that executes this specification. To run the test for the 16-bit Add, the user simply calls a method and provides the desired target:
By applying staged metaprogramming, fault allows the user to leverage the full capabilities of the Python host language in the programmatic construction of test components. For example, a test can use a native for loop to construct a sequence of actions using the built-in random number library and integer type:
for _ in range(32):
N = (1 << 16) - 1
in0, in1 = random.randint(0, N), random.randint(0, N)
tester.expect(Add16.out, (in0 + in1) & N)
Python for loops are executed during the first stage of computation and are effectively “unrolled” into a flat sequence of actions. Other control structures such as while loops, if statements, and function calls are handled similarly.
Python’s object introspection capabilities greatly enhance the flexibility of fault tests. For example, the core logic of the above test can be generalized to support an arbitrary width Add circuit by inspecting the interface:
# compute max value based on port width (length)
N = (1 << len(Add.in0)) - 1
in0, in1 = random.randint(0, N), random.randint(0, N)
tester.expect(Add.out, (in0 + in1) & N)
This ability to metaprogram components as a function of the design under test is an essential aspect of fault’s design. It allows the construction of generic components that can be reused across designs with varying interfaces and behavior.
fault’s embedding in Python’s class system provides an opportunity for reuse through inheritance. For example, a design team could subclass the generic Tester class and add a new method to perform an asynchronous reset sequence:
def __init__(self, circuit, clock, reset_port):
super().__init__(self, circuit, clock)
self.reset_port = reset_port
# asynchronous reset, negative edge
Combining inheritance with introspection, we can augment the the ResetTester to automatically discover the reset port by inspecting port types:
def __init__(self, circuit, clock):
# iterate over interface to find reset (assumes exactly one)
for port in circuit.interface.ports.values():
if isinstance(port, AsyncResetN):
reset_port = port
super().__init__(self, circuit, clock, reset_port)
2.1 Frontend: Tester API
fault’s Python embedding is implemented by the Tester class which provides various interfaces for recording test actions as well as methods for compiling and running tests using a specific target. By using Python’s class system to perform a shallow embedding , fault avoids the complexity of processing abstract syntax trees and simply uses Python’s standard execution to construct test components. As a result, programming in fault is much like programming with a standard Python library. This design choice reduces the overhead of learning the DSL and simplifies aspects of implementation such as error messages, but comes at the cost of limited capabilities for describing control flow. The fault frontend described in this paper focuses on implementation simplicity, but the system is designed to be easily extended with new frontends using alternative embeddings.
2.1.1 Action Methods
The Tester class provides a low-level interface for recording actions using methods. The basic action methods are poke (set a port to a value), expect (assert a port equals a value), step (invert the value of the clock), peek (read the value of a port), and eval (evaluate the circuit). The peek method returns an object containing a reference to the value of a circuit port in the current simulation state. Using logical and arithmetic operators, the user can construct expressions with this object and pass the result to other actions. For example, to expect that the value of the port O0 is equal to the inverse of the value of port O1, the user would write tester.expect(circuit.O0, tester.peek(circuit.O1)). The Tester provides a print action to display simulation runtime information included the peeked values.
2.1.2 Metaprogramming Control Flow
Notably absent from the basic method interface described above are control flow abstractions. As noted before, standard Python control structures such as loops and if statements are executed in the first stage of computation as part of the metaprogram. However, there are cases where the user intends to preserve the control structure in the generated code, such as long-running loops that should not be unrolled at compile time or loops that are conditioned on dynamic values from the circuit state. For example, consider a while loop that executes until it receives a ready signal:
# Construct while loop conditioned on circuit.ready.
loop = tester._while(tester.peek(circuit.ready))
loop.expect(circuit.ready, 0) # executes inside loop
loop.step(2) # executes inside loop
# Check final state after loop has exited
This logic could not be encoded in the metaprogram, because the metaprogram is evaluated before the test is run, and thus does not know anything about the runtime state of the circuit. To capture this dynamic control flow, the Tester provides methods for inserting if-else statements, for loops, and while loops. Each of these methods returns a new instance of the current Tester object which provides the same API, allowing the user to record actions corresponding to the body of the control construct. The Tester class provides convenience functions for using these control structures to generate common patterns, such as wait_on, wait_until_low, and wait_until_posedge.
2.1.3 Attribute Interface
While the low-level method interface is useful for writing complex metaprograms, simple components are rather verbose to construct. To simplify the handling of basic actions like poke and peek, the Tester object exposes an interface for referring to circuit ports and internal signals using Python’s object attribute syntax. For example, to poke the input port I of a circuit with value 1, one would write tester.circuit.I = 1. This interface supports referring to internal signals using a hierarchical syntax. For example, referring to port Q of an instance ff can be done with tester.circuit.ff.Q.
The Tester object provides methods for specifying assumptions and guarantees that are abstracted over constrained random and formal model checking runtime environments. An assumption is a constraint on input values, and a guarantee is an assertion on output values. Assumptions and guarantees are specified using Python lambda functions that return symbolic expressions referring to the input and output and ports of a circuit. For example, the guarantee lambda a, b, c: (c >= a) and (c >= b) states that the output c is always greater than or equal to the inputs a and b. Here is an example of verifying a simple ALU using the assume/guarantee interface:
# Configuration sequence for opcode register
tester.circuit.opcode_en = 1
tester.circuit.opcode = 0 # opcode for add (+)
tester.circuit.opcode_en = 0
# Verify add does not overflow
tester.circuit.a.assume(lambda a: a < BitVector(32768))
tester.circuit.b.assume(lambda b: b < BitVector(32768))
lambda a, b, c: (c >= a) and (c >= b)
Note that this example demonstrates the use of poke and step to initialize circuits not only for constrained random testing, but also for formal verification.
2.2 Actions IR
In using the Tester API, users construct a sequence of Action objects that are used as an intermediate representation (IR) for the compiler. Basic port action objects, such as Poke and Expect, simply store references to ports and values. Control flow action objects, such as While and If, contain sub-sequences of actions, resulting in a hierarchical data-structure similar to an abstract syntax tree. This view of the compiler internals reveals that the metaphor of recording actions is really an abstraction over the construction of program fragments.
2.3 Backend Targets
fault supports a variety of open-source and commercial backend targets for running tests. A target is responsible for consuming an action sequence, compiling it into a format compatible with the target runtime, and providing an API for invoking the runtime. Targets must also report the result of the test either by reading the exit code of running the process or processing the test output.
2.3.1 Verilog Simulation Targets
The fault compiler includes support for the open-source Verilog simulators verilator  and iverilog , plus three commercial simulators. To compile fault programs to a verilator test bench, the backend lowers the action sequence into a C++ program that interacts with the software simulation object produced by the verilator compiler. For iverilog and the commercial simulators, the backend lowers the action sequence into a SystemVerilog test bench that interacts with the test circuit through an initial block inside the top-level module. One useful aspect of the SystemVerilog backend is its handling of variations in the feature support of target simulators. For example, the commercial simulators use different commands for enabling waveform tracing and iverilog uses a non-standard API for interacting with files. Constrained random inputs are generated using rejection or SMT  sampling.
The CoreIR Symbolic Analyzer (CoSA) is a solver-agnostic SMT-based hardware model checker . fault’s CoSA target relies on magma’s ability to compile Python circuit descriptions to CoreIR , a hardware intermediate representation. CoreIR’s formal semantics are based on finite-state machines and the SMT theory of fixed-size bitvectors . fault action sequences are lowered into CoSA’s custom explicit transition system format (ETS) and combined with the CoreIR representation of the circuit to produce a model. CoSA allows the user to specify assumptions and properties, providing a straightforward lowering of fault assumptions and guarantees.
In addition to being able to test designs with Verilog simulators, fault supports analog and mixed-signal simulators. Compared to the traditional approach of maintaining separate implementations for digital and analog tests, this is a significantly easier way to write tests for mixed-signal circuits. Basic actions such as poke and expect are supported in the SPICE simulation mode, but they are implemented quite differently than they are in Verilog-based tests. Rather than emitting a sequential list of actions in an initial block, fault compiles poke actions into piecewise-linear (PWL) waveforms. Other actions, such as expect, are implemented by post-processing the simulation data.
For designs containing a mixture of SPICE and Verilog blocks, fault supports testing with a Verilog-AMS simulator. This mode is more similar to running SystemVerilog-based tests than SPICE-based tests. In particular, the test bench is implemented using a top-level SystemVerilog module, meaning that a wide range of actions are supported including loops and conditionals. This is a key benefit of using a Verilog-AMS simulator as opposed to a SPICE simulator.
To demonstrate fault’s capabilities, we evaluate the runtime performance of four different testing tasks from the domain of hardware verification. Each task highlights the utility of fault’s portability by reusing the same source input across separate trials of different targets. Due to licensing restrictions, we omit the name of the commercial simulators and replace them with a generic name. The code to reproduce these experiments is available in the artifact.222https://github.com/leonardt/fault_artifact/blob/master/README.md Each experiment involves at least one open-source simulator, but reproducing all the results requires access to commercial simulators.
3.0.1 CGRA Processing Element Unit Tests
To demonstrate the capability of fault as a tool for writing portable tests for digital verification, Figure 2 reports the runtime performance of a subset of the lassen test suite. lassen  is an open-source implementation of a CGRA processing element that contains a large suite of unit tests using fault. Interestingly, we see comparable performance between verilator and commercial simulator 1, while commercial simulator 2 is consistently 5x slower than the others. One important property of the lassen test suite is that it generates a new test bench for each operation and input/output pair. This stresses a simulator’s ability to efficiently handle incremental changes, since each invocation involves a new top-level test bench file, but an unchanged design under test.
|Test||verilator||commercial sim 1||commercial sim 2|
3.0.2 SRAM Array
To demonstrate the capability of fault as a tool for writing portable tests for analog and mixed-signal verification, we used OpenRAM to generate a 16x16 SRAM and then ran a randomized readback test of the design with SPICE, Verilog-AMS, and SystemVerilog simulators. OpenRAM  is an open-source memory compiler that produces a SPICE netlist and Verilog model.
The results shown in Figure (a)a reveal two interesting trends. First, as expected, SPICE simulations of the array were significantly slower than Verilog simulations (100-1000x). Since fault allows the user to prototype tests with fast Verilog simulations, and then seamlessly switch to SPICE for signoff verification, our tool may reduce the latency in developing mixed-signal tests by orders of magnitude. Second, even for simulations of the same type, there was significant variation in the runtime of different simulators. SPICE simulation time varied by about 2x, while Verilog simulation time varied by about 10x. One of the advantages of using fault is that it is easy to switch between simulators to find the one that works best for a particular scenario.
We also looked at the amount of human effort required to use fault to implement this test as compared to the traditional approach of writing separate testbenches for each simulation language. Since “human effort” is subjective, we used lines of code as a rough metric, as measured from handwritten implementations of the same test in SystemVerilog, Verilog-AMS, and SPICE. Figure (b)b shows the results of this experiment: the fault-based approach used 136 LoC as compared to 412 LoC for the traditional approach, a reduction of 3.02x.
3.0.3 CGRA Integration Test Bench
To observe how fault scales to more complex testing tasks, we report numbers for an integration test of the Stanford Garnet CGRA . This test generates an instance of the CGRA chip, runs a simulation that programs the chip for an image processing application, streams the input image data onto the chip, and streams the output image data to a file. The output is compared to a reference software model. Running the test took 232 minutes with the verilator target, 185 minutes with commercial simulator 1, and 221 minutes with commercial simulator 2. Leveraging the portability of fault-based tests could save up to 47 minutes in testing time. These results were collected using the same machine as the SRAM experiment (see Figure (a)a).
3.0.4 Unified Constrained Random and Formal
To demonstrate the utility of the assume/guarantee interface as a unified abstraction for constrained random and formal verification, we compared the runtime performance of using a constrained random target versus a formal model checker to verify the simple ALU property shown in Section 2.1.4
. The first test evaluated the runtime performance of verifying correctness of the property on 100 constrained random inputs versus using a formal model checker. The formal model checker provided a complete proof of correctness using interpolation-based model checking in 1.613 s, while constrained random verified 100 samples in 2.269 s (rejection sampling) and 2.799 s (SMT sampling). The second test injected a bug into the ALU by swapping the opcodes for addition and subtraction. The model checker found a counterexample in 1.154 s with bounded model checking , while constrained random failed in 2.947 s (rejection sampling) and 1.230 s (SMT sampling). In both cases the model checker was at least as fast as the constrained random equivalent while providing better coverage in the case of no bug. These results were collected using a MacBook Pro (13-in 2017, 4 Thunderbolt, macOS 10.15.2), with a 3.5 GHz Dual-Core Intel i7 CPU, and 16 GB RAM.
4 Related Work
Prior work has leveraged using a generic API to Verilog simulators to build portability into testing infrastructures. The ChiselTest library  and cocotb  provide this capability for Scala and Python respectively. Using a generic API offers many of the same advantages with regards to test portability, simplicity, and automation, but the lack of multi-stage execution limits the application to more diverse backend targets such as SPICE simulations and formal model checkers. However, because these libraries interact with the simulator directly, they do allow user code to immediately respond to the simulator state, enabling interactive debugging through the host language. cocotb also presents a coroutine abstraction that naturally models the concurrency found in hardware simulation. Future work could investigate using cocotb as a runtime target for fault’s frontend, enabling a similar concurrent, interactive style of testing. Another interesting avenue of work would be to extend fault’s backend targets to support lowering cocotb’s coroutine abstraction.
The ethos of fault is to enable the construction of flexible, portable test components that are simple to integrate and scale for testing complex applications. The ability to metaprogram test components is essential for enabling verification teams to match the productivity of design teams using generators. fault’s portability enables teams to easily transition to different tools for different use cases, and enables the proliferation of reusable verification libraries that are applicable in a diverse set of tooling environments.
While fault has already demonstrated utility to design teams in academia and industry, there remains a bright future filled with opportunity to improve the system. Extending the assume/guarantee interface to support temporal properties/constraints and leverage compositional reasoning  is essential for scaling the approach to more complex systems. Adding concurrent programming abstractions such as coroutines are essential for capturing the common patterns used in the testing of parallel hardware. Using a deep embedding architecture could significantly improve the performance of generating fault test benches.
The authors would like to thank the DARPA DSSoC (FA8650-18-2-7861) and POSH (FA8650-18-2-7854) programs, the Stanford AHA and SystemX affiliates, Intel’s Agile ISTC, the Hertz Foundation Fellowship, and the Stanford Graduate Fellowship for supporting this work.
-  (2012-06) Chisel: constructing hardware in a scala embedded language. In DAC Design Automation Conference 2012, Vol. , pp. 1212–1221. External Links: Cited by: §1.
-  (2016) The Satisfiability Modulo Theories Library (SMT-LIB). Note: www.SMT-LIB.org Cited by: §2.3.2.
-  (1999) Symbolic model checking without bdds. In Tools and Algorithms for the Construction and Analysis of Systems, W. R. Cleaveland (Ed.), Berlin, Heidelberg, pp. 193–207. External Links: Cited by: §3.0.4.
-  (1992) Experience with embedding hardware description languages in hol. In Proceedings of the IFIP TC10/WG 10.2 International Conference on Theorem Provers in Circuit Design: Theory, Practice and Experience, NLD, pp. 129–156. External Links: Cited by: §2.1.
-  (1989) Compositional model checking. In LICS, pp. 353–362. Cited by: §5.
-  (2019) Cocotb. GitHub. Note: https://github.com/cocotb/cocotb Cited by: §4.
-  (2017) CoreIR: A simple LLVM-style hardware compiler. GitHub. Note: https://github.com/rdaly525/coreir Cited by: §2.3.2.
-  (2018) SMTSampler: efficient stimulus generation from complex smt constraints. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. Cited by: §2.3.1.
-  (2016-11) OpenRAM: an open-source memory compiler. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Vol. , pp. 1–6. External Links: Cited by: §3.0.2.
-  (2019-01) A new golden age for computer architecture. Commun. ACM 62 (2), pp. 48–60. External Links: Cited by: §1.
-  B. S. Lerner, R. Bodík, and S. Krishnamurthi (Eds.) (2019) 3rd summit on advances in programming languages, SNAPL 2019, may 16-17, 2019, providence, ri, USA. LIPIcs, Vol. 136, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. External Links: Cited by: 21.
-  (2018) Experiences building edge tpu with chisel. In 2018 Chisel Community Conference (CCC), Cited by: §1.
-  (2018-10) CoSA: integrated verification for agile hardware design. In 2018 Formal Methods in Computer Aided Design (FMCAD), Vol. , pp. 1–5. External Links: Cited by: §2.3.2.
-  (2003) Interpolation and sat-based model checking. In Computer Aided Verification, W. A. Hunt and F. Somenzi (Eds.), Berlin, Heidelberg, pp. 1–13. External Links: Cited by: §3.0.4.
-  (2019) Magma. GitHub. Note: https://github.com/phanrahan/magma Cited by: §1, §2.
-  (2010-11) Rethinking digital design: why design must change. IEEE Micro 30 (6), pp. 9–24. External Links: Cited by: §1.
-  (2004) Verilator and systemperl. In North American SystemC Users’ Group, Design Automation Conference, Cited by: §2.3.1.
-  (2019) GarnetFlow. GitHub. Note: https://github.com/StanfordAHA/GarnetFlow Cited by: §3.0.3.
-  (2019) Lassen. GitHub. Note: https://github.com/StanfordAHA/lassen Cited by: §3.0.1.
-  (2000) MetaML and multi-stage programming with explicit annotations. Theoretical computer science 248 (1-2), pp. 211–242. Cited by: §2.
-  (2019) A golden age of hardware description languages: applying programming language techniques to improve design productivity. See 3rd summit on advances in programming languages, SNAPL 2019, may 16-17, 2019, providence, ri, USA, Lerner et al., pp. 7:1–7:21. External Links: Cited by: §1.
-  (2019) Chisel-testers2. GitHub. Note: https://github.com/ucb-bar/chisel-testers2 Cited by: §4.
-  (2006) Icarus verilog. Note: http://iverilog.icarus.com Cited by: §2.3.1.