Complete Test of Synthesised Safety Supervisors for Robots and Autonomous Systems

by   Mario Gleirscher, et al.
Universität Bremen

Verified controller synthesis uses world models that comprise all potential behaviours of humans, robots, further equipment, and the controller to be synthesised. A world model enables quantitative risk assessment, for example, by stochastic model checking. Such a model describes a range of controller behaviours some of which – when implemented correctly – guarantee that the overall risk in the actual world is acceptable, provided that the stochastic assumptions have been made to the safe side. Synthesis then selects an acceptable-risk controller behaviour. However, because of crossing abstraction, formalism, and tool boundaries, verified synthesis for robots and autonomous systems has to be accompanied by rigorous testing. In general, standards and regulations for safety-critical systems require testing as a key element to obtain certification credit before entry into service. This work-in-progress paper presents an approach to the complete testing of synthesised supervisory controllers that enforce safety properties in domains such as human-robot collaboration and autonomous driving. Controller code is generated from the selected controller behaviour. The code generator, however, is hard, if not infeasible, to verify in a formal and comprehensive way. Instead, utilising testing, an abstract test reference is generated, a symbolic finite state machine with simpler semantics than code semantics. From this reference, a complete test suite is derived and applied to demonstrate the observational equivalence between the synthesised abstract test reference and the generated concrete controller code running on a control system platform.


page 1

page 2

page 3

page 4


Sound Development of Safety Supervisors

Safety supervisors are controllers enforcing safety properties by keepin...

Verified Synthesis of Optimal Safety Controllers for Human-Robot Collaboration

We present a tool-supported approach for the synthesis, verification and...

YAP: Tool Support for Deriving Safety Controllers from Hazard Analysis and Risk Assessments

Safety controllers are system or software components responsible for han...

Safety Controller Synthesis for Collaborative Robots

In human-robot collaboration (HRC), software-based automatic safety cont...

Complete Agent-driven Model-based System Testing for Autonomous Systems

In this position paper, a novel approach to testing complex autonomous t...

Safety-Critical Online Control with Adversarial Disturbances

This paper studies the control of safety-critical dynamical systems in t...

Blackbox End-to-End Verification of Ground Robot Safety and Liveness

We formally prove end-to-end correctness of a ground robot implemented i...

1 Introduction

In verified controller synthesis, world models are used that comprise all potential behaviours of humans, robots, further equipment, and the controller to be synthesised. A world model enables quantitative risk assessment, for example, by stochastic model checking. Such a model describes a range of controller behaviours some of which—when implemented correctly—guarantee that the overall risk in the actual world is acceptable, provided that the stochastic assumptions have been made to the safe side. The objective of the synthesis step is to select a controller behaviour from this range that meets requirements given as constraints, for example, to stay within an acceptable risk bound. Within such constraints, the synthesis can optimise further objectives, for example, maximal performance or minimal cost and risk. Because of crossing the boundaries between different abstractions, formalisms, and tools, verified controller synthesis for safety-critical systems naturally has to be accompanied by rigorous testing. Indeed, standards and regulations for safety-critical systems (e.g. [ISOTS15066, ISO26262, DO178C, DO330]) require testing as a key element to obtain certification credit before entry into service. Hence, a key methodological aim is to bridge the gap between verified controller synthesis and the generation of executable code that is being deployed on a control system platform and integrated into the wider system to be put into service.


Figure 1: Workflow and artefacts of the proposed tool-supported approach to complete testing

Following this aim, we propose an integrated formal approach to the complete testing of synthesised supervisory discrete-event controllers that enforce safety properties in domains such as human-robot collaboration and autonomous driving. Our tool-supported approach works as follows.

1. Controller Synthesis.

The verified synthesis step is based on policy synthesis for Markov decision processes

[Kwiatkowska2011-PRISM4Verification, Kwiatkowska2007-StochasticModelChecking]. A conceptual world model is constructed that defines all the behaviours of all the relevant actors (e.g. humans, robots, other equipment) and the controller under consideration. The range of controller behaviours are denoted as the controller design space . Then, the relevant temporal logic properties are formally verified of and an appropriate (optimal) controller behaviour is selected (synthesised) from . For this step, we adopt the approach described in [Gleirscher2021-VerifiedSynthesisSafety, Gleirscher2020-SafetyControllerSynthesis].

2. Abstraction. Then, the selected (verified) controller behaviour is abstracted into a test reference model . This model is described as a symbolic finite state machine (SFSM) [DBLP:conf/icst/Petrenko16], where the control states are called risk states. Symbols correspond to subsets of ’s state space. The input alphabet corresponds to the events monitored (observed) by the controller, the output alphabet to the signals that the controller can issue to as the controlled process. An event is triggered by a guard condition, whose input valuation changes from false to true, so that a transition labelled with (or fulfilling) this guard can be taken. Transitions of are labelled with such input/output (I/O) pairs and derived from .

3. Code Generation. is also translated into a software component executable on the control system platform of a robotic or autonomous system. Following an embedded systems tradition, we use C/C++ as the target language for , making the reasonable assumption that the used type of FSMs has a simpler semantics than the executable code. Abstraction and code generation are explained in Sect. 3.

4. Test Suite Derivation. Using the H-Method [DBLP:conf/forte/DorofeevaEY05], in this step, a complete test suite for I/O conformance testing is derived for a finite state machine (FSM) abstraction of . This abstraction maps the SFSM guard conditions to atomic input labels; otherwise it adopts the SFSM structure without changes. It has been shown in [peleska_sttt_2014, Huang2017] that complete FSM test suites can be mapped to likewise complete suites on SFSMs, when the FSM input events are considered as input equivalence classes of the SFSM, and each is refined to a concrete SFSM input data tuple solving the equivalence class constraint (this is just a refined guard condition).

5. Conformance Test. Based on a generated test harness emulating the target platform, the test suite is run against to record outputs and obtain a complete set of verdicts . A complete pass shown by the verdicts demonstrates the observational equivalence between the test reference and the controller code . Test suite derivation and conformance test execution are explained in Sect. 4. There, it is also explained how potential errors in the reference model , the test suite generator, or the test harness can be uncovered. This is required according to standards for safety-critical control applications (see, e.g. [DO178C, DO330]), because faulty tool chains might mask “real” errors in the software under test.

Related Work.

In the rich body of literature on verified controller synthesis, the approaches in [Orlandini2013-ControllerSynthesisSafety, Bersani2020-PuRSUEspecificationrobotic] from collaborative robotics are perhaps closest to the one presented here as they include a platform deployment stage. While these authors focus on the synthesis of complete robot controllers, our approach focuses on safety supervisors but includes a testing step reassuring the correctness of platform code generation. The authors of [Villani2019-Integratingmodelchecking] propose a general integration of quantitative model checking (with Uppaal [Behrmann2004-TutorialUppaal]) with model-based conformance testing and fault injection. Apart from using the switch cover method for test suite generation, their approach is highly similar to our Mealy-type test reference generation, conformance testing, and mutation approach for test suite evaluation. However, while their focus is more on cross-validation of Uppaal and FSM models, we concentrate on code robustness tests, assuming that has been validated and verified beforehand.

The investigation of complete testing methods is a very active research field [Petrenko:2012:MTS:2347096.2347101]. The H-Method [DBLP:conf/forte/DorofeevaEY05] applied for testing in this paper has been selected because (1) it produces far less test cases than the “classical” W-Method [chow:wmethod], but (2) it is still very intuitive with regard to the test case selection principles. This facilitates the qualification of the test case generator, as discussed in Section 4. If the main objective of a testing campaign was just to provide complete suites with a minimal number of test cases, then the SPYH-Method [DBLP:conf/icst/SouchaB18] should be preferred to the H-Method.

Whereas hazard- or failure-oriented testing [Gleirscher2011-HazardbasedSelection, Lesage2021-SASSISafetyAnalysis] and requirements falsification based on negative scenarios [Uchitel2002-Negativescenariosimplied, Gleirscher2014-BehavioralSafetyTechnical, Stenkova2019-GenericNegativeScenarios] are highly useful if no complete is available or if still needs to be validated and revised, our approach is complete once is successfully validated. That is, any deviation from detectable by these techniques is also uncovered by at least one test case generated by our approach. Moreover, our approach is usable to test controller robustness without a realistic simulator for .


We propose a solution to the generation of well-defined test references used in techniques such as the H-Method [DBLP:conf/forte/DorofeevaEY05]. In particular, we connect test reference generation with the H-Method to derive complete test suites and demonstrate that this form of robustness testing yields a correctness proof of a controller under certain assumptions. We provide tool support for both these steps. Our proof of concept indicates that complete test suites are a feasible and practically attractive means to verify correctness of implementations of the considered class of discrete-event control modules. In Sect. 2, we explain the safety supervisor concept by means of an example. In Sects. 4 and 3, we explain code and test reference generation and test suite derivation. We add concluding remarks in Sect. 5.

2 The Safety Supervisor Concept with an Illustrative Example

To illustrate our approach, we reuse our example from the domain of human-robot collaboration in industrial manufacturing as discussed in [Gleirscher2020-SafetyControllerSynthesis, Gleirscher2021-VerifiedSynthesisSafety]. In this example, a human operator collaborates with a robot on a welding and assembly task in a work cell equipped with a spot welder. This setting involves several actors performing potentially dangerous actions (e.g. robot arm movements, welding steps) and, thus, implies the reaching of hazardous states (e.g. operator near the active spot welder, , or operator and robot on the work bench, ). Such states need to be either avoided or reacted to in order to prevent accidents from happening or at least to reduce the likelihood of such undesired events.

This task of risk mitigation is, by design, put under the responsibility of a supervisory discrete-event controller

. This controller is supposed to enforce probabilistic safety properties of the kind “the probability of an accident

is less than ” or “hazard happens less likely than ”. The underlying conceptual controller behaviour comprises (i) the detection of critical events, (ii) the performance of corresponding mitigation actions to react to such events and reach a safe risk state, and (iii) , avoiding a paused task or degraded task performance, the execution of resumption actions to resolve the event and to return to a safe but productive risk state. For the sake of brevity, we call a safety supervisor.

3 Derivation of the Software Module and the Test Reference

We summarise [Gleirscher2021-VerifiedSynthesisSafety] on how to obtain the world model , the controller design space , and the controller behaviour . Then, we describe in more detail the generation of the controller software component (for deployment) and the abstraction into the test reference  (for test suite derivation, see Sect. 4).

Figure 2: Interface between and

The world model is a Markov decision process (MDP), the result of a fixed-point application of actions given as probabilistic guarded commands to an initial state of  [Kwiatkowska2011-PRISM4Verification]. MDPs are models containing uncertainties about aspects not under control (or agency) or not to be modelled explicitly. The world state space is defined using a set of finite-sorted variables. The MDP is a labelled transition system where the transition relation encodes non-deterministic and probabilistic choice in a compound manner and states are labelled with atomic propositions holding of ’s valuations defining the states. Non-deterministic decisions encode freedom of choice of the actors in , in particular, the controller design space . This freedom can be resolved by picking an appropriate policy

, a choice resolution for each state, with the result of obtaining a Markov chain (MC), a labelled transition system without indeterminacy in the controller (and the other considered actors). Policy appropriateness can be thought of as sub-setting

and is defined by constraints to be specified in probabilistic computation tree logic [Kwiatkowska2011-PRISM4Verification]. The resulting MC is verified against these constraints and includes the selected behaviour .

Figure 3: Phase transitions of factor

Now, has to be translated into the two forms and . For this step, we define the variables to be monitored and the variables that can be controlled, resulting in what we call the syntactic interface (alphabet) 111 of a set of sorted variables returns the set of tuples in the Cartesian product of the sorts of the variables in . of  (see Figure 2) [Broy2010-LogicalBasisComponent]. This interface defines the nature of the changes in that any can observe and perform.

The control states of both and are derived from the notion of risk states [Gleirscher2021-RiskStructuresDesign], which is defined over a set of -sorted variables modelling the critical events considered in  as risk factors. We require . The sort  models life-cycle stages for handling a factor  (Figure 3), for example, from inactive (), active (a), and mitigated (m) back to inactive [Gleirscher2021-RiskStructuresDesign]. In the example in Sect. 2, we consider three factors, hence . Each can then be associated with a control state space .

We then translate the controller fragment of the MC transition relation (resulting from policy synthesis over ) into C++ code. Basically, every transition of is translated into a guarded action with and action name derived from , , and . For that, the source state of each transition is mapped into two parts: one corresponding to the input  (the observed event) and one corresponding to a risk state . The control and state updates and to be associated with this action are derived from the difference in the controlled variables between source and target states. implements Algorithm 1, intentionally simple (not using action names) and wrapped into platform-specific code (not shown) for data processing and communication.

Ctrin Event, out Signal
1: init init. control/risk state true should implement
2: step
Algorithm 1 Safety supervisor

In order to obtain , we then translate the -fragment of this transition relation into a deterministic Mealy-type FSM with the state space , the transition relation , and the initial state being congruent with the one in . Figure 4 shows for the example in Sect. 2 and is an operation refinement of the composition of the factors (Figure 3) of . The translations into and including the generation of the test harness are carried through with the Yap tool.222The discussed features are available in Yap version 0.8+, see

Figure 4: Visual representation of . Nodes are risk states in , edges transitions in . Because valuation expressions for are too long, edge labels are hashed and prefixed with action names for readability, following the pattern :/ with action name and an integer-valued hash function .

Regarding the difference of and , the semantics of can vary significantly. is converted into an input format suitable for test suite generation via libfsmtest [libfsmtest]. Here, we consider a C++ component for a low-level real-time implementation, for example, an FPGA synthesised from VHDL or Verilog HDL333Field-programmable gate array (FPGA); VHSIC or Verilog hardware description language (VHDL or Verilog HDL) generated from C++. In [Gleirscher2021-VerifiedSynthesisSafety], we consider a C# component used in a simulation of in a “Robot Operating System”-enabled digital twinning environment. While the semantics of the C++ and C# implementations and may dramatically differ, can be shared between the two. The only difference on the testing side is in the mappings used in the test harness (Sect. 4) to deliver the inputs to and record the outputs of .

4 Complete Test Strategy

In this section, we briefly describe the characteristics of complete test suites, sketch their derivation, outline the chosen recipe for test suite derivation (Sect. 4.1), and discuss typical error possibilities to be taken into account during standards-oriented controller and tool certification (Sect. 4.2).

4.1 Strategy Application

A test suite is complete if—under certain hypotheses—it guarantees that non-conforming implementations will fail in at least one test case, while equivalent implementations will always pass the suite [Huang2017]. Here, these hypotheses are (a) the number of control states implemented in

, and (b) assumptions about potential mutations of guards and output assignments. Since the verification of safety-critical systems requires code to be open source for analyses, these hypotheses can be checked using static analysis. We check that

has the same number of control states as , and it is checked that the guard conditions in have been correctly translated to corresponding branching conditions in  (cf. creftypecap 2 in Algorithm 1).

Since the reference is modelled as an SFSM, we need a method to construct complete test suites for SFSMs. We follow the recipe from [peleska_sttt_2014, Huang2017] which allows us to use test generation strategies for the simpler class of FSMs and translate the resulting test suite to an SFSM suite as follows: (1) For the SFSM, input equivalence classes are calculated. This is performed by creating all conjuncts of positive and negated SFSM guard conditions which have at least one solution. (2) An FSM is created as an abstraction of . The input alphabet of this FSM consists of the identifiers for the input equivalence classes calculated in (1). Control states, output events, and transition arrows are directly adopted from the SFSM. (3) For this FSM, a complete test suite is created using the H-Method [DBLP:conf/forte/DorofeevaEY05]. Its test cases consist of input traces, where each input is an identifier of an SFSM input equivalence class. The expected results are obtained by running this input trace against the FSM. (4) The FSM test suite is refined to an SFSM test suite by calculating concrete input representatives from the constrains specifying the referenced input classes. (5) The concrete SFSM test suite is executed in a test harness: this is an executable running the test cases one by one against and checking its responses against the FSM test oracle.

The theory elaborated in [peleska_sttt_2014, Huang2017] confirms that the concrete SFSM test suite is also complete, if this holds for the abstract FSM test suite. Since we know that has the same number of control states and the same guards as , passing the test suite is equivalent to proving observational equivalence between and . For tool support, the libfsmtest library [libfsmtest] is used which provides an implementation of the H-Method and a template for the test harness.

4.2 Verification of Verification Results

For automated verification/testing of safety-critical system components, applicable standards require a verification that the tool chain involved does not mask any errors inside . This process is usually called verification of the verification results. We consider the possible errors in the testing environment one by one. (1) Error in the generation of : The complete test suite created as described above characterises up to observational equivalence. By checking if the test suite is compatible with the computations of , it is shown that is correct. (2) Error in the testing theory: It has been shown in [DBLP:conf/pts/SachtlebenHH019] that methods of similar complexity as the H-Method can be mechanically verified using a proof assistant (e.g. Isabelle/HOL). (3) H-Method implementation error: Here, we have two options: in [DBLP:conf/pts/Sachtleben20] it has been demonstrated that correct algorithms can be generated while proving a testing theory to be correct. Alternatively, the generated test suite can be checked automatically for completeness: from the specification of the test cases required for the H-Method given in [DBLP:journals/sqj/HuangOP19], a checking tool can be derived which verifies that the generated suite really contains the test cases required according to the theory. This checking algorithm would be orthogonal to the test generation algorithm. This means that it is highly unlikely that test generator and completeness checker could contain complementary errors masking each other out. (4) Test harness error: The test harness could execute the suite in a faulty way that masks an error in . To ensure that this is not the case, we apply mutation testing. Using the clang compiler functions for static code analysis, a set of mutants of is created in an automated way. Then it is checked for each mutant in if it is uncovered by the test suite, or if it is semantically equivalent to the original version of .

5 Conclusions

We outlined an approach to the complete testing of synthesised discrete-event controllers that enforce safety properties in applications such as human-robot collaboration and autonomous driving. Our aim is to bridge the gap between verified controller synthesis and certified deployment of executable code. We illustrate our approach with a human-robot collaboration example where a safety supervisor makes autonomous decisions on when and how to mitigate hazards and resume normal operation. We check the specificity of the test reference and the strength of the corresponding test suite by mutation of the generated code , modulo semantic equivalence over . We contribute a preliminary synthesis-based test strategy that allows one to show total correctness of under certain implementation assumptions. The presented approach is automated in a tool chain: Yap and a stochastic model checker (e.g. PRISM [Kwiatkowska2011-PRISM4Verification]) for MDP generation and verification, Yap for test reference and code generation, and libfsmtest [libfsmtest] for test suite derivation. For testing the integrated system (robot, welding machine, safety supervisor and simulation of human interactions), the approach presented here is embedded into a more general methodology for verification and validation of robots and autonomous systems, starting at the module level considered here, and ending at the level of the integrated overall system [eder_kerstin_2021_5203111].