Sound Development of Safety Supervisors

by   Mario Gleirscher, et al.
Universität Bremen

Safety supervisors are controllers enforcing safety properties by keeping a system in (or returning it to) a safe state. The development of such high-integrity components can benefit from a rigorous workflow integrating formal design and verification. In this paper, we present a workflow for the sound development of safety supervisors combining the best of two worlds, verified synthesis and complete testing. Synthesis allows one to focus on problem specification and model validation. Testing compensates for the crossing of abstraction, formalism, and tool boundaries and is a key element to obtain certification credit before entry into service. We establish soundness of our workflow through a rigorous argument. Our approach is tool-supported, aims at modern autonomous systems, and is illustrated with a collaborative robotics example.


page 1

page 2

page 3

page 4


Complete Test of Synthesised Safety Supervisors for Robots and Autonomous Systems

Verified controller synthesis uses world models that comprise all potent...

YAP: Tool Support for Deriving Safety Controllers from Hazard Analysis and Risk Assessments

Safety controllers are system or software components responsible for han...

A Survey of Algorithms for Black-Box Safety Validation

Autonomous and semi-autonomous systems for safety-critical applications ...

Safety Controller Synthesis for Collaborative Robots

In human-robot collaboration (HRC), software-based automatic safety cont...

Discrete-Event Controller Synthesis for Autonomous Systems with Deep-Learning Perception Components

We present DEEPDECS, a new method for the synthesis of correct-by-constr...

Incremental Database Design using UML-B and Event-B

Correct operation of many critical systems is dependent on the data cons...

Some Complexity Results for Stateful Network Verification

In modern networks, forwarding of packets often depends on the history o...

1 Introduction

Safety supervisors (supervisors for short) are discrete-event controllers that enforce probabilistic safety properties in modern autonomous systems such as human-robot collaboration and autonomous driving. Supervisor development benefits from design and test automation. Because supervisors are high-integrity components, their automation needs to be rigorously assured. This rigor suggests verified synthesis for design automation and model-based conformance test for test automation. Indeed, standards and regulations for safety-critical systems (e.g., [ISOTS15066, ISO26262, DO178C, DO330]) require testing as a key element to obtain certification credit before entry into service. To get the best of the two worlds—synthesis and test—a crossing of the boundaries between different abstractions, formalisms, and tools is inevitable. This involves bridging the gap between synthesis, the derivation of test suites from a synthesized supervisor reference, the generation of executable code, its test and deployment on a control system platform, and its integration into the wider system to be put into service.

Figure 1: Example of a safety supervisor. Nodes and edges denote states () and transitions () and edge labels signify input/output expressions ().

To resolve this challenge, we propose a rigorous workflow for the sound development, in particular, the synthesis and complete test, of safety supervisors. Prior to explaining our workflow, we illustrate safety supervision by example of an operator collaborating with a robot on a welding and assembly task in a workcell with a spot welder [Gleirscher2020-SafetyControllerSynthesis, Gleirscher2021-VerifiedSynthesisSafety]. These actors perform dangerous actions (e.g., robot movements, welding steps) possibly reaching hazardous states (e.g., operator near the active spot welder, ; operator and robot on the workbench, ). To reduce accident likelihood, such states (Figure 1) need to be reacted to. These reactions are under the responsibility of a supervisor (Figure 1) enforcing probabilistic safety properties of the workcell, such as “accident

is less likely than probability

” or “hazard occurs less likely than ”. The supervisor’s behavior comprises (i) the detection of critical events (e.g., si_HSact), (ii) the performance of mitigation actions (e.g., si_stoppedfun) to react to such events and reach a safe state (e.g., ), and (iii) avoiding a paused task or degraded task performance, the execution of resumption actions (e.g., siHSressafmod) to resolve the event and to return to a safe but productive state (here, ).

Extending our previous work [Gleirscher2021-CompleteTestingSafety], we propose the derivation of symbolic finite state machines (SFSM) used as test references for model-based testing with complete methods. The resulting test suites allow for a proof of conformance (i.e., observational equivalence) between the generated supervisor code and the reference. The hypotheses to be fulfilled to guarantee test suite completeness can be checked by simple static analyses of the supervisor code. We provide tool support for both these steps, explain the tool qualification obligations, and present a rigorous argument by applying Hoare logic on the workflow to obtain certification credit for the supervisor, on the basis of the development, verification, and validation process, and the artifacts produced in each workflow stage.

Sect. 2 summarizes the proposed workflow. Sect. 3 details each workflow stage. In Sect. 4, we argue for the soundness of our approach, make our assumptions explicit, and show how we reduce several error possibilities in our workflow. Sect. 5 summarizes related work. We add concluding remarks in Sect. 6.

2 Overview of the Workflow

Figure 2: Stages and artifacts of the proposed supervisor development workflow

Figure 2 shows our workflow. In , we construct a stochastic world model  describing the behaviors of all relevant actors (e.g., humans, robots, equipment) and the supervisor. includes a range of controller behaviors some of which—when implemented correctly—guarantee that the risk in the actual world is acceptable, provided that the stochastic assumptions have been made to the safe side. This range is denoted as the supervisor design space . adopts our work [Gleirscher2021-VerifiedSynthesisSafety, Gleirscher2020-SafetyControllerSynthesis]

based on policy synthesis for Markov decision processes (MDPs)

[Kwiatkowska2011-PRISM4Verification, Kwiatkowska2007-StochasticModelChecking] and selects controller behaviors from that meet requirements  (see example in Figure 1) specified in probabilistic computation tree logic (PCTL; [Kwiatkowska2011-PRISM4Verification]) and verified of  (and ). While maintaining the constraint , the synthesis procedure applies objectives (e.g., maximum performance, minimum cost) and selects an optimal supervisor behavior .

In , is transformed into a test reference , an SFSM [DBLP:conf/icst/Petrenko16] whose control states are called risk states and transitions are labeled with input/output (I/O) pairs (e.g., Figure 1). The input alphabet of  specifies the events (i.e., state changes in ) observed and the output alphabet the signals issued to by the supervisor. Input labels model guard conditions that, when holding true of a world state, enable or trigger their transition (e.g., siHSressafmod in Figure 1).

In , is translated into a software component executed by the control system of a robotic or autonomous system. Following an embedded systems tradition, our example uses C++ as the target language for . Moreover, we assume that SFSMs have a simpler semantics than the executable code.

In , a complete test suite for checking against is generated, using a general input equivalence class testing strategy for SFSMs [peleskasttt2014]. The term complete means that—provided certain hypotheses hold— will (i) accept every whose behavior is represented by an SFSM which is observationally equivalent to , and (ii) reject every non-equivalent implementation. It is shown below that these hypotheses are fulfilled by , so that passing corresponds to a proof of observational equivalence.

In , is run against to record inputs, associated outputs, and the verdicts obtained for each test case in . A generated test harness TH, emulating the target platform, acts as the test oracle by comparing the I/O traces observed by the wrapper TW during execution of to the traces expected according to .

3 Workflow Stages

This section details the five workflow stages outlined in Sect. 2 and Figure 2.

3.1 : Risk-informed Supervisor Synthesis

This section summarizes [Gleirscher2021-VerifiedSynthesisSafety] for the construction of and selection of the supervisor behavior  from . Below, , , , and signify the sequential composition of the behaviors and , non-deterministic choice between and , non-deterministic but finite repetition of , and the removal of from , respectively. Moreover, denotes that refines while also preserving progress, and that and are observationally equivalent.

First, we specify as a probabilistic program alternating between an environment  (e.g., robot, operator) and a supervisor design space . Given the set of all probabilistic commands of  (e.g., operator or robot moves, welding steps, Figure 1), we require and to be refinements of the command bundles and . With , the world without the supervisor can be generated by .

Let a set of finite-sorted variables, the sort of , and the universe . A state is a valuation function with . Map restriction restricts a (set of) state(s) from to . Let be the set of all states. Given for , by inductively applying to an initial state , we obtain an MDP and the set of states reachable by .111Below, we mostly abbreviate to when referring to the transition relation of a program executed from initial state . So, actually means . is labeled with atomic propositions. is a labeled transition system whose transition relation encodes non-deterministic and probabilistic choice. MDPs can model uncertainties about uncontrolled or non-modeled aspects in . See, e.g., [Kwiatkowska2011-PRISM4Verification] for further details.

We model risk using a set of -sorted variables describing the critical events in  as risk factors [Gleirscher2021-RiskStructuresDesign]. A risk factor (Figure 2(a)) has at least three phases

as well as phase transitions in

modeling the life-cycle of handling , for example, from inactive () to active (a) to mitigated (m) and back to inactive. The example in Figure 1 uses three factors, . That way, induces a notion of risk state in . can then be associated with states , which represent the possible control states of any supervisor policy , test reference , and implementation .

Freedom of choice in and is resolved by refining indeterminacy in 

through uniformly distributed probabilistic choice and in

by deriving a policy, a choice resolution for each state in

where the supervisor is enabled and can make a decision. The result is a discrete-time Markov chain (DTMC)

, a labeled transition system without indeterminacy. Here, policy derivation involves both sub-setting  using PCTL constraints and Pareto-optimizing multiple objectives [Kwiatkowska2011-PRISM4Verification]. Any resulting optimal DTMC is thus verified against , establishing , and includes the selected behavior .

(a) Phases and transitions of factor
(b) Syntactic interface between and
Figure 3: Risk factor and supervisor interface

3.2 : Test Reference Generation

For the translation of into a test reference , we define the I/O alphabet with and for monitored and controlled variables , resulting in the syntactic interface [Broy2010-LogicalBasisComponent] of . This interface (Figure 2(b)) defines the nature of the changes in that any can observe and perform. We require but allow an overlap of and .

To obtain as an operation refinement of the parallel composition of all factors in , we translate the -fragment of the transition relation of the DTMC into a deterministic SFSM with states , the transition relation , and the initial state congruent with  (i.e., ), usually . Given a DTMC transition , the source state is mapped to input  (the observed event) and risk state . The control and state updates and are derived from the difference in the controlled and factor variables between and the target state . Figure 1 shows an example of . Finally, is provided in a format readable by libfsmtest 222Licensed according to MIT license Source code available under [libfsmtest] for test suite generation (Sect. 3.4).

3.3 : Code Generation

Independent of , the -fragment of the transition relation of is translated to an implementation (Figure 3(a)). Similar to the translation to (Sect. 3.2), every transition of is mapped333Note that the elements of and are used as propositions in guard conditions. to a guarded command with , and action name derived from , , and . is intentionally simple (e.g., flat branching structure) and wrapped into platform-specific code (not shown) for data processing and communication. Figure 3(b) depicts a fragment of for our running example.

(a) Pseudo code of a generic supervisor
(b) Fragment of the supervisor implementation for Figure 1
Figure 4: Supervisor pseudo code and implementation fragment

As opposed to , the representation of can vary significantly. For instance, in Figure 3(b), we consider a C++ component for a low-level real-time implementation. Alternatively, one may want to derive VHDL or Verilog HDL to synthesize an FPGA.444VHSIC or Verilog hardware description language (VHDL or Verilog HDL); field-programmable gate array (FPGA) In [Gleirscher2021-VerifiedSynthesisSafety], we consider a C# component used in a simulation of in a Robot Operating System-enabled digital twinning environment [Douthwaite2021-ModularDigitalTwinning]. can thus be shared between a varying , , and . The only difference on the testing side is in the I/O wrapper used in the test harness (Sect. 3.4) to deliver the inputs to and record the outputs of . The translations into and and the generation of the test wrapper are performed with the Yap tool.555Features and examples available in Yap 0.8+,

3.4 : Test Suite Generation

We avoid the costly verification of the (potentially changing) code generator used in . Instead, we generate a conformance test suite that, when passed, corresponds to a correctness proof of . The SFSM reference model  (Sect. 3.2) and the supervisor implementation (Sect. 3.3) allow us to apply a model-based conformance testing approach for verifying , the observational equivalence of and . Indeed, complete test methods enable us to prove that the system under test (SUT) conforms to a given reference model under certain hypotheses [PeleskaHuangLectureNotesMBT]. A complete test suite is derived from for as follows.

Step 1. Since is a deterministic SFSM which outputs a finite range of control signals, we can apply the equivalence class testing theory of Huang et al. [peleskasttt2014], which has been elaborated for a more general class of Kripke structures including SFSMs like . Moreover, the guard conditions in are even mutually exclusive (see the construction of in Sect. 3.2 and Sect. 4.4). Therefore, an input equivalence class corresponds to the set of input valuations satisfying a specific guard condition . We write if the guard condition evaluates to true, after having replaced all occurrences of input variables in by value . In any SFSM state, all members of an input class produce the same output values. We use the guard formula itself as an identifier of the associated input equivalence class .

Step 2. With identifiers as finite input alphabet and the finite set of possible outputs as output alphabet, is abstracted to a minimal, deterministic, finite state machine (FSM) (Figure 5 left) with I/O alphabet . Using as a reference model, the H-Method [DBLP:conf/forte/DorofeevaEY05] is applied to derive a complete FSM test suite . The completeness of is guaranteed provided that the true behavior of can be abstracted to a deterministic FSM with at most states, where is the number of states in . The generation of

is performed by means of the open source library

libfsmtest, which contains algorithms for model-based testing against FSMs [libfsmtest].

Step 3. Test suite is translated to an SFSM test suite by selecting an input valuation from each input equivalence class and replacing each test case (i.e., sequence of input class identifiers) by the sequences of valuation functions. It has been shown that is complete whenever is [peleskasttt2014].

3.5 : Conformance Testing

Figure 5: Commuting diagrams reflecting the observational equivalences (, ), pass relations (, ), and abstraction map ()

libfsmtest provides a generic test harness TH for executing test suites generated from FSMs against SUTs. For verifying , TH executes the complete FSM test suite . To this end, the harness uses a test wrapper TW for (i) refining each in a test case to a concrete input valuation of , (ii) calling to perform one control step, and (iii) abstracting -output valuations back to (Figure 5 right). Test harness, wrapper, and supervisor code are compiled and linked; this results in a test executable . The component of the test executable acts like an FSM over alphabets , . This FSM interface is used by the test harness to stimulate with inputs from and to check whether the outputs conform to the outputs expected according to reference FSM .

TW implements two bijections, for step (i) and for step (iii). Map satisfies , and fulfills . These mappings ensure that the diagrams in Figure 5 are both commutative, that is, the execution of against by TH results in the execution of against by .

Consider, for example, transition siHSressafmod in Figure 1. Function maps -input rloc=atWeldSpot&…&rngDet=far to -input valuation . The wrapper implements this simply by assignments rloc=atWeldSpot;…;rngDet=far;. If the corresponding -step produces any output valuation in , this is mapped by to -output safmod=&…&notif=.

4 Workflow Soundness

It remains to be shown that the stages to establish the chain of refinements


Through model checking of , our notion of preserves trustworthiness (i.e., , Sect. 3.1) comprising world safety and supervision progress. After having established through that is trustworthy, trustworthiness of and can be inferred from the two observational equivalences in (1). Furthermore, remains trustworthy as long as , implying that and are re-initialized congruently in . So, the main objective of proving workflow soundness is to establish that (1) holds.

Following safety-critical control standards (e.g., DO-178C/330 [DO178C, DO330], IEC 61508 [iec61508], ISO 10218 [ISO10218]), we argue for the soundness of our workflow (G) and the refinement chain (1). Our argument for G (top-level in Figure 6) aims at ruling out that a faulty , test theory, generator, or test harness mask errors inside . Arguing for G is known as verification of the verification results. By semi-formal weakest precondition () reasoning (J2), we establish a Hoare triple (J1) for each workflow stage (G1 to G5) and implication relationships (J3) between the post- and preconditions (Table 1) of these triples to establish G, the soundness of the sequential composition (Figure 2).

Figure 6: Overview of the workflow assurance case in goal structuring notation [GSN2011]

=0.2mm =0.2mm 1 Requirements are complete and world model is trustworthy. Chosen supervisor behavior can be trusted. 2 Supervisor behavior is trustworthy, deterministic, factor-complete, and (initial state)-compatible. Reference is deterministic, input-complete up to idle self-loops, -compatible, and accepted by . 3 and syntactic interface is defined. Generated code is compatible with the test harness and statically analyzable. 4 The complete testing theory is correct, the prerequisites for applying the selected test generation method are fulfilled, and the test suite validator is correct. The generated FSM test suite is complete for checking observational equivalence between and FSMs over the same alphabet with at most states, or will indicate an error. 5 , TW is correct, and TH is correct. passes test suite if and only if it is observationally equivalent to .

Table 1: Overview of pre- and post-conditions of the workflow stages

4.1 Assurance of : Supervisor Synthesis

For , we need to establish


: For the synthesis stage to yield a trustworthy result, we first identify a complete list of well-formedness properties and supervisor factor handling requirements () for the MDP . Completeness of relies on the completeness of , the latter on state-of-the-art hazard analysis and risk assessment. We are further provided that is a trustworthy world model () because we have achieved [Gleirscher2021-VerifiedSynthesisSafety, Gleirscher2020-SafetyControllerSynthesis] by probabilistic model checking. Note that errors in and can be iteratively identified by validating (e.g., completeness and vacuity checks) and re-checking .

: Here, our goal is to preserve trustworthiness in the supervisor behavior selected according to Sect. 3.1, namely, established by the policy synthesis facility of the model checker (). with contains only those properties that can be checked of DTMCs (). Recall that results from converting indeterminacy in into probabilistic choice, thus qualitatively preserving all behavior of in while results from fixing all non-deterministic choices in . will thus be deterministic and, because of being exposed to all behaviors of , be able to deal with all environments producible by from (or ). Given , we can now see, that the weakest precondition of under is implied by , formally,


By Proposition (2), we have now established Proposition (G1).

4.2 Assurance of : Reference Generation

For , we need to establish


: We require that (i) can be trusted, which is implied by . (ii) is expected to handle combinations of critical events, which is established by generating commands for each factor in . (iii) has to be deterministic, which is guaranteed by policy synthesis in . (iv) Supervisor behavior is a function of and , formally, is such that . Note that we have now established .

: (i) The equivalence () follows from the 1-to-1 translation of DTMC to SFSM transitions in  (Sect. 3.2). (ii) Factor handling completeness () and determinism () of are also maintained by this translation and . (iii) Congruence of and () follows from the definition of (Sect. 3.2), from being the initial state of and , and from the fact that cannot change . (iv) Yapproduces in the libfsmtest input format (). Note that can be trusted () but is not yet input-complete as self-loop transitions (i.e., idle actions) are both ignored and not introduced by ; this completion is done in .

We can now derive the weakest precondition of under as follows:


Because conjuncts (i-iv) of imply the derived precondition parts, we have


By obtaining Proposition (7), we have established Proposition (G2).

4.3 Assurance of : Code Generation

For , we need to establish


includes , established by according to Proposition (2), and the definition of the sets (implied by ), , and (implied by ).

: We require from to obtain an implementation that can be integrated with the test harness () and is statically analyzable (), allowing simple checks in using a custom lex/yacc parser or Unix text processing tools. For , Yap generates the I/O wrapper based on a libfsmtest template and the variable sets , , and . For , Yap produces C++ code adhering to the structure in Figure 3(a), assuring that no pre-processing directives are used and the branching structure is flat and simple, see Figure 3(b). Trivially, and the weakest precondition of under is:


Because the conjuncts of imply the derived precondition, we have


Having verified Proposition (9), we have established Proposition (G3).

4.4 Assurance of : Test Suite Generation

For , we need to establish


: As specified in Table 1, the precondition to be established for task consists of several main conjuncts.

requires the applied testing theory to be correct, when applied with input equivalence classes specified by guard conditions and with the H-Method for generating the intermediate FSM test suite from (Sect. 3.4). The correctness of the input equivalence class construction has been proven by Huang et al. [peleskasttt2014]. They have also shown that any complete FSM test generation method can be applied for calculating , and the resulting SFSM test suite will always be complete as well [Huang2017]. Finally, Dorofeeva et al. [DBLP:conf/forte/DorofeevaEY05] have proven the completeness of the H-Method. These facts ensure the validity of .

requires that the prerequisites for applying this test theory are fulfilled by the reference model and the implementation . The four prerequisites to be established are defined and ensured as follows. (i) It has to be shown that the reference model can be interpreted as a deterministic reactive I/O state transition system (); this is a variant of Kripke structures that is used as the semantic basis for models in Huang et al. [peleskasttt2014]. SFSMs represent special cases of reactive I/O state transition systems. The determinism of is ensured by . (ii) It has to be justified that the true behavior of can be interpreted as a deterministic reactive I/O state transition system (). This is easy to see, since the code generator creates with a structure where a main loop evaluates which guard condition can be applied and executes the corresponding transition changing the internal control state and setting finite-valued outputs. By static code analysis, it is shown that uses exactly the same guard conditions as . This can be performed with Unix core utilities like grep, sed, and sort666 Thus, is deterministic since is deterministic. (iii) Next, it has to be shown that the implementation does not possess more than control states (, where after minimization of ) (). The number of states in can be directly determined from a description file containing all such states. Again, a simple static analysis shows that the code uses the same control states, so . (iv) Finally, it has to be shown that each guard condition represents a single input equivalence class (). Indeed, this follows from the fact that all guards are mutually exclusive, which has already been established by .

: Instead of verifying the H-Method algorithm in libfsmtest, we check the generated test suite for consistency with the test suite specification of the H-Method. If the check fails, the algorithm can be fixed, and the suite can be created again. The check is automatically performed by a test suite validator . As explained in Sect. 4.6, this approach requires to qualify . Qualifying instead of the H-Method algorithm has the advantage that the former has a significantly simpler implementation than the latter. Also, any future optimizations of the test algorithm will not affect the tool qualification (TQ) of , because the checks performed on the generated test suites do not depend on the library, but only on the theory. Summarizing,


and the validity of , , , , , and has been ensured. The validity of is explained below. Therefore, the validity of precondition is ensured as well.

: The post-condition to be established is that the test suite generated by according to the steps described in Sect. 3.4 is complete for testing any FSM over alphabets with at most states against , provided that holds and does not indicate a generation error.

The weakest precondition ensuring under is

Prerequisites for applying testing theory are fulfilled (11)
is complete if indicates no error (12)



By Proposition (13), we have established Proposition (G4).

4.5 Assurance of : Conformance Testing

For , we need to establish


: The post-condition to be established (see Table 1) states that passes the test suite if and only if it is observationally equivalent to . As explained in Sect. 3.5, the test wrapper with embedded supervisor (denoted by ) implements an FSM over the same input and output alphabet as . The test executable runs test suite against ; this is equivalent to running against , provided that the wrapper implements the alphabet mappings correctly. Therefore, we require that is complete (), which has already been established by . The additional conjuncts of are tool-related: (i) The test wrapper TW is correct—this is ensured by two sub-conditions. : TWcorrectly implements as defined in Sect. 3.5. : TWcorrectly implements as defined in Sect. 3.5. (ii) The test harness TH is correct—this is ensured by three sub-conditions. : THskips no test cases from . : THneither skips nor adds nor changes inputs in a test case . : THassigns verdict PASS to execution of if and only if the I/O trace produced by for is in the language of .

Preconditions and are ensured by comprehensive TQ tests (as explained below in Sect. 4.6) qualifying the test wrapper. Precondition is ensured analogously for the checker component of the test harness TH. Preconditions and are ensured by artifact-based TQ (Sect. 4.6): the test cases actually executed and their associated test steps are documented in a test execution log that is compared to the test suite . The comparison is performed by an execution log validator . The logging component of TH and the log validator are qualified by means of comprehensive TQ tests.

: The test theory explained in Sect. 4.4 implies that is equivalent to the following alternative post-condition.

: passes the test suite if and only if it is observationally equivalent to .

Of course, and are only equivalent if the wrapper correctly implements the alphabet mappings and (Sect. 3.5). We know from that is complete, so it is a good candidate for checking observational equivalence. Additionally, however, it needs to be ensured that the test harness TH executes the test suite correctly and performs the correct checks of observed -reactions against reactions expected according to . Consequently, the weakest precondition of under is

TW implements correctly (15)
TW implements correctly (16)
TH skips no test cases of (17)
TH neither skips nor adds nor changes inputs in a test case (18)
Test case passes iff (19)

Because the conjuncts of are equivalent to these preconditions, we have


By Proposition (20), we have established Proposition (G5).

4.6 Tool Qualification

As described above, the soundness of the workflow establishing the refinement chain (1) depends on tools automating critical verification steps. Following the applicable standards for safety-critical systems development, this workflow requires tool qualification. For TQ-related considerations, we apply the avionic standard RTCA DO-178C with annex DO-330 [DO178C, DO330], because this is currently the most specific and strict standard, as far as TQ is concerned. Fulfilling the TQ requirements specified there implies compatibility with the requirements for support tools according to IEC 61508 [iec61508] and ISO 10218 [ISO10218].

Standards for the use of automation tools in development and verification (e.g., RTCA DO-178C) offer three options to ensure that the tool-produced artifacts (e.g., object code, test suites, test execution results) are correct. (1) If an artifact is not verified by any other means, the tool needs to be qualified. (2) If an artifact is verified manually by a systematic review or inspection, no TQ is required. (3) If an artifact is verified by an automated checker replacing the manual review/inspection procedure, then the checker needs to be qualified [peleska2012c]. We call this artifact-based TQ, since the tool is “re-qualified” every time it produces a new artifact (i.e., a new test suite in our case).

For either test automation tools or associated artifact checkers, the so-called tool qualification level TQL-4 specified in [DO178C, Section 12.2.2] applies: this level is intended for tools that may not produce errors in software to be deployed in the target system, but whose failures may prevent the detection of errors in target software. TQL-4 requires a documented development life cycle for the tool, a comprehensive requirements specification, and a verification that these requirements are fulfilled. Verification can be performed by reviews, analyses (including formal verification), and tests. Formal verification of the tool alone cannot replace SW/HW integration tests of the tool on the platform where it is deployed. For TQL-4 tools to be applied for the verification of target software of highest criticality (as discussed in this paper), the TQ tests need not only cover all requirements, but also the code with 100% MC/DC coverage [DO178C, p. 114].

We are aware that the workflow stages and are subjected to TQ as well. However, the qualification of model checkers and their results [Wagner2017-QualificationModelChecker] in is more complicated, part of ongoing research, and thus out of scope here.

5 Related Work and Discussion

The approaches in [Orlandini2013-ControllerSynthesisSafety, Bersani2020-PuRSUEspecificationrobotic] from collaborative robotics are perhaps closest to our workflow () as they include platform deployment (). While they focus on the synthesis of overall robot controllers, we focus on supervisors but with a testing stage reassuring implementation correctness. Villani et al. [Villani2019-Integratingmodelchecking] integrate quantitative model checking (with Uppaal [Behrmann2004-TutorialUppaal]) with conformance testing and fault injection. The authors advocate cross-validation of Uppaal and FSM models. Our approach differs from theirs in two ways: (i) SFSMs do not require cross-validation, since they are generated from a world model validated by model checking. (ii) We do not need fault injection for testing, since our complete test strategy corresponds to a formal code verification by model checking.

Research in complete testing methods is a very active field [Petrenko:2012:MTS:2347096.2347101]. We applied the H-Method [DBLP:conf/forte/DorofeevaEY05] in because (i) it produces far fewer test cases than the classical W-Method [chow:wmethod], but (ii) it allows for an intuitive test case selection, facilitating the qualification of the test case generator (Sect. 3.4). If the objective of a test campaign is to provide complete suites with a minimal number of test cases then the SPYH-Method [DBLP:conf/icst/SouchaB18] should be preferred to the H-Method.

Hazard/failure-oriented testing [Gleirscher2011-HazardbasedSelection, Lesage2021-SASSISafetyAnalysis] and requirements falsification based on negative scenarios [Uchitel2002-Negativescenariosimplied, Gleirscher2014-BehavioralSafetyTechnical, Stenkova2019-GenericNegativeScenarios] are useful if is not available or needs to be validated and revised. In contrast, our approach is complete once is validated, that is, any deviation from detectable by such techniques is also uncovered by at least one test case generated by our approach. Moreover, our approach is usable to test supervisor robustness without a realistic simulator for .

The soundness argument for our workflow (Sect. 4) relies on proofs of completeness of the test suites generated with the testing theories for showing . Such proofs can be mechanized using proof assistants [DBLP:conf/pts/SachtlebenHH019]. It is also possible to generate test generation algorithms from the proof tool and prove their correctness as a minor extension of the testing theory [DBLP:conf/pts/Sachtleben20]. This could simplify the tool qualification argument in Sect. 4.4. However, some kind of TQ argument will still be necessary, because proven algorithms do not guarantee correctness of their execution on a target platform (e.g., a PC or cloud server), where additional errors might be produced due to inadequate address or integer register sizes.

6 Conclusions

We proposed a rigorous workflow for the sound development (i.e., the verified synthesis and code generation) of supervisory discrete-event controllers enforcing safety properties in human-robot collaboration and other autonomous system applications. The novelty of this workflow consists in (i) the generation of a test reference model whose completeness and correctness is established by a refinement relation to a validated world model, (ii) the application of complete model-based testing methods in combination with static analysis to obtain a conformance proof of the safety supervisor code, and (iii) an explanation of how the tools involved and the artifacts they produce can be qualified according the most stringent requirements from standards for safety-critical systems development. We employ Hoare logic and weakest precondition calculus (at a meta-level rather than at the artifact level) to establish soundness of our workflow and use goal structuring notation to structure and visualize the complex verification and validation argument to obtain certification credit. The workflow is supported by a tool chain: Yap [Gleirscher2020-YAPToolSupport] and a stochastic model checker (e.g., Prism [Kwiatkowska2011-PRISM4Verification]) for Markov decision process generation and verification, Yap for test reference and code generation, and libfsmtest [libfsmtest] for test suite derivation and execution.

To test the integrated HW/SW-system (i.e., robot, welding machine, safety supervisor and simulation of human interactions), we will embed our approach into a more general methodology for verification and validation of autonomous systems, starting at the module level considered here, and ending at the level of the integrated overall system [DBLP:journals/corr/abs-2110-12586].