The development of automated vehicles (AVs) is one of the great technological challenges of today’s society. While significant progress was made in the last years on the functional side, it remains yet unsolved how the safe operation of an AV can be assured. As illustrated in Fig. 1, the AV system represents a complex, multi-stage compute stack involving most notably sensing, perception, planning and actuation, constituting the primary channel
. This compute chain is exposed to various sources of random faults, such as noise in the input data, hardware execution errors due to cosmic radiation, or systematic errors like software and hardware bugs, which can compromise the safe operation of the AV. Particularly, this holds in the presence of artificial intelligence (AI)-based components, which are well-established in the perception and planning domain, but are for example highly affected by unbalanced or incomplete training data.
To achieve fault-tolerance of the primary channel, a secondary safety channel is typically established (see Fig. 1
) that monitors the correct behaviour of the relevant primary functions. In this work, we assume that solutions exist for the verification of the planning functions: For example, Markov chains can be used to assess the safety of planned trajectories. More recently, Responsibility Sensitive Safety (RSS)  was proposed by Intel/Mobileye and it is currently adapted by the IEEE Standard P2846  for AV safety. All these approaches have in common that they rely on input provided by a perception system. Therefore, assuring the integrity of the perception system at reasonable costs and high efficiency is still a major challenge we address in this article.
Various perception monitoring approaches have been tested: One strategy is to create modular redundancies (e.g. duplex ) or diverse function replication, e.g. leveraging different sensing modalities or algorithms . If the diversity is sufficiently large, it can be argued that this provides a certain level of protection against the aforementioned faults. It remains an open question though whether it is feasible to develop a sufficiently diverse system that achieves comparable quality on all channels: Instead, the channel with the lowest performance often dictates the overall system behaviour. Furthermore, the inherent challenges of learning-based systems remain and the computational cost is significantly increased. This results in the demand of accelerators and high-performance processors for which it is typically hard to ensure correctness and integrity as demanded by the functional safety standard ISO26262.
In previous work  we proposed a two-stage planning and perception monitoring architecture where the planning monitor was realized by RSS and the perception monitor using a dynamic occupancy grid. While the detection performance of this implementation was very promising, the used dynamic occupancy grid relies on computational intense tracking algorithms. Therefore its compute requirements are a key challenge, unable to provide a lightweight solution that can run on FuSa-certified hardware.
To overcome this, we present here a simple, lightweight and low-cost, yet robust perception monitoring solution. Our perception monitor consists of two components sensor checks and plausibility checks, see Fig. 1. The former component performs a minimal processing of a high-confidence sensor source (here e.g. a LiDAR) to accurately verify the positions and two-dimensional shapes of objects in a static occupancy grid. The latter component verifies the dynamics of objects based on a physical motion model – thus not subject to sensing faults – to identify for example unrealistic speeds, accelerations or discontinuous paths. Both components run relatively simple functions, allowing for a light-weight implementation. Yet, we demonstrate that this monitoring concept can reliably detect speed and position errors that are large enough to be safety-critical in realistic situations. The identified errors can be mitigated by the subsequent planning monitor using for example worst-case assumptions on the erroneous parameters. The focus of this current work is however on the detection, not mitigation of perception errors.
Importantly, the here described monitoring solution is generic and can be combined with any primary perception system and any planning module beyond RSS.
In summary, the article contributes the following novelties:
We present a novel perception monitoring approach that leverages the strengths of both sensor and plausibility checking,
We demonstrate that this method provides a high precision and recall against the most relevant perception errors,
We verify that the implementation is low-cost and can be executed on certified hardware.
Ii Related Work
General monitoring architectures and their trade-offs in terms of cost, safety, execution time etc. are for example discussed in [12, 3]. The authors of [14, 15] present the benefits of an inherently asymmetric commander-monitor concept compared to a triple modular redundancy system, and propose a run-time monitor that can assess the safety of a planned vehicle trajectory. An example for an asymmetric perception monitor based on dynamic occupancy grids is proposed in our previous work . When machine-learning functions are utilized in AD systems, additional safety challenges emerge as discussed in [6, 11]. Plausibility checking as a mean to improve perception at different stages of the sensor fusion process is explored for example in [21, 22]. Use cases comprise the detection of implausible ghost vehicles  or misbehaviour in vehicle-to-vehicle communication . In previous work, we have further harnessed plausibility checking to detect sensor faults in distributed sensor networks .
As a conceptual limitation, a monitor of reduced complexity with respect to the primary channel, can detect only a subset of all possible perception errors of the latter. We thus restrict our analysis to the following errors which we consider as most relevant for the validation of the RSS perception input:
False negative detections represent objects in the true environment that the AD system fails to detect. For example, an AI component may not be trained properly to detect a certain type of object, and thus miss it. These errors could result in unsafe planning and collisions.
False positive detections are observations of the AD system that do not exist in the true environment (”ghost” targets). These are not considered to be safety-critical, nevertheless they can negatively impact the comfort of the system as they might lead to unnecessary braking or evasive manoeuvres.
Position errors result in an object being perceived at an incorrect location, potentially leading to unsafe manoeuvres or collisions. Depending on the object association, large position errors can be interpreted as a simultaneous false positive and false negative detection.
Velocity errors have the consequence that the motion of a perceived object is predicted in an incorrect way and might lead to false assumptions on required safety distances.
Our goal is to design a simplified monitor with sufficient detection capabilities to reliably identify the above errors, as quantified by the precision and recall metrics. A reduced precision of the monitor means increased false alarms, which directly affects the availability of the system. On the other hand, a reduced recall results in errors being missed by the monitor. Finding an optimal balance is a design challenge between the sensitivity and robustness of the architecture, especially as the input to the AV system is exposed to noise. The verification of false positive and false negative detections requires a redundant sensor source, and is therefore handled by the occupancy grid of the sensor checks. We have studied this aspect in previous work . In this article, we focus on the ability of our perception monitor to detect critical position and speed errors using combined sensor and plausibility checks, see Section IV. Those errors can be both transient or permanent.
Importantly, to properly calculate a safety envelope according to RSS definitions, only safety-critical manifestations of the mentioned error types need to be detected. Since the distinction of safe and unsafe errors depends generally on the explicit situation in a complex environment, it is not possible to define a universal threshold for, e.g., critical speed and position errors. We may however highlight two error magnitudes that typically lead to safety violations and serve as a benchmark for our monitor: For position errors, deviations of at least half the width of a lane, , cause critical lane misassociations, or mistakes about pedestrians being on or off the road. Further, speed errors of about
represent a typical difference between static and mobile pedestrians, and may also result in the misclassification of an object class, if typical speed estimates are partially or fully utilized for that purpose.
Iii-B Sensor checks
In order to realize the sensor checks we convert the LiDAR point-cloud measurements in an occupancy grid. In contrast to the approach described in , we use for this work a classical occupancy grid . The occupancy grid provides a two-dimensional (2D) representation of the environment. Therefore, it divides the surrounding of the ego vehicle in cells of a given size, so each Cartesian position
within the grid is mapped to a specific cell. Each cell contains the probability of the cell to be occupied by an obstacle. A point cloud can be easily transformed in such a grid representation, by projecting all points, that are measurements of obstacles, to the cells corresponding to their Cartesian position. Points that are part of the ground plane or of the environment that can be under-passed are excluded. Each point that falls within a cell increases the occupancy probability of that cell. Each grid cell therefore provides a spatial occupancy probability, for a given position, denoted.
In order to verify the object position we determine the region that is covered by an object at the current point in time, according to the primary input. This region is determined by the object state and its covariance matrix . We enlarge this region by an additional safety margin to compensate for noise in the LiDAR measurements and other undetected uncertainties in the system, see Fig. 2. The coverage function that will determine the coverage for a position and a given object takes the form
We then define metrics for the consistency, , of the grid with a given object, , and the conflict of a given grid position with the overall environment as
This allows us to evaluate false detections in the primary channel. Using two decision thresholds and and boolean mapping functions, denote , we have
With the functions , it’s possible to determine whether there are false positive or false negative detections.
Iii-B2 Theoretical analysis of position errors
A position error can be seen as a combination of a false positive and a false negative detection, as described above. The capability of the sensor check monitor component to detect position errors is determined by the grid size , determining the spatial resolution, the expected noise of the sensor input, and a safety margin denoted . Robustness against noise and other non-faulty deviations of the sensor and object data is needed to assure a low false alarm rate.
As visualized in Fig. 2, the grid is occupied around the part of a given object’s bounding box that is hit by the LiDAR beams, forming typically a ”L”-shape. A position error can be detected when the conflict of grid cells within exceeds the threshold . It should be noted, that depending on the original location of individual measurement points within a grid cell, a given position error may or may not be sufficient to shift the measurement from one grid cell to another, and hence produce a conflict or not. The minimum position error , that is guaranteed detectable, can be estimated as the largest intra-cell shift diagonal to the grid orientation,
Here, represents the measurement uncertainty of the object’s border, which can be approximated from the uncertainty of position and dimensions (length and width) as . Those margins etc. are obtained directly from the covariance matrix , and is the Euclidean norm. Further, the factor of in Eq. 6 represents a specific confidence that the true object border is within the respective error margins and can be fine-tuned to control the sensitivity of the sensor checks. In practice, depending on object orientation and grid alignment, most position errors will fall below this worst-case estimate and are therefore detected already at smaller displacements, see Sec. IV.
It is important to mention that with the proposed sensor checks only position errors that result in a larger distance from the ego vehicle can be reliably detected: Position errors that cause the object to appear closer to the ego vehicle, on the other hand, are more difficult to detect as the object region might overlap with the real object edge. Nevertheless, we argue that errors of the latter type are typically not safety-critical as an erroneously perceived shorter distance will automatically lead to a more cautious behaviour of the vehicle.
Iii-C Plausibility checks
Iii-C1 Motion model
To analyze the object dynamics, we define a reduced object state at time by
where and is a two-dimensional position,
is the speed (absolute value of the velocity vector),the heading angle, the longitudinal acceleration, and the turn rate. We adopt a constant turn rate and acceleration (CTRA) evolution during a time interval , which takes the form 
Assuming sufficiently high sensor update rates for pseudo real-time modeling, we can focus on small time intervals and simplify the CTRA model by expanding around ,
The plausibility check verifies whether or not the object displayed a plausible motion during the last interval . Let us denote with variables that are measured, and with variables that are predicted. Then, in a first step, we estimate the non-observed variables turn rate and acceleration from speed and heading measurements
Second, we use this estimate to predict the object position at time with the motion model of Eq. (9),
To evaluate the precision of our prediction, a standard error propagation is performed assuming independence of the measured quantities , and no error for .
Eventually, the motion of an object is considered implausible if at least one of the following conditions is true
Here, denotes the margin of error of a variable, and we have as well as . We introduce the parameter to control the sensitivity of the plausibility check, while , , are thresholds specifying a physically realistic maximum turn rate, forward acceleration, and brake acceleration, respectively.
Iii-C2 Theoretical analysis of speed errors
To get a better intuition of the efficacy of the plausibility check, we perform a theoretical analysis of two different types of speed errors. As a minimal setup, we take an object moving with constant true speed, denote , at two subsequent time steps with interval , and inject a speed error of value . We explore the impact of the time interval size and the object speed, while keeping for simplicity a constant heading (and thus in this test scenario). We calculate the minimal detectable speed error , defined here as (positive or negative) speed error with the smallest absolute value, that can be detected by the plausible motion check. Note that due to the position check of Eq. (14) there will be a conceptual difference in detecting positive and negative speed errors in this model. Negative speed errors lead to slightly smaller predicted error margins for the position, compared to those of a positive speed error of the same magnitude, which can make them easier to detect.
Permanent speed error:
This error is represented by adding the constant shift (importantly, this does not affect the next observed position which will evolve according to only). As the observed speeds are the same across all time steps, the estimated acceleration is zero and we can not reach the thresholds in Eq. (13). The detection of speed errors is then based solely on the predicted position, which will overshoot or undershoot the next measured position. We can see two important trends governing in Fig. 3fig:dv_const: An increase in leads to larger error margins for the predicted positions, given a nonzero uncertainty in the heading and turn rates. The minimal detectable speed error therefore grows with the object speed, except for a regime of small speeds , where the minimal speed errors is larger than two times the actual speed. An increase in the time interval both increases the prediction gap and the prediction error, where we find that typical measurement uncertainties balance this interplay in favor of the former, such that delayed updates typically help in detecting constant speed errors.
Transient speed error:
We simulate this behavior by enforcing a speed and . For such transient speed errors, the regime of small time intervals is typically dominated by the acceleration check in Eq. (13), resulting in a detection threshold of . The minimal detectable error is then independent of the object speed, increasing the sensibility for transient speed errors, see Fig. 3fig:dv_trans. With increasing time intervals, we see a superposition of the two conditions in Eq. (13) and Eq. (14), as the position prediction check becomes more and more relevant. For large , the minimal detected error will be determined by the position prediction only, and we observe the same trends as in Fig. 3fig:dv_const.
Iv-a Experimental setup
To evaluate our proposed safety approach, we use the CARLA simulator , and equip an ego vehicle with a LiDAR sensor. The environment model is represented by the ground truth object information directly from CARLA. Subsequently, we inject position and speed faults into this ground truth object list and forward the manipulated object list to the monitor. For evaluation we compare the detected errors with the original error list to determine the efficacy of the implemented checks (Fig. 3(b)). Importantly, the detection process for the plausibility checks comprises up to two subsequent time steps, since an error at a given time frame is typically detected by an implausible history at the next time frame.
For this paper, we have simulated two test scenarios in CARLA, see Fig. 4. The first scenario represents a residential area with multiple spawned pedestrians, that randomly cross the street. The second scenario is an urban intersection featuring not only pedestrians but also cars and other vehicles. Those two setups were chosen to provide a diverse perception input to the ego vehicle, in terms of environment constellations and object types. In addition to that, we have tested position errors with the NuScenes dataset  as an example of non-simulated LiDAR information. Each scenario duration was sufficiently long for the environment to contain more than relevant object states in scope, in order to guarantee statistical significance. All experiments use the parametrization of Tab. I unless stated otherwise.
|max. turn rate|
Iv-B False alarm
An important design target of a monitor is a low false alarm rate, in the presence of noise, to maintain high system availability.
In order to evaluate the robustness of the proposed checks, we add Gaussian noise to the ground truth object positions, effectively increasing the measurement uncertainties .
For demonstration, we give results of the evaluation with the pedestrian scenario. Fig. 5 shows that the sensor checks are very robust and produce zero false positives up to position noise of about . Similarly, for the plausibility checks, we find negligibly low false alarm rate remains below ‰. This leads to a generally high precision of both monitor components in the experiments described in the next sections.
Iv-C Permanent position error
To analyse permanent position errors, we inject a constant position offset into the object list, increasing the distance of objects relative to the ego object. The evaluation results in Fig. 6, using the pedestrian scenario, show that the sensor checks can reliably detect such errors if the offset is sufficiently large, which predominantly depends on the grid resolution.
Explicitly, with a grid resolution of , position errors of a magnitude of can be reliable detected (recall ), while with a cell size of the minimum detectable error increases to . The plausibility checks are not able to detect such permanent errors, as the consistency of the object history is not affected.
Iv-D Random position error
We study another realistic error pattern by injecting a position shift error to an individual object based on a fixed error probability (Fig. 6(a) - 6(f)). The sensor checks perform similar to the analysis of the previous section, however, for the intersection scenario the recall of the sensor checks alone reaches only , which can be explained by object occlusions occurring in this scenario. Errors associated with such occluded ground-truth objects cannot be detected by the ego vehicle sensor. We thus expect an even higher recall in a real scenario where sensor input is used for the primary instead of the ground truth data.
The plausibility checks, in contrast to the permanent shifts in Sec. IV-C, are very well-suited to detect transient position errors, identifying for example in Fig. 6(a) transient errors greater than at a reliable recall rate of . The detection capabilities of the plausibility check degrades with a higher error injection probability. This is because the position errors then effectively persist longer (across multiple time frames), resembling more and more the situation of permanent faults studied above. Overall, the error detection capability of the monitor showcases a slightly better performance in the pedestrian than in the intersection scenario, which is attributed to the more diverse motion patterns of the various object types and the on average higher object speeds in the latter.
Iv-E Velocity errors
For RSS to work efficiently, it is essential to verify the velocities of objects. They determine for example the expected braking times and safe following distances. In order to evaluate the detection capabilities of velocity errors we inject faults to the object speed, but do not affect the direction of the velocity, for simplicity of the analysis. Fig. 8 visualizes the results for both permanent and random (transient) speed errors injected at a rate of (sensor update rate ). As we expected from the theoretical analysis of Sec. III-C2, permanent errors can be reliably detected if they are greater than a rather large threshold of . On the other hand, the plausibility checks can efficiently detect already small speed deviations above , since the transient errors allow for additional plausibility checking with the help of acceleration limits, see also Fig. 3.
The sensor checks only use position information, and are thus not able to detect any velocity errors.
Iv-F Position error of a real sensor
Finally, we also explore the error detection capability of our monitor with real LiDAR data from the NuScenes dataset . Since no ground truth velocity data is provided here, we restrict ourselves to the analysis of permanent position faults with sensor checks only. As before, the ground truth environment model with injected position errors is used as a primary channel, while the LiDAR data enables the monitor sensor checks.
Fig. 9 shows that the results are comparable with the results obtained from simulation. The recall of the sensor checks saturates at around for larger position errors, which is again due to partial or full occlusions that the LiDAR scans fail to detect.
Iv-G Runtime evaluation
The monitor concept proposed in this article leverages the complementary strengths of sensor and plausibility checks, while remaining functionally simple for high compute efficiency. A lightweight implementation is desirable since the checks should ideally be executed on a parallel hardware that is certifiably (e.g. ASIL) robust against soft errors, which typically limits the compute resources. We here quantify the process latency as a performance metric on selected test hardware, using the two different systems:
Intel Core i9-7900X @ 3.30GHz
Intel Atom CPU C3934 @ 2.00GHz
The Intel Atom CPU represents a potential target platform for ASIL-compliant monitor applications, while the Intel Core is a state-of-the-art CPU with higher performance.
Our tests show (Fig. 10) that the sensor checks require an average of on the Core system using the configuration of Table I (covering a area around the ego vehicle, which should be sufficient for urban driving). On the Intel Atom the average latency increases to an average . Plausibility checking here takes an average of on the Core system to process about objects. On the Atom the latency increases to an average . Those results indicate that our monitor architecture is indeed lightweight, meaning that it is feasible to run the verification process at pseudo-real-time (typically ) on a safety-certifiable hardware. Note that the minimum latency target for an unobstructed system execution is eventually determined by the processing time of the parallel primary perception channel. The implementation of the code is not yet optimized and runs on a single core. The plausibility checks are currently implemented in Python and single-threaded. With parallelism and other optimization techniques the latency could therefore be reduced further if required. Also using a non-uniform grid representation  can improve the latency of the sensor checks.
We presented a lightweight monitor architecture for automated perception systems, which is realized by a combination of plausibility checks of an object’s motion history, and sensor checks that use LiDAR information. To improve overall safety, we envision that this perception monitor is coupled with a supervisor for the AD planning system, such as for example RSS. Compared to existing monitor approaches, our concept is characterized by two key advantages: i) The checks are designed to be effective yet simple, in order to reduce computational load, ii) We combine diverse sensor-dependent and sensor-independent methods to address both fault-tolerance against common-cause sensor failures and SOTIF. We evaluated the monitor in simulation, using CARLA, and performed additional tests with the NuScenes dataset. Our experiments demonstrate a high recall and precision (both ) for the detection of both permanent and random position errors larger than at most . A comparable detection performance is found for transient speed errors greater than or permanent ones greater than . Except for the permanent speed errors, those values are more than sufficient to detect the targeted errors of about position and velocity deviations, which we identified as a representative estimate for safety-critical perception errors in a wide spectrum of situations.
We further verified that the proposed monitor is able to run on an ASIL-capable Intel Atom CPU with low latency. Explicitly, the sensor check execution required on average and the plausibility checks . This is a significant reduction of the computational effort compared to our previous work , where only the creation of the required occupancy grid required more time and cores on an Intel Core i9 system. For future work, we plan to refine the monitor checks with respect to the safety-relevance of the object attributes and conduct additional test in real world scenarios.
This research was partially funded by the German Ministry for Economic Affairs and Energy in the project SafeADArchitect (19A20013A) and by the Federal Ministry of Transport and Digital Infrastructure of Germany in the project Providentia++ (01MM19008). Funding was received from the European Union’s Horizon 2020 research and innovation program within the FOCETA project under grant agreement No. 956123.
-  (2007-07) Safety Assessment of Autonomous Cars using Verification Techniques. In 2007 American Control Conference, Vol. , pp. 4154–4159. External Links: Cited by: §I.
-  (2019) Design of a Misbehavior Detection System for Objects Based Shared Perception V2X Applications. IEEE Intelligent Transportation Systems Conference. Cited by: §II.
-  (2018) Toward a holistic software systems engineering approach for dependable autonomous systems. In 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS), pp. 23–30. Cited by: §II.
-  (2020) Efficient dynamic occupancy grid mapping using non-uniform cell representation. In 2020 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 1629–1634. External Links: Cited by: §IV-G.
-  (2020) Towards online environment model verification. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Vol. , pp. 1–7. External Links: Cited by: §I, §II, §III-A, §III-B1, §V.
-  (2017) Making the case for safety of machine learning in highly automated driving. In International Conference on Computer Safety, Reliability, and Security, pp. 5–16. Cited by: §I, §II.
-  (2019) NuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027. Cited by: Fig. 9, §IV-A, §IV-F.
-  (2017) CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16. Cited by: §IV-A.
-  (2020) A Plausibility-Based Fault Detection Method for High-Level Fusion Perception Systems. IEEE Open Journal of Intelligent Transportation Systems 1, pp. 176–186. External Links: Cited by: §II.
-  (2019) IEEE P2846: A Formal Model for Safety Considerations in Automated Vehicle Decision Making. Note: onlinehttps://sagroups.ieee.org/2846/ Cited by: §I.
-  (2018) Toward a framework for highly automated vehicle safety validation. Technical report SAE Technical Paper. Cited by: §II.
-  (2017-04) An architecture pattern for safety critical automated driving applications: design and analysis. In 2017 Annual IEEE International Systems Conference (SysCon), Vol. , pp. 1–7. External Links: Cited by: §I, §II.
-  (2019) Meet Tesla’s self-driving car computer and its two AI brains. External Links: Cited by: §I.
-  (2020) The monitor as key architecture element for safe self-driving cars. In 2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), pp. 9–12. Cited by: §II.
-  (2020) Runtime monitoring for safe automated driving systems. Ph.D. Thesis, Mälardalen University. Cited by: §II.
-  (2015) Multi-sensor data fusion for checking plausibility of V2V communications by vision-based multiple-object tracking. IEEE Vehicular Networking Conference, VNC, pp. 143–150. External Links: Cited by: §II.
-  (2008) Comparison and evaluation of advanced motion models for vehicle tracking. Proceedings of the 11th International Conference on Information Fusion, FUSION 2008 (1), pp. 730–735. External Links: Cited by: §III-C1.
-  (2017) On a formal model of safe and scalable self-driving cars. arXiv:1708.06374. Cited by: §I.
-  (2005) Probabilistic robotics (intelligent robotics and autonomous agents series). Intelligent robotics and autonomous agents, The MIT …. External Links: Cited by: §III-B1.
-  (2016) Probability and Statistics with Reliability, Queuing and Computer Science Applications. Probability and Statistics with Reliability, Queuing and Computer Science Applications, pp. 1–857. External Links: Cited by: §III-C1.
-  (2006) Plausibility Checking of Sensor Signals for Vehicle Dynamics Control Systems. Symposium on Advanced Vehicle Control. External Links: Cited by: §II.
-  (2018) Vehicular dynamics based plausibility checking. IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC. External Links: Cited by: §II.