Monitoring and Diagnosability of Perception Systems

05/24/2020 ∙ by Pasquale Antonante, et al. ∙ MIT 0

Perception is a critical component of high-integrity applications of robotics and autonomous systems, such as self-driving cars. In these applications, failure of perception systems may put human life at risk, and a broad adoption of these technologies relies on the development of methodologies to guarantee and monitor safe operation as well as detect and mitigate failures. Despite the paramount importance of perception systems, currently there is no formal approach for system-level monitoring. In this work, we propose a mathematical model for runtime monitoring and fault detection of perception systems. Towards this goal, we draw connections with the literature on self-diagnosability for multiprocessor systems, and generalize it to (i) account for modules with heterogeneous outputs, and (ii) add a temporal dimension to the problem, which is crucial to model realistic perception systems where modules interact over time. This contribution results in a graph-theoretic approach that, given a perception system, is able to detect faults at runtime and allows computing an upper-bound on the number of faulty modules that can be detected. Our second contribution is to show that the proposed monitoring approach can be elegantly described with the language of topos theory, which allows formulating diagnosability over arbitrary time intervals.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The automotive industry is undergoing a change that could revolutionize mobility. Self-driving cars promise a deep transformation of personal mobility and have the potential to improve safety, efficiency (e.g., commute time, fuel), and induce a paradigm shift in how entire cities are designed [Silberg12wp-selfDriving]. One key factor that drives the adoption of such technology is the capability of ensuring and monitoring safe execution. Consider Uber’s fatal self-driving crash [ntsbuber] in 2018: the report from the National Transportation Safety Board stated that “inadequate safety culture” contributed to the fatal collision between the autonomous vehicle and the pedestrian. The lack of safety guarantees, combined with the unavailability of formal monitoring tools, is the root cause of these accidents and has a profound impact on the user’s trust. The AAA’s survey [aaa] shows that 71% of Americans claim to be afraid of riding in a self-driving car. This is a clear sign that the industry needs a sound methodology, embedded in the design process, to guarantee safety and build public trust.

While safety guarantees have been investigated in the context of control and decision-making [Mitsch17ijrr-verificationObstacleAvoidance, Foughali18formalise-verificationRobots], the state of the art is still lacking a formal and broadly applicable methodology for monitoring perception systems, which constitute a key component of any autonomous vehicle. Perception systems provide functionalities such as localization and obstacle mapping, lane detection, detection and tracking of other vehicles and pedestrians, among others.

State of Practice. The automotive industry currently uses five classes of methods to claim the safety of an autonomous vehicle (AV) [Shalev-Shwartz17arxiv-safeDriving], namely: miles driven, simulation, scenario-based testing, disengagement, and proprietary. The miles driven

approach is the most commonly used, and is based on the statistical argument that if the probability of crashes per mile is lower in autonomous vehicles than for humans, then autonomous vehicles are safer. This approach is problematic since the meaningful

111Driving on empty streets provides less evidence of safety than driving in dense urban environments. number of fault-free miles that the AV should drive is significant—on the order of billions of miles [Kalra16tra-selfDriving, Shalev-Shwartz17arxiv-safeDriving]—which would require years to achieve and would not enable frequent software updates. The same approach can be made more scalable through simulation, but unfortunately creating a life-like simulator is an open problem, for some aspects even more challenging than self-driving itself. Scenario-based testing is based on the idea that if we can enumerate all the possible driving scenarios that could occur, then we can simply expose the AV (via simulation, closed-track testing, or on-road testing) to all of those scenarios and, as a result, be confident that the AV will only make safe driving decisions. However, enumerating all possible corner cases is a daunting (and system-dependent) task. Finally, disengagement [googledisengagements]

is defined as the moment when a human safety driver has to intervene in order to prevent a hazardous situation. However, while less frequent disengagements indicate an improvement of the AV behavior, they do not give evidence of the system safety.

An established methodology to ensure safety throughout the life cycle of an automotive system is to use a standard that every manufacturer has to comply with. In the automotive industry, the standard ISO 26262 (Road vehicles — functional safety) [iso26262] is a risk-based safety standard that applies to electronic systems in production vehicles, including driver assistance and vehicle dynamics control systems. A key issue is that ISO 26262 relies on the presence of human drivers (and mostly focuses on electronic systems rather than algorithmic aspects) hence it does not readily apply to fully autonomous vehicles [Koopman16sae-autonomousVehicleTesting]. The recent ISO 21448 (Road vehicles — Safety of the intended functionality) [iso21448] is intended to complement ISO 26262. It provides guidance on the applicable design, verification, and validation measures needed to achieve the safety at higher levels of automation. The standard has been first publicly released in 2019, and—while stressing the key role of perception—it only provides high-level considerations to inform stakeholders. In June 2019, 11 major stakeholder in the automated driving industry including Aptiv, Baidu, BMW, Intel, and Volkswagen published a report [AptivSafetyAV] on safety in autonomous driving. The report focuses on highly autonomous vehicles, presenting an overview of safety by design and providing a sound discussion of the verification and validation of such systems. The report proposes a verification and validation approach mainly based on testing, and suggests the use of monitors

to detect off-nominal performance at the system and subsystem level (including neural networks).

Perception has been subject to increasing attention also outside the automated driving industry. For instance, autonomous aerial vehicles are expected to disrupt air mobility in the same way autonomous cars are disrupting urban mobility [UberElevate, EHangAirMobility]. A recent report from the European Union Aviation Safety Agency (EASA) [EASADesignNN] acknowledges both the importance of such technologies and the lack of standardized methods to guarantee trustworthiness of such systems. The report [EASADesignNN]

considers an automatic landing system as case study and investigates the challenges in developing trustworthy AI, with focus on machine learning.

These reports and standards stress the important role of perception for autonomous systems and motivate us to design a rigorous framework for system-level perception monitoring.

Related Work. Formal methods [Ingrand19irc-verificationTrends] have been recently used as a tool to study safety of autonomous systems. Formal methods use mathematical models to analyze and verify part of a program. The insight here is that these mathematical models can be used to rigorously prove properties of the modeled program. This approach has been successful for decision systems, such as obstacle avoidance [Mitsch17ijrr-verificationObstacleAvoidance] and road rule compliance [Roohi18arxiv-selfDrivingVerificationBenchmark]. This is mainly due to the fact that decisional specifications are usually model-based and have well-defined semantics [Foughali18formalise-verificationRobots]. The challenge in applying this approach to perception is related to the complexity of modeling the physical environment [Seshia16arxiv-verifiedAutonomy], and the trade-off between evidence for certification and tractability of the model [Luckcuck19csur-surveyFormalMethods].

Relevant to the approach presented is this paper is the class of runtime verification methods. Runtime verification is an online approach based on extracting information from a running system and using it to detect (and possibly react to) observed behaviors satisfying or violating certain properties [Bartocci18book-runtimeVerification]. Traditionally, the task of evaluating whether a module is working properly was assigned to a monitor, which verifies some input/output properties on the module alone. Balakrishnan et al. [Balakrishnan19date-perceptionVerification] use Timed Quality Temporal Logic (TQTL) to reason about desiderable spatio-temporal properties of a perception algorithm. Kang et al. [Kang2018ModelAF] use model assertions, which similarly place a logical constraint on the output of a module to detect anomalies. We argue that this paradigm does not fully capture the complexity of perception pipelines: while monitors can still be used to infer the state of a single module, perception pipelines provide an additional opportunity to cross check the compatibility of results across different modules. In this paper we try to address this limitation.

Performance guarantees for perception have been investigated for specific problems. In particular, related work on certifiable algorithms [Yang20arxiv-teaser, Yang20arxiv-shapeStar, Briales18cvpr-global2view] provides algorithms that are capable of identifying faulty behaviors during execution. These related works mostly focus on specific algorithms, while our goal is to establish monitoring for perception systems, including multiple interacting modules and algorithms.

Contribution. In this paper we develop a methodology to detect and identify faulty modules in a perception pipeline at runtime. In particular, we address two questions (adapted from [Brundage20arxiv-trustworthyAI]): (i) Can I (as a developer) verify that the perception algorithm is providing reliable interpretations of the perception data? (ii) Can I (as regulator) trace the steps that led to an accident caused by an autonomous vehicle? Towards this goal, we draw connections with the literature on self-diagnosability for multiprocessor systems, and generalize it to (i) account for modules with heterogeneous outputs, and (ii) add a temporal dimension to the problem to account for modules interacting over time. This contribution results in a graph-theoretic approach that, given a perception system, is able to detect faults at runtime and allows computing an upper-bound on the number of faulty modules that can be detected. This upper bound is related to the level of redundancy within the system and provides a quantitative measure of robustness. Our second contribution is to show that the proposed monitoring approach can be elegantly described with the language of topos theory [MacLane12book-toposTheory, Johnstone02book-toposTheory] and allows formulating diagnosability and faults detection over arbitrary time intervals.

2 An Example of Perception Pipeline

Consider the system depicted in Fig. 1: this is an example of perception system used to localize an autonomous vehicle (AV) using multiple sensor streams. We are going to use this system as a running example throughout the paper to elucidate our mathematical model.

Figure 1: Example of localization system for Autonomous Vehicles.

In Fig. 1, the AV observes the world through noisy sensors, each of which provides information to various perception modules. These perception modules collect, organize, and interpret sensory information in order to create a model of the world. In this paper, for the sake of simplicity, we consider a world model that includes only the absolute pose—position and heading—of the AV, an element of the 3-dimensional Special Euclidean group .

In Fig. 1, each rectangle inside the “Perception” box represents a perception module, while each arrow represents a data stream. Let denote a set of variables; in our example, . We can tag each arrow with the variable it carries. Moreover, each variable has its own type. Let be a map; we say that is the type of variable . For example,

, a real vector of 6 elements listing the acceleration and angular velocities measured by the Inertial Measurement Unit (IMU). At any instant of time, the system takes the measurements from the sensors and estimates the AV’s absolute pose, denoted by

. Let us analyze the modules in Fig. 1:

  • The Global Positioning System (GPS) provides raw latitude and longitude data to the GPS reader, which packages this information and sends it to GPS Data Processing, which in turn estimates the car’s absolute pose (variable ).

  • The Light Detection and Ranging (LIDAR) sensor uses light in the form of a pulsed laser to measure reflection distances, producing a 3D point cloud of the surrounding environment; the LIDAR reader gets the point cloud while the LIDAR Registration module estimates the relative motion between consecutive time instants by comparing consecutive scans [Yang20arxiv-teaser], The estimate of the relative motion of the AV is commonly referred to as “odometry” in robotics [Cadena16tro-SLAMsurvey] (variable ).

  • The High Definition Map Localization uses a given high definition map of the environment to estimate the absolute position (variable ) of the vehicle by comparing the LIDAR scan with the map.

  • The Cameras provide images of the environment, the Camera reader collects raw images, while the Visual Odometry module estimates the relative motion of the vehicle by comparing two consecutive images (variable ).

  • The Inertial Measurement Unit (IMU) measures accelerations and angular velocities of the car, then by integrating these values over time, IMU integration estimates the relative motion of the vehicle (variable ).

  • Finally, the Pose-Graph Optimization [Cadena16tro-SLAMsurvey] module estimates the vehicle’s pose by fusing the noisy absolute and relative pose measurements produced by the other modules.

Errors are common throughout this pipeline, so it is crucial for the system to self-diagnose these errors in real time, discard outliers, and produce reliable estimates of the car’s pose (which can be then used for planning and control purposes). For instance, when operating near tall buildings, the GPS can provide unreliable measurements. Similarly, when used in crowded scenes, visual odometry can provide incorrect motion estimates. In the next sections we study how faults can be diagnosed, using this perception system as an example.

3 Monitoring of Perception Systems

Our first contribution is to develop a mathematical model for monitoring and fault diagnosis in perception systems. Towards this goal, we draw connections with the literature on fault diagnosis in multiprocessors systems, which has been extensively studied since the late 1960s. Section 3.1 reviews the notion of diagnosability and existing results for fault detection and to characterize diagnosable systems using the PMC (Preparata, Metze, and Chien) model [Preparata67tec-diagnosability]. Section 3.2 extends the notion of diagnosability to perception systems and discusses how to model temporal aspects arising in real AV applications.

3.1 Diagnosability

In the PMC [Preparata67tec-diagnosability] model, a set of independent processors is assembled such that each processor has the capability to communicate with a subset of the other processors, and all of the processors perform the same computation. In such system a fault occurs whenever the outputs of a processor differs from the outputs of fault-free processors; the problem is to identify which is which. Each processor is assigned a particular subset of the other processors for the purpose of testing. Using a comparison-based mechanism, the model aims to characterize the set of faulty processors. Clearly, it is not possible to determine the faulty subset in general, so much of the literature on multiprocessor diagnosis considers two fundamental questions [Dahbura84tc-diagnosability]: (i) Given a collection of processors and a set of tests, what is the maximum number of arbitrary processors which can be faulty such that the set of faulty processors can be uniquely identified? (ii) Given a set of test results, does there exist an efficient procedure to identify the faulty processors? The key tool to address these questions is the diagnostic graph [Preparata67tec-diagnosability].

Diagnostic Graph. At any given time, each processor is assumed to be in one of two states: faulty or fault-free. Diagnosis is based on the ability of processors to test—i.e., to provide an opinion about the faultiness of—other processors. Formally we assume that each processor implements one or more consistency functions; these are Boolean functions that return pass () or fail (), depending on whether the output of two processors is in agreement.

To perform the diagnosis, we follow [Preparata67tec-diagnosability] and model the problem as a directed graph , where is the set of processors, while the edges represent the test assignments. In particular, for an edge , we say that node is testing node . The outcome of this test is the result of the consistency functions of , in other words (resp. ) if evaluates as faulty (resp. fault-free). Fault-free processors are assumed to provide correct test results, whereas no assumption is made about tests executed by faulty processors: they may produce correct or incorrect test outcomes. We call the diagnostic graph.

The collection of all test results for a test assignment is called a syndrome. Formally, a syndrome is a function . The syndrome is processed by an external entity which diagnoses the system, that is, makes some determination about the faultiness status of every processor in the system. The notion of -diagnosability formalizes when it is possible to use the syndrome to diagnose and identify faults in a system.

Definition 1 (-diagnosability [Preparata67tec-diagnosability]).

A diagnostic graph with processors is -diagnosable () if, given any syndrome, all faulty processors can be identified, provided that the number of faults presented does not exceed .

What makes fault detection challenging is the fact that we can have multiple processors that provide incorrect results but are consistent with each other. This is formalized by the notion of consistent fault set.

Definition 2 (Consistent fault set [Preparata67tec-diagnosability]).

For a syndrome , denote the test outcome assigned to the edge by . For a -diagnosable graph and a syndrome , a subset is a -consistent fault set if and only if

  1. ;

  2. if then either or

  3. if then .

Item b in Definition 2 says that if ( disagrees with ) then at least one between and must be outside the consistent fault set. Item c states that processors outside the faulty set are in agreement with each other; note that the condition is not if and only if because we are assuming that tests performed by faulty nodes are unreliable (see example in Fig. 5), therefore we can have when is faulty and returns a wrong test result.

Example. Consider the simple example in Fig. 5 (adapted from [Preparata67tec-diagnosability]). For the graph in Fig. 5, a syndrome will be represented as a list of five elements, . Assume exactly one of the processors, say

(w.l.o.g), is faulty. Then and , i.e.,

correctly identifies

as faulty and could be either 0 or 1: being a faulty node,

may or may not correctly diagnose

; in the figure we label that edge by . In this simple case it is easy to check that, assuming there is only one faulty processor, we can correctly identify that the faulty processor is

. It is natural to ask: can we identify a faulty condition with two faulty processors? In this graph, we cannot: to prove that, it is sufficient to show that two different faulty conditions generate the same syndrome (Fig. (c)c).

Figure 5: (a) Example of diagnostic graph with five processors and (b)-(c) two faulty conditions (faulty processors shown in red) exhibiting the same syndrome.

The problem of determining the maximum value of for which a given system is -diagnosable is called the diagnosability problem. We denote the maximum value of -diagnosability of a graph by .

3.1.1 Characterization of -diagnosability

Consider a directed graph . For , we denote by the number of edges directed toward (in-degree). We denote by the minimum in-degree of the graph. Moreover, we denote the testable set of (outgoing neighbors of ). Finally, for , denote .

We can now state the following fundamental theorem.

Theorem 3 (Characterization of -diagnosability [Hakimi74tc-diagnosability]).

Let be a diagnostic graph with processors. Then is -diagnosable if and only if

  1. ;

  2. ; and

  3. for each integer with , and each with we have .

Using Theorem 3 we can quickly find that the maximum in the graph in Fig. 5 is because and . Moreover, condition c holds ( forces and each with has ).

A naive implementation of Theorem 3 would lead to an algorithm of time complexity  [Bhat82acm-diagnosability]. Bhat [Bhat82acm-diagnosability] propose an improved algorithm to find the maximum of an arbitrary graph in where . Since this approach can be impractical for large graphs, they also propose a polynomial algorithm to find a suboptimal value of in .

3.1.2 Fault Detection

Once we have a syndrome on a -diagnosable graph, we would like to actually identify the set of faulty processors, provided that the number of faults does not exceed . Dahbura et al. [Dahbura84tc-diagnosability] showed that the problem of identifying the set of faulty processors is related to the problem of finding the minimum vertex cover set of an undirected graph (i.e. the smallest set of vertices such that every edge has an endpoint in ). In general, the problem of finding a minimum vertex cover is in the class of NP-complete problems, meaning that there is no known deterministic algorithm that is guaranteed to solve the problem in polynomial time, but the validity of any solution can be tested in polynomial time. However, the work [Dahbura84tc-diagnosability] exploits some special properties of -diagnosable graphs to propose an algorithm with time complexity for fault identification. In a later work, Sullivan [Sullivan88tc-diagnosability] proposes an algorithm with time complexity for fault identification and proved that this is the tightest bound if is . These results ensure us that fault identification is practical in real-time and scales well with bigger diagnostic graphs.

3.2 Diagnosability for Perception

In this section we generalize the approach of Section 3.1 to make it applicable to perception systems. In particular, while in Section 3.1 multiple processors perform the same computation and yield the same type of output, perception systems are characterized by the fact that each module produces different variables and potentially at different rates (Fig. 1). Therefore, in order to build on the diagnosability framework described in the previous section, we have to (i) define consistency functions for a perception system with arbitrary modules, and (ii) capture temporal aspects that are missing in the original PMC model [Preparata67tec-diagnosability]

Figure 6: Example of diagnostic graph for the localization pipeline in Fig. 1.

3.2.1 Perception Consistency Functions

Consistency functions are used to implement pairwise tests for each edge in a diagnostic graph. Due to the heterogeneity of the modules of a perception system, we can have different types of consistency functions, which we describe in the rest of this section. Using these consistency functions, we can build a diagnostic graph: for instance, in Fig. 6 we report an example of diagnostic graph for the localization example in Fig. 1.

Input and Output admissibility. Consider the case in which the camera captures a very dark image caused by a sudden change of brightness in the scene. Upon receiving camera images, the Visual Odometry module can detect the underexposed image as unintended and report a failure from testing the camera module. If we look at the example in Fig. 1, we can capture this test as an edge between modules . In perception systems, this type of consistency function is very common. Following the same reasoning we can add a consistency checks between any module and its predecessor, for example, if the point cloud provided by the LIDAR (

) to LIDAR Registration (

) contains coincident points we clearly have a fault in the LIDAR module. These edges are added to the diagnosis graph in Fig. 6. Note that using this logic we can also monitor output admissibility (the output of a module will be the input of another).

Input Consistency. Consider the IMU integration module, suppose it receives data from three IMUs, the module can compare the three noisy measurements and detect if one of the three does not agree with the others; for example an IMU provides very different acceleration measurements. Hence the IMU integration module can test the three IMU reader modules.

Output Consistency. Consider the GPS reader and the Map reader. They don’t directly share information but suppose that a GPS measurement return a latitude and longitude that the maps predict to be occupied by water. The map reader, using this information, can identify the faulty measurement of the GPS, returning fail whenever testing the GPS. In our perception example, this test correspond to an edge and/or .

Input/Output Consistency. Consider the case in which Visual Odometry is not able to correctly estimate the pose of the vehicle. After fusing all the incoming pose measurements, the Pose-Graph Optimization module

, can identify that visual odometry produced an off-nominal value [Lajoie19ral-DCGM]. Input/Output consistency differs from Input consistency in that it needs to compute the output variable in order to test the validity of the inputs. In our system this example corresponds to the edges , , …, .

Example. Each module in Fig. 1 can use one or more of these consistency functions described above to test other modules in the system. Each test can detect a subset of failures that might occur, so the combination of all available tests maximize the set of detectable faults in the system. In Fig. 6, we report an example of diagnostic graph for the localization example in Fig. 1. By inspection and using condition (ii) in Theorem 3, we know that because

has in-degree equal to . By checking condition (iii) we discover that the graph is in fact only -diagnosable. In particular, for , and () we have that therefore . This means we can uniquely identify faulty modules only in the case where just one such fault occurs.

3.2.2 Temporal Diagnostic Graphs

In the previous section, we considered instantaneous fault diagnosis. In other words, at any iteration, given a diagnostic graph and a syndrome we can detect and identify faulty processors only considering tests occurring at that time instant. However, perception modules evolve over time and considering the temporal dimension allows adding temporal checks and modeling systems with modules operating at multiple frequencies. In the following, we extend the notion of diagnostic graphs to account for the temporal dimension of perception.

In a perception pipeline, different modules publish information at different frequencies. For example consider the system in Fig. 1. Within each second, the IMU publishes 100 times, whereas the GPS reader only publishes once. We can thus take the subgraph on nodes according to minimum publishing frequencies and with edges that have both endpoints in . For instance, consider the nodes that publish at least at , those nodes are


, and

. The resulting subgraph of that we would obtain taking these nodes in depicted in Fig. (a)a. Note that we might have different subgraphs for each publishing frequency as in Fig. 11. Clearly we can perform fault detection over each of these “instantaneous” diagnostic graphs. However, as mentioned above, considering the evolution of the system over time provides further opportunities for fault detection.

Figure 11: Subgraphs of the diagnostic graph in Fig. 6, one for each publishing frequency.

If we consider an arbitrary interval of time (taken in seconds), over the interval we have the opportunity to collect multiple diagnostic graphs with their syndromes (Fig. 12). Furthermore, the output of the modules of these graphs cannot be arbitrary, and must have some temporal consistency. For instance, because motion is continuous, measurements of the AV pose at consecutive times should be close to one another. Another example is that, since the AV has a limited braking power (imposed by physical constraints), a set of consecutive IMU acceleration measurements cannot exceed some bounds. Therefore the diagnostic graphs at different moments in time can be connected via temporal consistency function, i.e., we can make tests (or edges) that go across time. We call the collection of the diagnostic graphs in the time interval and the corresponding temporal edges a -temporal diagnostic graph (Fig. 12). The simplest form of temporal consistency involves an edge connecting two identical nodes occurring at consecutive time steps, e.g., two consecutive IMU measurements. However it can easily be extended to different modules and neighboring (but not adjacent) time steps.

Besides being more expressive, this approach has the potential to greatly increase the -diagnosability of the system. Intuitively, the temporal diagnostic graph becomes larger, but also more diagnosable, when longer intervals of time are considered.

Corollary 4.

If are two graphs with the same set of vertices, then .


Assume conditions (i), (ii), and (iii) hold for . Since and have the same number of processors, condition (i) holds for . Since and for every we have , conditions (ii) and (iii) hold also for . ∎

We can prove that the -diagnosability is monotonic under temporal restriction.

Proposition 5.

Let be a diagnostic graph. Suppose given subsets that cover U, in the sense that , and for each , let be the largest subgraph with vertices . Then if each is -diagnosable, so is .


Assume each is -diagnosable, and suppose that is a syndrome for a -consistent fault set on (so it has at most -many faulty nodes). Then its restriction to each has fewer than -many faulty nodes, so the faulty nodes in can be accurately diagnosed. Since every node is in some , every node can be accurately diagnosed, so is -diagnosable. ∎

Example. Let us consider a small interval of time, say . In such an interval of time we should get three different diagnostic graphs for the rate (see Fig. (a)a). Each of those graph is -diagnosable. Now suppose we can test consistency across one and two consecutive instants, obtaining the temporal diagnostic graph in Fig. 12. The diagnosability of the temporal diagnostic graph in Fig. 12 is now -diagnosable (cf. Theorem 3). From this example, one can see the advantages of including the temporal dimension for fault diagnosis.

Figure 12: Example of temporal diagnostic graph over a small time interval ().

4 Temporal Type Theory

This section shows that temporal diagnostic graphs can be elegantly described with the language of topos theory. In particular, in order to organize the information of multiple graphs that interact over time-intervals of varying length, as well as the diagnosability and collection of faulty nodes in these graphs, we use the mathematical language of temporal type theory as described in [Schultz19book-ttt]. In the following, Section 4.1 briefly reviews basic concepts of topos theory, while Section 4.2 tailors these concepts to temporal diagnostic graphs.

4.1 Sheaves, Behavior Types, and Toposes

In temporal type theory, one models various types of behavior using data structures called sheaves. A sheaf assigns a set to each open interval , understood as the set of possible behaviors that can occur over the time-window . For any subinterval , the sheaf also assigns a restriction map , which crops the behavior to the subwindow, returning some .

Every aspect of the vehicle motion and the perception system considered in this paper has a representation in this common language of behavior types. For example, at any moment of time, every variable of type has an associated a behavior type

where the restriction map is given by composition. That is, given , define to be the composite . Below we will discuss behavior types for diagnosing faults in the perception pipeline as data arrives over intervals of time. To do so, we first construct the behavior type of all temporal diagnostic graphs. A -behavior, i.e., element of , consists of a graph together with a function , meaning that each node is assigned a moment in that interval, indicating the moment at which it occurs. For any , the restriction map sends to , the maximal subgraph on the nodes that exist within the subinterval ; in other words it includes all the edges from between those nodes.

All possible behavior types, as sheaves, form a topos . A topos is a well-studied algebraic system that has many useful formal properties [MacLane12book-toposTheory, Johnstone02book-toposTheory]. For example behavior types can be added (a behavior of is a behavior of or a behavior of ), multiplied (a behavior of consists of a pair of behaviors), exponentiated, etc. Moreover every topos comes equipped with its own internal language, which includes a full-fledged constructive logic. In the internal language of , the booleans are replaced with a notion of temporal truth values, a behavior type of all propositions. It comes equipped with all the constants, operations, and quantifiers, . The internal language can be used to provide very short and intuitive descriptions of complex objects. One writes the description set-theoreatically, and then the topos machinery turns that description into one that varies in time. For example, if one writes the standard definition of vector space in the internal language, the result will be a vector space changing in time.

Example. We can use the internal language to define the sheaf of upper-bounded naturals, denoted . In the internal language it is defined by , meaning “if is an upper bound, then so is ”. This translates to saying that a behavior assigns a natural number to each open subinterval of , and smaller subintervals can be assigned smaller natural numbers.

4.2 Temporal Diagnostic Graphs as Behavior Types

Temporal diagnostic graphs such as the one in Fig. 12 have a special property: any clipping to a subinterval has a lower diagnosability number. To be more precise, let be a graph and for any , let denote the graph with vertices and edges

Thus, the graph consists of copies of , as well as edges connecting corresponding nodes across neighboring copies. For example , whereas we can imagine as two panes, each a copy of , and an edge from each node in the first pane to the corresponding node in the second pane.

Now for any frequency and graph , consider a temporal graph with ,222Note that the graph in Fig. 12 has strictly more edges than , so we can still conclude the result from Corollary 4. where each vertex is sent to the time . It consists of multiple panes, spaced -apart, each isomorphic to . We call temporal graphs constructed as was above time-regular graphs and denote the subsheaf of time-regular graphs by .

It follows from Proposition 5 that if one clips a time-regular graph to a smaller interval , the result will have a smaller diagnosability number. Thus we obtain a map of sheaves , sending each time-regular graph to its diagnosability number . Note that this contains a large amount of information: is not just one graph but a graph for every duration of time, and similarly consists of a natural number for every interval .

Using the language of temporal type theory we can also model fault detection over arbitrary intervals. Suppose actual data is coursing through the perception pipeline. For every interval of time, we have a diagnosability graph , and every node in it has published values for each of its variables. This induces a function , that sends every module to the union of all subintervals throughout which it is fault-free. The perception system is trying to infer this function, but on short intervals does not have enough data to determine it. Thus it returns with . For short intervals, when the -diagnosability number is low, this will often be a strict inclusion: some faulty nodes may fail to be diagnosed. However by auditing longer intervals, the decision layer can often ex post facto diagnose the faulty nodes and obtain an equality.

5 Conclusion

We presented a novel methodology for fault diagnosis in perception systems. Towards this goal, we drew connections with the literature on diagnosability for multiprocessors systems and generalized it to account for heterogeneous modules interacting over time, as the ones arising in perception systems. In particular, we showed that considering the temporal dimension, i.e., assessing the consistency of the system behavior over time, has the potential to enhance diagnosability, while still enabling the use of existing tools for fault detection. Finally, we showed that the proposed monitoring approach can be elegantly described with the language of topos theory which allows the formulation of diagnosability and fault detection over arbitrary time intervals. While the approach is fairly general and applicable to a variety of perception systems, we considered a localization system as a case study.

We believe that the proposed notion of temporal diagnostic graphs can complement the literature and enhance the current practice, contributing to the goal of achieving safety and trustworthiness in autonomous vehicle applications. For instance, a system designer can use the tools proposed in this paper to assess the diagnosability before deploying the vehicle on public roads, or design the system in such a way that its fault diagnosability is maximized. While deployed, the proposed framework allows the vehicle to have a greater awareness of its operational envelope and enables real-time perception monitoring via runtime diagnosability. Finally, for regulators, system-level guarantees provide a more solid and rigorous ground for certification, and, in the unfortunate case of an accident, the proposed approach increases accountability by providing formal evidence about the root cause of the failures.

This work opens a number of avenues for future work. First, we plan to provide more examples of perception systems that can be modeled with (and can benefit from) the proposed monitoring approach. Second, we plan to build on the results presented in this paper to model an entire AV software architecture using the language of temporal type theory.