Log In Sign Up

A Safety Assurable Human-Inspired Perception Architecture

Although artificial intelligence-based perception (AIP) using deep neural networks (DNN) has achieved near human level performance, its well-known limitations are obstacles to the safety assurance needed in autonomous applications. These include vulnerability to adversarial inputs, inability to handle novel inputs and non-interpretability. While research in addressing these limitations is active, in this paper, we argue that a fundamentally different approach is needed to address them. Inspired by dual process models of human cognition, where Type 1 thinking is fast and non-conscious while Type 2 thinking is slow and based on conscious reasoning, we propose a dual process architecture for safe AIP. We review research on how humans address the simplest non-trivial perception problem, image classification, and sketch a corresponding AIP architecture for this task. We argue that this architecture can provide a systematic way of addressing the limitations of AIP using DNNs and an approach to assurance of human-level performance and beyond. We conclude by discussing what components of the architecture may already be addressed by existing work and what remains future work.


page 1

page 2

page 3

page 4


Safety and Trustworthiness of Deep Neural Networks: A Survey

In the past few years, significant progress has been made on deep neural...

Risk-Based Safety Envelopes for Autonomous Vehicles Under Perception Uncertainty

Ensuring the safety of autonomous vehicles, given the uncertainty in sen...

Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving

Deep neural networks (DNNs) have accomplished impressive success in vari...

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

The last decade of machine learning has seen drastic increases in scale ...

Action-based Character AI in Video-games with CogBots Architecture: A Preliminary Report

In this paper we propose an architecture for specifying the interaction ...

Towards the Quantification of Safety Risks in Deep Neural Networks

Safety concerns on the deep neural networks (DNNs) have been raised when...

It's Raining Cats or Dogs? Adversarial Rain Attack on DNN Perception

Rain is a common phenomenon in nature and an essential factor for many d...


Artificial intelligence-based perception (AIP) using deep neural networks (DNN) has achieved remarkable performance. Yet as news reports can attest, AIP can fail in surprising and catastrophic ways. This highlights the fact that, currently, the level of safety assurance possible for AIP is insufficient to support the high levels of autonomy required for fully automated driving systems (ADS). In contrast, although human perception is imperfect, a status quo assumption by society is that the perception performance of a mature, unimpaired human is sufficient for the safe operation of a vehicle. Thus, achieving and assuring AIP performance against a human baseline would be necessary for a societally acceptable ADS and therefore a worthy goal.

In this paper, we take the position that this goal can be approached by studying how humans do perception and using this to construct a corresponding human-inspired AIP architecture. The idea of using humans as inspiration for AIP is not new. Many of the techniques of AI are based on human psychology or neurophysiology and this trend has accelerated in recent times (e.g.,  Suchan et al. (2021); Malowany and Guterman (2020); Yildirim et al. (2020)). Instead, our focus is specifically on how to use the connection to humans to support the goal of safety and its assurance.

To investigate this concretely, we consider the basic perception task of object image classification: does an image X depict a member of a given class C? We show that a human-inspired AIP architecture for this task can assurably address key limitations of current DNN-based AIP approaches while still leveraging the strengths of DNNs.

The remainder of the paper is structured according the following contributions: 1) we review research from the cognitive sciences on human object image classification; 2) we present a safe AIP architecture aligned with this work; and 3) we provide justification for the architecture from various perspectives including feasibility and assurability. Finally, we give conclusions.

How Humans (Probably) Do Classification

In this section, we review research from the cognitive sciences relevant to how humans do object image classification.

Dual Process Models

In the cognitive sciences, a dual process model of cognition is the dominant view Epstein (1994); Kahneman (2011); Evans and Stanovich (2013). Type 1 thinking is fast, non-conscious holistic, intuitive, and the same across different individuals. Type 2 thinking is slow, conscious, sequential conceptual reasoning that varies across individuals and is correlated with intelligence measures.

The dominant view on how the two types interact is default-interventionism Kahneman (2011); Evans and Stanovich (2013): the Type 1 process always produces some default response quickly, and the Type 2 process intervenes to produce a potentially different response only if “difficulty, novelty, and motivation combine to command the resources of working memory” Evans and Stanovich (2013). The Type 1 default response may be wrong—humans often act as “cognitive misers” by substituting a less accurate easy-to-evaluate characteristic for a harder one, leading to biases (e.g., stereotyping). An important metacognitive factor is the level of “confidence” in the default response. When people are confident, they are less likely to invoke the Type 2 process Thompson et al. (2011). Thus, low confidence is a key triggering factor for Type 2 intervention.

Time and risk play important roles. For fast (Type 1) binary perceptual decisions (less than 1,500 ms), research supports the idea that evidence accumulates over time until a threshold is reached and a decision is made. In addition, there is a speed/accuracy tradeoff. If speed is a priority then accuracy may be lower, while a focus on high accuracy slows the decision (e.g., Ratcliff and McKoon (2008)).

A natural criterion for choosing the priority is the perceived risk associated with the decision. Safety-critical decisions that must be made quickly, e.g., an object appears suddenly in front of the vehicle, prioritize speed. In this case, accuracy may suffer, and Type 2 intervention is not an option, because it is slow. This suggests that even inaccurate Type 1 decisions should be appropriately conservative to manage risk. For example, if there is not enough time to determine whether the object that suddenly appeared is a pedestrian or a cyclist, a safe response may be to assume that it is a pedestrian, since this suggests a more conservative behaviour.

Although this risk managing approach to Type 1 classification seems intuitive and prudent, it is difficult to support from research. The research on time pressure and human decision risk is focused on gambling contexts where it has been observed that when time pressure forces Type 1 decisions, these may be riskier decisions than if more time was available (e.g.,  Madan et al. (2015)). Since the risk in gambling contexts is measured in monetary terms, it is not clear how well these results transfer to fast safety-critical decision making.

A related well-researched area is the “choking-under-pressure” phenomenon exhibited by humans in high-stakes situations such as sporting events Yu (2015). One explanation proposed for this is that pressure induces people to consciously monitor their behaviour causing a switch from automatic and efficient (Type 1) behaviour to a slower controlled (Type 2) behaviour Baumeister (1984).

Object Image Classification

Specific to the object image classification task, two prominent lines of research from different perspectives are object recognition, studied in the neuropsychology of vision DiCarlo et al. (2012), and object categorization, studied predominately in cognitive psychology and cognitive linguistics Goldstone et al. (2018). Object recognition concerns the ability to assign labels to particular objects sensed by the retina, including precise identifying labels and coarser category labels. Object categorization is the more general cognitive process of grouping objects based on similar or shared features Goldstone et al. (2018). Note that the term “categorization” used in the cognitive sciences is synonymous with “classification” as used in AI contexts.

Vision processing in the brain has two major streams: the ventral stream is responsible for object recognition, whereas the dorsal stream is responsible for visually guided action. Recent research provides strong evidence that some Type 1 representation of a category is already in the ventral stream, expressed in terms of visual features, even though it is ultimately coded using more abstract (i.e., conceptual) features (Type 2) in downstream parts of the brain Bracci et al. (2017). The categorization in the ventral stream is fast, with a response time as little as 250-290 ms for some categories, confirmed by multiple studies Fabre-Thorpe (2011).

Humans are effective at recognizing objects under different confounding visual conditions, such as varying positions to the object, lighting, context, occlusion, etc. A key function of the ventral stream is to facilitate this ability by transforming object images into representations invariant to these conditions before further processing to categorize the object DiCarlo et al. (2012). Two theories dominate regarding the form of the invariant object representation. The structural description theory Biederman (1987) proposes a 3D parts-based representation, while in the view-based theory Poggio and Edelman (1990), objects are represented as a combination of a small set of particular 2D views that can be transformed to represent any other view. In this paper, we will refer to the transformation of an object image to an invariant representation as object normalization.

Although object categorization can be seen to be part of object recognition, the research tradition in this area is focused on theories about concepts — the mental representation of a category. As such it is applicable to both Type 1 and Type 2 processes. The classical rule-based theory of concepts extending back to Greek philosophers is that they consist of the necessary and sufficient conditions for membership in the category. This view has been much critiqued. For example, Wittgenstein observed that the requirement for a set of necessary conditions often does not hold due to presence of exceptions and famously illustrated this by attempting to find the necessary conditions for the category “game”. It is also inconsistent with empirical evidence obtained by Rosch Rosch (1973) that categories are graded, with some members more central or typical than others, having more of the common features. This led Rosch to propose that concepts are prototype-based with membership determined by degree of similarity to the prototype. Another dominant proposal supported by empirical evidence is that concepts are exemplar-based Medin and Schaffer (1978), where exemplars are specifically remembered examples of the category and membership is determined by collective similarity to all exemplars. Each approach has its strengths and weakness and more recently, the accepted view is that all of these approaches may be used in some combination Murphy (2016).

Both the prototype and exemplar approaches to object categorization depend on similarity judgement to compare the observed object image with stored representations. Research on human similarity judgement is extensive (See Goldstone and Son (2012) for a review). Four basic approaches have been proposed: geometric using a distance measure in a continuous space, feature-based aggregating the number of shared discrete features, alignment-based extending the feature-based approach to include relations (e.g., part-of) between features, and transformation-based based on the effort needed to transform one image into another.

Figure 1: Human inspired classification activity diagram.

An Assurable Human-Inspired Classification Architecture

Inspired by the research on object classification by humans presented above, in this section, we propose the high-level dual process architecture for classification shown in Fig. 1

. Here, we assume that the fast Type 1 processes are carried out by DNNs, while slower Type 2 processes use reasoning with symbolic AI. The input to the system is an object image from an upstream process (e.g., the first stage of an object detector). In alignment with processing in the ventral stream, the first step is object normalization to eliminate the confounding effects of visual conditions. Then, a DNN-based classifier of the invariant object representation generates a classification based on visual features. We assume that these classifiers use prototype/exemplar methods to align with the human representations of concepts. Furthermore, we assume they measure confidence in their classification decision.

If the result produced by fast classifiers is inadequate (e.g., too low confidence) and if there is available time, then a reasoning process can intervene to attempt to improve the result by exploiting conceptual knowledge about object classes. The reasoning process considers alternative classification hypotheses, then generates perceptual queries of the invariant object representation that could provide evidence to affirm or refute a hypothesis. We assume that these queries have yes/no answers and can all be viewed as classification problems111We limit the queries for simplicity, but more general Type 1 queries are be possible as well.; thus, we ground the Type 2 reasoning process in the Type 1 perceptual process Harnad (1990). Note that the a query is a recursive invocation (as indicated by the dashed arrows), since if the Type 1 process does not adequately answer a query, a Type 2 reasoning process can be invoked to intervene on it, and so on. Overall, the more time spent reasoning, the more this process can improve the quality of the classification by generating more potential hypotheses and by obtaining more evidence for hypotheses.

The Necessity of Dual Processes

We may reasonably ask whether the additional complexity of a dual process approach to classification is really necessary. After all, a DNN is a universal approximator and with sufficient training examples, it should get arbitrarily accurate. However, we argue that a pure DNN approach is intrinsically limited.

The classes of objects, such as, pedestrian, cyclist and car, that are relevant for perception by an ADS have the crucial characteristic that they are not primarily determined by visual features but rather by conceptual features. For example, something is a cyclist not because of how it looks (visual features), but because it exhibits conceptual features such as having one or more wheels, carrying human rider(s), being propelled by rider effort, etc. Assessing the presence of these features definitively may require arbitrary amounts of reasoning. This suggests that visual features are insufficient to correctly characterize these classes, and thus, a DNN trained on object images alone cannot ever achieve perfect accuracy, regardless of how many training examples are provided.

However, having a similar visual appearance for certain subsets of class instances is a common occurrence. This could be due to genetics (for “natural kinds”), design or fashion. For example, cyclists on bicycles have visual similarity but look different from cyclists on recumbent cycles who are visually similar to each other. When such clustering according to visual similarity is available, visual feature-based classifiers are useful approximators for these subclasses of instances. But even here, their performance is intrinsically limited as illustrated in Fig. 

2. It is always possible to find false negatives (FN)—unusual cyclists that fit the conceptual description but not the visual. On the other hand, we can also always find images that look like cyclists, but on careful inspection, do not satisfy the conceptual description, yielding false positives (FP).

Despite the inaccuracies of visual-feature based classifiers, the benefit is that they may be fast in comparison to a classifier based on reasoning about conceptual features. Thus, when a safety critical decision must be made quickly, a visual-feature based classifier is preferable. This suggests that an optimal classifier strategy should follow a dual approach, leveraging visual features for speed and conceptual features for accuracy when the time is available.

To further refine this conclusion, we must address an apparent paradox. The architecture in Fig. 1 shows that conceptual reasoning must ultimately be grounded in visual features (or, more generally, in features of available sense modalities). This is because evidence to support conceptual hypotheses about objects in the world can only be obtained through visual means—there is no way to directly access knowledge about these objects. Thus, all reasoning about conceptual features must be reducible to reasoning about visual features. However, if this is the case, then it would seem that visual features alone must be enough to characterize these classes, even if they are internally encoded in terms of conceptual features.

The way out of this apparent paradox is to acknowledge that, while individual queries about the object image issued by a Type 2 classifier are ultimately answered using visual features, each such query appeals to potentially different visual features and the scope of such queries is limited only by the size of the knowledge base. In contrast, the set of visual features used by a Type 1 classifier for a specific class is much smaller, focused on that class only. For example, if a bicycle is decorated with flowers attached to the frame, these may create enough of a visual distortion to cause an FP in a Type 1 cyclist classifier. However, the Type 2 conceptual reasoning process can potentially identify the presence of flowers (using a Type 1 flower classifier) and conclude that these do not affect the satisfaction of the conceptual definition of cyclist.

In this case, it is unlikely that the Type 1 cyclist classifier could ever learn to draw this conclusion because it would need to develop sensitivity to visual features about flowers. More generally, it would need to handle the visual features for every class in the knowledge base that could ever co-occur with a cyclist, which is likely to include most of the knowledge base. The dual process approach solves this scalability problem by delegating the job of ranging over the full span of world knowledge needed in the many varied, but rarer cases, to Type 2 classification and keeping Type 1 classification focused on typical class features.

Figure 2: Visual feature based classifiers are intrinsically limited.

Addressing Safety

We assume that the safety requirements of an object classification subsystem are refined from system level (e.g., ADS) safety requirements (see Salay et al. (2022) for a schema of such a refinement). This refinement identifies specific performance requirements of the subsystem needed to address different potential hazard scenarios. Since these requirements are system specific, for our proposed high-level architecture we instead consider the general implications of the high-level requirement that the subsystem provides performance at least as good as humans. In particular, the following three requirements are relevant and follow from the review of human classification.

Requirement 1.

The classification subsystem shall support accurate classification for both typical and atypical inputs.

Humans are able to effectively address both these types of inputs, and while it is well-known that DNN-based classifiers can achieve high accuracy on typical cases, they can often fail on unusual cases. As argued in the previous section, this is because DNN classifiers use visual features and these are only sufficient for characterizing subsets of class instances that cluster on visual similarity. These clusters identify visually prototypical class instances. However these only approximate the true class described by conceptual features, leading to both FNs and FPs for atypical cases. To correct these inevitable misperceptions by Type 1 classifiers, the architecture uses a Type 2 classifier based on conceptual reasoning.

The decision on when to invoke the Type 2 classifier is a crucial part of the architecture (i.e., the “good enough” decision in Fig. 1). One signal relevant here is a measure of the uncertainty (or conversely, confidence) in the Type 1 classifier result. Assume the Type 1 classification process produces a categorical distribution across classes and that this is calibrated— i.e., the value

for an input image accurately reflects the actual probability that

is the correct class of the image.

A true positive (TP) classification corresponds to sharp distribution with one class having high probability and the others low. A distribution close to uniform probability indicates high uncertainty and a potential FN representing a visually atypical instance (Fig. 2, top). A distribution in which a few classes dominate also represents higher uncertainty indicating atypical visual ambiguity and could signal a potential FP. For example, the bottom right example in Fig. 2 could have highest probability for Cyclist (causing an FP) but with the probability of Pedestrian a close second. A limitation of this approach for detecting FPs is that it may require a large number of classes. For example, the bottom left example in Fig. 2 would only be caught if there was a class BicycleRide.

Requirement 2.

The classification subsystem shall support classification for both fast and slow safety-critical decisions.

This requirement acknowledges that safety-critical decisions may occur over different time-frames. For example, an object appearing suddenly ahead of the ADS requires a fast response, whereas an object causing a traffic slowdown ahead allows for a slower response. When fast classification is required, the architecture assumes that this is provided by the Type 1 process alone, since Type 2 processes are too slow. For typical cases, this can provide assurably high levels of accuracy. A limitation of the architecture is that atypical cases may be misclassified by the Type 1 classifier and this can be a safety hazard in some situations. Uncertainty measurement of the Type 1 result, as discussed above, may play a mitigating role here by signaling to the driving policy when the classification may be incorrect and a conservative action should be taken to minimize risk.

In Salay et al. (2020), a systematic way to approach this with a quantifiable safety guarantee is proposed. A credible set of with confidence , is a smallest subset of classes such that their cumulative probability is not less than . Because the classifier is calibrated, the true class is in the credible set with probability at least . Thus, if the Type 1 classifier sends the credible set as its result to the driving policy, any action it produces that is safe for all the classes in the set will be safe at least % of the time.

For example, consider the bottom right image of a person walking their bicycle in Fig. 2. Assume the Type 1 classifier returns the categorical distribution Cyclist:, Pedestrian:, Car:. Simply selecting the class with the maximum probability would make the result Cyclist, but this is an FP—the true class is Pedestrian. However, we can be 95% sure that the true class is one of Cyclist, Pedestrian, which is the credible set for . If the driving policy chooses an action that is safe for both classes in this set, it will be safe at least 95% of the time. The cost is a potentially overly conservative action. A limitation of this approach is that it requires there to exist an action that is safe for every class in the credible set, which may not always be the case.

Requirement 3.

The classification subsystem shall support accurate classification in the presence of confounding visual conditions within the range tolerated by humans.

Humans are effective at ignoring conditions such as varying positions to the object, lighting, context, occlusion, etc. However, these kinds of variations have proven to be challenging for DNN-based classifiers and are the basis for many kinds of adversarial attacks. The architecture addresses this issue by introducing the object normalizer. The Type 1 classifiers operate on the invariant representation in which the confounding effects are mostly removed.

However, since the impact of confounding visual conditions is to introduce aleatoric uncertainty into the image, the effectiveness of object normalization and subsequent classification is ultimately limited by the amount of aleatoric uncertainty present. For example, a certain amount of lighting variation can be removed, but as the light gets lower, information loss increases until it is too great to discern the object in the image. There is, therefore, a limited range of tolerable visual conditions for both humans and machines. We require that the object normalizer + classifier combination operate at least within the human range. Methods for eliciting formal requirements representing such human-tolerable ranges have been recently proposed Hu et al. (2020, 2022).

Type 1/Type 2 Consistency

We should expect that some consistency relation holds between the Type 1 and Type 2 classifications, but what should it be? As discussed above, the Type 1 classification based on visual features is inherently limited—it may achieve high accuracy for typical cases but often produces FNs and FPs for atypical cases. Furthermore, recall that for humans, the interaction between the Type 1 and Type 2 processes is not a decision fusion of redundant perceptual processes, but rather that the Type 2 process intervenes to improve on the Type 1 result when necessary and possible. This relationship is inherited by the proposed architecture. Thus, the Type 2 classification is considered both to be authoritative and it must be no worse than the Type 1 classification. The latter condition suggests that when Type 1 is TP, then so must Type 2, but when Type 1 is FN or FP, Type 2 may be the same or TP.

Note that we do not assume the Type 2 classification is necessarily always TP even though it is considered authoritative, since its accuracy is still limited when excessive aleatoric uncertainty is present. Furthermore, the degree of improvement over the Type 1 classification is limited by the reasoning time available, richness and correctness of the conceptual knowledge base and accuracy of the Type 1 classifiers used to answer queries.

Until now we have been discussing the kind of classification consistency that must hold between the Type 1 and Type 2 classification. Another kind of consistency is risk consistency—how is the safety of the classifications related? If we assume that a correct classification is always at least as safe (i.e., leads to a driving policy action that is not more hazardous) as a misclassification, then our classification consistency requirement implies that, when time is not a safety-critical factor, the Type 2 classification is always at least as safe as the Type 1 classification.

However, not all misclassifications are unsafe. For example, misclassifying a pedestrian as a cyclist, when it is still far ahead, may not lead to different behaviour by an ADS. Thus, the hazardousness of a given misclassification is situation-dependent. Can this fact be exploited to produce a stronger risk consistency requirement? In an assurance case, a fine-grained analysis of hazardous patterns of misperceptions relevant in different driving scenarios can provide a correspondingly fine-grained and risk-aware set of performance requirements for the Type 1 classifiers Salay et al. (2022). Such a set of requirements identify the kinds of images that are more likely to cause hazardous actions if misclassified, thus the training of Type 1 classifiers can focus more on these.

Another approach to stronger risk consistency is based on the credible set approach to representing uncertainty discussed above. If a Type 1 classifier produces the credible set for a required level of confidence as output, then even though uncertainty is present, a driving policy can still perform a safe action, if one exists. In limited operational design domains, it may be possible to show that a safe action exists in every situation for any subset of classes. Thus, in such a restricted context we can satisfy the following additional risk consistency condition: the Type 1 classification will always be as safe as the Type 2 classification of the time. Note that, even here, a Type 2 classification is still preferable when time permits because the action based on an uncertain Type 1 classification is more conservative than necessary and may hamper other ADS objectives such as progress or comfort.


Although the proposed architecture is human-inspired, this alone is not sufficient to justify it. In this section, we validate the architecture by analyzing the feasibility and assurability of the components.

Feasibility of Architecture Components

We briefly review existing work that could address the requirements of architecture components.

Object Normalization

The field of computer graphics studies how to render object and image-taking specifications (e.g., 3D mesh, light sources, textures, camera position, etc.) into an object image. The problem of inverse graphics is how to produce such a specification from an object image; thus, it performs the task of object normalization. Solving the inverse graphics problem is active research and various recent approaches using neural networks have been proposed (e.g.,Yao et al. (2018); Deng et al. (2019); Yildirim et al. (2020)). The idea of capsule networks is a prominent approach Hinton et al. (2018) where the network learns an object class by decomposing into object parts and their structural relationships.

Another field that is relevant here is embodied AI Chrisley (2003). Humans learn about objects by engaging with them directly in the world. In this way, they automatically learn what aspects of their experience are irrelevant to tasks such as classification (e.g., their position relative to the object or the orientation of the object). Artificial agents may be able to obtain these same benefits if they learn in a similar way rather than being trained from a predefined dataset of static images Smith and Gasser (2005). To facilitate this, various simulation tools have been developed that allow artificial agents to roam and interact in a simulated world to learn about it directly Duan et al. (2021). An example of applying this to object normalization is to learn spatial invariants of object classification by approaching objects in different ways in the simulated world Caudell et al. (2011).

Type 1 Classification

An emerging trend for DNNs is dynamic inference where the DNN can exit early if needed Teerapittayanon et al. (2016). This can implement the speed/accuracy tradeoff observed in the ventral stream.

Classifiers that use DNNs are typically structured as a series of convolutional layers followed by fully connected layers. The lack of interpretability of these approaches limits their applicability as a Type 1 classifier when safety assurance is required. Alternative and interpretable DNN architectures based on prototype or exemplar approaches have recently been investigated and have shown positive results (e.g.,  Li et al. (2018); Hase et al. (2019); Papernot and McDaniel (2018)

). The paper “This looks like that: deep learning for interpretable image recognition” 

Chen et al. (2018) is good example of such architectures. Here, a classifier for different bird species is developed by learning for each class a set of prototypical image fragments taken from training images. Inference is then done by judging similarity of the learned prototypes to an input image and assigning the image to the class with the best fit.

Type 2 Classification

The kind of reasoning needed here is focused on explaining what the object image is. Thus, approaches to abductive reasoning are applicable. As discussed above, the classes used by ADSs often do not possess a common set of necessary conditions; thus, traditional monotonic logics may be inappropriate. Non-monotonic logics (e.g., default logic) have been developed to express class membership rules which allow exceptions. Case-based reasoning aligns well with exemplar-based categorization. Description logics have concepts as first class entities and have been extended to support prototype-based reasoning (e.g., Baader and Ecke (2016)). Reasoning using formalized “commonsense” theories provides a way to utilize human conceptual knowledge about various domains (e.g., physics of objects) Davis (2017); Suchan et al. (2021). Another line of research relevant here concerns formal executable models of conceptual categorization such as Dual PECCS Lieto et al. (2017) that incorporates both prototype and exemplar based reasoning.

Safety Assurance

Performance Comparison

An assurance argument regarding a human baseline must rely on some performance metrics for comparing component performance to the baseline. A naive way to proceed is to use one of the many performance metrics that have been proposed to compare the performance of different classifiers (e.g., accuracy, precision, F1-score). Such “generic” metrics are problematic for several reasons. First, such comparisons should be “species-fair” and not be biased by operational differences  Firestone (2020). For example, the retina is high resolution in the fovea but loses resolution and is color blind at the periphery. Thus, it sees an image differently than a DNN that gets an image as a uniform pixel grid. This difference can result in different classification accuracy of an image even if this has nothing to do with classification knowledge.

Second, comparisons should be risk-aware—performance differences in a context that is not safety relevant are not important. One way to achieve this is to define specialized perception performance metrics for different hazardous driving scenarios Salay et al. (2022). Finally, because generic metrics average performance over many trials, an AIP may obtain the same value as a human on the metric but still make, what to humans seem like unjustifiable errors (e.g., adversarial examples), undermining the assurance argument. To address this, performance measurements should be made for different difficulty categories for humans. In particular, cases that are easy for humans (e.g., variations due to confounding visual conditions) should also be easy for the AIP—adversarial examples violate this condition. Furthermore, the use of an error consistency metric is needed here, which measures the degree to which the AIP is making the same decision as a human on individual trials Geirhos et al. (2020). A high error consistency provides evidence that the AIP is following a similar strategy as the human in its classification decision. Note however that we are only interested in preserving strategies where humans make correct decisions and do not want to replicate their weaknesses.

Object Normalization

The object normalizer identifies where the confounding effects of visual conditions are explicitly addressed in the architecture. Thus, the assurance argument regarding robustness to adversarial cases focuses here. Furthermore, since we take human performance as a baseline, the performance of the normalizer needs only to be assured up to human tolerable bounds on these conditions (e.g., maximum level of fog after which human performance is inadequate). Methods for eliciting formal requirements representing such bounds, as well as corresponding testing criteria, have been recently proposed Hu et al. (2020, 2022).

A generic DNN-based object normalizer would be reusable for different classification tasks allowing any assurance effort to be amortized over all its applications. Thus, although not-interpretable, it could be subjected to increased and extensive testing scrutiny. In addition, this testing effort would be robust because it is not subject to distributional shift or dependencies on community-specific norms since “objecthood” is such a basic concept.

Techniques for formally verifying DNNs are being developed (e.g.,Liu et al. (2019)). Thus, formal verification may be a possible solution for invariances that can be expressed formally as object image transformations (e.g., affine transformations or injected Gaussian noise). Formalizable aspects of object normalization may also allow non-data-driven implementation amenable to traditional assurance practices.

Type 1 Classification

A significant positive impact of object normalization is to simplify the classification problem since the classifier needs only to learn the visual features of the class instances in an idealized setting. This reduces the size and diversity needed in the dataset to assure adequate sample coverage of the input distribution. It also improves generalization by reducing the likelihood of spurious correlations with noncausal features of the input.

Prototype/exemplar-based classifier approaches using DNNs provide interpretability by allowing human inspection of the prototypes/exemplars to determine whether they are meaningful. For example, in “This looks like that” discussed above, the prototype fragments of bird images can be inspected by birding experts to determine whether they are indicative of the classes they correspond to. This expert assessment provides evidence for correctness in the safety argument. Unlike the many post hoc explainability mechanisms that have been proposed for DNNs, such as saliency maps, interpretability provides the faithful explanations needed for assurance Rudin (2019).

Another potential benefit of prototype/exemplar-based classifier approaches is the alignment with how humans represent concepts. This could provide evidence that the classifier generalizes in the same way as humans—i.e., by judging similarity to prototypes (and/or exemplars) that have been validated as conforming to community or expert opinion. However, the validity of this “evidence from alignment” argument depends also on the alignment of the similarity metric used with how humans judge similarity. If generic object similarity judgement can be learned by a DNN, then, like an object normalizer, this is could be a reusable component that can be given a higher degree of testing scrutiny. However, there is mixed evidence about whether this is possible. Earlier studies show comparable performance for DNN-based similarity judgment relative to humans, but a more recent study found that DNNs cannot outperform humans when more complex categorical knowledge is needed to judge similarity Jozwik et al. (2017). This suggests that similarity judgement may itself be a perception task that requires a dual process treatment to achieve human-level performance.

Type 2 Classification

The knowledge base used by reasoning here is expressed in terms of human understandable concepts; therefore, it is interpretable and inspectable. This allows verification of alignment with community-specific consensus knowledge about object classes. Additionally, since reasoning is formal and based on a logic, evidence of internal consistency (i.e., soundness) and areas of (in)completeness of the knowledge base can be facilitated using formal methods.

The requirement of classification consistency imposes an important constraint between the knowledge at the Type 1 and Type 2 levels that must be verified as part of an assurance argument. Automatic cross-validation methods between the levels could facilitate this. For example, Type 2 reasoning could be used to label images with semantic information that is then used to train or test the Type 1 classifiers. Reasoning about the scope of conceptual knowledge used by Type 2 could form the basis for completeness claims about the Type 1 classifiers and the datasets used to train and test them.


Although imperfect, human perception performance is often assumed to serve as a minimum baseline for safety that a societally acceptable AIP must meet. However, it is widely known that while current state-of-the-art AIP has achieved high levels of performance using DNNs, they still fall short of this baseline. In this paper, we review research on how humans do the basic perception task of object classification. Then we propose a dual process architecture for a safety assurable object classification AIP aligned with the findings of this research. We discuss how such an architecture is both potentially feasible and assurable.

We plan on investigating several issues as part of future work. First, while this paper explores a dual processing architecture for classification, the ideas must be further developed for more general perception and decision making, potentially in a unified way. This should also go beyond a single modality like vision. When a fast and critical decision needs to be made, one may need to introduce additional sensing modalities. For example, tailpipe fumes on a cold day may appear in LiDAR like a potentially solid object, but a camera image can easily remove this ambiguity. Second, an interesting next step would be to develop a safety argument template that could be evolved and drive the development of concrete AIP architectures in a safety-first manner. Finally, a key limitation is still the challenge to be robust to and detect out-of-distribution (OOD) samples at the Type 1 level when it needs to be fast and we intend to explore this further (plus validating the hypothesis that Type 2 can refute Type 1 for OOD samples in the long run with sufficient accuracy). Perhaps neuroscience can be helpful here too by providing insights into how the brain deals with uncertainty and novelty. Ultimately, the lessons we can learn from the human brain may be the key to achieving assurable and societally acceptable AIP.


  • F. Baader and A. Ecke (2016) Reasoning with prototypes in the description logic using weighted tree automata. In Language and Automata Theory and Applications, pp. 63–75. Cited by: Type 2 Classification.
  • R. F. Baumeister (1984) Choking under pressure: self-consciousness and paradoxical effects of incentives on skillful performance.. Journal of personality and social psychology 46 (3), pp. 610. Cited by: Dual Process Models.
  • I. Biederman (1987) Recognition-by-components: a theory of human image understanding.. Psychological review 94 (2), pp. 115. Cited by: Object Image Classification.
  • S. Bracci, J. B. Ritchie, and H. O. de Beeck (2017) On the partnership between neural representations of object categories and visual features in the ventral visual pathway. Neuropsychologia 105, pp. 153–164. Cited by: Object Image Classification.
  • T. P. Caudell, C. T. Burch, M. Zengin, N. Gauntt, and M. J. Healy (2011) Retrospective learning of spatial invariants during object classification by embodied autonomous neural agents. In The 2011 International Joint Conference on Neural Networks, pp. 2135–2142. Cited by: Object Normalization.
  • C. Chen, O. Li, C. Tao, A. J. Barnett, J. Su, and C. Rudin (2018) This looks like that: deep learning for interpretable image recognition. arXiv preprint arXiv:1806.10574. Cited by: Type 1 Classification.
  • R. Chrisley (2003) Embodied artificial intelligence. Artificial intelligence 149 (1), pp. 131–150. Cited by: Object Normalization.
  • E. Davis (2017) Logical formalizations of commonsense reasoning: a survey. Journal of Artificial Intelligence Research 59, pp. 651–723. Cited by: Type 2 Classification.
  • B. Deng, S. Kornblith, and G. Hinton (2019) Cerberus: a multi-headed derenderer. arXiv preprint arXiv:1905.11940. Cited by: Object Normalization.
  • J. J. DiCarlo, D. Zoccolan, and N. C. Rust (2012) How does the brain solve visual object recognition?. Neuron 73 (3), pp. 415–434. Cited by: Object Image Classification, Object Image Classification.
  • J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan (2021) A survey of embodied ai: from simulators to research tasks. arXiv preprint arXiv:2103.04918. Cited by: Object Normalization.
  • S. Epstein (1994) Integration of the cognitive and the psychodynamic unconscious.. American psychologist 49 (8), pp. 709. Cited by: Dual Process Models.
  • J. S. B. Evans and K. E. Stanovich (2013) Dual-process theories of higher cognition: advancing the debate. Perspectives on psychological science 8 (3), pp. 223–241. Cited by: Dual Process Models, Dual Process Models.
  • M. Fabre-Thorpe (2011) The characteristics and limits of rapid visual categorization. Frontiers in psychology 2, pp. 243. Cited by: Object Image Classification.
  • C. Firestone (2020) Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences 117 (43), pp. 26562–26571. Cited by: Performance Comparison.
  • R. Geirhos, K. Meding, and F. A. Wichmann (2020) Beyond accuracy: quantifying trial-by-trial behaviour of cnns and humans by measuring error consistency. arXiv preprint arXiv:2006.16736. Cited by: Performance Comparison.
  • R. L. Goldstone, A. Kersten, and P. F. Carvalho (2018) Categorization and concepts. Stevens’ handbook of experimental psychology and cognitive neuroscience 3, pp. 275–317. Cited by: Object Image Classification.
  • R. L. Goldstone and J. Y. Son (2012) Similarity.. Oxford University Press. Cited by: Object Image Classification.
  • S. Harnad (1990) The symbol grounding problem. Physica D: Nonlinear Phenomena 42 (1-3), pp. 335–346. Cited by: An Assurable Human-Inspired Classification Architecture.
  • P. Hase, C. Chen, O. Li, and C. Rudin (2019) Interpretable image recognition with hierarchical prototypes. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7, pp. 32–40. Cited by: Type 1 Classification.
  • G. E. Hinton, S. Sabour, and N. Frosst (2018) Matrix capsules with EM routing. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: Object Normalization.
  • B. C. Hu, L. Marsso, K. Czarnecki, R. Salay, H. Shen, and M. Chechik (2022) If a human can see it, so should your system: reliability requirements for machine vision components. In 44th International Conference on Software Engineering (ICSE 2022), Note: To Appear Cited by: Addressing Safety, Object Normalization.
  • B. C. Hu, R. Salay, K. Czarnecki, M. Rahimi, G. Selim, and M. Chechik (2020)

    Towards requirements specification for machine-learned perception based on human performance

    In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 48–51. Cited by: Addressing Safety, Object Normalization.
  • K. M. Jozwik, N. Kriegeskorte, K. R. Storrs, and M. Mur (2017)

    Deep convolutional neural networks outperform feature-based but not categorical models in explaining object similarity judgments

    Frontiers in psychology 8, pp. 1726. Cited by: Type 1 Classification.
  • D. Kahneman (2011) Thinking, fast and slow. Macmillan. Cited by: Dual Process Models, Dual Process Models.
  • O. Li, H. Liu, C. Chen, and C. Rudin (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: Type 1 Classification.
  • A. Lieto, D. P. Radicioni, and V. Rho (2017) Dual peccs: a cognitive system for conceptual representation and categorization. Journal of Experimental & Theoretical Artificial Intelligence 29 (2), pp. 433–452. Cited by: Type 2 Classification.
  • C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, and M. J. Kochenderfer (2019) Algorithms for verifying deep neural networks. arXiv preprint arXiv:1903.06758. Cited by: Object Normalization.
  • C. R. Madan, M. L. Spetch, and E. A. Ludvig (2015) Rapid makes risky: time pressure increases risk seeking in decisions from experience. Journal of Cognitive Psychology 27 (8), pp. 921–928. Cited by: Dual Process Models.
  • D. Malowany and H. Guterman (2020) Biologically inspired visual system architecture for object recognition in autonomous systems. Algorithms 13 (7), pp. 167. Cited by: Introduction.
  • D. L. Medin and M. M. Schaffer (1978) Context theory of classification learning.. Psychological review 85 (3), pp. 207. Cited by: Object Image Classification.
  • G. L. Murphy (2016) Is there an exemplar theory of concepts?. Psychonomic bulletin & review 23 (4), pp. 1035–1042. Cited by: Object Image Classification.
  • N. Papernot and P. McDaniel (2018) Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765. Cited by: Type 1 Classification.
  • T. Poggio and S. Edelman (1990) A network that learns to recognize three-dimensional objects. Nature 343 (6255), pp. 263–266. Cited by: Object Image Classification.
  • R. Ratcliff and G. McKoon (2008) The diffusion decision model: theory and data for two-choice decision tasks. Neural computation 20 (4), pp. 873–922. Cited by: Dual Process Models.
  • E. H. Rosch (1973) Natural categories. Cognitive psychology 4 (3), pp. 328–350. Cited by: Object Image Classification.
  • C. Rudin (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206–215. Cited by: Type 1 Classification.
  • R. Salay, K. Czarnecki, M. S. Elli, I. J. Alvarez, S. Sedwards, and J. Weast (2020) PURSS: towards perceptual uncertainty aware responsibility sensitive safety with ml.. In SafeAI@ AAAI, pp. 91–95. Cited by: Addressing Safety.
  • R. Salay, K. Czarnecki, H. Kuwajima, H. Yasuoka, T. Nakae, V. Abdelzad, C. Huang, M. Kahn, and V. D. Nguyen (2022) The missing link: developing a safety case for perception components in automated driving. Note: SAE Technical Paper Cited by: Addressing Safety, Type 1/Type 2 Consistency, Performance Comparison.
  • L. Smith and M. Gasser (2005) The development of embodied cognition: six lessons from babies. Artificial life 11 (1-2), pp. 13–29. Cited by: Object Normalization.
  • J. Suchan, M. Bhatt, and S. Varadarajan (2021) Commonsense visual sensemaking for autonomous driving–on generalised neurosymbolic online abduction integrating vision and semantics. Artificial Intelligence 299, pp. 103522. Cited by: Introduction, Type 2 Classification.
  • S. Teerapittayanon, B. McDanel, and H. Kung (2016) Branchynet: fast inference via early exiting from deep neural networks. In

    2016 23rd International Conference on Pattern Recognition (ICPR)

    pp. 2464–2469. Cited by: Type 1 Classification.
  • V. A. Thompson, J. A. P. Turner, and G. Pennycook (2011) Intuition, reason, and metacognition. Cognitive psychology 63 (3), pp. 107–140. Cited by: Dual Process Models.
  • S. Yao, T. M. H. Hsu, J. Zhu, J. Wu, A. Torralba, W. T. Freeman, and J. B. Tenenbaum (2018) 3d-aware scene manipulation via inverse graphics. arXiv preprint arXiv:1808.09351. Cited by: Object Normalization.
  • I. Yildirim, M. Belledonne, W. Freiwald, and J. Tenenbaum (2020) Efficient inverse graphics in biological face processing. Science advances 6 (10), pp. eaax5979. Cited by: Introduction, Object Normalization.
  • R. Yu (2015) Choking under pressure: the neuropsychological mechanisms of incentive-induced performance decrements. Frontiers in behavioral neuroscience 9, pp. 19. Cited by: Dual Process Models.