Modeling Supervisor Safe Sets for Improving Collaboration in Human-Robot Teams

by   David L. McPherson, et al.
berkeley college

When a human supervisor collaborates with a team of robots, their attention is divided and cognitive resources are at a premium. We aim to optimize the distribution of these resources and the flow of attention. To this end, we propose the model of an idealized supervisor to describe human behavior. Such a supervisor employs a potentially inaccurate internal model of the the robots' dynamics to judge safety. We represent these safety judgements by constructing a safe set from this internal model using reachability theory. When a robot leaves this safe set, the idealized supervisor will intervene to assist, regardless of whether or not the robot remains objectively safe. False positives, where a human supervisor incorrectly judges a robot to be in danger, needlessly consume supervisor attention. In this work, we propose a method that decreases false positives by learning the supervisor's safe set and using that information to govern robot behavior. We prove that robots behaving according to our approach will reduce the occurrence of false positives for our idealized supervisor model. Furthermore, we empirically validate our approach with a user study that demonstrates a significant (p = 0.0328) reduction in false positives for our method compared to a baseline safety controller.



There are no comments yet.


page 3

page 4


SERoCS: Safe and Efficient Robot Collaborative Systems for Next Generation Intelligent Industrial Co-Robots

Human-robot collaborations have been recognized as an essential componen...

Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

An outstanding challenge with safety methods for human-robot interaction...

A Safety-Aware Kinodynamic Architecture for Human-Robot Collaboration

The new paradigm of human-robot collaboration has led to the creation of...

Towards safe human-to-robot handovers of unknown containers

Safe human-to-robot handovers of unknown objects require accurate estima...

Expectable Motion Unit: Avoiding Hazards From Human Involuntary Motions in Human-Robot Interaction

In robotics, many control and planning schemes have been developed that ...

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

The dynamic response of the legged robot locomotion is non-Lipschitz and...

Driving Through Ghosts: Behavioral Cloning with False Positives

Safe autonomous driving requires robust detection of other traffic parti...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction and Background

As automation becomes more pervasive throughout society, humans will increasingly find themselves interacting with autonomous and semi-autonomous systems. These interactions have the potential to multiply the productivity of humans workers, since it will become possible for a single human to supervise the behavior of multiple robotic agents. For example, a single human driver could manage a fleet of self-driving delivery robots, but the driver would only take full control for the “last mile,” guiding the robots to precisely deposit packages in environments where autonomous navigation may not be reliable. Human experts regularly serve as failsafe supervisors on factory assembly floors staffed with robotic arms [13]. Air traffic controllers soon will have to manage completely autonomous drones flying through their airspace alongside existing traditional mixed-autonomy planes and their auto-pilots [19].


Fig. 1: Top: if a robot’s behavior does not take into account a human supervisor’s notion of safety, the misaligned expectations can degrade team performance. Bottom: When a robot acts according to a human supervisor’s expectations, the supervisor can more easily predict the robot’s behavior.

While a human may be able to successfully exert direct control over a single robot, it becomes intractable for a human to directly control teams of robots (in fact, humans often benefit from automated assistance when controlling even a single robot, as discussed in the literature on assistive teleoperation [7, 15]). In order to manage the increased complexity of multi-robot teams, the human must be able to rely on increased autonomy from the robots, freeing the human to focus their attention only on those areas where they are most needed. Our goal is to model what grabs the supervisor’s attention in order to modify robot behavior to reduce the occurrence of distractions.

This project is inspired by work like Bajcsy et al [3] and Jain et al [14] that learn from supervisor interventions in a “coactive” learning framework. These works apply Learning from Demonstration techniques to the more challenging domain where the given data is just a correction from a trajectory rather than a full trajectory. The authors of [3]

posed this correction challenge in model-based framework that interprets the human’s signals as resulting from an optimization problem. This inverse optimization framework has also been used in Inverse Reinforcement Learning

[1, 20] which applies Inverse Optimal Control (as conceived of by Kalman [16]) to interpreting human trajectories. Our work applies the inverse optimization framework to learn from the supervisor’s decisions to intervene.

Results in cognitive science suggest that humans observing physical scenes can be modeled as performing a noisy “mental simulation” to predict trajectories [4, 18]. We posit that human supervisors utilize this same cognitive dynamic simulation to predict robot safety and intervene accordingly. Specifically, we theorize that the intervention behavior is driven by an internal “safe set” which we can attempt to reconstruct by observing supervisor interventions.

Safe sets are conceived from the Formal Methods notion of “Viability”. A set of states is “viable” if for every state in the set there exists a dynamic trajectory that stays within the set for all time. Reachability analysis calculates the largest viable set that doesn’t include any undesirable state configurations (e.g. collisions with obstacles, power overloads, etc). Since the set is viable, it is possible to guarantee that the dynamic system will always stay within the set and therefore stay safely away from the undesirable states. For this reason, viability kernels are often refered to as “safe sets”. Reachability can be used for robust path planning in dynamically changing environments [9] or working around multiple dynamic agents [5], and recent results have leveraged the technique to bound tracking error in order to generate dynamically feasible paths using simple planning algorithms [11].

Hoffman et al. used the safety guarantees of reachability analysis to engineer a multi-drone team that could automatically avoid collisions [12]. Similarly, Gillula could guarantee safety for learning algorithms by constraining their explorations to stay within the safe set [10]. Extending this, Akametalu and Tomlin [2] were able to guarantee safety while simultaneously learning and expanding the safe set. All of these controllers supervise otherwise un-guaranteed systems and intervene to maintain safety whenever the system threatens to leave the viable safe set. In this paper, we explore how this intervention behavior is similar to human supervision, and apply this to representing human safety concerns as safe sets in the state space.

Ii Supervisor Safe Set Control

Based on the success of cognitive dynamical models for explaining humans’ understanding of physical systems, we posit that human operators may have some notion of reachable sets which they employ to predict collisions or avoid obstacles. We propose a noisy idealized model to describe the behavior of the human supervisor of a robotic team, and we develop a framework for estimating the human supervisor’s mental model of a dynamical system based on observing their interactions with the team. We then propose a control framework that capitalizes on this learned information to improve collaboration in such human-robot teams.

Ii-a Preliminaries: Reachability for Safety

Consider a dynamical system with bounded input and bounded disturbance , given by


where and are compact. We let and denote the sets of measurable functions and , respectively, which represent possible time histories for the system input and disturbance. Given a choice of input and disturbance signals, there exists a unique continuous trajectory from any initial state which solves


where describes the evolution of the dynamical system [6].

Obstacles in the environment can be modeled as a “keep-out” set of states that the system must avoid. We define the safety of the system with respect to this set, such that the system is considered to be safe at state over time horizon as long as we can choose to guarantee that there exists no time for which . The task of maintaining the system’s safety over this interval can be modeled as a differential game between the control input and the disturbance. Consider an optimal control signal which attempts to steer the system away from and an optimal disturbance which attempts to drive the system towards . By choosing any Lipschitz payoff function which is negative-valued for and positive for , we can encode the outcome of this game via a value function characterized by the following Hamilton-Jacobi-Isaacs variational inequality [8]:


The value function that satisfies the above conditions is equal to for the trajectory with driven by an optimal control and an optimal disturbance . We can therefore find the set of states from which we cannot guarantee the safety of the system on the interval , also known as the backward-reachable set of over this interval. That is, for all initial states and feedback control polices , there exists some disturbance such that for some .

If there exists a non-empty controlled-invariant set that does not intersect , then we deem this set a “safe set” because there exists a feedback policy that guarantees that the system remains in , and thus out of , for all time. It follows from their properties that is the complement of , and the relationship between , , and is visualized in Fig. 2. Within a safe set , the value function becomes independent of as [8]. Because we focus on the case where the system is initialized to some safe state and we aim to maintain for all , we simplify notation by defining the terms and .

Fig. 2: Illustration of the relationship between a keep-out set , the derived backward-reachable set , and the resulting safe set . Note that , and is equal to the complement of . This illustration approximates the result obtained using the Dubins car dynamics given in (10).

One approach to guaranteeing the safety of the system is to apply a “minimally invasive” controller which activates on the zero level set of [10]. This approach allows complete flexibility of control as long as , and applies the optimal control to avoid when reaches the boundary of . We refer the interested reader to [10, 8] for a more thorough treatment of reachability and minimally invasive controllers.

Ii-B Noisy Idealized Supervisor Model

We define an idealized model of the supervisor of a robotic team whose responsibility it is to ensure that no robots collide with obstacles represented by the keep-out set . The idealized supervisor behaves as a minimally invasive controller as described in Section II-A. However, while the robotic team members’ true dynamics are given by the function as in (1), the supervisor possesses an internal model of the robots’ dynamics given by , which is not necessarily equal to the true dynamics. Following the differential game characterized by (3), the supervisor also possesses an internal value function and safe set which they use to evaluate the safety of each state in the environment. We allow for the possibility that the supervisor adds some amount of margin to their internal safe set, such that . Therefore, the idealized supervisor will always intervene when a robotic team member reaches the level set of , rather than the zero level set of the true . We further specify that the idealized supervisor is conservative: . This condition implies that the supervisor will never let a robot teammate leave the true safe set since . Additionally, we propose a noisy version of this idealized supervisor: the noisy idealized supervisor will intervene when they observe a robot reach the level set of , where is drawn from whenever a supervisor makes a safety judgement.

Ii-C Learning Safe Sets from Supervisor Interventions

We choose to model the human supervisor of a robotic team as approximating the behavior of the idealized supervisor model presented in Section II-B. That is, the human supervisor will allow the robots to perform their task however they choose, but intervene whenever they perceive that a robot is approaching an obstacle in the state space. Given this model, we can interpret the points at which the human intervenes as corresponding to the unknown level set of some value function , which characterizes the human’s mental safe set . Our goal is to use observations of human interventions to derive an estimated value function and which describe the observed behavior and induce an estimated . We approach this task by deriving a Maximum Likelihood Estimator (MLE) of the human’s mental safe set. If we assume that a human supervisor always intends to intervene at the level set of , but their ability to precisely intervene at this level is subject to Gaussian noise, either from observation error or variability in reaction time, then we can consider the value at an intervention point

as being drawn from a normal distribution centered at

(that is, )).

Given a proposed value function and a set of intervention points with corresponding values , we wish to estimate the most likely and

to explain these interventions. Gaussian distributions induce the following probability density function for a single observation


which leads to the following probability density for a set of independent observations


The likelihood of any estimated parameter values and being correct, given the observations and the proposed value function , is expressed as . It can be shown that the values of the unknown parameters and that maximize the likelihood function are given by


which are simply the mean and variance of the set of observations.

Notice that the estimates given by (6) are computed with respect to a given value function . If we were to assume that the human supervisor has a perfect model of the system dynamics, then we could simply set to equal the true of the system in (1), and would be the maximum likelihood estimate for the level at which the supervisor will intervene. However, it is unlikely that a human supervisor’s notion of the dynamics will correspond exactly to this model, and we would like to maintain the flexibility of estimating value functions that are not strictly derived from (1). To this end, we define the maximum likelihood of being the that produced our observations as . The value of is obtained by substituting the estimates from (6) into the probability density function from (5). That is, .

We seek the most likely value function to explain our observations, which will be the value function with the greatest maximum likelihood (the maximum over maxima)


where is the set of all possible value functions.

In order to make this optimization tractable, we can restrict ourselves to a set of value functions corresponding to a family of dynamics functions parameterized by , making the optimization in question


In practice, we may not be able to find an expression for the gradient of with respect to , since the value function is derived from the dynamics via the differential game given by (3). The lack of a gradient expression restricts the use of numerical methods to solve the problem as presented in (8). In these cases, we can compute a representative library of value functions corresponding to a set of representative parameter values (see Fig. 3 for an example library). The optimization then reduces to choosing the most likely value function from among this library

Fig. 3: Two dimensional slices of the zero level sets of the value functions from the library used for the experiment described in Section III. We used a family of Dubins car dynamics (see (10)) parametrized by . Notice that as decreases (the modeled control authority is decreased), the level sets extend farther away from the obstacle, indicating that a robot is expected to turn earlier to guarantee safety.

In order to ensure that the learned safe set is conservative, we can extend our MLE to a Maximum A Posteriori (MAP) estimator by incorporating our prior belief that, regardless of the safe set that the supervisor uses to generate interventions, they do not want the robots to be unsafe with respect to the true dynamics. In this case, we maintain a uniform prior that assigns equal probability to all whose zero sublevel sets are supersets of the zero sublevel set of the true , and zero probability to all other . In other words, we assume that the supervisor does not overestimate the agility of the robots, and in practice we can enforce this condition by choosing the library in (9) to only contain appropriate value functions. Moreover, regardless of the choice of , we assume that the supervisor intends to intervene before reaching the zero level set of , which always includes the boundary of . If we choose a prior that assigns zero probability to all non-positive and uniform probability elsewhere, it can be shown that the MAP estimates are obtained by letting equal and otherwise proceeding as before. Fig. 4 provides an example of this algorithm estimating a safe set from human supervisor intervention data.

Fig. 4: An example data set from the experiment described in Section III. The red circles represent the location of supervisor interventions, and the colored background represents the learned value function with contour lines shown in black. In this case, the learning algorithm chose a dynamics model parametrized by .

Ii-D Team Control with Learned Safe Sets

We propose that safe sets learned according to the approach in Section II-C can be used to create effective control laws for the robotic members of human-robot teams. Recall our model of the human supervisor of a robotic team: the supervisor must rely on each robot’s autonomy to complete the majority of their tasks unassisted, but the supervisor may intervene to correct a robot’s behavior when necessary (such as by avoiding an imminent collision with the keep-out set ). We put forth that in the scenario where the human intervenes to prevent a collision, they do so because they observe that a robot has violated the boundaries of their mental safe set .

Now, consider a team of robots navigating an unknown environment, and which are able to avoid any obstacles that they detect. One approach to safely automating this team is to have each robot behave according to a minimally invasive control law: the robots are allowed to follow trajectories generated by any planning algorithm, so long as they remain within , the reachable set computed using the baseline dynamics model (1) with associated value function . Whenever these robots detect an obstacle, they add it to the keep-out set , thus modifying and . If a robot reaches the boundary of , it applies the optimal control to avoid until it has cleared the obstacle. However, it is possible that a robot does not detect an obstacle, and a human supervisor must intervene to ensure robot safety.

As stated above, the human supervisor will intervene when a robot reaches the boundary of , not the boundary of . This discrepancy leads to the possibility that the supervisor will intervene when the robot reaches some state , even if the robot would have avoided the obstacle without intervention. These situations arise whenever but . These “false positive” interventions represent unnecessary work for the human supervisor, and we seek to eliminate them in order to improve the human’s experience and the team’s overall performance.

We propose using a safe set learned from previous observations of supervisor interventions, as outlined in Section II-C, as a substitute for in the robots’ minimally invasive control law. By estimating the human’s internal safe set, we take advantage of the following property:


For an idealized supervisor collaborating with a team of robots as described in Section II-D, if the robots avoid detected obstacles by applying an optimally safe control at the boundary of safe set , then if the supervisor plans to intervene because they observe for robot , the supervisor can infer that robot has not detected an obstacle and any supervisor intervention will not be a false positive.


The proof of this property follows constructively from the definitions of safe set, idealized supervisor, and false positive. If robot had correctly detected an obstacle and adjusted its representation of accordingly, then it would have applied the optimal control to remain within the supervisor’s safe set. Therefore, if the supervisor is able to observe that robot has left , it must be the case that the robot has not detected the obstacle. False positives are defined to be supervisor interventions that occur when a robot has detected an obstacle but the supervisor still intervenes. In this case, the supervisor can correctly infer that robot has not detected an obstacle, so any intervention at this point cannot be a false positive. ∎

For an idealized supervisor, as becomes an arbitrarily good approximation of , the number of false positive interventions will approach zero. For a noisy idealized supervisor, the supervisor will intervene whenever where . This noise will continue to produce false positives, even with a perfect fit , if the robots apply the optimally safe control at the -level set of . Instead, the level set where the optimally safe control is applied can be raised arbitrarily high to drive the false positive rate to zero. For example, is sufficient to avoid over 97% of intervention states used for learning, in expectation. We test the efficacy of our approach through the human-subjects experiment described in Section III.

Iii Experimental Design for User Validation

Our goal in understanding and modeling the supervisor’s conception of safety is to improve team performance by decreasing cognitive overload. Although we have based our human modeling on the cognitive science literature, we do not intend to verify humans’ exact cognitive processes. Instead, we aim to apply our inspiration from cognitive science toward building better human-robot teams. To this end, our hypotheses are:

Iii-1 H1

Representing supervisor behavior as cognitive keep-out sets allows intervention signals to be distilled into an actionable rule which will decrease supervisory false positives and cognitive strain, thereby increasing team performance and trust.

Iii-2 H2

Fitting danger-avoidance behavior to a supervisor’s beliefs is preferable to generic conservative behavior.

In our experiment, we gather supervisor intervention data, fit our model to the data, and then run a human-robot teaming task that assesses performance.

Fig. 5: Safe sets tested in experiment (illustrated by their complementary reachable set): (left) Standard safe set (calculated from true dynamics and obstacle size), (middle) example Learned safe set (calculated from fitted supervisory perception of dynamics and obstacle size), (right) Conservative safe set (calculated from true dynamics and inflated obstacle size)

Iii-a Procedure

Our experiment applies the idealized supervisor theory and learning algorithm to supervising simulated robots. The robots moved according to the Dubins car model:


The experiment is divided into three phases. In Phase I, the subject is given an opportunity to familiarize themselves with the robotic system’s dynamics. The user is allowed to directly apply the full range of controls through the computer keyboard for one minute. After ensuring the user has some experience from which to build an internal dynamics model, we then assess their emergent conception of safety. In Phase II, supervisory data is extracted from the subject by showing them scenes where the robot is driving towards an obstacle, and the supervisor decides where to intervene to avoid a crash. This intervention data is then fed into our algorithm (described in Section II-C) that extracts the best fitting safe set. Our estimator used a library of candidate dynamics functions parameterized by values of between 0 and 3, as shown in Fig. 3. In this experiment, we enforced conservativeness by excluding subjects whose Learned sets were not supersets of the Standard safe set, rather than enforcing a prior directly on . The Learned safe set is assessed in Phase III against two fixed safe sets (see Fig. 5) pre-calculated from the true dynamic equations.

Fig. 6: Screenshot of the task from Phase III of the experiment. Robotic vehicles make trips back and forth across the screen, detecting and avoiding each obstacle with 80% probability. The human supervisor must remove an obstacle in the event that it is undetected, but must infer this information from the robots’ motion.

These safe sets were calculated using Hamilton-Jacobi reachability as described in Section II-A using the Level Set Toolbox [17] for MATLAB. During this final phase, the subject sequentially supervises homogeneous teams of robots, each team avoiding obstacles based on one of the three assessed safe sets. Ten randomly placed obstacles are strewn about the screen impeding the robots’ autonomous trips back and forth across the screen (see Fig. 6). Although robots will detect and avoid an obstacles in 80% of their interactions with it, there is a 20% chance that the robot will not detect an obstacle as it approaches. The subject is charged with catching these random failures and removing an obstacle before the robot crashes. Crashing is disincentivized by decrementing an on-screen “score” counter. Removing an obstacle costs only half of what a crash costs the player. This system encourages saving the robot but not guessing wildly. Moreover, simply clearing out all obstacles is not a viable strategy because every obstacle removed generates a new obstacle elsewhere. This score mechanism was also used to make the participant invested in team success by awarding points every time a robot completes a trip across the screen.

Iii-B Independent Variables

To assess our hypotheses, we manipulate the safe set used between team supervision trials. We exposed the human subject to three teams, each driving using one of three safe sets. The Learned set is derived from Phase II supervisor intervention observations as described in Section II, using . The two baseline kernels are calculated using Hamilton-Jacobi-Isaacs reachability on the true dynamic equations. The Standard set is calculated using the true obstacle size. The Conservative set adds a buffer that doubles the effective size of the obstacle, inducing trajectories that give obstacles a wide berth.

Iii-C Dependent Measures

Iii-C1 Objective Measures

The team was tasked with making trips across the screen to reach randomized goals. The robots’ task was to travel across the screen, safely dodging obstacles along the way, while the human was tasked with supervising as a failsafe to remove an obstacle if the robots should fail to observe and avoid it.

Team performance was quantified using three objective metrics: number of trips completed, number of supervisory interventions, and the number of obstacle collisions. These metrics were presented to the subject as an aggregated, arcade-style score. To incentivize participants to only intervene when necessary, obstacle-removal interventions reduced the score, but only by half as much as an obstacle collision.

The number of interventions taken by the supervisor can also serve as a proxy measurement to quantify the amount of cognitive strain they experience while working with the robotic team. Of particular note are the number of interventions that were not actually required, as the supervisor incorrectly judged that a robot had not detected an obstacle. These false positives needlessly drain supervisor attention and indicate a lack of trust in the system. We aim to increase the human’s trust in the system, which we quantify by a decrease in these false positives.

Iii-C2 Subjective Measures

After each round of pairwise comparison (completing the task with two different robotic teams), we presented the subject with a questionnaire to gauge how the choice of safe set impacted their experience. These questionnaires contained statements about each team that subjects would respond to using a 7-point Likert scale (1 - Strongly Disagree, 7 - Strongly Agree). These statements were designed to measure Trust, Perceived Performance, Interpretability, Confidence, Team Fluency, and overall Preference between the teams in the comparison.

Iii-D Subject Allocation

The subject population consisted of 6 male, 5 female, and 1 non-binary participants between the ages of 18-29. We used a within-subjects design where each subject was asked to complete all three possible pairwise comparisons of our three treatments (the safe sets used). We used a balanced Latin Square design for the order of comparisons, with no treatment being first in a pair twice. Furthermore, we generated six randomized versions of the task so that subjects were presented with a different version of the task for each trial across the three pairwise comparisons. To avoid coupling the treatment results to a particular version of the task, each treatment was paired with each task version an equal number of times across our subject population.

Iv Analysis and Discussion

Iv-a H1: False Positive Reduction over Standard

Our first hypothesis is that a Learned safe set that reflects the supervisor’s intervention behavior would decrease the number of false positives compared to the Standard safe set. To test this, we performed a one-way repeated measures ANOVA on the number of supervisory false positives from Phase III of the experiment with safe set as the manipulated factor. A false positive was any supervisor intervention where the removed obstacle was actually detected by all nearby robots, which would have avoided it successfully. The robot team’s safe set had a significant effect on the number of supervisory false positives (, ). An all-pairs post-hoc Tukey method found that the Learned safe set significantly decreased () false positives over the Standard safe set, but there was no significant difference between the Learned safe set and the Conservative safe set (which also significantly decreased false positives over the Standard safe set, with ). These results support our main hypothesis that representing supervisor behavior as cognitive keep-out sets allows intervention signals to be distilled into an actionable rule which will decrease supervisory false positives.

Fig. 7: Average number of false positives per trial plotted against the three safe set types. There were significant differences between Standard and Learned () and between Standard and Conservative (). There was no significant difference between Learned and Conservative.

The second half of that hypothesis, that decreasing supervisory false positives will increase trust and team performance was not shown conclusively from our data. We performed a one-way, repeated measures ANOVA on the pairwise comparison surveys between the teams using the Learned and the Standard safe sets. Measures of trust showed no significant improvement (, ).

Iv-B H2: Preference over Conservative

For 9 of 11 participants, the Learned safe set had shorter avoidance arcs than the Conservative set. We hypothesized that this greater efficiency would make the tailored conservativeness of the Learned set preferable to the baseline Conservative safe set. However, a t-test showed that the survey responses for preference were statistically indistinguishable (

) from a neutral score: an inconclusive result for Hypothesis 2. We believe that this result stems from users judging preference more on intelligibility, the ease of avoiding false positives, than on efficiency, the shortness of paths. As discussed in Section IV-A, both the Learned and Conservative safe sets led to significant false positive reductions over the Standard set.

Fig. 8: Regressed safe sets (viewed on the slice) from supervisor intervention data overlaid on baselines. Three users’ safe sets clustered to arcing like the Standard safe set. Three others clustered to arcing like the Conservative safe set. The final five safe sets exhibit a distinct behavior that reflects supervisors’ preference for gradual, pre-emptive arcs.

This indistinguishability is further compounded since a preference for intelligibility seems to be expressed by some subjects in their Phase II intervention data, resulting in their Learned safe sets having similar arcs as the Conservative safe set (see Fig. 8). Future work could investigate this efficiency-intelligibility trade-off further by using a conservative baseline that is distinguishably more conservative than user safe sets and by making efficiency more central to the team task.

Iv-C Model Validity

The statistically significant decreases in false positives observed in Phase III agree with the decreases predicted by the supervisor model based on intervention data from Phase II. Our model posits that interventions occur at states noisily distributed about a safe set boundary. Therefore, it predicts that the empirical distribution of Phase II intervention states contained within a proposed safe set (see Fig. 9) will mirror the proportion of false positive interventions observed in Phase III: if states are deemed safe by the controller, they will not be avoided, even when the noisy supervisor would judge them to be unsafe. Since the Learned safe set controller intervenes at the level set (see Section II-C), exactly half the intervention states will be contained within the Learned safe set in expectation. The model’s predictions are compared against observed false positives in Table 1.

Fig. 9: Empirical distribution of intervention states observed during data collection (Phase II of the experiment). The interventions within the Conservative reachable set are colored in red, leaving 115 interventions in the corresponding safe set. Similarly, the interventions within the Standard reachable set are colored darker, leaving 397 interventions in the corresponding safe set. Intervention states not contained within a reachable set would have generated a false positive during the human-robot teaming task.
Interventions Predicted Average Observed
in Safe Set F.P. vs Std. F.P. F.P. vs Std.
Standard 397 / 440 100% 12.54 100%
Learned 220 / 440 55.4% 7.31 58.3%
Conservative 115 / 440 29% 4.68 37.3%

Table 1: Predicted and observed false positives. Left: Predicted false positives from Phase II data. Right: Observed false positives in Phase III.

V Conclusion

Automation with human supervisors relies on leveraging the human supervisor’s cognitive resources for success. Respecting these resources is essential for creating well performing human-robot teams. It is especially important to avoid overtaxing the human as automated teams continue to scale up, and a single human worker both accomplishes more and bears more cognitive load than ever. To alleviate this burden, we can decrease the number of issues that command the supervisor’s attention by reducing false positives. By modeling which system states command supervisory attention, we can program autonomous systems to avoid those states when they do not require attention. To capture this information, we combine the concept of mental simulation from cognitive science with formal safety analysis from reachability theory to propose the noisy idealized supervisor model. We employ the noisy idealized supervisor as the generative model in a learning algorithm to predict supervisor safety judgements, and we present a safety controller for robotic agents that respects the supervisor’s perception of safety. This safety controller is guaranteed to reduce false positives for idealized supervisors. Furthermore, for actual supervisors, our human-robot teaming user study demonstrated a significant reduction in false positives when using our approach compared to the standard baseline.

Our results show that it is possible to reduce false positives, and thus cognitive load, by aligning robot behavior with humans’ expectations. Our approach is applicable whenever reachability theory can tractably analyze a dynamical system that will be subject to human safety judgements. Future work will explore the impact of this framework on application domains from air traffic management to self-driving vehicles.


  • [1] Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In

    Proceedings of the twenty-first international conference on Machine learning

    , page 1. ACM, 2004.
  • [2] Anayo K Akametalu and Claire J Tomlin. Temporal-difference learning for online reachability analysis. In Control Conference (ECC), 2015 European, pages 2508–2513. IEEE, 2015.
  • [3] Andrea Bajcsy, Dylan P Losey, Marcia K O’Malley, and Anca D Dragan. Learning robot objectives from physical human interaction. In Conference on Robot Learning, pages 217–226, 2017.
  • [4] Peter W Battaglia, Jessica B Hamrick, and Joshua B Tenenbaum.

    Simulation as an engine of physical scene understanding.

    Proceedings of the National Academy of Sciences, 110(45):18327–18332, 2013.
  • [5] Mo Chen, Jaime F Fisac, Shankar Sastry, and Claire J Tomlin. Safe sequential path planning of multi-vehicle systems via double-obstacle hamilton-jacobi-isaacs variational inequality. In Control Conference (ECC), 2015 European, pages 3304–3309. IEEE, 2015.
  • [6] Earl A Coddington and Norman Levinson.

    Theory of ordinary differential equations

    Tata McGraw-Hill Education, 1955.
  • [7] Anca D Dragan and Siddhartha S Srinivasa. A policy-blending formalism for shared control. The International Journal of Robotics Research, 32(7):790–805, 2013.
  • [8] Jaime F. Fisac, Anayo K. Akametalu, Melanie Nicole Zeilinger, Shahab Kaynama, Jeremy H. Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. CoRR, abs/1705.01292, 2017.
  • [9] Jaime F Fisac, Mo Chen, Claire J Tomlin, and S Shankar Sastry. Reach-avoid problems with time-varying dynamics, targets and constraints. In Proceedings of the 18th international conference on hybrid systems: computation and control, pages 11–20. ACM, 2015.
  • [10] Jeremy H. Gillula, Gabriel M. Hoffmann, Haomiao Huang, Michael P. Vitus, and Claire J. Tomlin. Applications of hybrid reachability analysis to robotic aerial vehicles. The International Journal of Robotics Research, 30(3):335–354, 2011.
  • [11] Sylvia L Herbert, Mo Chen, SooJean Han, Somil Bansal, Jaime F Fisac, and Claire J Tomlin. Fastrack: a modular framework for fast and guaranteed safe motion planning. arXiv preprint arXiv:1703.07373, 2017.
  • [12] Gabriel M Hoffmann and Claire J Tomlin. Decentralized cooperative collision avoidance for acceleration constrained vehicles. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 4357–4363. IEEE, 2008.
  • [13] Sheue-Ling Hwang, Woodrow Barfield, Tien-Chen Chang, and Gavriel Salvendy. Integration of humans and computers in the operation and control of flexible manufacturing systems. The International Journal of Production Research, 22(5):841–856, 1984.
  • [14] Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. Learning preferences for manipulation tasks from online coactive feedback. The International Journal of Robotics Research, 34(10):1296–1313, 2015.
  • [15] Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S Srinivasa, and J Andrew Bagnell. Shared autonomy via hindsight optimization for teleoperation and teaming. arXiv preprint arXiv:1706.00155, 2017.
  • [16] Rudolf Emil Kalman. When is a linear control system optimal? Journal of Basic Engineering, 86(1):51–60, 1964.
  • [17] Ian M Mitchell. A toolbox of level set methods. Dept. Comput. Sci., Univ. British Columbia, Vancouver, BC, Canada, http://www. cs. ubc. ca/~ mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR-2004-09, 2004.
  • [18] Kevin A Smith and Edward Vul. Sources of uncertainty in intuitive physics. Topics in cognitive science, 5(1):185–199, 2013.
  • [19] Claire J Tomlin. Towards automated conflict resolution in air traffic control1. IFAC Proceedings Volumes, 32(2):6564–6569, 1999.
  • [20] Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In AAAI, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.