I Introduction and Background
As automation becomes more pervasive throughout society, humans will increasingly find themselves interacting with autonomous and semi-autonomous systems. These interactions have the potential to multiply the productivity of humans workers, since it will become possible for a single human to supervise the behavior of multiple robotic agents. For example, a single human driver could manage a fleet of self-driving delivery robots, but the driver would only take full control for the “last mile,” guiding the robots to precisely deposit packages in environments where autonomous navigation may not be reliable. Human experts regularly serve as failsafe supervisors on factory assembly floors staffed with robotic arms . Air traffic controllers soon will have to manage completely autonomous drones flying through their airspace alongside existing traditional mixed-autonomy planes and their auto-pilots .
While a human may be able to successfully exert direct control over a single robot, it becomes intractable for a human to directly control teams of robots (in fact, humans often benefit from automated assistance when controlling even a single robot, as discussed in the literature on assistive teleoperation [7, 15]). In order to manage the increased complexity of multi-robot teams, the human must be able to rely on increased autonomy from the robots, freeing the human to focus their attention only on those areas where they are most needed. Our goal is to model what grabs the supervisor’s attention in order to modify robot behavior to reduce the occurrence of distractions.
This project is inspired by work like Bajcsy et al  and Jain et al  that learn from supervisor interventions in a “coactive” learning framework. These works apply Learning from Demonstration techniques to the more challenging domain where the given data is just a correction from a trajectory rather than a full trajectory. The authors of 
posed this correction challenge in model-based framework that interprets the human’s signals as resulting from an optimization problem. This inverse optimization framework has also been used in Inverse Reinforcement Learning[1, 20] which applies Inverse Optimal Control (as conceived of by Kalman ) to interpreting human trajectories. Our work applies the inverse optimization framework to learn from the supervisor’s decisions to intervene.
Results in cognitive science suggest that humans observing physical scenes can be modeled as performing a noisy “mental simulation” to predict trajectories [4, 18]. We posit that human supervisors utilize this same cognitive dynamic simulation to predict robot safety and intervene accordingly. Specifically, we theorize that the intervention behavior is driven by an internal “safe set” which we can attempt to reconstruct by observing supervisor interventions.
Safe sets are conceived from the Formal Methods notion of “Viability”. A set of states is “viable” if for every state in the set there exists a dynamic trajectory that stays within the set for all time. Reachability analysis calculates the largest viable set that doesn’t include any undesirable state configurations (e.g. collisions with obstacles, power overloads, etc). Since the set is viable, it is possible to guarantee that the dynamic system will always stay within the set and therefore stay safely away from the undesirable states. For this reason, viability kernels are often refered to as “safe sets”. Reachability can be used for robust path planning in dynamically changing environments  or working around multiple dynamic agents , and recent results have leveraged the technique to bound tracking error in order to generate dynamically feasible paths using simple planning algorithms .
Hoffman et al. used the safety guarantees of reachability analysis to engineer a multi-drone team that could automatically avoid collisions . Similarly, Gillula could guarantee safety for learning algorithms by constraining their explorations to stay within the safe set . Extending this, Akametalu and Tomlin  were able to guarantee safety while simultaneously learning and expanding the safe set. All of these controllers supervise otherwise un-guaranteed systems and intervene to maintain safety whenever the system threatens to leave the viable safe set. In this paper, we explore how this intervention behavior is similar to human supervision, and apply this to representing human safety concerns as safe sets in the state space.
Ii Supervisor Safe Set Control
Based on the success of cognitive dynamical models for explaining humans’ understanding of physical systems, we posit that human operators may have some notion of reachable sets which they employ to predict collisions or avoid obstacles. We propose a noisy idealized model to describe the behavior of the human supervisor of a robotic team, and we develop a framework for estimating the human supervisor’s mental model of a dynamical system based on observing their interactions with the team. We then propose a control framework that capitalizes on this learned information to improve collaboration in such human-robot teams.
Ii-a Preliminaries: Reachability for Safety
Consider a dynamical system with bounded input and bounded disturbance , given by
where and are compact. We let and denote the sets of measurable functions and , respectively, which represent possible time histories for the system input and disturbance. Given a choice of input and disturbance signals, there exists a unique continuous trajectory from any initial state which solves
where describes the evolution of the dynamical system .
Obstacles in the environment can be modeled as a “keep-out” set of states that the system must avoid. We define the safety of the system with respect to this set, such that the system is considered to be safe at state over time horizon as long as we can choose to guarantee that there exists no time for which . The task of maintaining the system’s safety over this interval can be modeled as a differential game between the control input and the disturbance. Consider an optimal control signal which attempts to steer the system away from and an optimal disturbance which attempts to drive the system towards . By choosing any Lipschitz payoff function which is negative-valued for and positive for , we can encode the outcome of this game via a value function characterized by the following Hamilton-Jacobi-Isaacs variational inequality :
The value function that satisfies the above conditions is equal to for the trajectory with driven by an optimal control and an optimal disturbance . We can therefore find the set of states from which we cannot guarantee the safety of the system on the interval , also known as the backward-reachable set of over this interval. That is, for all initial states and feedback control polices , there exists some disturbance such that for some .
If there exists a non-empty controlled-invariant set that does not intersect , then we deem this set a “safe set” because there exists a feedback policy that guarantees that the system remains in , and thus out of , for all time. It follows from their properties that is the complement of , and the relationship between , , and is visualized in Fig. 2. Within a safe set , the value function becomes independent of as . Because we focus on the case where the system is initialized to some safe state and we aim to maintain for all , we simplify notation by defining the terms and .
One approach to guaranteeing the safety of the system is to apply a “minimally invasive” controller which activates on the zero level set of . This approach allows complete flexibility of control as long as , and applies the optimal control to avoid when reaches the boundary of . We refer the interested reader to [10, 8] for a more thorough treatment of reachability and minimally invasive controllers.
Ii-B Noisy Idealized Supervisor Model
We define an idealized model of the supervisor of a robotic team whose responsibility it is to ensure that no robots collide with obstacles represented by the keep-out set . The idealized supervisor behaves as a minimally invasive controller as described in Section II-A. However, while the robotic team members’ true dynamics are given by the function as in (1), the supervisor possesses an internal model of the robots’ dynamics given by , which is not necessarily equal to the true dynamics. Following the differential game characterized by (3), the supervisor also possesses an internal value function and safe set which they use to evaluate the safety of each state in the environment. We allow for the possibility that the supervisor adds some amount of margin to their internal safe set, such that . Therefore, the idealized supervisor will always intervene when a robotic team member reaches the level set of , rather than the zero level set of the true . We further specify that the idealized supervisor is conservative: . This condition implies that the supervisor will never let a robot teammate leave the true safe set since . Additionally, we propose a noisy version of this idealized supervisor: the noisy idealized supervisor will intervene when they observe a robot reach the level set of , where is drawn from whenever a supervisor makes a safety judgement.
Ii-C Learning Safe Sets from Supervisor Interventions
We choose to model the human supervisor of a robotic team as approximating the behavior of the idealized supervisor model presented in Section II-B. That is, the human supervisor will allow the robots to perform their task however they choose, but intervene whenever they perceive that a robot is approaching an obstacle in the state space. Given this model, we can interpret the points at which the human intervenes as corresponding to the unknown level set of some value function , which characterizes the human’s mental safe set . Our goal is to use observations of human interventions to derive an estimated value function and which describe the observed behavior and induce an estimated . We approach this task by deriving a Maximum Likelihood Estimator (MLE) of the human’s mental safe set. If we assume that a human supervisor always intends to intervene at the level set of , but their ability to precisely intervene at this level is subject to Gaussian noise, either from observation error or variability in reaction time, then we can consider the value at an intervention point
as being drawn from a normal distribution centered at(that is, )).
Given a proposed value function and a set of intervention points with corresponding values , we wish to estimate the most likely and
which leads to the following probability density for a set of independent observations
The likelihood of any estimated parameter values and being correct, given the observations and the proposed value function , is expressed as . It can be shown that the values of the unknown parameters and that maximize the likelihood function are given by
which are simply the mean and variance of the set of observations.
Notice that the estimates given by (6) are computed with respect to a given value function . If we were to assume that the human supervisor has a perfect model of the system dynamics, then we could simply set to equal the true of the system in (1), and would be the maximum likelihood estimate for the level at which the supervisor will intervene. However, it is unlikely that a human supervisor’s notion of the dynamics will correspond exactly to this model, and we would like to maintain the flexibility of estimating value functions that are not strictly derived from (1). To this end, we define the maximum likelihood of being the that produced our observations as . The value of is obtained by substituting the estimates from (6) into the probability density function from (5). That is, .
We seek the most likely value function to explain our observations, which will be the value function with the greatest maximum likelihood (the maximum over maxima)
where is the set of all possible value functions.
In order to make this optimization tractable, we can restrict ourselves to a set of value functions corresponding to a family of dynamics functions parameterized by , making the optimization in question
In practice, we may not be able to find an expression for the gradient of with respect to , since the value function is derived from the dynamics via the differential game given by (3). The lack of a gradient expression restricts the use of numerical methods to solve the problem as presented in (8). In these cases, we can compute a representative library of value functions corresponding to a set of representative parameter values (see Fig. 3 for an example library). The optimization then reduces to choosing the most likely value function from among this library
In order to ensure that the learned safe set is conservative, we can extend our MLE to a Maximum A Posteriori (MAP) estimator by incorporating our prior belief that, regardless of the safe set that the supervisor uses to generate interventions, they do not want the robots to be unsafe with respect to the true dynamics. In this case, we maintain a uniform prior that assigns equal probability to all whose zero sublevel sets are supersets of the zero sublevel set of the true , and zero probability to all other . In other words, we assume that the supervisor does not overestimate the agility of the robots, and in practice we can enforce this condition by choosing the library in (9) to only contain appropriate value functions. Moreover, regardless of the choice of , we assume that the supervisor intends to intervene before reaching the zero level set of , which always includes the boundary of . If we choose a prior that assigns zero probability to all non-positive and uniform probability elsewhere, it can be shown that the MAP estimates are obtained by letting equal and otherwise proceeding as before. Fig. 4 provides an example of this algorithm estimating a safe set from human supervisor intervention data.
Ii-D Team Control with Learned Safe Sets
We propose that safe sets learned according to the approach in Section II-C can be used to create effective control laws for the robotic members of human-robot teams. Recall our model of the human supervisor of a robotic team: the supervisor must rely on each robot’s autonomy to complete the majority of their tasks unassisted, but the supervisor may intervene to correct a robot’s behavior when necessary (such as by avoiding an imminent collision with the keep-out set ). We put forth that in the scenario where the human intervenes to prevent a collision, they do so because they observe that a robot has violated the boundaries of their mental safe set .
Now, consider a team of robots navigating an unknown environment, and which are able to avoid any obstacles that they detect. One approach to safely automating this team is to have each robot behave according to a minimally invasive control law: the robots are allowed to follow trajectories generated by any planning algorithm, so long as they remain within , the reachable set computed using the baseline dynamics model (1) with associated value function . Whenever these robots detect an obstacle, they add it to the keep-out set , thus modifying and . If a robot reaches the boundary of , it applies the optimal control to avoid until it has cleared the obstacle. However, it is possible that a robot does not detect an obstacle, and a human supervisor must intervene to ensure robot safety.
As stated above, the human supervisor will intervene when a robot reaches the boundary of , not the boundary of . This discrepancy leads to the possibility that the supervisor will intervene when the robot reaches some state , even if the robot would have avoided the obstacle without intervention. These situations arise whenever but . These “false positive” interventions represent unnecessary work for the human supervisor, and we seek to eliminate them in order to improve the human’s experience and the team’s overall performance.
We propose using a safe set learned from previous observations of supervisor interventions, as outlined in Section II-C, as a substitute for in the robots’ minimally invasive control law. By estimating the human’s internal safe set, we take advantage of the following property:
For an idealized supervisor collaborating with a team of robots as described in Section II-D, if the robots avoid detected obstacles by applying an optimally safe control at the boundary of safe set , then if the supervisor plans to intervene because they observe for robot , the supervisor can infer that robot has not detected an obstacle and any supervisor intervention will not be a false positive.
The proof of this property follows constructively from the definitions of safe set, idealized supervisor, and false positive. If robot had correctly detected an obstacle and adjusted its representation of accordingly, then it would have applied the optimal control to remain within the supervisor’s safe set. Therefore, if the supervisor is able to observe that robot has left , it must be the case that the robot has not detected the obstacle. False positives are defined to be supervisor interventions that occur when a robot has detected an obstacle but the supervisor still intervenes. In this case, the supervisor can correctly infer that robot has not detected an obstacle, so any intervention at this point cannot be a false positive. ∎
For an idealized supervisor, as becomes an arbitrarily good approximation of , the number of false positive interventions will approach zero. For a noisy idealized supervisor, the supervisor will intervene whenever where . This noise will continue to produce false positives, even with a perfect fit , if the robots apply the optimally safe control at the -level set of . Instead, the level set where the optimally safe control is applied can be raised arbitrarily high to drive the false positive rate to zero. For example, is sufficient to avoid over 97% of intervention states used for learning, in expectation. We test the efficacy of our approach through the human-subjects experiment described in Section III.
Iii Experimental Design for User Validation
Our goal in understanding and modeling the supervisor’s conception of safety is to improve team performance by decreasing cognitive overload. Although we have based our human modeling on the cognitive science literature, we do not intend to verify humans’ exact cognitive processes. Instead, we aim to apply our inspiration from cognitive science toward building better human-robot teams. To this end, our hypotheses are:
Representing supervisor behavior as cognitive keep-out sets allows intervention signals to be distilled into an actionable rule which will decrease supervisory false positives and cognitive strain, thereby increasing team performance and trust.
Fitting danger-avoidance behavior to a supervisor’s beliefs is preferable to generic conservative behavior.
In our experiment, we gather supervisor intervention data, fit our model to the data, and then run a human-robot teaming task that assesses performance.
Our experiment applies the idealized supervisor theory and learning algorithm to supervising simulated robots. The robots moved according to the Dubins car model:
The experiment is divided into three phases. In Phase I, the subject is given an opportunity to familiarize themselves with the robotic system’s dynamics. The user is allowed to directly apply the full range of controls through the computer keyboard for one minute. After ensuring the user has some experience from which to build an internal dynamics model, we then assess their emergent conception of safety. In Phase II, supervisory data is extracted from the subject by showing them scenes where the robot is driving towards an obstacle, and the supervisor decides where to intervene to avoid a crash. This intervention data is then fed into our algorithm (described in Section II-C) that extracts the best fitting safe set. Our estimator used a library of candidate dynamics functions parameterized by values of between 0 and 3, as shown in Fig. 3. In this experiment, we enforced conservativeness by excluding subjects whose Learned sets were not supersets of the Standard safe set, rather than enforcing a prior directly on . The Learned safe set is assessed in Phase III against two fixed safe sets (see Fig. 5) pre-calculated from the true dynamic equations.
These safe sets were calculated using Hamilton-Jacobi reachability as described in Section II-A using the Level Set Toolbox  for MATLAB. During this final phase, the subject sequentially supervises homogeneous teams of robots, each team avoiding obstacles based on one of the three assessed safe sets. Ten randomly placed obstacles are strewn about the screen impeding the robots’ autonomous trips back and forth across the screen (see Fig. 6). Although robots will detect and avoid an obstacles in 80% of their interactions with it, there is a 20% chance that the robot will not detect an obstacle as it approaches. The subject is charged with catching these random failures and removing an obstacle before the robot crashes. Crashing is disincentivized by decrementing an on-screen “score” counter. Removing an obstacle costs only half of what a crash costs the player. This system encourages saving the robot but not guessing wildly. Moreover, simply clearing out all obstacles is not a viable strategy because every obstacle removed generates a new obstacle elsewhere. This score mechanism was also used to make the participant invested in team success by awarding points every time a robot completes a trip across the screen.
Iii-B Independent Variables
To assess our hypotheses, we manipulate the safe set used between team supervision trials. We exposed the human subject to three teams, each driving using one of three safe sets. The Learned set is derived from Phase II supervisor intervention observations as described in Section II, using . The two baseline kernels are calculated using Hamilton-Jacobi-Isaacs reachability on the true dynamic equations. The Standard set is calculated using the true obstacle size. The Conservative set adds a buffer that doubles the effective size of the obstacle, inducing trajectories that give obstacles a wide berth.
Iii-C Dependent Measures
Iii-C1 Objective Measures
The team was tasked with making trips across the screen to reach randomized goals. The robots’ task was to travel across the screen, safely dodging obstacles along the way, while the human was tasked with supervising as a failsafe to remove an obstacle if the robots should fail to observe and avoid it.
Team performance was quantified using three objective metrics: number of trips completed, number of supervisory interventions, and the number of obstacle collisions. These metrics were presented to the subject as an aggregated, arcade-style score. To incentivize participants to only intervene when necessary, obstacle-removal interventions reduced the score, but only by half as much as an obstacle collision.
The number of interventions taken by the supervisor can also serve as a proxy measurement to quantify the amount of cognitive strain they experience while working with the robotic team. Of particular note are the number of interventions that were not actually required, as the supervisor incorrectly judged that a robot had not detected an obstacle. These false positives needlessly drain supervisor attention and indicate a lack of trust in the system. We aim to increase the human’s trust in the system, which we quantify by a decrease in these false positives.
Iii-C2 Subjective Measures
After each round of pairwise comparison (completing the task with two different robotic teams), we presented the subject with a questionnaire to gauge how the choice of safe set impacted their experience. These questionnaires contained statements about each team that subjects would respond to using a 7-point Likert scale (1 - Strongly Disagree, 7 - Strongly Agree). These statements were designed to measure Trust, Perceived Performance, Interpretability, Confidence, Team Fluency, and overall Preference between the teams in the comparison.
Iii-D Subject Allocation
The subject population consisted of 6 male, 5 female, and 1 non-binary participants between the ages of 18-29. We used a within-subjects design where each subject was asked to complete all three possible pairwise comparisons of our three treatments (the safe sets used). We used a balanced Latin Square design for the order of comparisons, with no treatment being first in a pair twice. Furthermore, we generated six randomized versions of the task so that subjects were presented with a different version of the task for each trial across the three pairwise comparisons. To avoid coupling the treatment results to a particular version of the task, each treatment was paired with each task version an equal number of times across our subject population.
Iv Analysis and Discussion
Iv-a H1: False Positive Reduction over Standard
Our first hypothesis is that a Learned safe set that reflects the supervisor’s intervention behavior would decrease the number of false positives compared to the Standard safe set. To test this, we performed a one-way repeated measures ANOVA on the number of supervisory false positives from Phase III of the experiment with safe set as the manipulated factor. A false positive was any supervisor intervention where the removed obstacle was actually detected by all nearby robots, which would have avoided it successfully. The robot team’s safe set had a significant effect on the number of supervisory false positives (, ). An all-pairs post-hoc Tukey method found that the Learned safe set significantly decreased () false positives over the Standard safe set, but there was no significant difference between the Learned safe set and the Conservative safe set (which also significantly decreased false positives over the Standard safe set, with ). These results support our main hypothesis that representing supervisor behavior as cognitive keep-out sets allows intervention signals to be distilled into an actionable rule which will decrease supervisory false positives.
The second half of that hypothesis, that decreasing supervisory false positives will increase trust and team performance was not shown conclusively from our data. We performed a one-way, repeated measures ANOVA on the pairwise comparison surveys between the teams using the Learned and the Standard safe sets. Measures of trust showed no significant improvement (, ).
Iv-B H2: Preference over Conservative
For 9 of 11 participants, the Learned safe set had shorter avoidance arcs than the Conservative set. We hypothesized that this greater efficiency would make the tailored conservativeness of the Learned set preferable to the baseline Conservative safe set. However, a t-test showed that the survey responses for preference were statistically indistinguishable () from a neutral score: an inconclusive result for Hypothesis 2. We believe that this result stems from users judging preference more on intelligibility, the ease of avoiding false positives, than on efficiency, the shortness of paths. As discussed in Section IV-A, both the Learned and Conservative safe sets led to significant false positive reductions over the Standard set.
This indistinguishability is further compounded since a preference for intelligibility seems to be expressed by some subjects in their Phase II intervention data, resulting in their Learned safe sets having similar arcs as the Conservative safe set (see Fig. 8). Future work could investigate this efficiency-intelligibility trade-off further by using a conservative baseline that is distinguishably more conservative than user safe sets and by making efficiency more central to the team task.
Iv-C Model Validity
The statistically significant decreases in false positives observed in Phase III agree with the decreases predicted by the supervisor model based on intervention data from Phase II. Our model posits that interventions occur at states noisily distributed about a safe set boundary. Therefore, it predicts that the empirical distribution of Phase II intervention states contained within a proposed safe set (see Fig. 9) will mirror the proportion of false positive interventions observed in Phase III: if states are deemed safe by the controller, they will not be avoided, even when the noisy supervisor would judge them to be unsafe. Since the Learned safe set controller intervenes at the level set (see Section II-C), exactly half the intervention states will be contained within the Learned safe set in expectation. The model’s predictions are compared against observed false positives in Table 1.
|in Safe Set||F.P. vs Std.||F.P.||F.P. vs Std.|
|Standard||397 / 440||100%||12.54||100%|
|Learned||220 / 440||55.4%||7.31||58.3%|
|Conservative||115 / 440||29%||4.68||37.3%|
Table 1: Predicted and observed false positives. Left: Predicted false positives from Phase II data. Right: Observed false positives in Phase III.
Automation with human supervisors relies on leveraging the human supervisor’s cognitive resources for success. Respecting these resources is essential for creating well performing human-robot teams. It is especially important to avoid overtaxing the human as automated teams continue to scale up, and a single human worker both accomplishes more and bears more cognitive load than ever. To alleviate this burden, we can decrease the number of issues that command the supervisor’s attention by reducing false positives. By modeling which system states command supervisory attention, we can program autonomous systems to avoid those states when they do not require attention. To capture this information, we combine the concept of mental simulation from cognitive science with formal safety analysis from reachability theory to propose the noisy idealized supervisor model. We employ the noisy idealized supervisor as the generative model in a learning algorithm to predict supervisor safety judgements, and we present a safety controller for robotic agents that respects the supervisor’s perception of safety. This safety controller is guaranteed to reduce false positives for idealized supervisors. Furthermore, for actual supervisors, our human-robot teaming user study demonstrated a significant reduction in false positives when using our approach compared to the standard baseline.
Our results show that it is possible to reduce false positives, and thus cognitive load, by aligning robot behavior with humans’ expectations. Our approach is applicable whenever reachability theory can tractably analyze a dynamical system that will be subject to human safety judgements. Future work will explore the impact of this framework on application domains from air traffic management to self-driving vehicles.
Pieter Abbeel and Andrew Y Ng.
Apprenticeship learning via inverse reinforcement learning.
Proceedings of the twenty-first international conference on Machine learning, page 1. ACM, 2004.
-  Anayo K Akametalu and Claire J Tomlin. Temporal-difference learning for online reachability analysis. In Control Conference (ECC), 2015 European, pages 2508–2513. IEEE, 2015.
-  Andrea Bajcsy, Dylan P Losey, Marcia K O’Malley, and Anca D Dragan. Learning robot objectives from physical human interaction. In Conference on Robot Learning, pages 217–226, 2017.
Peter W Battaglia, Jessica B Hamrick, and Joshua B Tenenbaum.
Simulation as an engine of physical scene understanding.Proceedings of the National Academy of Sciences, 110(45):18327–18332, 2013.
-  Mo Chen, Jaime F Fisac, Shankar Sastry, and Claire J Tomlin. Safe sequential path planning of multi-vehicle systems via double-obstacle hamilton-jacobi-isaacs variational inequality. In Control Conference (ECC), 2015 European, pages 3304–3309. IEEE, 2015.
Earl A Coddington and Norman Levinson.
Theory of ordinary differential equations. Tata McGraw-Hill Education, 1955.
-  Anca D Dragan and Siddhartha S Srinivasa. A policy-blending formalism for shared control. The International Journal of Robotics Research, 32(7):790–805, 2013.
-  Jaime F. Fisac, Anayo K. Akametalu, Melanie Nicole Zeilinger, Shahab Kaynama, Jeremy H. Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. CoRR, abs/1705.01292, 2017.
-  Jaime F Fisac, Mo Chen, Claire J Tomlin, and S Shankar Sastry. Reach-avoid problems with time-varying dynamics, targets and constraints. In Proceedings of the 18th international conference on hybrid systems: computation and control, pages 11–20. ACM, 2015.
-  Jeremy H. Gillula, Gabriel M. Hoffmann, Haomiao Huang, Michael P. Vitus, and Claire J. Tomlin. Applications of hybrid reachability analysis to robotic aerial vehicles. The International Journal of Robotics Research, 30(3):335–354, 2011.
-  Sylvia L Herbert, Mo Chen, SooJean Han, Somil Bansal, Jaime F Fisac, and Claire J Tomlin. Fastrack: a modular framework for fast and guaranteed safe motion planning. arXiv preprint arXiv:1703.07373, 2017.
-  Gabriel M Hoffmann and Claire J Tomlin. Decentralized cooperative collision avoidance for acceleration constrained vehicles. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 4357–4363. IEEE, 2008.
-  Sheue-Ling Hwang, Woodrow Barfield, Tien-Chen Chang, and Gavriel Salvendy. Integration of humans and computers in the operation and control of flexible manufacturing systems. The International Journal of Production Research, 22(5):841–856, 1984.
-  Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. Learning preferences for manipulation tasks from online coactive feedback. The International Journal of Robotics Research, 34(10):1296–1313, 2015.
-  Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S Srinivasa, and J Andrew Bagnell. Shared autonomy via hindsight optimization for teleoperation and teaming. arXiv preprint arXiv:1706.00155, 2017.
-  Rudolf Emil Kalman. When is a linear control system optimal? Journal of Basic Engineering, 86(1):51–60, 1964.
-  Ian M Mitchell. A toolbox of level set methods. Dept. Comput. Sci., Univ. British Columbia, Vancouver, BC, Canada, http://www. cs. ubc. ca/~ mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR-2004-09, 2004.
-  Kevin A Smith and Edward Vul. Sources of uncertainty in intuitive physics. Topics in cognitive science, 5(1):185–199, 2013.
-  Claire J Tomlin. Towards automated conflict resolution in air traffic control1. IFAC Proceedings Volumes, 32(2):6564–6569, 1999.
-  Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In AAAI, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.