Cooperation for Scalable Supervision of Autonomy in Mixed Traffic

by   Cameron Hickert, et al.

Improvements in autonomy offer the potential for positive outcomes in a number of domains, yet guaranteeing their safe deployment is difficult. This work investigates how humans can intelligently supervise agents to achieve some level of safety even when performance guarantees are elusive. The motivating research question is: In safety-critical settings, can we avoid the need to have one human supervise one machine at all times? The paper formalizes this 'scaling supervision' problem, and investigates its application to the safety-critical context of autonomous vehicles (AVs) merging into traffic. It proposes a conservative, reachability-based method to reduce the burden on the AVs' human supervisors, which allows for the establishment of high-confidence upper bounds on the supervision requirements in this setting. Order statistics and traffic simulations with deep reinforcement learning show analytically and numerically that teaming of AVs enables supervision time sublinear in AV adoption. A key takeaway is that, despite present imperfections of AVs, supervision becomes more tractable as AVs are deployed en masse. While this work focuses on AVs, the scalable supervision framework is relevant to a broader array of autonomous control challenges.



There are no comments yet.


page 1

page 12


Integrating Neurophysiological Sensors and Driver Models for Safe and Performant Automated Vehicle Control in Mixed Traffic

In future mixed traffic Highly Automated Vehicles (HAV) will have to res...

Limits of Probabilistic Safety Guarantees when Considering Human Uncertainty

When autonomous robots interact with humans, such as during autonomous d...

Safe Reinforcement Learning on Autonomous Vehicles

There have been numerous advances in reinforcement learning, but the typ...

Strategic Safety-Critical Attacks Against an Advanced Driver Assistance System

A growing number of vehicles are being transformed into semi-autonomous ...

Towards the Unification and Data-Driven Synthesis of Autonomous Vehicle Safety Concepts

As safety-critical autonomous vehicles (AVs) will soon become pervasive ...

On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods

The increasing adoption of Reinforcement Learning in safety-critical sys...

CausalAF: Causal Autoregressive Flow for Goal-Directed Safety-Critical Scenes Generation

Goal-directed generation, aiming for solving downstream tasks by generat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Given the complexity, chaos, and unpredictability of real-world environments, safety is a critical challenge for autonomous systems. This is particularly the case for mixed autonomy systems, which refer to systems in which humans and machines both exhibit control in the same environment [1]. Indeed, in a 2018 Allianz Global Assistance survey investigating decreasing interest in self-driving cars (one of the most-discussed mixed autonomy systems today), over 70% of respondents cited safety as a reason for their lack of interest, a rise from the 65% figure found a year earlier to the same question [2]. Thus, improving safety in mixed-autonomy systems is not only an end in itself, but also is a prerequisite for unlocking the benefits that such systems may offer.

Fig. 1: An illustrative vehicle merging scenario for analyzing scaling of safe supervision of AVs. The vehicle’s route is in blue; it enters via an on‐ramp on the right side of the rotary, completes a 270‐degree route, and exits via the off‐ramp at the rotary’s bottom. The merge point is indicated with the red marker. Vehicles that remain in-ring are shown in grey.

The question becomes: How can we achieve some level of safety and performance even when formal guarantees are elusive in a chaotic world? The research described here takes these matters as a point of departure by asking how we can mitigate risk via human supervision of non-human autonomous agents.

I-a Contributions

This work provides one perspective on the topic of improving safety in mixed autonomy settings by investigating the online human supervision of AVs in an illustrative merging task. To this end, we present a number of contributions that—to the best of our knowledge—have not been addressed in previous work.

  1. We formalize the ‘scaling supervision’ problem for online autonomous agents and propose two related ‘scalability’ metrics: expected supervisor control time and expected required number of supervisors.

  2. We propose a reachability analysis-based method for AV supervision. We provide an upper bound on the expected supervision time required for a supervisor to monitor a given AV’s merge, as well as a closed-form expression for the probability that the number of supervisors is insufficient when merging tasks are pooled together and distributed across multiple human supervisors. We empirically validate the findings using traffic scenarios derived from real-world rotaries.

  3. Interestingly, we show, both analytically and numerically, leveraging order statistics and deep reinforcement learning (RL), respectively, that cooperation of AVs enables supervision time sublinear in AV adoption.

Ii Background

Ii-a Motivation

Safety in AVs

The importance of safety in any deployment of AVs is sufficient justification for investigating supervision of autonomous agents in itself. If AVs replaced even a modest portion of US vehicles, the number of lives depending on AV safety systems could number in the thousands [3].

Moreover, AVs are not the only autonomous system with the potential to inflict human harm. The authors anticipate that the approach and lessons learned in this research can inspire future work to improve the safety and performance of autonomous agents in settings beyond society’s roadways, such as in manufacturing and other cyber-physical systems.

AVs’ Benefits to Traffic Systems

While ensuring a sufficient level of safety is an important end in itself, doing so is foundational to the wider adoption of AVs, which can in turn bring benefits including decreased traffic congestion and traffic emissions. AVs can achieve outsized positive impact without dominating roadways. Using AV policies learned via RL, previous work shows that even when AVs account for only 5-10% of vehicles in a traffic network, they can still boost the collective average speed of all vehicles in the network by up to 57% in idealized settings [5]. Researchers found that improving traffic flows in California can reduce highway carbon dioxide emissions by up to nearly 20%; this is particularly significant given that transportation accounts for approximately a third of all US carbon dioxide emissions [6].

More broadly, policies learned via RL have demonstrated superhuman performance in domains from board games to video games to high-altitude balloon control to simulated fighter jet combat [7]-[10]. However, deep RL methods lack the convergence guarantees of more traditional RL approaches [11, 12].


The case of AVs merging onto a highway with traffic is an appropriate task for investigating how we can scale supervision for a number of reasons. First, it is a common occurrence that can pose difficult control and coordination challenges for AVs [13, 14]. And given the speed at which freeway merges occur, this task can result in deadly crashes. Merging also presents unique challenges in the mixed autonomy settings. Were a system to consist entirely of AVs, a solution may exist via decentralized or centralized coordination, but that is much more difficult—if not impossible—for the foreseeable future, in which human drivers still populate the roadways [17].

Other applications

The scalable supervision framework, concepts, and formalization presented here are adaptable to other autonomous machine settings. Within the mixed-autonomy traffic environment, one could consider scalable supervision as a means of supporting dedicated AV lanes on freeways. These lanes have been shown in simulation to improve traffic efficiency while also worsening on-road safety [15]. The potential speed difference between AV-only and standard lanes present a merging problem similar to the case studied here. Scalable supervision would also be useful for the case of accident-prone ‘hotspots’ (construction sites, school zones, etc.) where concerns arise about the potential behavior of AVs [16]. Each of these is a practical instance of the scenario studied here.

More broadly, reachability-based supervision scaling that enforces safety constraints is applicable to domains as varied as warehouses, ports, airports, satellite orbits, and disaster response.

Ii-B Related Work

This research draws from interesting and important related work in at least four areas of the literature: human-AI teaming, supervision and cognitive models, reachability analysis, and reinforcement learning.

Human-AI teaming

Relevant work includes that from the human-robot interaction subfield in multi-agent systems, which explores how to reduce a single operator’s cognitive load from scaling linearly with the number of robots. A significant portion of this work investigates the possibility for well-designed interfaces to reduce task burden or for learned models of human preferences to automate human-machine control handoff (also called, sliding, shared, or adjustable autonomy) [18]. Our work expands this effort by considering the shared autonomy situation in which strict safety guarantees must be maintained, and extends this line of work into the mixed autonomy traffic setting.

Research such as that described in [19] seeks to optimize the allocation of limited human assistance (via a decision support system) to multiple robots performing tasks in a given environment. The machines in these works benefit from human assistance, but do not exhibit the strict safety-critical requirements of AVs in traffic. As such, these works also place greater emphasis on computing and comparing the various assistance permutations. Such comparisons are not necessary given our binary approach to safety and strict safety requirements (i.e., no suboptimal safety configurations are permitted).

Scalable supervision and cognitive models

To the best of our knowledge, previous work concerned with decreasing the time burden of human supervision of autonomous tasks tends to be directed at empirical studies of a single human monitoring multiple UAVs (unmanned aerial vehicles) [20, 21, 22]. Relatedly, human-robot interaction theory developed in the early 2000s considers ‘fanout,’ the number of robots that a human can effectively control [21, 22, 23, 24, 25, 26]. While philosophically aligned with our work on some levels, in the AV merge setting we isolate a single task to analyze and face safety challenges unique to the mixed autonomy setting. Additionally, the mixed autonomy AV setting considered here is arguably higher-stakes than a typical UAV setting, as it includes human drivers operating in the same vicinity; crashes would more likely involve human fatalities.

A closely related work is that on scaled autonomy by Swamy, et al., in which the authors investigate how to assist an operator in selecting which robots in a fleet most require teleoperation [27]. However, our work differs by seeking to minimize the supervisor’s active control time (the teleoperator in that work is never idle) and the mixed autonomy setting at the heart of our work must maintain strict safety standards.

This work extends on previous scalable supervision research currently in submission to the IEEE International Conference on Robotics and Automation [28].

Reachability analysis

At its most basic, reachability analysis is about identifying the set of states that a dynamical system could enter—the ‘reachable states’ or ‘reachable set’—given all admissible inputs and parameters [31]. A variety of methods exist for doing so; a common trade-off among these is that between approximation and computational inefficiency [32, 33]. Methods such as sampling-based approaches may require less computation, but they are not fully conservative, meaning they are not guaranteed to compute the reachable set [33]. For safety-critical applications, such approximations may be unacceptable.

One method for computing the exact reachable set is Hamilton-Jacobi (HJ) reachability. HJ methods can handle nonlinear system dynamics and allow for formal treatment of bounded disturbances, but suffer exponential computational complexity in terms of the number of state variables [34]. Fortunately, the vehicle dynamics and merging problem structure allow us to adopt a fully conservative approach while avoiding the computational complexity. The proposed reachability-based method calculates the exact reachable set for each vehicle in time. However, we note that utilizing other mechanisms of computing conservative reachable sets (such as Hamilton-Jacobi reachability) could be used future work generalizing to more scenarios [34]. Other work that seeks to integrate reachability analysis and AVs relaxes the conservatism of their approaches by making assumptions about human driving behaviors [35], or by not accounting for the full range of system dynamics [36]. The need remains for an approach that maintains strict safety guarantees.

Reinforcement learning (RL)

RL agents learn a control policy by interacting with their environment and observing the states they visit and rewards (or penalties) they accumulate along the way. The introduction of deep neural network architectures has allowed RL agents to achieve superhuman performance in a variety of domains, but at the expense of formal convergence guarantees 

[11, 12].

Previous work on ‘Safe RL’ has largely focused on supervising an RL agent’s learning process, rather than online (synchronous) supervision at test time. Given the amount of training required for RL agents, this supervision can require weeks of human labor [37]. Other methods utilize Lyapunov functions to achieve safer exploration during training, but here again the work centers around the agent’s learning rather than its performance at test time [38, 39]. Yet the safety of already-trained deep RL systems is far from assured, given issues such as catastrophic forgetting and generalizing to previously-unseen states or new tasks [40, 41]. In this work, we leverage RL to model a performant and yet imperfect (unsafe) AV. We propose the combination of reachability analysis and human supervision as one means of addressing safety during execution time of a controller trained using RL.

Iii Implications for Autonomous Driving

While investigating the case of merging AVs is an interesting problem in itself, it also presents a compelling opportunity for achieving significant scalability. It is worth pausing for a moment to appreciate the broader context of these improvements and the leveraging effect that can be achieved in the AV setting by adopting autonomy with supervision assistance for the driver. Not only does this provide further motivation for investigating scalable supervision of mixed autonomy systems in the specific context of AVs, but it also serves as a case study for determining the potential for supervision scaling to achieve practical gains in a given setting. In so doing, this analysis provides a framework that could be used to evaluate other tasks for scalable supervision.

Iii-a Improved safety and user experience

AVs provide a compelling case for investigating scalable supervision because short, particularly dangerous events—such as merging or navigating a construction zone—are interspersed within longer, easier events—such as cruising a highway. This is not only due to the fact that a human is present and able to assume control of the vehicle. Imagine a future where humans can trust their AVs to handle mundane driving like occurs on an interstate highway. An intervention that prevents a human driver from lifting a finger during a five-seconds-long merge may thus not only spare the person interruption for those five seconds, but indeed may have in effect spared her interruption for 20 minutes if the merge interlinked 10-minute stretches on two separate highways.

Certainly, unforeseen events can occur at any point in the driving process, but that is beyond the scope of this research, which focuses on the illustrative and pre-specified event of merging. Other instances which present increased risks or unpredictability but can be anticipated include construction zones, school zones, corridors particularly congested with foot traffic, and known traffic accident hotspots [16].

Iii-B Informed fleet operations

Beyond the improved driving experience and safety for an individual, in the further future such scaling might be a crucial element enabling the widespread deployment of autonomous vehicles, and thus crucial for achieving the environmental and social benefits outlined previously. Consider a company that seeks to provide a fully hands-off driving experience. Even if the company’s AVs are generally quite effective at navigating roadways, given the difficulty of some tasks like merging and consumers’ desire for additional safety, the firm may look for additional safety measures. If it were sufficiently economical (and assuming low-latency connections, etc.), one way to do so might be to hire remote supervisors to monitor the AV fleet and provide real-time control when necessary. Thus, the question of interest for this company would not only be reductions in the number of interventions and the time required over some baseline, but how many remote supervisors need to be hired as a whole to supervise the entire fleet of AVs.

One principal aim of this article is to provide rigorous methodology for informing such estimates, by refining the estimated time required for supervision of the AVs. As an illustration, the outcome of the proposed methodology, tailored to the locale and services of an AV operator, could be incorporated into the following estimate for fleet operations. Based on some moderate assumptions, we might expect that one of these remote supervisors might be able to supervise between 18 and 66 AVs for the task of highway merging in the United States. The logic behind this estimate is outlined here. In the US approximately 33% of total vehicle miles traveled (VMTs) occur on interstates, other freeways, or expressways, where “access and egress points are limited primarily to on- and off-ramps.” 

[42, 43, 44] Making the conservative assumptions that on these roads the average speed is 50mph and the average trip length is 15 miles, the total time for each trip is 0.3 hours. If the time it takes for a supervisor to assume control and execute a merge task safely is 60 seconds, and assuming one merge per trip, then a supervisor is needed for 5.55% of every trip. Taking the inverse of this, we see that—if tasks are allocated without gaps or delays—one supervisor could theoretically monitor merges for 18 highway vehicles. Since VMTs on interstates, other freeways, and expressways account for only a third of all VMTs, one supervisor could supervise up to 52 vehicles (since the majority never make a highway merge). These numbers increase to 22 highway vehicles and 66 vehicles overall if we assume an average freeway speed of 40mph.

Iv Formalizing Scalable Supervision

Fig. 2: An illustration of the reachability conditions that would activate a supervisor. The yellow zones represent the reachable distances over time horizon for both vehicles. When the merge point (marked in red) is within both vehicles’ reachable zones, there is a possibility of an accident in the time horizon, and the supervisor activates.

In this section, we provide a formalization of scalable supervision and a delineation between static and dynamic supervision, and provide theoretical results.

Iv-a Formalizing Scalable Supervision

At its core, the two goals of scaling supervision are to reduce the number of necessary human interventions and to reduce the time required for those that must occur. One way to do this (our approach) is by utilizing reachability analysis to identify when a merging AV is in danger of colliding with another vehicle in the system, and activate human control of the AV at that time. It is also important to note that ‘safety’ in this paper is not synonymous with crash-free behavior. Since the safety backup for the AV is a human driver, the ultimate safety of the system remains subject to human fallibility.

To understand how our approach works, first imagine an AV on an on-ramp merging into a rotary in which a human vehicle (HV) is already driving. The objective is to prevent the two vehicles from colliding during the AV’s merge, so the reachability question, simply put, is whether it is possible for both the AV and HV to get to the merge point over some finite time horizon . Underlying the horizon is the intuition that a supervisor need not assume control of the AV if a potential collision is far in the future; it is enough to assume control before the potential crash but with enough time to avoid it comfortably. We can vary to be more or less cautious in any given scenario; in at least one setting humans required 5-8 seconds to safely assume control of a vehicle; this is a time window we adopt in experiments [45].

The reachability question naturally decomposes into two independent subquestions: (1) Can the AV reach the merge point, and (2) Can the HV reach the merge point? That is, for each vehicle, is the distance it would travel if it were to apply the maximum acceleration over the horizon greater than or equal to the distance between that vehicle’s current position and the merge point? If the answer to both subquestions is ‘yes,’ then human supervisor control of the AV must be activated to ensure safety. See Figure 2 for a visual representation.

We can write this more formally: the horizon beyond which the supervisor must assume control of a merging AV to guarantee avoiding collision with the nearest in-ring vehicle (denoted by the subscript) is


where is the maximum distance vehicle can travel in time horizon and is a function of that vehicle’s initial velocity and its maximum acceleration , and is the distance between vehicle and the merge point . The road networks used for experimentation are limited to one-lane roads and thus considering a vehicle’s maximum forward distance is sufficient. This amounts to the most conservative reachability for the given time horizon: there is no area that the vehicle could reach over time that is not within its reachability zone. Note that this maintains safety guarantees even when generalizing to cases in which 2-dimensional motion is considered (such as by adding lane changes). The maximum distance reachability formulation still provides a conservative reachable zone because side-to-side motion only can reduce the distance along the road that the vehicle travels, rather than increase it.

Iv-B Scenario

Because this work focuses on merging, the experiments utilize a large rotary traffic network, which approximates a highway with multiple on-ramps and off-ramps. The ring road at the core of the structure is a well-studied setting in traffic literature, and has been shown to mimic traffic congestion patterns that might occur on an infinite roadway [46]. The ring therefore emulates a highway, and the speed limit is set accordingly. Merging vehicles enter the system via the on-ramps, merge into the ring road, and then exit via off-ramps (see Figure 1).

Since this is a mixed autonomy system, we simulate both AVs and human vehicles (HVs). The HVs are modeled using the widely used Intelligent Driver Model (IDM), which has been shown to emulate the actual behavior of human drivers [47, 48]. Next, we model connected AVs—also using IDM—to investigate the supervision scaling implications of connected AVs. We will see that they allow for linear improvements in supervision scaling. Lastly, we include in-ring AVs controlled using a policy learned via RL. These represent supervision-aware, cooperative AVs—the RL agent is incentivized during learning to avoid triggering the merging vehicle’s supervisor, and thus learns to accommodate the incoming vehicle’s merge. Thus, using both theoretical and empirical results, we show how the teaming of AVs enables supervision time sublinear in AV adoption.

V Theoretical Results

By applying reachability analysis to the problem of AV merging, a number of bounds can be derived for the settings in which we are interested.

This analysis begins with a description of an upper bound on the supervision requirements for the scenario in which a group of remote supervisors manage merges for an arbitrarily large number of on-ramps, as well as a closed-form expression for calculating the probability that AVs need supervision but cannot receive it. This is of particular interest because it allows for the characterization of the risk associated for a system with a fixed number of supervisors. Conversely, it allows one to calculate the number of supervisors necessary to achieve a desired level of supervision safety. After the provision of the upper bound, an analysis of the ‘typical case’ is provided, wherein the expected supervision scaling gains via cooperative, supervision-aware AVs is described.

One necessary component of the remote supervisor analysis is a characterization of , that is, the probability that the merge point falls within the reachable zone of at least one vehicle within the rotary. The analysis further below investigates this by considering how an upper bound on varies in the settings in which the in-ring vehicles are entirely HVs, a mix of HVs and connected AVs, and finally a mix of HVs and supervision-aware, cooperative AVs.

It is important to first note that we can rewrite Equation 1 using kinematics. We allow each vehicle to take any non-negative velocity and only assume finite vehicle acceleration . Thus, preserving this conservative view of reachability requires accounting for the worst-possible case, and the reachable set for vehicle is .

This conservative approach to reachability has the ancillary benefit that vehicle ’s reachable distance in the given time horizon can be written simply:


With this in mind, we turn to the challenge of characterizing the multiple-supervisor, multiple-merge case.

V-a Multiple supervisors monitoring an AV fleet

We consider a future setting in which a team of human supervisors located remotely could monitor a fleet of AVs and assume remote control if necessary. A natural question would be: how many supervisors are necessary?

Theorem 1.

Suppose we have remote supervisors and on-ramps on which s appear and trigger the on-ramp supervision condition with arrival processes . Suppose the service rate of each remote supervisor follows .

The fraction of AVs that require supervision but cannot immediately receive it (and thus go unsupervised) is given by


where , is the probability that an in-ring vehicle will trigger its supervision condition, and in order to have a valid steady state probability.



be the random variable denoting the number of AVs requiring control at an arbitrary point in time.

Allow different on-ramps to have different arrival rates . Because the supervision tasks are allocated to centralized group of remote supervisors, the problem reduces from a tasking situation involving separate queues to a single-queue tasking case. Given Equation 27 in the appendix, the arrival rate of AVs requiring supervision follows . Set

As a specific example (to simplify notation), assume for some constant , then the AVs arriving on the on-ramp triggers supervision condition following the distribution (again, see Equation 27 in the appendix). Let be a random variable denoting the number of AVs that need to be supervised. We know where can be obtained from the in-ring vehicle’s reachability condition and will be derived further below.

By Equation 26 (see Appendix), we have that is the arrival process for AVs that need to be supervised. Then, the problem reduces to an queue with arrival process , service rate for each supervisor, and finite capacity . The finite capacity is due to the fact that when all supervisors are busy, any additional on-ramp AVs requiring supervision will will be rejected immediately. These cases cannot wait on the queue for future service as their needs are immediate; that is, to maintain safety, once the on-ramp and in-ring conditions are triggered for a given merge, a supervisor must immediately supervise the merge. Thus, the rejected cases that the pool of supervisors cannot immediately service represent dangerous situations.

For an queue with no waiting space (that is, customers who cannot be served immediately are turned away), both the steady state probability and loss formula are known [49]. The expression for the steady state probability is


where . Then, represents the number of AVs requiring supervision, is the total number of supervisors, the arrival rate is set , and is kept as the service rate. The loss formula here represents the fraction of AVs that require supervision but cannot immediately receive it (and thus go unsupervised), and the result thus follows.

Based on Equation (3), the expected number of supervisors needed is the smallest such that for some constant of interest . In short, this answers the question, ”How many supervisors do I expect to need?”, assuming the questioner can define their risk threshold .

In the above, observe that—due to the functional reduction of the inflows to a single task queue serviced by the supervisors—the exact number of on-ramps does not directly affect the steady state probability or loss formula, except insofar as the individual arrival rates contribute to the global arrival rate.

Although the theorem in this section is focused on a remote supervision scenario, the empirical study (Section VI) is focused on the nearer-term scenario in which each AV has its own on-board driver supervising it. We leave empirical analysis of the remote supervisor scenario for future work.

Finally, note the importance of the in-ring reachability in determining whether an AV requires supervision. This follows naturally from the problem formalization, in which both the in-ring and on-ramp supervision conditions (reachability conditions) must be met in order for a merging AV to require supervision. The following sections provide insight into this term.

V-B In-ring reachability for all-HV rotaries

Lemma 1.

Given a single-lane ring road of circumference with vehicles distributed uniformly at random (but not necessarily independently) along the length of the ring, the probability that an arbitrary fixed point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:


where is the reachable range for vehicle over the time horizon.


Suppose is the fixed merge point. Let be the distance between the in-ring HV and (the asymmetric distance following the traffic flow). Then the event


where the first equality comes from the fact that the event happens if any of the HVs triggers the supervision condition, and the second equality comes from the reachability analysis for each HV.

Applying a union bound (Boole’s inequality) to the joint probability event in Equation 6, we have


where the last equality comes from our assumption that each vehicle’s location is distributed uniformly at random along the length of the ring, i.e. . However, note this does not require each vehicle’s location be independent of other vehicles’ positions.

The uniform distribution assumption above may be satisfied both by (a) a situation in which traffic is flowing freely around the ring, and (b) a setting with stop-and-go traffic where the congestion is equally likely to occur at any point within the ring. A case not covered by the lemma is one in which congestion routinely occurs around the merge point, but this case is beyond the current scope of work given our focus on safety in high-speed merges.

Lemma 2.

Given a single-lane ring road of circumference with vehicles distributed uniformly at random (but not necessarily independently) along the length of the ring, and a merging vehicle distributed uniformly at random along an on-ramp of length , the joint probability that the on-ramp’s merge point into the ring road is reachable over time horizon by both an in-ring vehicle and the on-ramp vehicle is upper bounded as follows:


where is the reachable range for vehicle over the time horizon.


Without loss of generality, we assume a sufficiently long on-ramp, i.e. .

The above follows from analyzing the joint probability of independent events. The rightmost term is the probability that an arbitrary point on the on-ramp is within the reachable zone of merging vehicle over horizon , and follows similar logic to that used for writing . ∎

Consequently, this also relies upon the assumption that the likelihood of finding the merging vehicle in a given position is evenly distributed across the on-ramp. This assumption is more tenuous in this case than it was in Lemma 1 because in situations of interest (such as when traffic exists on the highway) the AV will often have to yield. This would cause the AV to spend a disproportionately large amount of time just before the merge point.

V-C In-ring reachability for mixed HVs and connected AVs

Now consider how a mixed autonomy system with connected AVs may improve upon the all-HV case. These AVs may communicate their near-term trajectories, but are not assumed to alter their trajectories to avoid triggering supervision in the system. That is, they do not actively accommodate merging vehicles.

Corollary 1.

Given a single-lane ring road of circumference with unconnected vehicles and connected AVs distributed uniformly at random (but not necessarily independent of each other) along the length of the ring, the probability that an arbitrary point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:


where is the reachable zone for vehicle over the time horizon and is the length of connected AV . Note that one may define to include a safety buffer.


Again, suppose is the fixed merge point. Furthermore,

  • Let be the distance between the in-ring HV () and X at the current time .

  • Let be the distance between the in-ring connected AV () and X at future time . Due to the connectivity, we know the trajectory of the AVs and hence the location of the in-ring AVs at future time , so AV triggers supervision if its future location at time has distance less than its length to the merge point.

  • Assume and .

Then we similarly have the event as the event that any of the HVs or AVs trigger supervision:


Applying a union bound (Boole’s inequality) to the joint probability event in Equation 10 results in


It is worth further examining the connected AVs’ ability to predict and communicate their trajectory. Mixed autonomy settings present unique challenges to trajectory planning—while AVs can plan their own trajectories over a given time horizon in isolation, the. human drivers on the road are unpredictable. How can AVs predict their own trajectories when sharing the road with human drivers?

First, note that faster-than-expected HVs in this setting do not pose a problem for the connected AVs’ trajectory planning; indeed, if they speed further ahead, the AVs have more space. The instances which might be problematic are those in which an HV quickly slows.

However, even here the problems subside upon further analysis. If an HV sharply brakes far ahead of the merge point, the connected AVs behind it are also far from the merge point, and thus do not pose a collision risk for the merging vehicle. If an HV sharply brakes far after the merge point, no connected AV near the merge point will be substantially affected, especially not immediately.

Thus the instances of concern are further limited to those cases in which an HV rapidly brakes near the merge point when an on-ramp vehicle is about to merge. However, if the HV of concern is just prior to the merge point, note that it will have already triggered the supervisor, so any adjustment of a connected AV’s propensity to trigger the supervision condition is redundant. (Recall that once a supervisor is triggered, it supervises the entire merge.)

The remaining case is that in which an HV has just passed the merge point when it rapidly decelerates. Yet the only way an HV manages to pass the merge point without triggering supervision is if the merge point is beyond the on-ramp vehicle’s reachable zone during the entire time that same point is within the HV’s reachable zone. Therefore, even in this final case of concern, if the HV behaves erratically, there remains the entirety of the reachability time horizon for the connected AV to respond—and likely significantly more, given that the merging vehicle’s reachable zone assumes its maximum acceleration.

Of course, as discussed previously, unforeseen events can occur, but that is beyond the scope of this research focusing on the merge event.

V-D In-ring reachability for mixed HVs and supervision-aware AVs: worst case

Now consider how a mixed autonomy system with AVs that cooperate to avoid triggering supervision may improve upon the previous cases. We first consider a worst case improvement when the AVs are cooperative rather than agnostic to the supervision task. Then, in Section V-E, we consider the improvement given a typical (average) case with the cooperative AVs.

Corollary 2.

Given a single-lane ring road of circumference with unconnected vehicles and supervision-aware AVs distributed uniformly at random along the length of the ring, the probability that an arbitrary point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:


where is the reachable zone for vehicle over the time horizon.

That is, the previous upper bound in Corollary 1 is improved by dropping the term.


Again, suppose is the fixed merge point. Furthermore, let be the distance between the in-ring HV () and X at the current time . We assume .

Given the fixed merge point and a perfect control of its trajectory during the length planning interval, so long as is sufficiently long, there exists a control input (sequence of accelerations over the planning horizon) such that the supervision-aware cooperative AV is not at the merge point at with probability 1. So the event , i.e. cannot occur. Hence the above upper bound improves from the previous one in Corollary 1 with the term.

In the worst case, the order of HVs and AVs (with respect to the merge point) is adversarially distributed, such that (1) all the HVs are asymmetrically closer than all AVs to the merge point (when moving in the forward direction), and (2) all HVs have non-overlapping reachability zones. In this case the AVs cannot influence any HV’s behavior at the merge point, so it is possible for any HV to trigger the supervision condition. Furthermore, the HVs’ combined reachability zone is achieving its maximum coverage over the ring.

The event that an in-ring vehicle triggers supervision is thus due to the HVs:


The union bound (Boole’s inequality) gives


Note that is distinct from the reachability time horizon , and in practice would likely be much smaller. It simply corresponds to the case in which the planning interval is too short for the AV to avoid the merge point. For example, if a merging AV appears on the on-ramp when an in-ring, supervision-aware AV is moments before the merge point and moving quickly, there may not exist sufficient time for the in-ring AV to brake—and thus block any HVs behind it from interfering with the merge—before its momentum carries it past the merge point.

V-E In-ring reachability for mixed HVs and supervision-aware AVs: typical case

Recognizing that the situation described in Corollary 2 is adversarial, additional improvement in supervision scalability can be obtained with a typical (less adversarial) distribution on and .

In a typical mixed autonomy case, the order of HVs and AVs relative to the merge point is interspersed. As the AVs are supervision-aware and fully cooperative, the AV closest to the merge point may stop to accommodate a merge, and thus any vehicle after that AV cannot trigger supervision.

To model this, let be the distance between the in-ring AV () and the merge point . Assume . Let the order statistics . Assume that each vehicle’s location is distributed uniformly at random along the length of the ring. Without loss of generality and for ease of exposition, consider the restriction to and the shorthand notation .

We begin by considering two different distribution schemes for the in-ring vehicles. An additional distribution of interest is included in the appendix.

Fig. 3: Visual representations of the double integration in Equation 17. The two axes represent the random variables and . The bold black border encapsulates the area captured by the double integral, and the green triangle represents the portion of the ring for which HVs that would otherwise pose a threat to a merging vehicle will be blocked. The colors represent the different vehicles’ conditional distributions described. (a) The uniform distribution case. The even yellow indicates an equal likelihood of HVs at all ring positions relative to the ‘blocking’ AV. (b) A nonuniform ‘platoon’ distribution. The red indicates a higher likelihood of HVs immediately behind the ‘blocking’ AV and the blue indicates a lower likelihood of HVs immediately preceding it. Thus in this second case case the AV produces a greater supervision scaling effect in the red zone, but is somewhat adversarial in that the HVs between the AV and the merge point are more likely to be closer to the merge point.

Uniform vehicle distribution

First consider the case in which AV locations are independent from each other (i.e., for all ) and all s’ and s’ locations are independent . In this case, for all , and we have from order statistics

, so the probability density function for

is , where denotes the distance of the closest AV from the merge point [50]. We also assume , so the probability density function for is . By independence, we have the joint probability density function (and 0 otherwise).

It can therefore be written:


where in the current setting because any HV further away from the merge point than the nearest cooperative AV will not trigger the supervisor. Therefore,


One can compute as


A visual representation of this double integration is provided in Figure 3. In both cases one benefit of supervision-aware AVs is represented by the green triangle. It corresponds to the portion of the ring at which in-ring HVs’ reachability zones include the merge point, but which are blocked by an in-ring cooperative AV, and thus do not pose a danger to the merging vehicle.

An additional benefit may come in the form of the distribution shift of in-ring vehicles that the supervision-aware can cause, illustrated in the image with the color gradients. Here, not only does the AV block certain HVs from threatening the merge, but it can also shift the in-ring vehicle distribution such that an HV is less likely to be in the vicinity in the first place.

The image can thus be interpreted geometrically by imagining the colors as representing a third dimension on the image (as if rising out of the page towards the reader). Taking the red to represent a greater volumne of probability mass, we see that the green triangle would also have more probability mass, and thus translate into greater supervision scaling than in the uniform distribution case. At the same time, this distribution includes an adversarial element: the HVs between the AV nearest the merge point and the merge point itself are more likely to be distributed closely to the merge point, which increases the odds that they trigger the in-ring reachability condition. In Appendix 

Ex. 3: Supervision-Aware AVs Typical Case, we provide another related, but less adversarial distribution, which provides further improvement in the upper bound.

Previously (in the connected AV case, without supervision-aware AVs), , so the absolute improvement in each term inside the union bound is


and the relative improvement of each term inside the union bound is


where as we normalize . This monotonically increases in and —that is, the relative improvement increases with larger reachability zones or more AVs.

As a numerical example, when and , the absolute probability (given via Equation 17) is 0.078. The previous absolute probability was 0.1, and so the absolute improvement (from eq. 18) is 0.022, and the relative improvement (from Equation 19) is 0.219, or 21.9%.

Interpreting this in terms of the reachability bound


each term inside the sum is roughly 21.9% less than it was previously. Thus, this effectively drops 21.9% of terms from the set that the sum is taken over, and hence produces an additional 21.9% improvement gain relative to the upper bound.

A non-uniform vehicle distribution

As the AV nearest the merge point may slow down or stop the HVs behind it, consider the scenario in which HVs may be more likely to be behind—and close to—the first AV. For example, imagine an AV leading a platoon of HVs.

For simplicity and to attain a closed-form integration, we assume the probability density function of given is


i.e., Truncated Exp(1) r.v. on [0,1] where the r.v. is the asymmetric distance of HV to AV (following the traffic flow).

For tractable computation assume

to be the truncated Exp(S) r.v. on [0,1]. The resulting joint distribution is


We can compute the probabilities of interest using the same integration as in Case 1, but with a different joint probability density function. Closed-form probabilities can be obtained as


As a numerical example, with , we obtain a ~44% improvement from the previous upper bound when and a ~70% improvement when .

See Table I and Table II for the numerical results with and respectively. Numerical results for another distribution are included in the appendix. In reality, the true distribution is likely to lie somewhere between the two cases outlined in this section. Note the relative improvement column is computed by comparing the column with in the connected AV scenario. We provide

in the supervision-aware setting given the current joint p.d.f. to illustrate the shift in probability distribution.

Relative improvement
2 0.0914 0.0749 25.12%
3 0.0894 0.0675 32.46%
4 0.0888 0.0614 38.60%
5 0.0891 0.0564 43.64%
6 0.0902 0.0522 47.80%
7 0.0917 0.0487 51.29%
8 0.0934 0.0457 54.26%
9 0.0952 0.0432 56.85%
10 0.0970 0.0409 58.13%
11 0.0989 0.0388 61.18%
12 0.1007 0.0370 63.03%
13 0.1025 0.0353 64.72%
14 0.1042 0.0337 66.27%
15 0.1058 0.0323 67.71%
16 0.1074 0.0310 69.03%
TABLE I: Relative reduction in supervision time for supervision-aware AVs relative to connected AV baseline for non-uniform distribution when . The column second from the left indicates the probability distribution shift from the connected AV case, and is thus an indicator of the distribution shift caused by the AVs. The column second from the right is the overall absolute supervision probability that the supervision-aware AVs typically achieve, and is thus a function of both gains via the implicit distribution shift indicated in the prior column and gains via the cooperative blocking behavior itself. The rightmost column indicates the relative % improvement over the connected AV upper bound .
Relative improvement
2 0.0086 0.0084 16.00%
3 0.0081 0.0078 21.86%
4 0.0077 0.0074 26.43%
5 0.0074 0.0070 29.90%
6 0.0072 0.0067 32.52%
7 0.0071 0.0065 34.53%
8 0.0070 0.0064 36.12%
9 0.0069 0.0063 37.40%
10 0.0069 0.0062 38.47%
11 0.0069 0.0061 39.38%
12 0.0069 0.0060 40.17%
13 0.0069 0.0059 40.88%
14 0.0069 0.0058 41.51%
15 0.0069 0.0058 42.10%
16 0.0069 0.0057 42.63%
TABLE II: Relative reduction in supervision time for supervision-aware AVs relative to connected AV baseline for non-uniform distribution when . The column second from the left indicates the probability distribution shift from the connected AV case, and is thus an indicator of the distribution shift caused by the AVs. The column second from the right is the overall absolute supervision probability that the supervision-aware AVs typically achieve, and is thus a function of both gains via the implicit distribution shift indicated in the prior column and gains via the cooperative blocking behavior itself. The rightmost column indicates the relative % improvement over the connected AV upper bound .

Vi Experimental Results

Fig. 4: Experiment 1 results. Under typical circumstances, the supervision time required is significantly less than 100% (naive baseline), and even drops below 20% in favorable cases (lower is better). Additionally, we see the theoretical bounds hold, and that the reachability-based approach to supervision allows for substantial scaling improvements compared to the upper limits. For reference, the orange lines illustrate the heightened requirements were supervision to depend on in-ring reachability alone.

We validate the theoretical insights above and investigate supervision scaling with three experiments. These were built using the Simulation of Urban Mobility (SUMO) open-source software package, and Flow, a deep RL framework for mixed autonomy traffic 

[51, 5].

Vi-a Experiment 1: Human Vehicles

The first experiment populates a rotary approximately 1200 meters in circumference with 18 simulated HVs for 2500 timesteps (each corresponding to 0.1 seconds of simulation time). Once the simulation begins, 200 HVs per hour (in simulation time) merge into the rotary in accordance with the 270-degree route described earlier. This amounts to a steady traffic flow.

The reachability time horizon , number of in-ring vehicles, and rotary circumference are all varied. This allows us to conduct a baseline investigation of the reachability-based supervision scheme. We record the upper bounds described in Lemma 1 and 2, as well as the true proportion of time for which a supervisor would be active.

HVs used the Intelligent Driver Model (IDM) controller, calibrated to highway driving, which produces realistic acceleration profiles and plausible behavior [47, 48]. To focus on the potential for collisions, the SUMO implementation of each rotary consists of a single lane. The network’s speed limit is 50 miles per hour, to align with highway speeds.

Figure 4 demonstrates this experiment’s results. In all cases, the proportion of time that the point is within at least one vehicle’s reachable set is noticeably lower than the theoretical upper bound, which supports the lemmas. This is due to significant overlap in the vehicles’ reachable sets. This demonstrates that the reachability-based supervision provides an efficient alternative to full-time supervision.

Vi-B Experiment 2: Connected Vehicles

Fig. 5: Results for Experiment 2. The reduction in supervision requirements is relatively linear for both the theoretical bounds and observed behavior as the proportion of connected AVs in the system increases.

This experiment investigate the linear scaling potential of connected AVs, and by association the relevant corollaries. The setup is the same as in the previous experiment, except the 18 in-ring vehicles are replaced piecemeal by connected AVs to observe their effect on supervision scaling.

The results are shown in Figure 5. As is expected from the underlying theory, the supervision upper bound scales linearly with the number of in-ring connected AVs. The empirical supervision requirements also appears to scale linearly, albeit with noise.

Vi-C Experiment 3: Supervision-Aware AVs

This experiment compares the effect of limited penetration of two different types of AVs: the connected AVs from the previous experiment and supervision-aware AVs. Given the prospects for a gradual rollout of autonomous or near-autonomous vehicles (at least compared to a rapid rollout), this low-density regime is of particular interest.

The supervision-aware AV controller is modeled as a policy trained via the Trust Region Policy Optimization (TRPO) RL method  [52]

. The policy consisted of a multilayer perceptron with two hidden layers, each with 64 nodes. During training, the agent receives a reward at each timestep that is a function of its speed and is penalized each timestep that the supervisor is active. To simulate the low-density setting, the system for this experiment includes 32 vehicles total on a ring 3200 meters in circumference. The time horizon

remains at 8 seconds. The number of AVs ranges from 1 to 5, and RL results are averaged over 7 seeds.

Unlike the connected AV case, the RL-based AVs do not have encoded a ‘collapsed’ reachability zone. Despite this, Figure 6 shows that the RL vehicles outperform the connected AVs in terms of supervision scaling. In all cases, the policy learned via RL achieves double-digit reductions in supervision requirements compared to the connected AVs. Thus this experiment demonstrates that teaming of AVs enables supervision time sublinear in AV adoption.

Fig. 6: Results for Experiment 3. Cooperative behavior, analyzed using deep RL, indicates that in-ring AVs could substantially reduce supervision time of the on-ramp AV, even for low-adoption regimes.

Vii Conclusion

This work explored one perspective on improving safety in mixed autonomy settings by investigating human supervision of AVs in a simulated merging task. We investigated the question: Can we do better than the present-day industry standard of persistent supervision (consider the Tesla Autopilot)? Our findings indicate that clever allocation of human resources for supervision tasks may ease near-term adoption of imperfect autonomous agents in safety-critical environments. Future work could use simulations to empirically investigate the distribution produced, and could also evaluate the impact of the supervision-aware AVs’ cooperative behavior on traffic flows. More generally, continued research in this direction could adopt alternative reachability analysis tools to investigate supervision scaling’s potential in other mixed autonomy settings.


We are grateful to Dr. Eric Horvitz for inspiring our exploration of AV teaming. We would also like to thank Zhongxia Yan for discussions and assistance with experimental design and Jiaqi Zhang for discussions and assistance with the theoretical portion of the work.


Useful Statements

We provide useful probability statements that we adopt in our proofs. The statements can be found in standard probability textbook [50].

-a Binomial distribution parameterized by a Poisson random variable


-B Poisson distribution of summed independent random variables


Ex. 3: Supervision-Aware AVs Typical Case

Fig. 7: Visual representations of the vehicle’s conditional distribution in the case described in the appendix.

In Section V-E

we assume both the HVs behind and in front of the first AV follow a truncated exponential distribution. Such a distribution is slightly absurd (and adversarial) for HVs in front of the first AV because it implicitly says that those HVs have a greater probability of being close to the merge point than far away (close to the first AV), given that we define the asymmetric distance following the traffic flow direction. A less adversarial distribution would say the HVs before the first AV follows uniform distribution between the merge point and the first AV. In fact, here we consider such a joint distribution for

and s, where the conditional probability density function of given is


where is set so that the conditional probability integrates to 1 for each . That is, this is a truncated Exp(1) r.v. on [0,1] where the r.v. is the asymmetric distance of HV to AV (following the traffic flow) if the HV is behind the first AV, but conditionally uniform if the HV is before the first AV.

Again, for tractable computation assume to be the truncated Exp(S) r.v. on [0,1]. The resulting joint distribution is


Fixing , the corresponding probabilities for different is given in Table III, where relative improvement is calculated by comparing the current supervision-aware scenario against the connected-vehicle scenario where .

Relative improvement
2 0.0805192 0.0639507 36.05%
3 0.0813434 0.0594455 40.55%
4 0.0828135 0.0554341 44.57%
5 0.084711 0.0519237 48.08%
6 0.0868491 0.0488549 51.14%
7 0.0890942 0.0461509 53.85%
8 0.0913594 0.0437418 56.26%
9 0.0935922 0.0415718 58.43%
10 0.0957621 0.0395984 60.40%
11 0.0978524 0.0377901 62.21%
12 0.0998548 0.036123 63.88%
13 0.101766 0.0345786 65.42%
14 0.103586 0.0331427 66.86%
15 0.105317 0.0318034 68.20%
16 0.106961 0.0305513 69.45%
TABLE III: Supervision scaling for the case described in the appendix when

Table IV contains the corresponding result when fixing .

Relative improvement
2 0.00714992 0.00696878 30.31%
3 0.00694662 0.00670019 33.00%
4 0.00680191 0.00648492 35.15%
5 0.00670619 0.00631588 36.84%
6 0.00664762 0.00618278 38.17%
7 0.00661575 0.00607607 39.23%
8 0.00660267 0.00598828 40.12%
9 0.00660279 0.00591401 40.86%
10 0.00661227 0.00584953 41.50%
11 0.0066285 0.00579226 42.08%
12 0.0066497 0.00574043 42.60%
13 0.0066746 0.00569277 43.07%
14 0.00670233 0.00564842 43.52%
15 0.00673222 0.00560671 43.93%
16 0.00676381 0.00556716 44.33%
TABLE IV: Supervision scaling for the case described in the appendix when .

We can see that we relative improvement is indeed even larger than that in Section V-E, as the distribution of s before the first is less adversarial.

The visual representation of the conditional p.d.f. for this case can be seen in Figure 7.


  • [1] Wu, C. Learning and Optimization for Mixed Autonomy Systems-A Mobility Context. (UC Berkeley,2018)
  • [2] Allianz Partners Self-driving Cars Hit a Speedbump; Interest in Autonomous Vehicle Technology Slows Down. (2018,9),
  • [3] U.S. DoT Federal Highway Administration Highway Statistics 2019: Licensed Drivers. (U.S. Bureau of Transportation Statistics,2019),
  • [4] Cox Automotive Autonomous Vehicle Awareness Rising, Acceptance Declining. (2018,8),
  • [5] Wu, C., Kreidieh, A. R., Parvate, K., Vinitsky, E., & Bayen, A. M. Flow: A Modular Learning Framework for Mixed Autonomy Traffic. IEEE Transactions on Robotics. (2021)
  • [6] Barth, M. & Boriboonsomsin, K. Real-world carbon dioxide impacts of traffic congestion. Transportation Research Record. 2058, 163-171 (2008)
  • [7] Bellemare, M., Candido, S., Castro, P., Gong, J., Machado, M., Moitra, S., Ponda, S. & Wang, Z. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature. 588, 77-82 (2020)
  • [8] Hambling, D. AI outguns a human fighter pilot. (Elsevier,2020)
  • [9] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G. & Others Human-level control through deep reinforcement learning. Nature. 518, 529-533 (2015)
  • [10] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A. & Others Mastering the game of go without human knowledge. Nature. 550, 354-359 (2017)
  • [11] Bacci, E. & Parker, D. Probabilistic guarantees for safe deep reinforcement learning. International Conference On Formal Modeling And Analysis Of Timed Systems. pp. 231-248 (2020)
  • [12] Bouton, M., Karlsson, J., Nakhaei, A., Fujimura, K., Kochenderfer, M. & Tumova, J. Reinforcement learning with probabilistic guarantees for autonomous driving. ArXiv Preprint ArXiv:1904.07189. (2019)
  • [13] Jula, H., Kosmatopoulos, E. & Ioannou, P. Collision avoidance analysis for lane changing and merging. IEEE Transactions On Vehicular Technology. 49, 2295-2308 (2000)
  • [14] Zhou, M., Qu, X. & Jin, S. On the impact of cooperative autonomous vehicles in improving freeway merging: a modified intelligent driver model-based approach. IEEE Transactions On Intelligent Transportation Systems. 18, 1422-1428 (2016)
  • [15] Yu, H., Tak S., Park, M., & Yeo, H. Impact of autonomous-vehicle-only lanes in mixed traffic conditions. Transportation research record. 2673, 430-439 (2019)
  • [16] Erdogan, S., Yilmaz, I., Baybura, T., & Gullu, M. Geographical information systems aided traffic accident analysis system case study: city of Afyonkarahisar. Accident Analysis & Prevention, 40(1), pp. 174-181 (2008)
  • [17] Rios-Torres, J. & Malikopoulos, A. Automated and cooperative vehicle merging at highway on-ramps. IEEE Transactions On Intelligent Transportation Systems. 18, 780-789 (2016)
  • [18] Drew, D.S. Multi-agent systems for search and rescue applications. Current Robotics Reports. pp. 1-12 (2021)
  • [19] Dahiya, A., Akbarzadeh, N., Mahajan, A., & Smith, S.L. Scalable operator allocation for multi-robot assistance: a restless bandit approach. IEEE Transactions on Control of Network Systems. pp. 1 (2021)
  • [20] Cummings, M., Bruni, S., Mercier, S. & Mitchell, P. Automation architecture for single operator, multiple UAV command and control. (Massachusetts Inst Of Tech Cambridge, 2007)
  • [21] Cummings, M., Nehme, C., Crandall, J. & Mitchell, P. Predicting operator capacity for supervisory control of multiple UAVs. Innovations In Intelligent Machines-1. pp. 11-37 (2007)
  • [22] Kidwell, B., Calhoun, G., Ruff, H. & Parasuraman, R. Adaptable and adaptive automation for supervisory control of multiple autonomous vehicles. Proceedings Of The Human Factors And Ergonomics Society Annual Meeting. 56, 428-432 (2012)
  • [23] Humann, J. & Pollard, K. Human factors in the scalability of multirobot operation: A review and simulation. 2019 IEEE International Conference On Systems, Man And Cybernetics (SMC). pp. 700-707 (2019)
  • [24] Conesa-Muñoz, J., Soto, M., Santos, P. & Ribeiro, A. Distributed multi-level supervision to effectively monitor the operations of a fleet of autonomous vehicles in agricultural tasks. Sensors. 15, 5402-5428 (2015)
  • [25] Olsen, D. & Goodrich, M. Metrics for evaluating human-robot interactions. Proceedings Of PERMIS. 2003 pp. 4 (2003)
  • [26] Olsen Jr, D. & Wood, S. Fan-out: Measuring human control of multiple robots. Proceedings Of The SIGCHI Conference On Human Factors In Computing Systems. pp. 231-238 (2004)
  • [27] Swamy, G., Reddy, S., Levine, S. & Dragan, A. Scaled autonomy: enabling human operators to control robot fleets. 2020 IEEE International Conference On Robotics And Automation (ICRA). pp. 5942-5948 (2020)
  • [28] Hickert, C. & Wu, C. Scalability of safe supervision of autonomous vehicles in mixed traffic. 2022 IEEE International Conference On Robotics And Automation (ICRA) (in submission). (2022)
  • [29] Rogers, R. & Monsell, S. Costs of a predictible switch between simple cognitive tasks.. Journal Of Experimental Psychology: General. 124, 207 (1995)
  • [30] Haring, K., Ragni, M. & Konieczny, L. A cognitive model of drivers attention. Nele Rußwinkel— Uwe Drewitz— Hedderik Van Rijn (eds.). pp. 275 (2012)
  • [31] Althoff, M., Frehse, G. & Girard, A. Set Propagation Techniques for Reachability Analysis. Annual Review Of Control, Robotics, And Autonomous Systems. 4 (2020)
  • [32] Kurzhanski, A. & Varaiya, P. Ellipsoidal techniques for reachability analysis. International Workshop On Hybrid Systems: Computation And Control. pp. 202-214 (2000)
  • [33] Liebenwein, L., Baykal, C., Gilitschenski, I., Karaman, S. & Rus, D. Sampling-based approximation algorithms for reachability analysis with provable guarantees. (2018)
  • [34] Bansal, S., Chen, M., Herbert, S. & Tomlin, C. Hamilton-Jacobi reachability: A brief overview and recent advances. 2017 IEEE 56th Annual Conference On Decision And Control (CDC). pp. 2242-2253 (2017)
  • [35] Bahati, G., Gibson, M. & Bayen, A. Multi-Adversarial Safety Analysis for Autonomous Vehicles. (2020)
  • [36] Leung, K., Schmerling, E., Zhang, M., Chen, M., Talbot, J., Gerdes, J. & Pavone, M. On infusing reachability-based safety assurance within planning frameworks for human–robot vehicle interactions. The International Journal Of Robotics Research. 39, 1326-1345 (2020)
  • [37] Saunders, W., Sastry, G., Stuhlmueller, A. & Evans, O. Trial without error: Towards safe reinforcement learning via human intervention. ArXiv Preprint ArXiv:1707.05173. (2017)
  • [38] Berkenkamp, F., Turchetta, M., Schoellig, A. & Krause, A. Safe model-based reinforcement learning with stability guarantees. ArXiv Preprint ArXiv:1705.08551. (2017)
  • [39] Chow, Y., Nachum, O., Duenez-Guzman, E. & Ghavamzadeh, M. A lyapunov-based approach to safe reinforcement learning. ArXiv Preprint ArXiv:1805.07708. (2018)
  • [40] Cahill, A. Catastrophic forgetting in reinforcement-learning environments. (University of Otago,2011)
  • [41] Cobbe, K., Klimov, O., Hesse, C., Kim, T. & Schulman, J. Quantifying generalization in reinforcement learning.

    International Conference On Machine Learning

    . pp. 1282-1289 (2019)
  • [42] McGuckin, N. & Fucci, A. Summary of travel trends: 2017 national household travel survey. (US Department of Transportation, Federal Highway Administration,2018)
  • [43] U.S. DoT Federal Highway Administration Average Annual PMT, VMT Person Trips and Trip Length by Trip Purpose. (U.S. Bureau of Transportation Statistics,2018),
  • [44] U.S. DoT Federal Highway Administration & Federal Transit Administration 2015 Status of the Nation’s Highways, Bridges, and Transit Conditions & Performance Report to Congress. (Government Printing Office,2016)
  • [45] Johns, M., Mok, B., Sirkin, D., Gowda, N., Smith, C., Talamonti, W. & Ju, W. Exploring shared control in automated driving. 2016 11th ACM/IEEE International Conference On Human-Robot Interaction (HRI). pp. 91-98 (2016)
  • [46] Sugiyama, Y., Fukui, M., Kikuchi, M., Hasebe, K., Nakayama, A., Nishinari, K., Tadaki, S. & Yukawa, S. Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam. New Journal Of Physics. 10, 033001 (2008)
  • [47] Treiber, M., Hennecke, A. & Helbing, D. Congested traffic states in empirical observations and microscopic simulations. Physical Review E. 62, 1805 (2000)
  • [48] Treiber, M. & Kesting, A. Traffic flow dynamics. Traffic Flow Dynamics: Data, Models And Simulation, Springer-Verlag Berlin Heidelberg. (2013)
  • [49] Gross, D. Fundamentals of queueing theory. John Wiley & Sons. (2008)
  • [50] Grimmett, G. R., and Stirzaker, D. R.. Probability and random processes. Oxford university press, 2020.
  • [51] Lopez, P., Behrisch, M., Bieker-Walz, L., Erdmann, J., Flötteröd, Y., Hilbrich, R., Lücken, L., Rummel, J., Wagner, P. & Wießner, E. Microscopic Traffic Simulation using SUMO. The 21st IEEE International Conference On Intelligent Transportation Systems. (2018),
  • [52] Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P. Trust region policy optimization. International Conference On Machine Learning. pp. 1889-1897 (2015)