I Introduction
Given the complexity, chaos, and unpredictability of realworld environments, safety is a critical challenge for autonomous systems. This is particularly the case for mixed autonomy systems, which refer to systems in which humans and machines both exhibit control in the same environment [1]. Indeed, in a 2018 Allianz Global Assistance survey investigating decreasing interest in selfdriving cars (one of the mostdiscussed mixed autonomy systems today), over 70% of respondents cited safety as a reason for their lack of interest, a rise from the 65% figure found a year earlier to the same question [2]. Thus, improving safety in mixedautonomy systems is not only an end in itself, but also is a prerequisite for unlocking the benefits that such systems may offer.
The question becomes: How can we achieve some level of safety and performance even when formal guarantees are elusive in a chaotic world? The research described here takes these matters as a point of departure by asking how we can mitigate risk via human supervision of nonhuman autonomous agents.
Ia Contributions
This work provides one perspective on the topic of improving safety in mixed autonomy settings by investigating the online human supervision of AVs in an illustrative merging task. To this end, we present a number of contributions that—to the best of our knowledge—have not been addressed in previous work.

We formalize the ‘scaling supervision’ problem for online autonomous agents and propose two related ‘scalability’ metrics: expected supervisor control time and expected required number of supervisors.

We propose a reachability analysisbased method for AV supervision. We provide an upper bound on the expected supervision time required for a supervisor to monitor a given AV’s merge, as well as a closedform expression for the probability that the number of supervisors is insufficient when merging tasks are pooled together and distributed across multiple human supervisors. We empirically validate the findings using traffic scenarios derived from realworld rotaries.

Interestingly, we show, both analytically and numerically, leveraging order statistics and deep reinforcement learning (RL), respectively, that cooperation of AVs enables supervision time sublinear in AV adoption.
Ii Background
Iia Motivation
Safety in AVs
The importance of safety in any deployment of AVs is sufficient justification for investigating supervision of autonomous agents in itself. If AVs replaced even a modest portion of US vehicles, the number of lives depending on AV safety systems could number in the thousands [3].
Moreover, AVs are not the only autonomous system with the potential to inflict human harm. The authors anticipate that the approach and lessons learned in this research can inspire future work to improve the safety and performance of autonomous agents in settings beyond society’s roadways, such as in manufacturing and other cyberphysical systems.
AVs’ Benefits to Traffic Systems
While ensuring a sufficient level of safety is an important end in itself, doing so is foundational to the wider adoption of AVs, which can in turn bring benefits including decreased traffic congestion and traffic emissions. AVs can achieve outsized positive impact without dominating roadways. Using AV policies learned via RL, previous work shows that even when AVs account for only 510% of vehicles in a traffic network, they can still boost the collective average speed of all vehicles in the network by up to 57% in idealized settings [5]. Researchers found that improving traffic flows in California can reduce highway carbon dioxide emissions by up to nearly 20%; this is particularly significant given that transportation accounts for approximately a third of all US carbon dioxide emissions [6].
Merging
The case of AVs merging onto a highway with traffic is an appropriate task for investigating how we can scale supervision for a number of reasons. First, it is a common occurrence that can pose difficult control and coordination challenges for AVs [13, 14]. And given the speed at which freeway merges occur, this task can result in deadly crashes. Merging also presents unique challenges in the mixed autonomy settings. Were a system to consist entirely of AVs, a solution may exist via decentralized or centralized coordination, but that is much more difficult—if not impossible—for the foreseeable future, in which human drivers still populate the roadways [17].
Other applications
The scalable supervision framework, concepts, and formalization presented here are adaptable to other autonomous machine settings. Within the mixedautonomy traffic environment, one could consider scalable supervision as a means of supporting dedicated AV lanes on freeways. These lanes have been shown in simulation to improve traffic efficiency while also worsening onroad safety [15]. The potential speed difference between AVonly and standard lanes present a merging problem similar to the case studied here. Scalable supervision would also be useful for the case of accidentprone ‘hotspots’ (construction sites, school zones, etc.) where concerns arise about the potential behavior of AVs [16]. Each of these is a practical instance of the scenario studied here.
More broadly, reachabilitybased supervision scaling that enforces safety constraints is applicable to domains as varied as warehouses, ports, airports, satellite orbits, and disaster response.
IiB Related Work
This research draws from interesting and important related work in at least four areas of the literature: humanAI teaming, supervision and cognitive models, reachability analysis, and reinforcement learning.
HumanAI teaming
Relevant work includes that from the humanrobot interaction subfield in multiagent systems, which explores how to reduce a single operator’s cognitive load from scaling linearly with the number of robots. A significant portion of this work investigates the possibility for welldesigned interfaces to reduce task burden or for learned models of human preferences to automate humanmachine control handoff (also called, sliding, shared, or adjustable autonomy) [18]. Our work expands this effort by considering the shared autonomy situation in which strict safety guarantees must be maintained, and extends this line of work into the mixed autonomy traffic setting.
Research such as that described in [19] seeks to optimize the allocation of limited human assistance (via a decision support system) to multiple robots performing tasks in a given environment. The machines in these works benefit from human assistance, but do not exhibit the strict safetycritical requirements of AVs in traffic. As such, these works also place greater emphasis on computing and comparing the various assistance permutations. Such comparisons are not necessary given our binary approach to safety and strict safety requirements (i.e., no suboptimal safety configurations are permitted).
Scalable supervision and cognitive models
To the best of our knowledge, previous work concerned with decreasing the time burden of human supervision of autonomous tasks tends to be directed at empirical studies of a single human monitoring multiple UAVs (unmanned aerial vehicles) [20, 21, 22]. Relatedly, humanrobot interaction theory developed in the early 2000s considers ‘fanout,’ the number of robots that a human can effectively control [21, 22, 23, 24, 25, 26]. While philosophically aligned with our work on some levels, in the AV merge setting we isolate a single task to analyze and face safety challenges unique to the mixed autonomy setting. Additionally, the mixed autonomy AV setting considered here is arguably higherstakes than a typical UAV setting, as it includes human drivers operating in the same vicinity; crashes would more likely involve human fatalities.
A closely related work is that on scaled autonomy by Swamy, et al., in which the authors investigate how to assist an operator in selecting which robots in a fleet most require teleoperation [27]. However, our work differs by seeking to minimize the supervisor’s active control time (the teleoperator in that work is never idle) and the mixed autonomy setting at the heart of our work must maintain strict safety standards.
This work extends on previous scalable supervision research currently in submission to the IEEE International Conference on Robotics and Automation [28].
Reachability analysis
At its most basic, reachability analysis is about identifying the set of states that a dynamical system could enter—the ‘reachable states’ or ‘reachable set’—given all admissible inputs and parameters [31]. A variety of methods exist for doing so; a common tradeoff among these is that between approximation and computational inefficiency [32, 33]. Methods such as samplingbased approaches may require less computation, but they are not fully conservative, meaning they are not guaranteed to compute the reachable set [33]. For safetycritical applications, such approximations may be unacceptable.
One method for computing the exact reachable set is HamiltonJacobi (HJ) reachability. HJ methods can handle nonlinear system dynamics and allow for formal treatment of bounded disturbances, but suffer exponential computational complexity in terms of the number of state variables [34]. Fortunately, the vehicle dynamics and merging problem structure allow us to adopt a fully conservative approach while avoiding the computational complexity. The proposed reachabilitybased method calculates the exact reachable set for each vehicle in time. However, we note that utilizing other mechanisms of computing conservative reachable sets (such as HamiltonJacobi reachability) could be used future work generalizing to more scenarios [34]. Other work that seeks to integrate reachability analysis and AVs relaxes the conservatism of their approaches by making assumptions about human driving behaviors [35], or by not accounting for the full range of system dynamics [36]. The need remains for an approach that maintains strict safety guarantees.
Reinforcement learning (RL)
RL agents learn a control policy by interacting with their environment and observing the states they visit and rewards (or penalties) they accumulate along the way. The introduction of deep neural network architectures has allowed RL agents to achieve superhuman performance in a variety of domains, but at the expense of formal convergence guarantees
[11, 12].Previous work on ‘Safe RL’ has largely focused on supervising an RL agent’s learning process, rather than online (synchronous) supervision at test time. Given the amount of training required for RL agents, this supervision can require weeks of human labor [37]. Other methods utilize Lyapunov functions to achieve safer exploration during training, but here again the work centers around the agent’s learning rather than its performance at test time [38, 39]. Yet the safety of alreadytrained deep RL systems is far from assured, given issues such as catastrophic forgetting and generalizing to previouslyunseen states or new tasks [40, 41]. In this work, we leverage RL to model a performant and yet imperfect (unsafe) AV. We propose the combination of reachability analysis and human supervision as one means of addressing safety during execution time of a controller trained using RL.
Iii Implications for Autonomous Driving
While investigating the case of merging AVs is an interesting problem in itself, it also presents a compelling opportunity for achieving significant scalability. It is worth pausing for a moment to appreciate the broader context of these improvements and the leveraging effect that can be achieved in the AV setting by adopting autonomy with supervision assistance for the driver. Not only does this provide further motivation for investigating scalable supervision of mixed autonomy systems in the specific context of AVs, but it also serves as a case study for determining the potential for supervision scaling to achieve practical gains in a given setting. In so doing, this analysis provides a framework that could be used to evaluate other tasks for scalable supervision.
Iiia Improved safety and user experience
AVs provide a compelling case for investigating scalable supervision because short, particularly dangerous events—such as merging or navigating a construction zone—are interspersed within longer, easier events—such as cruising a highway. This is not only due to the fact that a human is present and able to assume control of the vehicle. Imagine a future where humans can trust their AVs to handle mundane driving like occurs on an interstate highway. An intervention that prevents a human driver from lifting a finger during a fivesecondslong merge may thus not only spare the person interruption for those five seconds, but indeed may have in effect spared her interruption for 20 minutes if the merge interlinked 10minute stretches on two separate highways.
Certainly, unforeseen events can occur at any point in the driving process, but that is beyond the scope of this research, which focuses on the illustrative and prespecified event of merging. Other instances which present increased risks or unpredictability but can be anticipated include construction zones, school zones, corridors particularly congested with foot traffic, and known traffic accident hotspots [16].
IiiB Informed fleet operations
Beyond the improved driving experience and safety for an individual, in the further future such scaling might be a crucial element enabling the widespread deployment of autonomous vehicles, and thus crucial for achieving the environmental and social benefits outlined previously. Consider a company that seeks to provide a fully handsoff driving experience. Even if the company’s AVs are generally quite effective at navigating roadways, given the difficulty of some tasks like merging and consumers’ desire for additional safety, the firm may look for additional safety measures. If it were sufficiently economical (and assuming lowlatency connections, etc.), one way to do so might be to hire remote supervisors to monitor the AV fleet and provide realtime control when necessary. Thus, the question of interest for this company would not only be reductions in the number of interventions and the time required over some baseline, but how many remote supervisors need to be hired as a whole to supervise the entire fleet of AVs.
One principal aim of this article is to provide rigorous methodology for informing such estimates, by refining the estimated time required for supervision of the AVs. As an illustration, the outcome of the proposed methodology, tailored to the locale and services of an AV operator, could be incorporated into the following estimate for fleet operations. Based on some moderate assumptions, we might expect that one of these remote supervisors might be able to supervise between 18 and 66 AVs for the task of highway merging in the United States. The logic behind this estimate is outlined here. In the US approximately 33% of total vehicle miles traveled (VMTs) occur on interstates, other freeways, or expressways, where “access and egress points are limited primarily to on and offramps.”
[42, 43, 44] Making the conservative assumptions that on these roads the average speed is 50mph and the average trip length is 15 miles, the total time for each trip is 0.3 hours. If the time it takes for a supervisor to assume control and execute a merge task safely is 60 seconds, and assuming one merge per trip, then a supervisor is needed for 5.55% of every trip. Taking the inverse of this, we see that—if tasks are allocated without gaps or delays—one supervisor could theoretically monitor merges for 18 highway vehicles. Since VMTs on interstates, other freeways, and expressways account for only a third of all VMTs, one supervisor could supervise up to 52 vehicles (since the majority never make a highway merge). These numbers increase to 22 highway vehicles and 66 vehicles overall if we assume an average freeway speed of 40mph.Iv Formalizing Scalable Supervision
In this section, we provide a formalization of scalable supervision and a delineation between static and dynamic supervision, and provide theoretical results.
Iva Formalizing Scalable Supervision
At its core, the two goals of scaling supervision are to reduce the number of necessary human interventions and to reduce the time required for those that must occur. One way to do this (our approach) is by utilizing reachability analysis to identify when a merging AV is in danger of colliding with another vehicle in the system, and activate human control of the AV at that time. It is also important to note that ‘safety’ in this paper is not synonymous with crashfree behavior. Since the safety backup for the AV is a human driver, the ultimate safety of the system remains subject to human fallibility.
To understand how our approach works, first imagine an AV on an onramp merging into a rotary in which a human vehicle (HV) is already driving. The objective is to prevent the two vehicles from colliding during the AV’s merge, so the reachability question, simply put, is whether it is possible for both the AV and HV to get to the merge point over some finite time horizon . Underlying the horizon is the intuition that a supervisor need not assume control of the AV if a potential collision is far in the future; it is enough to assume control before the potential crash but with enough time to avoid it comfortably. We can vary to be more or less cautious in any given scenario; in at least one setting humans required 58 seconds to safely assume control of a vehicle; this is a time window we adopt in experiments [45].
The reachability question naturally decomposes into two independent subquestions: (1) Can the AV reach the merge point, and (2) Can the HV reach the merge point? That is, for each vehicle, is the distance it would travel if it were to apply the maximum acceleration over the horizon greater than or equal to the distance between that vehicle’s current position and the merge point? If the answer to both subquestions is ‘yes,’ then human supervisor control of the AV must be activated to ensure safety. See Figure 2 for a visual representation.
We can write this more formally: the horizon beyond which the supervisor must assume control of a merging AV to guarantee avoiding collision with the nearest inring vehicle (denoted by the subscript) is
(1) 
where is the maximum distance vehicle can travel in time horizon and is a function of that vehicle’s initial velocity and its maximum acceleration , and is the distance between vehicle and the merge point . The road networks used for experimentation are limited to onelane roads and thus considering a vehicle’s maximum forward distance is sufficient. This amounts to the most conservative reachability for the given time horizon: there is no area that the vehicle could reach over time that is not within its reachability zone. Note that this maintains safety guarantees even when generalizing to cases in which 2dimensional motion is considered (such as by adding lane changes). The maximum distance reachability formulation still provides a conservative reachable zone because sidetoside motion only can reduce the distance along the road that the vehicle travels, rather than increase it.
IvB Scenario
Because this work focuses on merging, the experiments utilize a large rotary traffic network, which approximates a highway with multiple onramps and offramps. The ring road at the core of the structure is a wellstudied setting in traffic literature, and has been shown to mimic traffic congestion patterns that might occur on an infinite roadway [46]. The ring therefore emulates a highway, and the speed limit is set accordingly. Merging vehicles enter the system via the onramps, merge into the ring road, and then exit via offramps (see Figure 1).
Since this is a mixed autonomy system, we simulate both AVs and human vehicles (HVs). The HVs are modeled using the widely used Intelligent Driver Model (IDM), which has been shown to emulate the actual behavior of human drivers [47, 48]. Next, we model connected AVs—also using IDM—to investigate the supervision scaling implications of connected AVs. We will see that they allow for linear improvements in supervision scaling. Lastly, we include inring AVs controlled using a policy learned via RL. These represent supervisionaware, cooperative AVs—the RL agent is incentivized during learning to avoid triggering the merging vehicle’s supervisor, and thus learns to accommodate the incoming vehicle’s merge. Thus, using both theoretical and empirical results, we show how the teaming of AVs enables supervision time sublinear in AV adoption.
V Theoretical Results
By applying reachability analysis to the problem of AV merging, a number of bounds can be derived for the settings in which we are interested.
This analysis begins with a description of an upper bound on the supervision requirements for the scenario in which a group of remote supervisors manage merges for an arbitrarily large number of onramps, as well as a closedform expression for calculating the probability that AVs need supervision but cannot receive it. This is of particular interest because it allows for the characterization of the risk associated for a system with a fixed number of supervisors. Conversely, it allows one to calculate the number of supervisors necessary to achieve a desired level of supervision safety. After the provision of the upper bound, an analysis of the ‘typical case’ is provided, wherein the expected supervision scaling gains via cooperative, supervisionaware AVs is described.
One necessary component of the remote supervisor analysis is a characterization of , that is, the probability that the merge point falls within the reachable zone of at least one vehicle within the rotary. The analysis further below investigates this by considering how an upper bound on varies in the settings in which the inring vehicles are entirely HVs, a mix of HVs and connected AVs, and finally a mix of HVs and supervisionaware, cooperative AVs.
It is important to first note that we can rewrite Equation 1 using kinematics. We allow each vehicle to take any nonnegative velocity and only assume finite vehicle acceleration . Thus, preserving this conservative view of reachability requires accounting for the worstpossible case, and the reachable set for vehicle is .
This conservative approach to reachability has the ancillary benefit that vehicle ’s reachable distance in the given time horizon can be written simply:
(2) 
With this in mind, we turn to the challenge of characterizing the multiplesupervisor, multiplemerge case.
Va Multiple supervisors monitoring an AV fleet
We consider a future setting in which a team of human supervisors located remotely could monitor a fleet of AVs and assume remote control if necessary. A natural question would be: how many supervisors are necessary?
Theorem 1.
Suppose we have remote supervisors and onramps on which s appear and trigger the onramp supervision condition with arrival processes . Suppose the service rate of each remote supervisor follows .
The fraction of AVs that require supervision but cannot immediately receive it (and thus go unsupervised) is given by
(3) 
where , is the probability that an inring vehicle will trigger its supervision condition, and in order to have a valid steady state probability.
Proof.
Let
be the random variable denoting the number of AVs requiring control at an arbitrary point in time.
Allow different onramps to have different arrival rates . Because the supervision tasks are allocated to centralized group of remote supervisors, the problem reduces from a tasking situation involving separate queues to a singlequeue tasking case. Given Equation 27 in the appendix, the arrival rate of AVs requiring supervision follows . Set
As a specific example (to simplify notation), assume for some constant , then the AVs arriving on the onramp triggers supervision condition following the distribution (again, see Equation 27 in the appendix). Let be a random variable denoting the number of AVs that need to be supervised. We know where can be obtained from the inring vehicle’s reachability condition and will be derived further below.
By Equation 26 (see Appendix), we have that is the arrival process for AVs that need to be supervised. Then, the problem reduces to an queue with arrival process , service rate for each supervisor, and finite capacity . The finite capacity is due to the fact that when all supervisors are busy, any additional onramp AVs requiring supervision will will be rejected immediately. These cases cannot wait on the queue for future service as their needs are immediate; that is, to maintain safety, once the onramp and inring conditions are triggered for a given merge, a supervisor must immediately supervise the merge. Thus, the rejected cases that the pool of supervisors cannot immediately service represent dangerous situations.
For an queue with no waiting space (that is, customers who cannot be served immediately are turned away), both the steady state probability and loss formula are known [49]. The expression for the steady state probability is
(4) 
where . Then, represents the number of AVs requiring supervision, is the total number of supervisors, the arrival rate is set , and is kept as the service rate. The loss formula here represents the fraction of AVs that require supervision but cannot immediately receive it (and thus go unsupervised), and the result thus follows.
∎
Based on Equation (3), the expected number of supervisors needed is the smallest such that for some constant of interest . In short, this answers the question, ”How many supervisors do I expect to need?”, assuming the questioner can define their risk threshold .
In the above, observe that—due to the functional reduction of the inflows to a single task queue serviced by the supervisors—the exact number of onramps does not directly affect the steady state probability or loss formula, except insofar as the individual arrival rates contribute to the global arrival rate.
Although the theorem in this section is focused on a remote supervision scenario, the empirical study (Section VI) is focused on the nearerterm scenario in which each AV has its own onboard driver supervising it. We leave empirical analysis of the remote supervisor scenario for future work.
Finally, note the importance of the inring reachability in determining whether an AV requires supervision. This follows naturally from the problem formalization, in which both the inring and onramp supervision conditions (reachability conditions) must be met in order for a merging AV to require supervision. The following sections provide insight into this term.
VB Inring reachability for allHV rotaries
Lemma 1.
Given a singlelane ring road of circumference with vehicles distributed uniformly at random (but not necessarily independently) along the length of the ring, the probability that an arbitrary fixed point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:
(5) 
where is the reachable range for vehicle over the time horizon.
Proof.
Suppose is the fixed merge point. Let be the distance between the inring HV and (the asymmetric distance following the traffic flow). Then the event
(6) 
where the first equality comes from the fact that the event happens if any of the HVs triggers the supervision condition, and the second equality comes from the reachability analysis for each HV.
Applying a union bound (Boole’s inequality) to the joint probability event in Equation 6, we have
(7) 
where the last equality comes from our assumption that each vehicle’s location is distributed uniformly at random along the length of the ring, i.e. . However, note this does not require each vehicle’s location be independent of other vehicles’ positions.
∎
The uniform distribution assumption above may be satisfied both by (a) a situation in which traffic is flowing freely around the ring, and (b) a setting with stopandgo traffic where the congestion is equally likely to occur at any point within the ring. A case not covered by the lemma is one in which congestion routinely occurs around the merge point, but this case is beyond the current scope of work given our focus on safety in highspeed merges.
Lemma 2.
Given a singlelane ring road of circumference with vehicles distributed uniformly at random (but not necessarily independently) along the length of the ring, and a merging vehicle distributed uniformly at random along an onramp of length , the joint probability that the onramp’s merge point into the ring road is reachable over time horizon by both an inring vehicle and the onramp vehicle is upper bounded as follows:
(8) 
where is the reachable range for vehicle over the time horizon.
Proof.
Without loss of generality, we assume a sufficiently long onramp, i.e. .
The above follows from analyzing the joint probability of independent events. The rightmost term is the probability that an arbitrary point on the onramp is within the reachable zone of merging vehicle over horizon , and follows similar logic to that used for writing . ∎
Consequently, this also relies upon the assumption that the likelihood of finding the merging vehicle in a given position is evenly distributed across the onramp. This assumption is more tenuous in this case than it was in Lemma 1 because in situations of interest (such as when traffic exists on the highway) the AV will often have to yield. This would cause the AV to spend a disproportionately large amount of time just before the merge point.
VC Inring reachability for mixed HVs and connected AVs
Now consider how a mixed autonomy system with connected AVs may improve upon the allHV case. These AVs may communicate their nearterm trajectories, but are not assumed to alter their trajectories to avoid triggering supervision in the system. That is, they do not actively accommodate merging vehicles.
Corollary 1.
Given a singlelane ring road of circumference with unconnected vehicles and connected AVs distributed uniformly at random (but not necessarily independent of each other) along the length of the ring, the probability that an arbitrary point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:
(9) 
where is the reachable zone for vehicle over the time horizon and is the length of connected AV . Note that one may define to include a safety buffer.
Proof.
Again, suppose is the fixed merge point. Furthermore,

Let be the distance between the inring HV () and X at the current time .

Let be the distance between the inring connected AV () and X at future time . Due to the connectivity, we know the trajectory of the AVs and hence the location of the inring AVs at future time , so AV triggers supervision if its future location at time has distance less than its length to the merge point.

Assume and .
Then we similarly have the event as the event that any of the HVs or AVs trigger supervision:
(10) 
Applying a union bound (Boole’s inequality) to the joint probability event in Equation 10 results in
(11) 
∎
It is worth further examining the connected AVs’ ability to predict and communicate their trajectory. Mixed autonomy settings present unique challenges to trajectory planning—while AVs can plan their own trajectories over a given time horizon in isolation, the. human drivers on the road are unpredictable. How can AVs predict their own trajectories when sharing the road with human drivers?
First, note that fasterthanexpected HVs in this setting do not pose a problem for the connected AVs’ trajectory planning; indeed, if they speed further ahead, the AVs have more space. The instances which might be problematic are those in which an HV quickly slows.
However, even here the problems subside upon further analysis. If an HV sharply brakes far ahead of the merge point, the connected AVs behind it are also far from the merge point, and thus do not pose a collision risk for the merging vehicle. If an HV sharply brakes far after the merge point, no connected AV near the merge point will be substantially affected, especially not immediately.
Thus the instances of concern are further limited to those cases in which an HV rapidly brakes near the merge point when an onramp vehicle is about to merge. However, if the HV of concern is just prior to the merge point, note that it will have already triggered the supervisor, so any adjustment of a connected AV’s propensity to trigger the supervision condition is redundant. (Recall that once a supervisor is triggered, it supervises the entire merge.)
The remaining case is that in which an HV has just passed the merge point when it rapidly decelerates. Yet the only way an HV manages to pass the merge point without triggering supervision is if the merge point is beyond the onramp vehicle’s reachable zone during the entire time that same point is within the HV’s reachable zone. Therefore, even in this final case of concern, if the HV behaves erratically, there remains the entirety of the reachability time horizon for the connected AV to respond—and likely significantly more, given that the merging vehicle’s reachable zone assumes its maximum acceleration.
Of course, as discussed previously, unforeseen events can occur, but that is beyond the scope of this research focusing on the merge event.
VD Inring reachability for mixed HVs and supervisionaware AVs: worst case
Now consider how a mixed autonomy system with AVs that cooperate to avoid triggering supervision may improve upon the previous cases. We first consider a worst case improvement when the AVs are cooperative rather than agnostic to the supervision task. Then, in Section VE, we consider the improvement given a typical (average) case with the cooperative AVs.
Corollary 2.
Given a singlelane ring road of circumference with unconnected vehicles and supervisionaware AVs distributed uniformly at random along the length of the ring, the probability that an arbitrary point on that ring is reachable over time horizon by a vehicle is upper bounded as follows:
(12) 
where is the reachable zone for vehicle over the time horizon.
That is, the previous upper bound in Corollary 1 is improved by dropping the term.
Proof.
Again, suppose is the fixed merge point. Furthermore, let be the distance between the inring HV () and X at the current time . We assume .
Given the fixed merge point and a perfect control of its trajectory during the length planning interval, so long as is sufficiently long, there exists a control input (sequence of accelerations over the planning horizon) such that the supervisionaware cooperative AV is not at the merge point at with probability 1. So the event , i.e. cannot occur. Hence the above upper bound improves from the previous one in Corollary 1 with the term.
In the worst case, the order of HVs and AVs (with respect to the merge point) is adversarially distributed, such that (1) all the HVs are asymmetrically closer than all AVs to the merge point (when moving in the forward direction), and (2) all HVs have nonoverlapping reachability zones. In this case the AVs cannot influence any HV’s behavior at the merge point, so it is possible for any HV to trigger the supervision condition. Furthermore, the HVs’ combined reachability zone is achieving its maximum coverage over the ring.
The event that an inring vehicle triggers supervision is thus due to the HVs:
(13) 
The union bound (Boole’s inequality) gives
(14) 
∎
Note that is distinct from the reachability time horizon , and in practice would likely be much smaller. It simply corresponds to the case in which the planning interval is too short for the AV to avoid the merge point. For example, if a merging AV appears on the onramp when an inring, supervisionaware AV is moments before the merge point and moving quickly, there may not exist sufficient time for the inring AV to brake—and thus block any HVs behind it from interfering with the merge—before its momentum carries it past the merge point.
VE Inring reachability for mixed HVs and supervisionaware AVs: typical case
Recognizing that the situation described in Corollary 2 is adversarial, additional improvement in supervision scalability can be obtained with a typical (less adversarial) distribution on and .
In a typical mixed autonomy case, the order of HVs and AVs relative to the merge point is interspersed. As the AVs are supervisionaware and fully cooperative, the AV closest to the merge point may stop to accommodate a merge, and thus any vehicle after that AV cannot trigger supervision.
To model this, let be the distance between the inring AV () and the merge point . Assume . Let the order statistics . Assume that each vehicle’s location is distributed uniformly at random along the length of the ring. Without loss of generality and for ease of exposition, consider the restriction to and the shorthand notation .
We begin by considering two different distribution schemes for the inring vehicles. An additional distribution of interest is included in the appendix.
Uniform vehicle distribution
First consider the case in which AV locations are independent from each other (i.e., for all ) and all s’ and s’ locations are independent . In this case, for all , and we have from order statistics
, so the probability density function for
is , where denotes the distance of the closest AV from the merge point [50]. We also assume , so the probability density function for is . By independence, we have the joint probability density function (and 0 otherwise).It can therefore be written:
(15) 
where in the current setting because any HV further away from the merge point than the nearest cooperative AV will not trigger the supervisor. Therefore,
(16) 
One can compute as
(17) 
A visual representation of this double integration is provided in Figure 3. In both cases one benefit of supervisionaware AVs is represented by the green triangle. It corresponds to the portion of the ring at which inring HVs’ reachability zones include the merge point, but which are blocked by an inring cooperative AV, and thus do not pose a danger to the merging vehicle.
An additional benefit may come in the form of the distribution shift of inring vehicles that the supervisionaware can cause, illustrated in the image with the color gradients. Here, not only does the AV block certain HVs from threatening the merge, but it can also shift the inring vehicle distribution such that an HV is less likely to be in the vicinity in the first place.
The image can thus be interpreted geometrically by imagining the colors as representing a third dimension on the image (as if rising out of the page towards the reader). Taking the red to represent a greater volumne of probability mass, we see that the green triangle would also have more probability mass, and thus translate into greater supervision scaling than in the uniform distribution case. At the same time, this distribution includes an adversarial element: the HVs between the AV nearest the merge point and the merge point itself are more likely to be distributed closely to the merge point, which increases the odds that they trigger the inring reachability condition. In Appendix
Ex. 3: SupervisionAware AVs Typical Case, we provide another related, but less adversarial distribution, which provides further improvement in the upper bound.Previously (in the connected AV case, without supervisionaware AVs), , so the absolute improvement in each term inside the union bound is
(18) 
and the relative improvement of each term inside the union bound is
(19) 
where as we normalize . This monotonically increases in and —that is, the relative improvement increases with larger reachability zones or more AVs.
As a numerical example, when and , the absolute probability (given via Equation 17) is 0.078. The previous absolute probability was 0.1, and so the absolute improvement (from eq. 18) is 0.022, and the relative improvement (from Equation 19) is 0.219, or 21.9%.
Interpreting this in terms of the reachability bound
(20) 
each term inside the sum is roughly 21.9% less than it was previously. Thus, this effectively drops 21.9% of terms from the set that the sum is taken over, and hence produces an additional 21.9% improvement gain relative to the upper bound.
A nonuniform vehicle distribution
As the AV nearest the merge point may slow down or stop the HVs behind it, consider the scenario in which HVs may be more likely to be behind—and close to—the first AV. For example, imagine an AV leading a platoon of HVs.
For simplicity and to attain a closedform integration, we assume the probability density function of given is
(21) 
i.e., Truncated Exp(1) r.v. on [0,1] where the r.v. is the asymmetric distance of HV to AV (following the traffic flow).
For tractable computation assume
to be the truncated Exp(S) r.v. on [0,1]. The resulting joint distribution is
(22) 
We can compute the probabilities of interest using the same integration as in Case 1, but with a different joint probability density function. Closedform probabilities can be obtained as
(23)  
(24)  
(25)  
As a numerical example, with , we obtain a ~44% improvement from the previous upper bound when and a ~70% improvement when .
See Table I and Table II for the numerical results with and respectively. Numerical results for another distribution are included in the appendix. In reality, the true distribution is likely to lie somewhere between the two cases outlined in this section. Note the relative improvement column is computed by comparing the column with in the connected AV scenario. We provide
in the supervisionaware setting given the current joint p.d.f. to illustrate the shift in probability distribution.
Relative improvement  

2  0.0914  0.0749  25.12% 
3  0.0894  0.0675  32.46% 
4  0.0888  0.0614  38.60% 
5  0.0891  0.0564  43.64% 
6  0.0902  0.0522  47.80% 
7  0.0917  0.0487  51.29% 
8  0.0934  0.0457  54.26% 
9  0.0952  0.0432  56.85% 
10  0.0970  0.0409  58.13% 
11  0.0989  0.0388  61.18% 
12  0.1007  0.0370  63.03% 
13  0.1025  0.0353  64.72% 
14  0.1042  0.0337  66.27% 
15  0.1058  0.0323  67.71% 
16  0.1074  0.0310  69.03% 
Relative improvement  

2  0.0086  0.0084  16.00% 
3  0.0081  0.0078  21.86% 
4  0.0077  0.0074  26.43% 
5  0.0074  0.0070  29.90% 
6  0.0072  0.0067  32.52% 
7  0.0071  0.0065  34.53% 
8  0.0070  0.0064  36.12% 
9  0.0069  0.0063  37.40% 
10  0.0069  0.0062  38.47% 
11  0.0069  0.0061  39.38% 
12  0.0069  0.0060  40.17% 
13  0.0069  0.0059  40.88% 
14  0.0069  0.0058  41.51% 
15  0.0069  0.0058  42.10% 
16  0.0069  0.0057  42.63% 
Vi Experimental Results
We validate the theoretical insights above and investigate supervision scaling with three experiments. These were built using the Simulation of Urban Mobility (SUMO) opensource software package, and Flow, a deep RL framework for mixed autonomy traffic
[51, 5].Via Experiment 1: Human Vehicles
The first experiment populates a rotary approximately 1200 meters in circumference with 18 simulated HVs for 2500 timesteps (each corresponding to 0.1 seconds of simulation time). Once the simulation begins, 200 HVs per hour (in simulation time) merge into the rotary in accordance with the 270degree route described earlier. This amounts to a steady traffic flow.
The reachability time horizon , number of inring vehicles, and rotary circumference are all varied. This allows us to conduct a baseline investigation of the reachabilitybased supervision scheme. We record the upper bounds described in Lemma 1 and 2, as well as the true proportion of time for which a supervisor would be active.
HVs used the Intelligent Driver Model (IDM) controller, calibrated to highway driving, which produces realistic acceleration profiles and plausible behavior [47, 48]. To focus on the potential for collisions, the SUMO implementation of each rotary consists of a single lane. The network’s speed limit is 50 miles per hour, to align with highway speeds.
Figure 4 demonstrates this experiment’s results. In all cases, the proportion of time that the point is within at least one vehicle’s reachable set is noticeably lower than the theoretical upper bound, which supports the lemmas. This is due to significant overlap in the vehicles’ reachable sets. This demonstrates that the reachabilitybased supervision provides an efficient alternative to fulltime supervision.
ViB Experiment 2: Connected Vehicles
This experiment investigate the linear scaling potential of connected AVs, and by association the relevant corollaries. The setup is the same as in the previous experiment, except the 18 inring vehicles are replaced piecemeal by connected AVs to observe their effect on supervision scaling.
The results are shown in Figure 5. As is expected from the underlying theory, the supervision upper bound scales linearly with the number of inring connected AVs. The empirical supervision requirements also appears to scale linearly, albeit with noise.
ViC Experiment 3: SupervisionAware AVs
This experiment compares the effect of limited penetration of two different types of AVs: the connected AVs from the previous experiment and supervisionaware AVs. Given the prospects for a gradual rollout of autonomous or nearautonomous vehicles (at least compared to a rapid rollout), this lowdensity regime is of particular interest.
The supervisionaware AV controller is modeled as a policy trained via the Trust Region Policy Optimization (TRPO) RL method [52]
. The policy consisted of a multilayer perceptron with two hidden layers, each with 64 nodes. During training, the agent receives a reward at each timestep that is a function of its speed and is penalized each timestep that the supervisor is active. To simulate the lowdensity setting, the system for this experiment includes 32 vehicles total on a ring 3200 meters in circumference. The time horizon
remains at 8 seconds. The number of AVs ranges from 1 to 5, and RL results are averaged over 7 seeds.Unlike the connected AV case, the RLbased AVs do not have encoded a ‘collapsed’ reachability zone. Despite this, Figure 6 shows that the RL vehicles outperform the connected AVs in terms of supervision scaling. In all cases, the policy learned via RL achieves doubledigit reductions in supervision requirements compared to the connected AVs. Thus this experiment demonstrates that teaming of AVs enables supervision time sublinear in AV adoption.
Vii Conclusion
This work explored one perspective on improving safety in mixed autonomy settings by investigating human supervision of AVs in a simulated merging task. We investigated the question: Can we do better than the presentday industry standard of persistent supervision (consider the Tesla Autopilot)? Our findings indicate that clever allocation of human resources for supervision tasks may ease nearterm adoption of imperfect autonomous agents in safetycritical environments. Future work could use simulations to empirically investigate the distribution produced, and could also evaluate the impact of the supervisionaware AVs’ cooperative behavior on traffic flows. More generally, continued research in this direction could adopt alternative reachability analysis tools to investigate supervision scaling’s potential in other mixed autonomy settings.
Acknowledgments
We are grateful to Dr. Eric Horvitz for inspiring our exploration of AV teaming. We would also like to thank Zhongxia Yan for discussions and assistance with experimental design and Jiaqi Zhang for discussions and assistance with the theoretical portion of the work.
[]
Useful Statements
We provide useful probability statements that we adopt in our proofs. The statements can be found in standard probability textbook [50].
a Binomial distribution parameterized by a Poisson random variable
(26) 
B Poisson distribution of summed independent random variables
(27) 
Ex. 3: SupervisionAware AVs Typical Case
In Section VE
we assume both the HVs behind and in front of the first AV follow a truncated exponential distribution. Such a distribution is slightly absurd (and adversarial) for HVs in front of the first AV because it implicitly says that those HVs have a greater probability of being close to the merge point than far away (close to the first AV), given that we define the asymmetric distance following the traffic flow direction. A less adversarial distribution would say the HVs before the first AV follows uniform distribution between the merge point and the first AV. In fact, here we consider such a joint distribution for
and s, where the conditional probability density function of given is(28) 
where is set so that the conditional probability integrates to 1 for each . That is, this is a truncated Exp(1) r.v. on [0,1] where the r.v. is the asymmetric distance of HV to AV (following the traffic flow) if the HV is behind the first AV, but conditionally uniform if the HV is before the first AV.
Again, for tractable computation assume to be the truncated Exp(S) r.v. on [0,1]. The resulting joint distribution is
(29) 
Fixing , the corresponding probabilities for different is given in Table III, where relative improvement is calculated by comparing the current supervisionaware scenario against the connectedvehicle scenario where .
Relative improvement  

2  0.0805192  0.0639507  36.05% 
3  0.0813434  0.0594455  40.55% 
4  0.0828135  0.0554341  44.57% 
5  0.084711  0.0519237  48.08% 
6  0.0868491  0.0488549  51.14% 
7  0.0890942  0.0461509  53.85% 
8  0.0913594  0.0437418  56.26% 
9  0.0935922  0.0415718  58.43% 
10  0.0957621  0.0395984  60.40% 
11  0.0978524  0.0377901  62.21% 
12  0.0998548  0.036123  63.88% 
13  0.101766  0.0345786  65.42% 
14  0.103586  0.0331427  66.86% 
15  0.105317  0.0318034  68.20% 
16  0.106961  0.0305513  69.45% 
Table IV contains the corresponding result when fixing .
Relative improvement  

2  0.00714992  0.00696878  30.31% 
3  0.00694662  0.00670019  33.00% 
4  0.00680191  0.00648492  35.15% 
5  0.00670619  0.00631588  36.84% 
6  0.00664762  0.00618278  38.17% 
7  0.00661575  0.00607607  39.23% 
8  0.00660267  0.00598828  40.12% 
9  0.00660279  0.00591401  40.86% 
10  0.00661227  0.00584953  41.50% 
11  0.0066285  0.00579226  42.08% 
12  0.0066497  0.00574043  42.60% 
13  0.0066746  0.00569277  43.07% 
14  0.00670233  0.00564842  43.52% 
15  0.00673222  0.00560671  43.93% 
16  0.00676381  0.00556716  44.33% 
We can see that we relative improvement is indeed even larger than that in Section VE, as the distribution of s before the first is less adversarial.
The visual representation of the conditional p.d.f. for this case can be seen in Figure 7.
References
 [1] Wu, C. Learning and Optimization for Mixed Autonomy SystemsA Mobility Context. (UC Berkeley,2018)
 [2] Allianz Partners Selfdriving Cars Hit a Speedbump; Interest in Autonomous Vehicle Technology Slows Down. (2018,9), https://www.allianzworldwidepartners.com/usa/mediaroom/2018/selfdrivingcarshitspeedbumpautonomoustechslowsdown
 [3] U.S. DoT Federal Highway Administration Highway Statistics 2019: Licensed Drivers. (U.S. Bureau of Transportation Statistics,2019), https://www.fhwa.dot.gov/policyinformation/statistics/2019/dl201.cfm
 [4] Cox Automotive Autonomous Vehicle Awareness Rising, Acceptance Declining. (2018,8), https://www.coxautoinc.com/news/evolutionofmobilitystudyautonomousvehicles/
 [5] Wu, C., Kreidieh, A. R., Parvate, K., Vinitsky, E., & Bayen, A. M. Flow: A Modular Learning Framework for Mixed Autonomy Traffic. IEEE Transactions on Robotics. (2021)
 [6] Barth, M. & Boriboonsomsin, K. Realworld carbon dioxide impacts of traffic congestion. Transportation Research Record. 2058, 163171 (2008)
 [7] Bellemare, M., Candido, S., Castro, P., Gong, J., Machado, M., Moitra, S., Ponda, S. & Wang, Z. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature. 588, 7782 (2020)
 [8] Hambling, D. AI outguns a human fighter pilot. (Elsevier,2020)
 [9] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G. & Others Humanlevel control through deep reinforcement learning. Nature. 518, 529533 (2015)
 [10] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A. & Others Mastering the game of go without human knowledge. Nature. 550, 354359 (2017)
 [11] Bacci, E. & Parker, D. Probabilistic guarantees for safe deep reinforcement learning. International Conference On Formal Modeling And Analysis Of Timed Systems. pp. 231248 (2020)
 [12] Bouton, M., Karlsson, J., Nakhaei, A., Fujimura, K., Kochenderfer, M. & Tumova, J. Reinforcement learning with probabilistic guarantees for autonomous driving. ArXiv Preprint ArXiv:1904.07189. (2019)
 [13] Jula, H., Kosmatopoulos, E. & Ioannou, P. Collision avoidance analysis for lane changing and merging. IEEE Transactions On Vehicular Technology. 49, 22952308 (2000)
 [14] Zhou, M., Qu, X. & Jin, S. On the impact of cooperative autonomous vehicles in improving freeway merging: a modified intelligent driver modelbased approach. IEEE Transactions On Intelligent Transportation Systems. 18, 14221428 (2016)
 [15] Yu, H., Tak S., Park, M., & Yeo, H. Impact of autonomousvehicleonly lanes in mixed traffic conditions. Transportation research record. 2673, 430439 (2019)
 [16] Erdogan, S., Yilmaz, I., Baybura, T., & Gullu, M. Geographical information systems aided traffic accident analysis system case study: city of Afyonkarahisar. Accident Analysis & Prevention, 40(1), pp. 174181 (2008)
 [17] RiosTorres, J. & Malikopoulos, A. Automated and cooperative vehicle merging at highway onramps. IEEE Transactions On Intelligent Transportation Systems. 18, 780789 (2016)
 [18] Drew, D.S. Multiagent systems for search and rescue applications. Current Robotics Reports. pp. 112 (2021)
 [19] Dahiya, A., Akbarzadeh, N., Mahajan, A., & Smith, S.L. Scalable operator allocation for multirobot assistance: a restless bandit approach. IEEE Transactions on Control of Network Systems. pp. 1 (2021)
 [20] Cummings, M., Bruni, S., Mercier, S. & Mitchell, P. Automation architecture for single operator, multiple UAV command and control. (Massachusetts Inst Of Tech Cambridge, 2007)
 [21] Cummings, M., Nehme, C., Crandall, J. & Mitchell, P. Predicting operator capacity for supervisory control of multiple UAVs. Innovations In Intelligent Machines1. pp. 1137 (2007)
 [22] Kidwell, B., Calhoun, G., Ruff, H. & Parasuraman, R. Adaptable and adaptive automation for supervisory control of multiple autonomous vehicles. Proceedings Of The Human Factors And Ergonomics Society Annual Meeting. 56, 428432 (2012)
 [23] Humann, J. & Pollard, K. Human factors in the scalability of multirobot operation: A review and simulation. 2019 IEEE International Conference On Systems, Man And Cybernetics (SMC). pp. 700707 (2019)
 [24] ConesaMuñoz, J., Soto, M., Santos, P. & Ribeiro, A. Distributed multilevel supervision to effectively monitor the operations of a fleet of autonomous vehicles in agricultural tasks. Sensors. 15, 54025428 (2015)
 [25] Olsen, D. & Goodrich, M. Metrics for evaluating humanrobot interactions. Proceedings Of PERMIS. 2003 pp. 4 (2003)
 [26] Olsen Jr, D. & Wood, S. Fanout: Measuring human control of multiple robots. Proceedings Of The SIGCHI Conference On Human Factors In Computing Systems. pp. 231238 (2004)
 [27] Swamy, G., Reddy, S., Levine, S. & Dragan, A. Scaled autonomy: enabling human operators to control robot fleets. 2020 IEEE International Conference On Robotics And Automation (ICRA). pp. 59425948 (2020)
 [28] Hickert, C. & Wu, C. Scalability of safe supervision of autonomous vehicles in mixed traffic. 2022 IEEE International Conference On Robotics And Automation (ICRA) (in submission). (2022)
 [29] Rogers, R. & Monsell, S. Costs of a predictible switch between simple cognitive tasks.. Journal Of Experimental Psychology: General. 124, 207 (1995)
 [30] Haring, K., Ragni, M. & Konieczny, L. A cognitive model of drivers attention. Nele Rußwinkel— Uwe Drewitz— Hedderik Van Rijn (eds.). pp. 275 (2012)
 [31] Althoff, M., Frehse, G. & Girard, A. Set Propagation Techniques for Reachability Analysis. Annual Review Of Control, Robotics, And Autonomous Systems. 4 (2020)
 [32] Kurzhanski, A. & Varaiya, P. Ellipsoidal techniques for reachability analysis. International Workshop On Hybrid Systems: Computation And Control. pp. 202214 (2000)
 [33] Liebenwein, L., Baykal, C., Gilitschenski, I., Karaman, S. & Rus, D. Samplingbased approximation algorithms for reachability analysis with provable guarantees. (2018)
 [34] Bansal, S., Chen, M., Herbert, S. & Tomlin, C. HamiltonJacobi reachability: A brief overview and recent advances. 2017 IEEE 56th Annual Conference On Decision And Control (CDC). pp. 22422253 (2017)
 [35] Bahati, G., Gibson, M. & Bayen, A. MultiAdversarial Safety Analysis for Autonomous Vehicles. (2020)
 [36] Leung, K., Schmerling, E., Zhang, M., Chen, M., Talbot, J., Gerdes, J. & Pavone, M. On infusing reachabilitybased safety assurance within planning frameworks for human–robot vehicle interactions. The International Journal Of Robotics Research. 39, 13261345 (2020)
 [37] Saunders, W., Sastry, G., Stuhlmueller, A. & Evans, O. Trial without error: Towards safe reinforcement learning via human intervention. ArXiv Preprint ArXiv:1707.05173. (2017)
 [38] Berkenkamp, F., Turchetta, M., Schoellig, A. & Krause, A. Safe modelbased reinforcement learning with stability guarantees. ArXiv Preprint ArXiv:1705.08551. (2017)
 [39] Chow, Y., Nachum, O., DuenezGuzman, E. & Ghavamzadeh, M. A lyapunovbased approach to safe reinforcement learning. ArXiv Preprint ArXiv:1805.07708. (2018)
 [40] Cahill, A. Catastrophic forgetting in reinforcementlearning environments. (University of Otago,2011)

[41]
Cobbe, K., Klimov, O., Hesse, C., Kim, T. & Schulman, J. Quantifying generalization in reinforcement learning.
International Conference On Machine Learning
. pp. 12821289 (2019)  [42] McGuckin, N. & Fucci, A. Summary of travel trends: 2017 national household travel survey. (US Department of Transportation, Federal Highway Administration,2018)
 [43] U.S. DoT Federal Highway Administration Average Annual PMT, VMT Person Trips and Trip Length by Trip Purpose. (U.S. Bureau of Transportation Statistics,2018), https://www.bts.gov/content/averageannualpmtvmtpersontripsandtriplengthtrippurpose
 [44] U.S. DoT Federal Highway Administration & Federal Transit Administration 2015 Status of the Nation’s Highways, Bridges, and Transit Conditions & Performance Report to Congress. (Government Printing Office,2016)
 [45] Johns, M., Mok, B., Sirkin, D., Gowda, N., Smith, C., Talamonti, W. & Ju, W. Exploring shared control in automated driving. 2016 11th ACM/IEEE International Conference On HumanRobot Interaction (HRI). pp. 9198 (2016)
 [46] Sugiyama, Y., Fukui, M., Kikuchi, M., Hasebe, K., Nakayama, A., Nishinari, K., Tadaki, S. & Yukawa, S. Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam. New Journal Of Physics. 10, 033001 (2008)
 [47] Treiber, M., Hennecke, A. & Helbing, D. Congested traffic states in empirical observations and microscopic simulations. Physical Review E. 62, 1805 (2000)
 [48] Treiber, M. & Kesting, A. Traffic flow dynamics. Traffic Flow Dynamics: Data, Models And Simulation, SpringerVerlag Berlin Heidelberg. (2013)
 [49] Gross, D. Fundamentals of queueing theory. John Wiley & Sons. (2008)
 [50] Grimmett, G. R., and Stirzaker, D. R.. Probability and random processes. Oxford university press, 2020.
 [51] Lopez, P., Behrisch, M., BiekerWalz, L., Erdmann, J., Flötteröd, Y., Hilbrich, R., Lücken, L., Rummel, J., Wagner, P. & Wießner, E. Microscopic Traffic Simulation using SUMO. The 21st IEEE International Conference On Intelligent Transportation Systems. (2018), https://elib.dlr.de/124092/
 [52] Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P. Trust region policy optimization. International Conference On Machine Learning. pp. 18891897 (2015)
Comments
There are no comments yet.