SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies

09/28/2021 ∙ by Matt Vitelli, et al. ∙ 0

In this paper we present the first safe system for full control of self-driving vehicles trained from human demonstrations and deployed in challenging, real-world, urban environments. Current industry-standard solutions use rule-based systems for planning. Although they perform reasonably well in common scenarios, the engineering complexity renders this approach incompatible with human-level performance. On the other hand, the performance of machine-learned (ML) planning solutions can be improved by simply adding more exemplar data. However, ML methods cannot offer safety guarantees and sometimes behave unpredictably. To combat this, our approach uses a simple yet effective rule-based fallback layer that performs sanity checks on an ML planner's decisions (e.g. avoiding collision, assuring physical feasibility). This allows us to leverage ML to handle complex situations while still assuring the safety, reducing ML planner-only collisions by 95 on 300 hours of expert driving demonstrations using imitation learning and deploy it along with the fallback layer in downtown San Francisco, where it takes complete control of a real vehicle and navigates a wide variety of challenging urban driving scenarios.



There are no comments yet.


page 1

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Self-Driving Vehicles (SDVs) have the promise to revolutionize several industries including people and goods transportation. However, the development of L4+ SDVs has proved to be a significant challenge. Today, the main bottleneck is the vehicle’s ability to safely handle the ‘long tail’ of driving events [17]. World-class SDVs can handle common situations, but can behave unsafely in the many, rarely-occurring scenarios that are encountered on the road.

In the self-driving stack, the planning module is most responsible for this bottleneck. It determines what the SDV should do in any given situation. A traditional rule-based

planning approach selects a trajectory that minimizes a hand-engineered loss function. In order to improve its performance, engineers must design new terms in that loss function or re-tune their respective weights, for each driving scenario. This process is expensive and scales poorly to new geographies. Unlike perception, planning has benefited little from modern machine learning techniques, which leverage large quantities of data in order to avoid the hand-engineering of rules.

Fig. 1: Top: SafetyNet is the best-of-both combination of ML planning that improves with data, and rule-based safety and legality guarantees from the fallback layer. Bottom: an example of SafetyNet deployed to control a real-world self-driving vehicle on the streets of San Francisco.

Recently, the work of [5] has demonstrated the first machine-learned policies for autonomous driving learned directly from human demonstrations. These approaches, although they scale much better than the hand-engineering method, do not provide the interpretability and safety guarantees required to safely deploy these systems in production.

In this work we propose SafetyNet: the first autonomous driving system to combine the strengths of an ML planner with the interpretable safety of a rule-based system, road-tested in busy San Francisco. The ML component is a high-capacity planning policy trained from expert demonstrations, and its performance scales with the amount of training data without the need for costly behavior engineering. To improve system safety, decisions of the ML planner pass through a lightweight fallback layer: a simple, rule-based system that tests the decisions against a small set of checks, and can minimally modify them to improve safety if required. This allows SafetyNet to transparently enforce safety and legality constraints, such as avoiding collisions, road rules violations, while simultaneously maximizing comfort metrics.

This combination outperforms ML-only systems and allows us to safely deploy an ML planning system in the busy streets of San Francisco, constituting the first demonstration of its kind. Our system exhibits a variety of maneuvers such as lane-following, keeping the distance to other vehicles, and navigating intersections without impeding the safety of the vehicle and other traffic participants.

Our contributions are three-fold:

  1. The first combination of machine learning and a lightweight hand-engineered system to control a self-driving vehicle that learns from data while offering safety and legality guarantees.

  2. The first evaluation of such a system in the challenging, real-world, urban environment of downtown San Francisco.

  3. The source code will be provided to the public to encourage the advancement of the field111The source code for the ML planner and a reference implementation of the fallback layer will be available at

Ii Related works

Fig. 2:

The neural network architecture of the presented ML planning model is inspired by VectorNet


. The vectorized information on each agent and map element is encoded by a PointNet network. This local information is combined by a Transformer into global embedding. The embedding is later translated into actions via kinematic decoder.

Trajectory optimization-based planning. Traditional trajectory optimization-based planning systems are widely used in both academia and industry[23, 8, 24, 14]. Here the motion planning task is formulated as an optimization problem, usually by hand-engineering a cost function. The optimal trajectory is then generated by minimizing this cost using optimization algorithms, such as A* search-based methods [8, 1], sampling-based methods [18, 19, 4], dynamic programming [14] or combinations thereof that decompose the problem in a hierarchical fashion [34].

However, it is very difficult to hand-craft an objective function that provides a human-like trade-off between comfort, safety, and route progress over a wide variety of driving situations[10]. In comparison, encoding certain hard constraints, e.g. obstacles avoidance, physical feasibility, requires far less engineering. Therefore, building a lightweight hand-engineered system whose only purpose is to detect and correct infeasible trajectories is much simpler and more scalable.

Additionally, these hand-engineered approaches do not improve with data, and their performance does not generalize well in highly unstructured urban scenarios. Therefore, tremendous engineering efforts are needed to fine-tune them, in particular when expanding to a new operational design domain (ODD) or new geographies.

Machine-learned planning.

Recently, ML planning has gained attention due to successes in deep learning. This approach has the advantage of avoiding hand-crafted rules and scales well with data, thus performing better and better as more data is used for training. Therefore, this approach has great potential to handle a wide variety of driving situations

[5, 7, 37, 16]. Next, we introduce the two most commonly used ML paradigms for motion planning.

(1) Imitation learning (IL).

IL is a supervised learning approach in which a model is trained to mimic expert behavior. The first application of IL to autonomous driving was the seminal ALVINN

[25] back in 1989, which mapped the sensor data to steering and performed rural road following. More recently, [7, 16] demonstrated end-to-end driving using multiple-camera input alone, but the real-world driving results are limited to simple tasks such as lane follow or urban driving with light traffic. ChauffeurNet[5] proposed to apply IL on a bird’s eye view of a scene and use synthetic perturbations to alleviate the covariate shift problem [30], but it is yet to be tested in real-world urban environments.

(2) RL & IRL.Reinforcement learning (RL) is well-suited for sequential decision processes such as self-driving as it handles the interaction between the agent and the environment. Several methods[31, 29, 20] have been proposed to apply RL to autonomous driving. In particular [31] proposes combining learned and rule-based components, similarly to us, but the reported results are only from simulation. On the other hand, inverse reinforcement learning (IRL)[38, 36] is another popular ML paradigm applied to autonomous driving, which infers the underlying reward function based on expert demonstrations as well as a model of the environment. However, all these methods are yet to be evaluated in real-world urban driving.

The ML planning approaches introduced above, although very promising, do not provide safety guarantees, which prevent them to be deployed at scale in the real world. We are inspired by this paradigm but aim to mitigate this limitation by the SafetyNet proposed in this paper.

Hybrid approaches. The combination of ML and traditional motion planning techniques falls mostly into two categories:

ML-based heuristics

, which are leveraged to improve traditional planning algorithms, e.g. in terms of speed up [28, 3, 11, 26]. Modular approaches, where expert planners are leveraged to generate the trajectory candidates, e.g., by evaluating trajectories against a ML-based cost volume [37, 9]. The latter of these works, [9] also provides safety guarantees based on imposing a very high cost on trajectories leading to a potential collision. These safety guarantees were not however verified in the real world.

Another specific area of research that has emerged in this field is the study of safety frameworks [32, 2]. While this work is relevant, our goal is not to propose a comprehensive framework for safety, but rather a simple yet effective method that allows for the deployment of a powerful neural network planner that learns and improves with data while ensuring certain safety and legality constraints.

SafetyNet leverages the strengths of the expert system to guarantee certain determinism, legality, and safety rules for specific scenarios while relying on the machine-learned motion planner for nominal trajectory generation.

Iii Hybrid ML planning system

In this section we describe SafetyNet, our system for combining a machine-learned motion planner with an effective fallback layer to deliver a trajectory planning system for SDVs. Instead of relying on hand-engineered driving rules, this system learns to drive from expert driving demonstrations while guaranteeing certain interpretable safety constraints, such as avoidance of collisions and adherence to traffic rules.

The SafetyNet system is outlined in Fig. 1. It is composed of an ML neural policy network (“ML Planner”) that takes as input the environment state around the SDV and produces an intended trajectory to be taken by SDV. This trajectory is validated to obey the required constraints and in the case it does not satisfy them the closest trajectory is taken from a set of safe trajectory candidates.

Iii-a Input and Output

Input representation. The input data is encoded in an ego-centric frame of reference where the SDV is always at a fixed location relative to a frame. As shown in Fig. 2, the input to our model consists of:

  1. SDV: the current and past poses of the SDV and its size.

  2. Agents: the current and past poses of perceived agents, their sizes, and object type (e.g. vehicle, pedestrian, cyclist) produced by the SDV’s perception system.

  3. Static map elements: road network from High Definition (HD) maps including lanes, cross-walks, stop lines, localized using the SDV’s localization system.

  4. Dynamic map elements: traffic light states, and static obstacles detected by the perception system (e.g. construction zones).

  5. Route: the intended global route that the SDV should follow.

We use a vectorized input representation based on [15] to encode the perception outputs and map elements to vector sets. Each element includes a pose relative to the SDV pose, as well as additional features, such as the element type, time of observation.

Output representation. We define a trajectory as a sequence of discrete states separated uniformly in time by . Each state is defined as:


where correspond to the pose of the rear axle of the SDV w.r.t. a fixed coordinate frame at time and correspond to the velocity, longitudinal acceleration, curvature and jerk respectively.

Iii-B ML planner

Fig. 3: Safety (top) and comfort (bottom) events as a function of dataset size for the ML planner (with and without the fallback layer). We see that the addition of the fallback layer significantly improves every metric, save passiveness. We note that the number of collisions is not decreasing quickly as the ML planner is trained on more data, but despite this the fallback layer ensures a high level of safety.
Average Displacement Error [m]
Training set size @1s @2s @3s @4s
5h 0.24 0.90 1.70 2.52
50h 0.14 0.51 0.98 1.53
150h 0.12 0.40 0.77 1.22
300h 0.11 0.37 0.73 1.18
Table I: Open-loop performance trend with increasing training set size. The open-loop prediction error improves when trained with more data.

The ML planning component of our system takes the input capturing the states around the SDV and outputs the trajectory to be executed.

Model architecture. Inspired by [15], our model is built on a hierarchical graph network-based architecture as shown in Fig. 2. It consists of a PointNet-based [27] local subgraph for processing local information from vectorized inputs and a global graph using a Transformer encoder [33] for reasoning about interactions over agents and map features.

To ensure the predicted trajectories are physically feasible, we introduce a kinematic decoder, which models the vehicle kinematic using a unicycle model [24]

. First, a 3-layer multilayer perceptron (MLP) is added after the Transformer encoder, which predicts longitudinal jerk

and curvature for each time step within the prediction horizon . Then a kinematic layer takes the predictions as well as the current ego state to roll out the next state of the ego:


where is the update function of the kinematic model, comprises a set of parameters for vehicle kinematic constraint, including the maximum allowed jerk, acceleration, curvature and steering angle, which are used to clip controls to ensure physical feasibility.

Inspired by the deep kinematic model introduced in  [12], we implement as follows:


where the state derivatives are computed as follows:


Training framework. We use imitation learning to train a driving policy that mimics expert driving behavior by minimizing the L1 loss between the poses generated by the model and the ground truth poses. Following [5], we include perturbations to extend the distribution of states seen during the training and thus reduce the impact of the covariate shift [25, 30]. Although previous work used a pre-solver to smooth the target trajectory after applying perturbations, we can skip that thanks to the fact that we are using a kinematic decoder. Instead, we can simply penalize large values of jerk and curvature to reduce jerk and improve driving comfort. The final loss is then:


where is the predicted pose at time , is the target pose, and and

are hyperparameters.

Iii-C Fallback Layer

After generating an ML trajectory, our system evaluates it along several dimensions for dynamic feasibility, legality, and collision probability, and determines a trajectory label {

Feasible, Infeasible}. We describe these next.

Dynamic feasibility. We evaluate whether the input trajectory remains within a feasible envelope characterized by the SDV’s dynamics limits. Concretely, we evaluate each trajectory state and check whether the parameters, including longitudinal jerk, longitudinal acceleration, curvature, curvature rate, lateral acceleration, and steering jerk (curvature rate velocity) are within reasonable bounds.

The bounds for those parameters were obtained from real-world vehicle testing. In practice, we typically use more conservative limits for jerk, longitudinal acceleration, and lateral acceleration to remain within comfortable limits.

Legality. For a given trajectory, we evaluate whether it is violating the traffic rules. A trajectory will be labeled as Infeasible if any of the following violations happens:

  • Running the stop sign,

  • Violation of the right of way,

  • Running a red traffic light,

  • Leaving the drivable surface.

Collision likelihood. We check each state in the given trajectory for collisions with predicted poses of other agents from an in-house prediction module. Collision detection is performed by rasterizing future agent predictions and checking for overlaps with planned ego poses. Additionally, we also check for longitudinal distance, time-to-collision, and time headway violations along the trajectory. If any of the collision likelihood checks fail, the trajectory is labeled as Infeasible.

Fallback trajectory generation. Assuming the ML trajectory is labeled as Feasible, we will directly execute it. If the trajectory is labeled as Infeasible, we select a feasible fallback trajectory as close as possible to the ML trajectory.

For this we use a trajectory generation method based on [35], generating a number of lane-aligned trajectory candidates . These candidates consist of speed keeping, distance keeping, and emergency stopping maneuvers. Our implementation can be easily adapted to specific scenarios of interest.

Each of the generated trajectories is checked for feasibility as described above and the trajectory candidate which is most similar to the ML trajectory is selected for execution:


Iv Experiments

# events per 1k miles
Planner Type Collisions Close Calls Discomfort Braking Passiveness Deviation from Route
ML Planner 91.5 74.5 312.8 213.9 0.0
ML Planner + Fallback Layer 4.6 45.0 24.2 277.0 0.0
Change Rate -95% -40% -92.3% +29.5% 0%
Table II: Comparing the performance of an ML-Planner-only approach vs SafetyNet, in closed-loop evaluation. This shows that SafetyNet significantly reduces collisions, close calls, and discomfort breaking, at the expense of more passiveness. See Fig. 5 for some qualitative examples.

In this section we evaluate our system across several dimensions: (a) the performance of the ML planner in simulation when trained on an increasing amount of data (no fallback layer), (b) the effectiveness of the fallback layer in simulation, and (c) the performance of SafetyNet in the real world when controlling a real vehicle in San Francisco. We start by describing the datasets and metrics used during the evaluation.

Iv-a Data

We created an in-house dataset to train and evaluate our system: 380 hours of urban driving collected in Palo Alto and San Francisco. It contains a wide range of driving scenarios in a densely populated urban environment. The dataset consists of 25 second scenes, which capture the perception output, the SDV trajectory, and HD maps. We partition the dataset into 300h for training and 80h for testing.

Iv-B Metrics

We validate our planner using large-scale real-world driving dataset with closed-loop simulation. During simulation, we execute the planner and motion controller modules and simulate the SDV motion. The vehicle model is calibrated to the real-world SDV. Since the simulated SDV pose can diverge significantly from the logged pose in the dataset, we allow the other road agents to be reactive in their longitudinal behavior, avoiding collisions while preserving their trajectories from the dataset.

To evaluate the closed-loop performance of the model, for each scene in the test set we simulate the SDV by unrolling the driving policy for the full duration of the scene and detect the following binary events:

  1. Collisions: the simulated SDV is 5cm from the road boundaries, static obstacles, or agents.

  2. Close-calls: the simulated SDV has no collision, but either gets within 25cm of another agent, has a time-to-collision 1.5s, or has a time headway to another agent 1s.

  3. Discomfort braking: the simulated SDV’s jerk drops below m/s.

  4. Passiveness: the simulated SDV travels slower than its behavior in the dataset by 5m/s and it is spatially behind its dataset position.

  5. Off road events: the simulated SDV deviates from the dataset route center line by 10m.

After identifying the events in each scene, we aggregate them, normalizing by the number of miles driven in the simulations. All metrics are reported as the total number of events per 1000 miles.

Additionally, to evaluate the open-loop performance, we compute the Average Displacement Error (ADE) between the position of the simulated SDV and the dataset position. This gives us insight into trajectory similarity between simulation and the dataset (see Table I).

Finally, we test how the system performs in the real world by deploying it in downtown San Francisco. We evaluate the ML planner performance via how often the fallback trajectory is used and we provide qualitative examples.

Iv-C Effect of dataset size

For this experiment, the fallback layer is disabled, and the ML planner’s trajectory is always executed. We evaluate the performance of various ML planners trained with varying dataset sizes: {5, 50, 150, 300} hours in simulation.

Looking at the dashed lines in Fig. 3 (w/o fallback) we observe that the closed-loop performance across all safety and comfort metrics increases with training set size, and that more data should continue improving performance, albeit slowly. Models trained on 50h produce unstable driving policies that have significantly more collisions, off-route events, or even cases where the SDV remains completely still (passiveness). When dataset size is increased to 300 hours, performance improves significantly and the learned driving policy is able to reduce collisions, close-call and discomfort brakes, and reliably follows the route. In terms of passiveness, performance plateaus, which we attribute to causal confusion inherent to open-loop training [13]. Similarly, the average displacement error (ADE) of the open-loop predictions keeps improving when trained with more data, as shown in Table I.

Iv-D Effect of fallback layer

Fig. 4: The distribution of causes for a fallback trajectory to be used over the ML Planner trajectory.
Fig. 5: Examples of when the fallback layer prevented unsafe behavior caused by the ML planner. [Top] avoiding collision with a bus, [middle] running a red light, and [bottom] ensuring yielding to vehicle with right-of-way. We report the unsafe trajectory in column (b), and SafetyNet safe trajectory in column (c).

Now we evaluate the effectiveness of the fallback layer in simulation by comparing the SafetyNet performance with, and without the fallback layer enabled. Here the ML planner is trained on the full 300h dataset. As shown in Tab. II, collisions are reduced by 95%, close call events by 40%, and discomfort braking by 92% when the fallback layer is enabled. This clearly demonstrates the value of the fallback layer and its importance for real world deployment. Crucially, we see that the solid lines in Fig. 3 (w/ fallback) are close to zero regardless of the maturity of the ML planner. This indicates that, with our method, we can safely deploy not only well-performing ML planners, but even immature ones (trained on 50h of data), thus facilitating faster development and evaluation cycle.

We note that there is also a 29.5% increase in passive behavior when the fallback layer is enabled, due to the fact that the SDV drives more conservatively. We see this as a necessary trade-off for reducing collisions and close calls.

In Fig. 4 we show the distribution of events that trigger the fallback trajectory to be used during simulation. The main causes for fallback behaviors are the ML trajectory leaving the drivable surface, contacting dynamic agent predictions, infeasible distance gap with other agents, infeasible steering jerk, and contact with static obstacles. By manually triaging these ML planner failure cases, we see (a) contact lane boundary issues are a result of the ML planner cutting corners too closely, and (b) collisions are caused by the network not paying enough attention to the size of the large vehicles and assuming that vehicles are of similar size (Fig. 5, top). Moreover, near traffic light intersections the ML planner occasionally generates trajectories that do not properly yield to oncoming traffic (Fig. 5, bottom). It is caused by the synthetic perturbations applied to the ego poses during training. Note that in all these examples, the fallback layer successfully prevented collisions from happening.

Iv-E Failure cases

There is still a small proportion of the ML planner failures that are not caught by SafetyNet. This is mostly because the current fallback layer implementation does not limit the proximity of the SDV to lane boundaries, and does not fully compensate for the uncertainty in the prediction of behavior of other agents. In Fig. 6 we see that the most recent feasible ML trajectory has brought the SDV close to the lane boundary and simultaneously the oncoming vehicle has a sudden change of velocity. With this combination of factors, the fallback trajectory fails to assure a comfortable lateral distance from the oncoming vehicle.

Fig. 6: A SafetyNet failure case in simulation. Two simultaneous situations (SDV driving close to dividing line, and a sudden change in velocity of the oncoming vehicle) combine to reveal an issue in the implementation of the system.
Fig. 7: Examples of successful ML planner behavior (no fallback trajectory was used) in the real world. Top: merging, middle: yielding to pedestrians, and bottom: nudging around a parked car.

Iv-F Real world testing

Finally, we extensively tested SafetyNet with ML planner (trained on 300 hours), in the real world, in densely populated downtown San Francisco, under the supervision of human safety drivers. During the 150+ mile public road testing, the model successfully performed a wide variety of challenging maneuvers including lane-following, merging, yielding to pedestrians or nudging around parked cars (Fig. 7). At the same time, the fallback layer assured the safety of the overall system, taking control for around 7.9% of the total driving time. This confirms our hypothesis that although the ML planner is able to perform complex maneuvers and drive safely for most of the time (90%), an additional layer of safety is required in certain situations. We refer the reader to the accompanying video for a more thorough analysis.

V Conclusions

We present SafetyNet, a method for combining ML planners with a rule-based system fallback layer to provide safe driving in challenging, real-world urban environments. We have demonstrated the very significant improvements in safety and comfort metrics compared to a purely machine-learning-based system, both in simulation as well as in challenging San Francisco streets. This approach makes it possible to safely use learned planners in the real world, and benefit from their ability to improve with the data and handle more complex situations than their purely rule-based counterparts. We believe that teams starting to explore ML planning methods, and even those with state-of-the-art ML planning solutions [5, 16, 37] would benefit from incorporating a SafetyNet system. In effect, SafetyNet may facilitate the development of machine-learning-based planners and their wider adoption in the AV industry.

We see many exciting opportunities for further development. The fallback layer can be refined to be less conservative and not increase passiveness. In terms of the ML planner, the presented approach is relatively simple, based on imitation learning. It can be improved by drawing from recent advancements in model-based Reinforcement Learning (RL) [22], offline RL [21], or closed-loop training in a data-driven simulation [6].

Vi Acknowledgements

We would like to thank following contributors who work on Autonomy 2.0 at Level 5: Luca Bergamini, Sergey Zagoruyko, Ana Ferreira, Oliver Scheel, Stefano Pini, Christian Perone, Lukas Platinsky, Jasper Friedrichs, Jared Wood, Yilun Chen and Alex Ozark.


  • [1] Z. Ajanovic, B. Lacevic, B. Shyrokau, M. Stolz, and M. Horn (2018) Search-based optimal motion planning for automated driving. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4523–4530. Cited by: §II.
  • [2] M. Althoff and J. M. Dolan (2014) Online verification of automated road vehicles using reachability analysis. IEEE Transactions on Robotics 30 (4), pp. 903–918. External Links: Document Cited by: §II.
  • [3] O. Arslan and P. Tsiotras (2015) Machine learning guided exploration for sampling-based motion planning algorithms. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 2646–2652. External Links: Document Cited by: §II.
  • [4] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, and D. Rus (2013) Intention-aware motion planning. In Algorithmic Foundations of Robotics X, E. Frazzoli, T. Lozano-Perez, N. Roy, and D. Rus (Eds.), Cited by: §II.
  • [5] M. Bansal, A. Krizhevsky, and A. Ogale (2019) ChauffeurNet: learning to drive by imitating the best and synthesizing the worsts. In Proceedings of Robotics: Science and Systems XV, Cited by: §I, §II, §II, §III-B, §V.
  • [6] L. Bergamini, Y. Ye, O. Scheel, L. Chen, C. Hu, L. Del Pero, B. Osinski, H. Grimmett, and P. Ondruska (2021) SimNet: learning reactive self-driving simulations from real-world observations. In International Conference on Robotics and Automation (ICRA), Cited by: §V.
  • [7] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba (2016) End to end learning for self-driving cars. External Links: 1604.07316 Cited by: §II, §II.
  • [8] M. Buehler, K. Iagnemma, and S. Singh (2009) The darpa urban challenge: autonomous vehicles in city traffic. Springer. Cited by: §II.
  • [9] S. Casas, A. Sadat, and R. Urtasun (2021) MP3: a unified model to map, perceive, predict and plan. In

    IEEE/CVF International Conference on Computer Vision and Pattern Recognition

    pp. 14403–14412. Cited by: §II.
  • [10] Y. Chang, S. Omari, and M. S. Vitelli (2021-June 10) Systems and methods for determining vehicle trajectories directly from data indicative of human-driving behavior. Google Patents. Note: US Patent App. 16/706,307 Cited by: §II.
  • [11] H. L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust (2019)

    RL-rrt: kinodynamic motion planning via learning reachability estimators from rl policies

    IEEE Robotics and Automation Letters 4 (4), pp. 4298–4305. External Links: Document Cited by: §II.
  • [12] H. Cui, T. Nguyen, F. Chou, T. Lin, J. Schneider, D. Bradley, and N. Djuric (2020) Deep kinematic models for kinematically feasible vehicle trajectory predictions. External Links: 1908.00219 Cited by: §III-B.
  • [13] P. de Haan, D. Jayaraman, and S. Levine (2019) Causal confusion in imitation learning. In Advances in Neural Information Processing Systems, Vol. 32, pp. . Cited by: §IV-C.
  • [14] H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong (2018-07) Baidu apollo em motion planner. Cited by: §II.
  • [15] J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid (2020-06) VectorNet: encoding HD maps and agent dynamics from vectorized representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: Fig. 2, §III-A, §III-B.
  • [16] J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah, and A. Kendall (2020) Urban driving with conditional imitation learning. 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 251–257. Cited by: §II, §II, §V.
  • [17] A. Jain, L. D. Pero, H. Grimmett, and P. Ondruska (2021) Autonomy 2.0: why is self-driving always 5 years away?. External Links: 2107.08142 Cited by: §I.
  • [18] J. J. K. Jr. and S. M. Lavalle (2000) RRT-connect: an efficient approach to single-query path planning. In Int. Conf. on Robotics and Automation, Cited by: §II.
  • [19] S. Karaman and E. Frazzoli (2011) Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research 30 (7), pp. 846–894. Cited by: §II.
  • [20] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J. Allen, V. Lam, A. Bewley, and A. Shah (2019) Learning to drive in a day. In International Conference on Robotics and Automation (ICRA), pp. 8248–8254. Cited by: §II.
  • [21] S. Levine, A. Kumar, G. Tucker, and J. Fu (2020) Offline reinforcement learning: tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643. Cited by: §V.
  • [22] T. M. Moerland, J. Broekens, and C. M. Jonker (2020) Model-based reinforcement learning: a survey. arXiv preprint arXiv:2006.16712. Cited by: §V.
  • [23] M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, D. Hähnel, T. Hilden, G. Hoffmann, B. Huhnke, D. Johnston, S. Klumpp, D. Langer, A. Levandowski, J. Levinson, J. Marcil, D. Orenstein, J. Paefgen, I. Penny, and S. Thrun (2009) Junior: the stanford entry in the urban challenge.. pp. 91–123. Cited by: §II.
  • [24] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli (2016) A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on intelligent vehicles 1 (1), pp. 33–55. Cited by: §II, §III-B.
  • [25] D. A. Pomerleau (1989) ALVINN: an autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems, Vol. 1. Cited by: §II, §III-B.
  • [26] H. Pulver, F. Eiras, L. Carozza, M. Hawasly, S. Albrecht, and S. Ramamoorthy (2020) PILOT: efficient planning by imitation learning and optimisation for safe autonomous driving. arXiv preprint arXiv:2011.00509. Cited by: §II.
  • [27] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2016) PointNet: deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593. Cited by: §III-B.
  • [28] A. H. Qureshi, A. Simeonov, M. J. Bency, and M. C. Yip (2019) Motion planning networks. In 2019 International Conference on Robotics and Automation (ICRA), Vol. , pp. 2118–2124. External Links: Document Cited by: §II.
  • [29] M. Riedmiller, M. Montemerlo, and H. Dahlkamp (2007) Learning to drive a real car in 20 minutes. In 2007 Frontiers in the Convergence of Bioscience and Information Technologies, pp. 645–650. Cited by: §II.
  • [30] S. Ross, G. J. Gordon, and J. Bagnell (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In

    International Conference on Artificial Intelligence and Statistics

    Cited by: §II, §III-B.
  • [31] S. Shalev-Shwartz, S. Shammah, and A. Shashua (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295. Cited by: §II.
  • [32] S. Shalev-Shwartz, S. Shammah, and A. Shashua (2017) On a formal model of safe and scalable self-driving cars. CoRR abs/1708.06374. External Links: Link, 1708.06374 Cited by: §II.
  • [33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30. Cited by: §III-B.
  • [34] J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi (2014) A behavioral planning framework for autonomous driving. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp. 458–464. Cited by: §II.
  • [35] M. Werling, S. Kammel, J. Ziegler, and L. Gröll (2012) Optimal trajectories for time-critical street scenarios using discretized terminal manifolds. The International Journal of Robotics Research 31 (3), pp. 346–359. Cited by: §III-C.
  • [36] M. Wulfmeier, P. Ondruska, and I. Posner (2015) Maximum entropy deep inverse reinforcement learning. arXiv: Learning. Cited by: §II.
  • [37] W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun (2019) End-to-end interpretable neural motion planner. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §II, §V.
  • [38] B. D. Ziebart, A. L. Maas, J. Bagnell, and A. Dey (2008) Maximum entropy inverse reinforcement learning. In AAAI, Cited by: §II.