Navigation by Imitation in a Pedestrian-Rich Environment

11/01/2018 ∙ by Jing Bi, et al. ∙ University of Rochester 2

Deep neural networks trained on demonstrations of human actions give robot the ability to perform self-driving on the road. However, navigation in a pedestrian-rich environment, such as a campus setup, is still challenging---one needs to take frequent interventions to the robot and take control over the robot from early steps leading to a mistake. An arduous burden is, hence, placed on the learning framework design and data acquisition. In this paper, we propose a new learning-from-intervention Dataset Aggregation (DAgger) algorithm to overcome the limitations brought by applying imitation learning to navigation in the pedestrian-rich environment. Our new learning algorithm implements an error backtrack function that is able to effectively learn from expert interventions. Combining our new learning algorithm with deep convolutional neural networks and a hierarchically-nested policy-selection mechanism, we show that our robot is able to map pixels direct to control commands and navigate successfully in real world without explicitly modeling the pedestrian behaviors or the world model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Autonomous ground vehicle industry has gained great development from the recent successes in deep learning. Applications such as personal mobility services and luggage carrying support in complex, pedestrian-rich environment become feasible 

[1, 2]. Although walking in crowded environment is not difficult for humans, it is actually a challenge to autonomous vehicles, since some subtle social norms that pedestrians usually follow are hard to quantify [2]. Intuitively, employing imitation learning by mimicking human pedestrians is a promising approach to start with.

Imitation learning has been gaining increasing importance as a promising learning approach to achieve high performance on various applications, e.g., humanoid robots. Traditional imitation learning is based on behavior cloning, where robots learn from human acting trajectories in a supervised learning fashion. The human behavior implicitly provides the learner information about the states it might encounter and the possible actions it should take 

[3]. Despite the simplicity, many researches have proved it to be an efficient and high-performance strategy [4, 5, 6, 7].

However, learning solely from cloning states and actions induced by human behaviors might likely lead to failures when the robot encounters states that are never visited by human demonstrations. The Dataset Aggregation (DAgger) algorithm [8] is, therefore, introduced to alleviate this problem. The DAgger describes an improved learning paradigm, where states are sampled from the learner, after a warm-start with behavior cloning, and annotated along with expert behavior when encountering mistakes. These data are aggregated for training a better learner policy iteratively. The use of DAgger is, hence, preferred by many researchers for imitation learning.

Fig. 1: We train a robot to navigate in pedestrian-rich environment, where we consider four different scenarios (columns from left to right): no pedestrian (path following), confronting, pedestrian following, and crossing (avoid hitting).

In this paper, we consider to apply imitation learning to real-world robot navigation in a pedestrian-rich environment. Figure 1 shows a snapshot of the navigation problems we encounter, where we roughly split all scenarios into four categories: no pedestrian (path following), confronting, pedestrian following, and crossing (avoid hitting). In this environment, we need to make frequent interventions to the robot and take control over the robot from early steps leading to a mistake. In other words, we cannot allow robot to execute actions generated by an imperfect policy, which might jeopardize the safety of pedestrians. Such requirements make it difficult to collect effective data trajectories using the DAgger algorithm for training.

To address the above limitation, we propose a new Learn-from-Intervention DAgger algorithm that implements an error backtrack function. The algorithm can effectively learn from the pairs of data containing states sampled from the learner policy and the backtracked action losses. Compared with traditional DAgger implementation, our framework allows the human expert to interact with the robot, which largely raises the efficiency of Data Aggregation process. Combining our new learning algorithm with deep Convolutional Neural Networks (CNNs), such as ResNet [9], we are able to map pixels direct to control commands without explicitly modeling the pedestrian behaviors or the world model. We implement this new imitation learning framework on a novel mobile robot platform and experiments demonstrate promising performance in terms to navigation by imitation in a pedestrian-rich campus environment.

Ii Related Work

A common approach to navigate in pedestrian-rich environment is to treat pedestrians as dynamic obstacles and use specific control strategy for robots to avoid collision [10, 11, 12]. Initially, researchers have to build sophisticated models for pedestrians and manually define actions to avoid accidents based on possible states the robot could enter. Unfortunately, it is extremely difficult to include all possible states in their implementation due to the complexity of real-world environment. When the robot encounters a situation that does not exist in the model, it will likely fail to find a feasible action, which leads to the freezing robot problem. Several attempts [2, 13] have been made to solve this problem, among which they focus on learning-based methods. Unlike previous hand-craft approaches, their implementation essentially gives the robot ability to learn and emulate human behaviors. A better policy could be generated by the robot itself instead of being hard-coded by developers. However, an explicit model for pedestrians is required in their training process (which is not required in our approach). In our model, human operators could easily evaluate whether actions of the robot is acceptable and time-efficient while driving it in the real-world environment. By interactively adjusting behaviors of the robot, human operators could expose their decision-making policies to it and from which the robot could derive its own policy based on imitation without explicitly modeling the behavior of pedestrians.

Imitation learning is one form of learning from demonstration [14] and has been applied to a variety of tasks, including autonomous flight [15, 16, 17], learning navigational behavior [18, 19], off-road driving [20, 21], and road following [22, 23, 24, 25]

. Main approaches for imitation learning can be categorized into behavior cloning (BC) and inverse reinforcement learning (IRL). Behavior cloning tackles this problem in a supervised manner, by directly learning the mapping from the states to corresponding actions labeled by the expert policy. This learning algorithm relies on the presence of an expert who instruct the agents at each state and the agents try to mimic the corresponding action. In 

[26], Codevilla et al. use a CNN to directly map observation to steer angle, which give CNN ability to perform self-driving on road with different conditions. In [27], imitation learning is applied to the navigation of a UAV in the forest without explicitly modeling the environment. Several studies [24, 20, 22] have been conducted on the similar topic. Unlike our implementation, their approaches mainly focus on learning steering. We further expand the domain of tasks to include acceleration and braking along with a new learning-from-intervention DAgger algorithm, which empowers our robot to adapt to a more dynamic environment. Instead of concentrating on lane following and static obstacles avoidance, we develop a framework which could handle crowded environment including bypassing pedestrians.

Iii Method

In this paper, we study navigation in a pedestrian-rich environment and formulate it as an imitation learning problem. Instead of explicitly modeling the pedestrian behaviors as in [13], we split all scenarios into four categories: {no pedestrian (path following), confronting, pedestrian following, crossing (avoid hitting)}. Each category has a task-specific policy that maps a state to a robot action; the latter is defined by speed and steering angle in a discrete space. A meta-controller decides to perform a particular task given the state. In our case, the states are observations obtained from cameras. To train a policy, we first initialize it via behavior cloning, which serves as warm-start, then the policy is iteratively refined by our proposed Learning-from-Intervention DAgger algorithm. Next, we introduce each technique in detail.

Iii-a Hierarchical Formalism

The meta-controller and the sub-task controllers are nested in a two-layer tree structure, where the meta-controller is the root and the four sub-task controllers serve as leaves. We learn the two levels of policies separately. We use CNNs to achieve our policy mappings from states to tasks/actions. Despite the separate policies, they share the same base network structure, i.e., ResNet in our implementation; hence, the added computation is minor. In testing, given the observations at each time step, the high-level policy decides on which tasks to select and the task policy generates an action for the robot.

Iii-B Hierarchical Behavior Cloning

Input: parameters
Output: initialized policies ,

1:Initialize data buffers and ,
2:Initialize policies ,
3:repeat
4:     Get a new environment instance with start state
5:     for  do   
6:         Get an updated state   
7:         Append   
8:         Append      
9:until End of collecting
10:Train meta-controller Train
11:Train sub-policies Train for all
Algorithm 1 Hierarchical Behavior Cloning

We begin by describing the standard imitation learning setup and then proceed to our DAgger algorithms. Consider a robot that interacts with the environment over discrete time steps. At each time step , the robot receives an observation as its state representation and takes an action by . The basic idea behind imitation learning is to learn controller polices that mimic an expert. Since the expert is successful at performing the task of interest, a robot trained to mimic expert actions will also perform the task well.

Let us define the expert knowledge of the meta-controller as and that of the sub-task controllers as . Without explicitly model the behavior of the pedestrian, we drive our robot platform in real-world to collect the expert data. We use the hierarchical behavior cloning algorithm (see Algorithm 1) to initialize our controller policies. The algorithm collects separate state-task and state-action pairs from expert driving data to form datasets and to train meta-controller police and sub-task controller polices, respectively.

Iii-C Hierarchical Learn-from-Mistake DAgger

Iii-C1 DAgger

Using only behavior cloning, the best policy we can get is learned from the distribution of states generated by the expert, not the robot itself. Such policy might likely fail when the robot encounters a state that is never visited by the expert. The Dataset Aggregation (DAgger) algorithm is introduced in [8] to alleviate this problem. In its simplest form, DAgger works iteratively as follows. At the outset, we collect data and initialize policy by behavior cloning the expert demonstrations; here denotes the states are sampled from the expert. Subsequently, at iteration , we collect data , where the states are sampled from the previously-trained learner with policy . By merging current and previous data, is used to train new policy . Intuitively, at each iteration, we collect a set of states that the learner is likely to visit based on previous experience, and obtain modified actions from the expert.

Input: parameters , , and policies ,
Output: updated policies ,

1:Initialize data buffers and ,
2:repeat
3:     Get a new environment instance with start state
4:     Initialize data queue
5:     for  do
6:         Get an updated state
7:         if  then
8:              if no Intervention then
9:                  Execute
10:                  if   then POP                   
11:                  Append to
12:              else
13:                  Execute
14:                  
15:                  Append
16:                  Go to line 3                  
17:         else    
18:              Append                 
19:until End of collecting
20:Update meta-controller Train
21:Update sub-policies Train for all
Algorithm 2 Hierarchical Learn-form-Intervention DAgger

Iii-C2 Learn-from-Intervention

DAgger works fine for learning policies to play video games; however, adapting it to real world is difficult since we cannot allow robot to execute actions generated by an imperfect policy, which might jeopardize the safety of pedestrians on the road. It is also impractical to run the robot in pedestrian-rich environment before the model finally achieves good performance.

In real-world setting, we need to make frequent interventions to the robot and take control over the robot from early steps leading to a mistake. Such requirements make it hard to collect meaningful data trajectories as in the original DAgger. To overcome the difficulties, we propose Learn-from-Intervention DAgger (see Algorithm 2), which allows the robot to learn how to react when it is likely going to make a mistake and also allows frequent expert interventions.

The basic idea is inspired by the original DAgger, we let the learner controlled by itself and generate , and the expert intervenes when the robot is likely going to make mistakes, e.g., get off the road or cause damage. The expert makes an intervention and the control switches to the expert. For example, when the robot is going to drive off the road, the expert will make a steer and bring it back to the correct track. After making the correction, the control switches back to the robot. In contrast to the original DAgger, we record data a certain period before the expert intervention with a fixed length queue . The reason why we need to record a period before the intervention are two fold. First, even the expert needs reaction time for the intervention. Second, the mistake taking place now is likely caused by more than one step rolling back the time. Therefore, the newly added data consists of the trajectory that is likely leading to a mistake and the intervention of the expert .

In order to use the data in , a backtrack function is used to modify the state-action pairs in the buffer, which will later be added to the dataset after each intervention. At intervention time step , the error is measured by the difference between the action and . We test three different functions for backtracking the errors for including logarithm, exponential and linear functions. Empirically, we find the linear function works best in our experiment. Furthermore, by learning from interventions after the robot gets warm start, the robot is easier to generalize what is right and what is wrong, therefore it gets better performance for the next iteration. Finally, we expend this framework to hierarchical structure.

Fig. 2: General graph of our hardware system setup indicating how wires connect and frames captured by three cameras in one spot. This three-camera setting provide a wide field of view to ensure the sides of paths can be recorded.

Iv System

Iv-a Hardware Setup

We evaluate our approach using an off-the-shelf RC car (Traxxas Tactical) as our robot platform, an NVIDIA Jetson TX2 as our main console, an Intel RealSense D415 as the main central camera and two webcams (LogiTech C615) on the sides. We also employ an Arduino board and a USB servo controller to implement our control system to the robot. Fig. 2 shows our hardware setup.

When performing imitation learning, the crucial part is how to collect the training data. In our case, the simplest solution is to collect every states when expert driving the robot in real world. However, this typically leads to imperfect policy, since the robot is only trained on the states encountered by the expert and will not recovery from mistakes when encountering an unseen state. To alleviate this problem, we include observations of recoveries from perturbations. Inspired by the pioneering work of Pomerleau [28], we install three cameras in front of the car: one facing forward and the other two shifted to the left and to the right. By recording from three cameras simultaneously with appropriately adjusted control signals, we can simulate recovery from drift.

Iv-B Framework Overview

The framework is depicted in Fig. 3. Here a hierarchical framework containing two different stages is employed. We start stage 1 as warm-up stage which is used to generate a baseline policy for input to stage 2. Given the baseline policy provided by warm-up stage, two separate modules work interactively in Stage 2. The learner takes policies as input, successively generates and executes actions based on the observations captured by the center camera. Once a human expert makes an intervention to the predicted trajectory, due to a violation of the policy based on the experts judgment, all data queued in buffer will be saved to . We merge all data together to train a new policy , which will be feedback to the learner in the next loop.

Fig. 3:

Framework overview, employing a hierarchical structure based on two separate stages. Two modules in Stage 2 interact iteratively to update our policy. Minus signs indicate the loss calculated by the loss function.

Iv-B1 policy representation

For policy representation, we use a 2D CNN to map states to actions without considering sequential information for simplicity. We use transfer learning to adapt a pre-trained ResNet 

[29]

on ImageNet to our data. We retrain the last few layers and retain the weights for other layers, because we think that the lower layers contain kernels extracting basic image features, e.g., lines and corners, and they are general for all images. Notice that we impose a basic assumption here: for similar observations, the actions of the expert should obey a single-peak distribution. In this case, we use cross-entropy as our loss function, which calculates the distance between the predicted distribution of the network and the actual action distribution of the expert. In this way, the network can learn more effectively of the expert actions than a one-hot representation that is used for other problems, e.g., image classification.

Iv-B2 Stage 1

We prepare our dataset by driving the RC car around the campus and recording featured environments (bituminous street, wide/narrow footpath, etc.). All movements of the car at this stage are controlled by a human expert, recorded in terms of PWM-signal-based tuples, and saved together with the video data into a SD card on Jetson TX2. We limit PWM command for throttle controlling from transmitter to below 1750, mapping to 1.2   1.5m/s in real world, in order to imitate the walking speed of human pedestrians and also to handle unexpected scenarios during testing. We also linearly map the PWM signals for steering control to actual steering angles, ranging from -50 to 50 degrees. Under our system, all saved commands are passed through TX2 for convenient data manipulation and calculation. Furthermore, we use horizontal flips to further augment the training dataset, and also flip the command based on the center steering angel with normal distribution noise. This provides the controller with demonstrations of recovery from drift and unexpected disturbances.

Iv-B3 Stage 2

We use all data collected from Stage 1 to get a baseline policy. However, we have found that the policy learned with Stage 1 are not sufficiently robust. The robot still make mistakes at the training region where we use for collecting the training data. It performs even worse in unseen testing region, because only using states encountered by the expert is not enough for a good generalization. Therefore, we further use our Learn-from-Intervention DAgger algorithm to provide more crucial data to train our policies. To do so, we set up our platform to be able to record and inference at the same time. We implement a fixed length queue to retain the seen states and predicted commands. If there is no invention from the expert, the queue will pop out unused data. Upon detecting a likely mistake, the expert will correct the robot and the queue will be used. The model is evaluated on the TX2 in real time (inference performed at 10 FPS). It receives images from the central camera. The network predicts steer angle and speed value control command, which will be converted into appropriate PWM signals. After conversion, signals will be directly seed to the speed controller and steering servo.

V Experiments

To eliminate the influence of training data and evaluate qualitatively how well the model generalizes to previously unseen environments with different appearance, we intentionally choose four regions excluded by our dataset as our experimental field which are marked as red in Fig.4. For each sub-task, we use behavior cloning model as our first iteration and generate five policies which differ in the number.

During testing process, for each policy in every region, the performance of the robot is characterized as the proximity of its behaviors to human expert’s actions using two types of measurements for different sub-tasks.

Fig. 4: A map of our campus, with red areas indicating our training and testing areas. Pictures around the map present some featured examples of the real-world views of these areas.

V-a Successful Attempts

One of the most intuitive way to assess the achievements of our robot is to measure the number of successful attempts. For scenarios such as confronting and crossing, we just adopt this straightforward approach as our evaluation method. For individual sub-task, we make in total 20 attempts on four different roads for each model we trained. If the robot successfully shun confronting pedestrians or stop for crossing pedestrians, we will count that as a success. In Fig. 5, it presents the number of successful attempts for each policy on each road.

Fig. 5: Successful attempts of six models for pedestrian following and path following on each of for testing road within every 20 attempts. X-axis indicates the number of each road and the Y-axis indicates numbers of the successful trials.

As shown in Fig.5, robot achieve more successful attempts after each iteration of learning from intervention which indicates the improvement on generalization of the policy.

V-B Time Without Intervention

Apart from those two tasks mentioned in the previous section, we also evaluate the competence of our robot on two other sub-tasks, path following and pedestrian following, in which the performance could not be precisely quantified in terms of number of successful attempts since the task of following is an enduring process rather than one single action. Therefore, we choose to measure the time without intervention (TWI) which means the duration of the robot successfully following the object in autonomous mode. Results for experiments on four different roads are shown in Fig. 6. For each road, there is a gray line representing the total time our human needs to walk from one end to the other. Every box denotes the TWI of all five experiments.

Fig. 6: TWI of six models for confronting and crossing on five different roads. We show the results in terms of box plots.

The result indicates that generalization of our model is indeed enhanced after iteratively learning from human intervention.

V-C Training Data

Fig. 7: Changes in size of dataset related to number of iterations for all four scenarios. For each loop, the increments of data collected during DAgger process fall due to less errors made by the robot under updated policies.

Since our framework merely take failures into consideration, as our model improves, the amount of data we could collect dramatically decreases in a pattern exhibited in Fig. 7. We found that after five iterations the size of our dataset for each subtask will converge to a saturated state and basically remains invariant.

Vi Conclusions

To conclude, we introduce the learn-from-intervention DAgger algorithm, an approach for robot to navigate in a pedestrian-rich environment from raw images. We apply the proposed approach to a physical robotic vehicle in real-world road with pedestrians. The result of experiment indicates that imitating human behavior from intervention is useful for robot to generalize a good decision. For the future work, we plan to model this problem as a sequence-decision problem and take previous actions and observations into consideration when training a policy representation, which makes it more realistic for robots to generate actions.

References

  • [1] H. Bai, S. Cai, N. Ye, D. Hsu, and W. S. Lee, “Intention-aware online pomdp planning for autonomous driving in a crowd,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on.   IEEE, 2015, pp. 454–460.
  • [2] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on.   IEEE, 2017, pp. 1343–1350.
  • [3] A. Attia and S. Dayan, “Global overview of imitation learning,” arXiv preprint arXiv:1801.06503, 2018.
  • [4] S. Schaal, “Is imitation learning the route to humanoid robots?” Trends in cognitive sciences, vol. 3, no. 6, pp. 233–242, 1999.
  • [5] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in

    Proceedings of the twenty-first international conference on Machine learning

    .   ACM, 2004, p. 1.
  • [6] J. Bagnell, J. Chestnutt, D. M. Bradley, and N. D. Ratliff, “Boosting structured prediction for imitation learning,” in Advances in Neural Information Processing Systems, 2007, pp. 1153–1160.
  • [7] S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in

    Proceedings of the thirteenth international conference on artificial intelligence and statistics

    , 2010, pp. 661–668.
  • [8] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 627–635.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2016, pp. 770–778.
  • [10] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
  • [11] M. Phillips and M. Likhachev, “Sipp: Safe interval path planning for dynamic environments,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on.   IEEE, 2011, pp. 5628–5635.
  • [12] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research.   Springer, 2011, pp. 3–19.
  • [13] L. Tai, J. Zhang, M. Liu, and W. Burgard, “Socially-compliant navigation through raw depth inputs with generative adversarial imitation learning,” arXiv preprint arXiv:1710.02543, 2017.
  • [14] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” vol. 57, no. 5, pp. 469–483. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0921889008001772
  • [15] P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.
  • [16] A. Giusti, J. Guzzi, D. C. Ciresan, F.-L. He, J. P. Rodríguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro, et al., “A machine learning approach to visual perception of forest trails for mobile robots.” IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661–667, 2016.
  • [17] S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A. Bagnell, and M. Hebert, “Learning monocular reactive uav control in cluttered natural environments,” in 2013 IEEE international conference on robotics and automation.   IEEE, 2013, pp. 1765–1772.
  • [18] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8.   Chicago, IL, USA, 2008, pp. 1433–1438.
  • [19] B. D. Ziebart, A. L. Maas, A. K. Dey, and J. A. Bagnell, “Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior,” in Proceedings of the 10th international conference on Ubiquitous computing.   ACM, 2008, pp. 322–331.
  • [20] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739–746.
  • [21] D. Silver, J. A. Bagnell, and A. Stentz, “Learning from demonstration for autonomous navigation in complex unstructured terrain,” The International Journal of Robotics Research, vol. 29, no. 12, pp. 1565–1592, 2010.
  • [22] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
  • [23] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2722–2730.
  • [24] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, 1989, pp. 305–313.
  • [25] J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end autonomous driving,” arXiv preprint arXiv:1605.06450, 2016.
  • [26] F. Codevilla, M. Muller, A. Lopez, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” p. 8.
  • [27] S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A. Bagnell, and M. Hebert, “Learning monocular reactive UAV control in cluttered natural environments.” [Online]. Available: http://arxiv.org/abs/1211.1690
  • [28] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, 1989, pp. 305–313.
  • [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.