I Introduction and Problem Formulation
I-a Motivation of a neural network approach for control
Control methods that permit small sampling times, high-dimensional nonlinear nonconvex system models, and long planning horizons are very desirable. This also holds for autonomous driving. To underline complexity, for a sampling time of 0.01s and look-ahead time of 2s a time-based prediction horizon of 200 sampling instances is needed. This motivates offline encoding of motion primitives in neural networks (NNs), before their employment online in combination with a reference waypoint selector. The main disadvantage is that typically high computational power is needed for the encoding of a large number of motion primitives in NNs.
I-B Problem formulation and contribution
The problem addressed in this paper is to develop methods for efficient encoding of motion primitives in NNs. For clarity, it is stressed that the focus is here solely on aspects to improve the encoding process, in particular, for dynamic vehicle models. Not discussed here are all recursive feasibility related problems that arise in closed-loop control, for scenarios unseen during training and model mismatches (subject of ongoing work). The following main contributions are made. First, specific virtual velocity constraints (VVC) are proposed. It is motivated how these must be handled differently for kinematic and dynamic vehicle models. For the latter, a specific 1-scalar network extension is suggested. Second, network scheduling is proposed, whereby vehicle velocity is used as scheduling variable. Third, various feature vector selections are discussed, before a preferred 4D choice is motivated. Fourth, 3 feedforward structures are compared including weighted skip connections. Fifth, details of the GPU-implementation and a full 16-states-2-controls dynamic vehicle model are discussed. Sixth, the benefits and capabilities of tiny NNs with as few as 10 parameters are illustrated.
I-C Related work
. Note that ES can be considered a gradient-based algorithm since it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient, however, generating more robust policies . In contrast, TSHC is a truely gradient-free algorithm (hill climbing), which is designed specifically for deterministic encoding of motion primitives in NNs. This paper differs from  according to above listed 6 contributions.
Regarding control by motion primitives, this approach differs from methods derived from , which require online search of look-up tables, e.g., using a GPU for exhaustive search . In contrast, when encoding motion primitives in a NN, explicit search is not required. Instead, a feature vector relating the current state to a next desired state (waypoint) is fed to the network to generate control commands.
Regarding NN architectures, this paper extends recently proposed SCNs (structured control nets) 
, which combine a linear term mapping from network input to control channels additively with a nonlinear term resulting from a multilayer perceptron (MLP). In contrast, one of the discussed network architectures adds linearly weighted skip connections between all downstream layers. Skip connections are not new, but popular for learning very deep architectures .
Regarding vision-based end-to-end learning approaches [10, 11], the proposed approach fundamentally differs in that it is founded on model-based training. This offers the advantage that certificates about learnt control performance can be provided by statement of (i) the vehicle-model used for training, and (ii) the encoded motion primitives (training tasks) and their associated low-dimensional feature vectors. In contrast, providing equivalent certificates for vision-based end-to-end learning methods is in general much more difficult due to the high dimensionality of images.
Finally, it is noted that a closed-loop control system based on encoded motion primitives must always be seen in combination with a reference setpoint selector that is determining waypoints or features to be fed to the NN, which must account for obstacles and thus primarily solve nonconvex optimization problems. For exemplatory approaches see [12, 13, 14, 15, 16, 17]. Explicit reference setpoint selection as well as a preceding perception module (fusing proprio- and exteroceptive sensor measurements) are not the focus of this paper.
Ii System Model
Ii-a Kinematic 3-states vehicle model
The equations of motion of a well-known simple kinematic vehicle model are: , and , with wheelbase (in simulations 2.69m). This model has 3 states (position-coordinates and heading) and 2 controls (steering angle and velocity ). Both controls are additionally constrained by absolute and rate actuation limits to emulate steering and 0-100/100-0km/h ac/deceleration performance of the dynamic vehicle model described next.
Ii-B Dynamic 16-states vehicle model
Equations of motion of a 16-states dynamic vehicle model are derived by extending the bicycle model 111 Used as starting point since discussing a high-dimensional dynamic vehicle model and providing all hyperparameters for reproduction.
Used as starting point since discussing a high-dimensional dynamic vehicle model and providing all hyperparameters for reproduction.by aerodynamic friction forces, roll-, yaw- and four-wheel-dynamics:
Note that (1) is based on the Pacejka “magic formula” tyre model . In addition,  and  were used for its derivation. Because of the general importance of models for all model-based control and reinforcement learning algorithms, the entire C++ code excerpt is provided in Appendix -A, including all system parameters which are extended from  and modified (among others) such that a 0-100/100-0km/h ac/deceleration performance of 7.4/3.8s is obtained.
If-else distinctions are convenient for model formulations and imply logical constraints. In the context of optimal control, these can be translated into integer linear inqualities  yielding mixed-integer optimization problems. Note that logical constraints require no special treatment when encoding motion primitives in NNs via gradient-free learning.
Control commands are discussed. Suppose continuous control for at sampling time . Then, for the dynamic vehicle model,
before is distributed to drive- and brake-torques at the different wheels. In contrast, for the kinematic model only (2a) is used likewise, while (2b) is replaced by , with and maximum and minimum velocity. For the dynamic model, both acceleration and deceleration are controlled via (instead of, e.g., distinguishing front-wheel drive and 4 brake commands). This is done to compare kinematic and dynamic models with both having 2 controls. It implies that physical acceleration and braking actuators are never activated simultaneously.
Ii-C Discussion of vehicle model and feature vector selection
Three more comments about vehicle models are made. First, higher-fidelity vehicle models offer the potential of reducing control delays to a minimum. Therefore vehicles may be modeled to a degree such that the lowest-possible actuation commands are controlled, e.g., pulse-width modulated (PWM) signals. In contrast, simple kinematic models usually require additional cascaded low-level control for the mapping from PWM to velocity. For example, see  for spatial-based velocity control using a kinematic model.
Second, multiple system parameters which typically characterize higher-dimensional dynamic vehicle models offer means for robustifying control by encoding motion primitives for different system parameter settings.
Third, the choice of feature vector input to the NN controller may not necessarily be affected by the vehicle model. Various feature vectors are considered. In this paper, for example, a 5-, 6- and 7D selection are considered. The first is defined as , relating states at time to desired goal pose. The latter two options (6D and 7D) add either only , or both and , respectively. Normalization constants are employed, which throughout this paper are selected as in SI-units. It is distinguished between and to account for different velocity effects on the dynamics. Note that the first two options for are identical for both kinematic and dynamic vehicle model. However, the third option varies due to the different interpretation of for the two models. The dimension of may influence the number of training tasks. This is since without a priori knowledge about meaningful training tasks, the simplest method to generate training tasks is to grid over the elements of .
Iii Neural Network Architecture
The processing of feature vector by NNs is discussed.
Iii-a Fully Structured Control Nets (FSCNs)
Fully structured control nets (FSCNs) are introduced as
with parameters to learn , , and , and with . For the remainder of this paper all parameters to be learnt shall be summarized in a vector denoted by
initialized by small zero-mean Gaussian noise (with a standard deviation of 0.001). For illustration of the concept of FSCNs see also Fig.1. FSCN design choices are the number of layers and the number of units per layer . Note that and are fixed as the dimensions of feature vector and controls , respectively.
Iii-B Discussion of neural network architectures
Five remarks are made. First, FSCNs (3) extend recently proposed structured control nets (SCNs) , which for comparison read: , and . Thus, in contrast to (3), SCNs only add one linear term from network input to output. In Sect. VI-C, both architectures plus the standard multilayer perceptron (MLP) are compared. MLPs itself are identical to SCNs minus the additional linear term.
Second, weighted skip connections are introduced as in (3b) and (3d). Since these weights are initialized by small Gaussian noise with zero-mean, FSCNs initially resemble MLPs. Alternatively, initialization as identity mappings (plus small Gaussian noise) rather than around zero were also tested (then more emphasizing the skip-aspect) but not found to accelerate learning at the early training phase.
Third, in this paper small NNs (with few parameters) that still enable encoding of all motion primitives (training tasks) are desired. Small NNs are preferrable since they (i) permit faster network evaluation, which is favorable for both faster offline training and online control execution, and (ii) reduce large hardware storage requirements for parameters. For perspective, in  huge 4M+ parameter networks are mentioned for playing Atari games (requiring image processing). In contrast, we here seek to reduce the number of parameters as much as possible. The benefits of small networks become most apparent when training with limited computational ressources. The effects on training times are demonstrated in the experiments of Sect. VI-B.
Fourth, in contrast to MLPs, in general in (3d) cannot be guaranteed. This is because of its affine term. In experiments capping and an additional tanh activation were tested but found to not accelerate learning (on the contrary). Note that is ultimately ensured through physical actuator absolute and rate constraints.
Fifth, as will be shown, for the encoding of motion primitives based on dynamic vehicle models that do not control velocity directly, a NN-extension was found to be very useful for the handling of spatial virtual velocity constraints (VVCs). These constraints and the corresponding network extension are presented next and consequently applied to all 3 NNs discussed: FSCNs, SCNs and MLPs.
Iv Training Algorithm
This section states key aspects of the proposed algorithm for efficient encoding of motion primitives in above NNs.
Iv-a Virtual Velocity Constraints and Network Extension
The notion of virtual velocity constraints (VVCs) is adopted from [1, Sect. III-B]. However, due to the dynamic vehicle model used here and to remove 1 hyperparameter, modifications on VVCs are made as follows. First, let
with being output of a NN for both cases of training on a kinematic and dynamic vehicle model, and let
Second, if or , project and , respectively. Third, and now differentiating between the cases of training on a kinematic and dynamic vehicle model, set
for the former and latter cases, respectively. Here, is such that and is a scalar parameter to be learnt. However, only when training based on a dynamic vehicle model. Ultimately, for are applied to physical actuator absolute and rate constraints accounting for at the previous sampling time.
Several comments are made. First, the VVCs of (5) are spatially independent of goal proximity. This has several benefits: (i) due to the -margin feasibility can be guaranteed even for training tasks that demand, e.g., only a small lateral displacement of the vehicle for a desired starting and end velocity of 0km/h, and (ii) no hyperparameters and additional measures are required to threshold spatial goal proximity.
Second, the -margin around
is a heuristic choice. In general, it may be regared as a hyperparameter. However, here it is considered as fixed and to be interpreted as tolerable limitedly small velocity variation (for over/undershoots). The velocity corridor provided byencourages to always quickly and monotonously approach the target velocity. This property is (i) in general desirable, especially for throughput maximization and when it is encouraged and permitted by traffic to drive at speed limits (e.g., for urban driving), and (ii) encourages at most one velocity-sign change for the reaching of the goal velocity and state.
Third, VVCs can be regarded as a filter for the -output from a NN. As indicated in (7), for training on the dynamic vehicle model scalar must be learnt in addition to the other NN parameters. Since velocity is not controlled directly the encoding of motion primitives based on dynamic vehicle models is more complicated. In preliminary tests a variety of alternative filter functions were tested. The simple form of (7) was found to be suitable. It requires just one
scalar parameter and resembles a P-controller with nonlinear activation function. Note thatwhen .
Fourth and to summarize, VVCs are motivated (i) to accelerate learning, and (ii) to avoid velocity-over/undershoots until reaching of desired goal-poses. The benefits of VVCs for both training on a kinematic and a dynamic vehicle model are illustrated in the experiments of Sect. VI-A. They are found to be essential for efficient encoding of motion primitives in NNs, especially when training on sparse rewards as motivated in [1, Sect. III-B].
Iv-B Task Separation with Hill Climbing
The gradient-free TSHC algorithm with refinement step  is used for training, whereby perturbation hyperparameter
is selected randomly according to a uniform distribution at every parameter iteration to reduce the number of difficult to select hyperparameters (which would occur when instead using a fixed or adaptive). This is relevant for the generation of parameter solution candidates
with Gaussian distributedfor all .
To summarize, all hyperparameters remaining for the training algorithm are: , , , and . The number of restarts and maximum number of iterations per restart may be selected according to desired total training time. When training on a GPU we may select as the product of the number of blocks and threads per block used for asynchronous training. In general a feasibility guarantee of all motion primitives can be given since training is conducted obstacle-free, which encourages to select the maximum number of permitted time-steps to solve a training task large. On the other hand, an unnecessarily conservative choice prolongs training. Tolerances indicate when a specific training task goal-pose is reached.
Above comments underline a benefit of training by TSHC. That is its simplicity and interpretability of hyperparameters. Assuming large computational power being available, (i) all of , , and should be large, and (ii) should be small. Then, the only tunable hyperparameter remaining is . In practice, it was found that it should be selected sufficiently large to enable enough exploration in the NN parameter space. Hence, the choice .
Iv-C Neural Network Scheduling based on Vehicle Velocity
The TSHC algorithm used to encode a set of motion primitives in a NN was discussed above. Now it is proposed to partition a large set of motion primitives into subsets of motion primitives scheduled on vehicle velocity. Then, NNs may be learnt separately for each of these subsets by separate applications of the TSHC algorithm. As further demonstrated in the experiments of Sect. VI-D, this offers the advantage that learning effort can be adapted to difficulty of corresponding subsets of training tasks, e.g., using different network parametrizations and hyperparameters. Consequently, the overall time to learn the entire set of motion primitives can be reduced significantly. A disadvantage is a natural increase in the total number of network parameters. However, the former advantages clearly outweigh the latter disadvantage. This is since, as will be shown in the experiments of Sect. VI, tiny NNs can be used to encode many motion primitives.
V Implementation details
All methods are implemented in Cuda C++. Training is conducted on 1 GPU. Three more comments are made. First, a self-imposed guideline was to implement library-free code for the NN controller (such that in principle it could then run library-free on embedded hardware). Therefore, the tanh-function is approximated by an implementation of an Lambert’s continued fraction series expansion.
Second, as outlined in Sect. IV-B, parameter candidates
are generated by affine perturbations with zero-mean Gaussian noise and spherical variance. Therefore, uniform random numbers are first generated according to 
, before Gaussian random variables are generated based on the Box-Muller method. One instance of the latter method simultaneously generates two scalar Gaussian variables. Both are used to generate consecutive entries in . This enables library-free code. Furthermore, the same methods for uniform and Gaussian random variables are used on both GPU and CPU host. Thus, only from (8) and scalar random seeds need to be passed to the GPU kernels (workers), before parameter candidates are then generated directly on the GPU. A ranking of the performance of these workers and knowledge of their seed numbers then permits to reconstruct the best and corresponding on the CPU host.
Third, for final experiments each of GPU kernels implements one of workers such that parameter candidates are tested in parallel. For completeness, nested parallelization with solving training tasks in parallel for each parameter candidate is in general also possible. This was also tested and implemented by using Cuda’s atomicAdd-function and an algebraic mapping to reconstruct a specific training task from a kernel’s thread index. For our scenario with 1 GPU this method did, however, not accelerate training. In contrast, the preferred method testing parameter candidates in parallel implies that all training tasks are tested for each parameter candidate and GPU kernel, before a cumulative score is returned to the CPU host. Training tasks are generated directly on the GPU within nested for-loops instead of precomputation to minimize memory requirements.
Vi Simulation Experiments
Tolerances indicating the reaching of a desired goal pose are set as m and km/h. As will be discussed, is relevant only for Experiment 1 and is set as .
Vi-a Experiment 1: Effects of , vehicle model and VVCs
Experiment 1 is characterized by (i) a comparison of the 5-, 6- and 7D feature vectors from Section II-C here abbreviated as s5, s6 and s7, (ii) , (iii) FSCNs with 1 hidden layer and 1 hidden unit, i.e., [5,1,2], [6,1,2] and [7,1,2] (with each number in brackets indicating the number of units per layer), (iv) , (v) solving all tasks at once (i.e., without network scheduling), and (vi) training tasks according to [km/h], (capped between 0 and 120km/h), with and if as well as with and if , , , and . This yields a total number of training tasks, which are selected to analyze longitudinal control (to focus on VVCs effects) and to ensure a maximum look-ahead time of less than 2.5s. Thus, for the selected (in combination with a sampling time of 0.01s) all training tasks are guaranteed to be learnable.
Results are summarized in Table I, Fig. 2 and 3, whereby denotes the number of tasks solved to -precision, [m] the negative (hill climbing convention) accumulated pathlength, the number of network parameters (with ), [s] the total learning time and the number of restarts (out of ) for which all tasks could be solved.
Several observations can be made. First, VVCs clearly improve learning progress for both kinematic and dynamic vehicle models, see Fig. 2. Second, the inclusion of in feature vector clearly helps: compare s6 and s7 (solving all tasks) vs. s5 (omitting from and not solving all tasks). Note also the robustness for the former cases w.r.t. restarts. As indicates, for s6 all 125 tasks are solved for all restarts. Third, despite varying randomly according to Section IV-B, an evolution of parameters over and corresponding learning progress can be observed, see Fig. 2 (b) and (d). Fourth, significantly faster training times are observed for the kinematic in comparison to the dynamic vehicle model. This is since model simulations are much more complex for the latter, which accumulates to longer . Fourth, as Fig. 3 shows within limited desirable steering trajectories with small maximum lateral overshoot of only 0.41m (among all tasks) from optimal are learnt. Fig. 3 further indicates that monotonous velocity profiles are learnt. Finally, note that FSCNs with only one hidden layer and unit were sufficient to encode all training tasks.
Vi-B Experiment 2: Effects of network size for FSCNs
Experiment 2 is characterized by (i) feature vector
and (ii) , (iii) a comparison of FSCNs for different numbers of hidden layers and units per hidden layer, (iv) , (v) attempting to solve a total of training tasks at once (i.e., without network scheduling), which are generated by gridding and with uniform spacings of 0.25m and 10km/h, respectively, for all , and .
Results are summarized in Table II. Several comments are made. First, note how quickly rises with increasing network size. Experiment 2 was setup such that neither of the solutions in Table II solved all 585 tasks to (i) better illustrate the effects of different network sizes on FSCN-performance, and to (ii) underline the role of in combination with Experiment 3, which treats the exact same training tasks.
Second, (9) differs from the 5-, 6- and 7D feature vectors discussed in Sect. II-C and Experiment 1. Feature vector according to (9) is our preferred choice. This is motivated for three reasons. (i) While experimenting with different training task generation schemes it was found that including - and -related components in made training task setup much more complicated for control tasks requiring lateral motion. Gridding that guarantees feasibility, simultaneously enables to limit learning effort (limitedly small ), generalizes enough and therefore does not require much manual tuning is difficult. In contrast, gridding over the components of (9) is relatively straightforward. (ii) Since the NN controller is ultimately envisioned in combination with obstacle avoidance feasibility checks along forward simulated trajectories in a receding horizon fashion, focusing on -related training tasks appears suitable therefore and sufficient to encode lateral motion agility in NNs. This discussion is subject of ongoing work, see also Experiment 4 and Fig. 4 for illustration and Sect. VII. (iii) Furthermore, focus on enables control mirroring w.r.t. steering. This permits to limit training tasks to and thus use free training capacities to increase, e.g., lateral spacing resolution.
Vi-C Experiment 3: Aspects of different network architectures
Training setup is identical to Experiment 2 except that is increased from 500 to 1000. Different network architectures (FSCNs, SCNs  and MLPs) are compared for 3 different network sizes ([4,1,2], [4,2,2] and [4,4,2]). Results are summarized in Table III. The following observations are made. First, increasing enables to solve all tasks. While for Experiment 2 and even the largest FSCN could not solve all 585 tasks, for Experiment 3 and every and even the smallest FSCN can solve all tasks. In contrast, none of the SCNs and only MLP-[4,4,2] could solve all . Note that the best -result is obtained for MLP-[4,4,2].
Second, significantly larger are observed for FSCNs in comparison to SCNs and MLPs. One reason is that for, e.g., FSCN-[4,1,2] 178 out of best parameter settings solved less than 500 tasks. In contrast, for MLP-[4,1,2] only 68 out of solved less than 500 tasks (despite none solving all 585). This was a recurring observation. Thus, on average per GPU-call MLPs solved more tasks, which reduced overall .
Third, for the same number of hidden layers and units per hidden layer, MLPs always have fewer parameters than SCNs, which themselves have fewer than FSCNs.
To summarize, based on above observations, in particular on and , for our purpose of encoding motion primitives in NNs it is found that MLPs are still preferrable as function approximators over both SCNs  and also its extension FSCNs. MLPs are the focus in Experiment 4.
Vi-D Experiment 4: Neural network scheduling on velocity
Experiment 4 is characterized by (i) comparing feature vector from (9) and its 5D extension including , (ii) , (iii) a comparison of MLP-[4,1,2] and MLP-[5,1,2], (iv) , (v) attempting to solve a total of training tasks at once and alternatively with network scheduling, whereby (vi) training tasks are generated by gridding and with uniform spacings of 0.25m and 10km/h, respectively, for all , and (with ).
The results are summarized in Tables IV, V and VI and Fig. 4. Several observations can be made. First, the benefits of network scheduling are illustrated. These include (i) faster overall learning time (accumulated 8.1 hours for MLP-[4,1,2] in Table V vs. 22.1 hours in Table IV), and (ii) the ability to quicker detect difficult subsets of training tasks that can consequently also be resolved faster after modification of hyperparameters or even NN parametrization. Second, results of Table V suggest to prefer a 4D over a 5D . The reduction of accumulated for all 10 subsets of tasks solved completely by both MLP-[4,1,2] and MLP-[5,1,2] is 15% for the former vs. the latter. Third, the capabilities of tiny NNs with only 10 parameters for MLP-[4,1,2] are demonstrated. These were tested both when (i) encoding all 14625 motion primitives at once, and (ii) learning subsets of 1125 tasks scheduled on vehicle velocity. It is remarkable that a MLP-[4,1,2] with 10 parameters (with ) can encode 13685 motion primitives to desired -precision. Note that for the scheduling solution overall 13 MLPs are learnt, with each having 10 parameters. Fourth, it is stressed that training tasks of Experiment 4 are not easy. Since these are generated uniformly, in the extreme case, the steering angle is initialized as at an initial km/h. This resulted in learnt trajectories with a maximum lateral overshoot of 62.1m before recovery of the desired . For visualization see also Fig. 4 where the lateral maximum overshoot is slightly more than 10m for an initial m/h. The left frame of Fig. 4 is displayed to underline that for limited learning time only local optimal trajectories (according to the shortest path criterion subject to actuator and system constraints) are learnt. Trajectories are approximately balanced, but not entirely. Not displayed for brevity but very interestingly to observe was that for and trajectories were learnt that first reverse drive for some time before only then accelerating forward. This fully makes sense given the initial negative tire heading, the pathlength minimization objective for and . Finally, as indicated in Table V and discussed above through the lateral overshoots, more difficult learning was observed for the higher velocity tasks. Here, tasks were setup (i) for demonstration that difficult motion primitives can still be encoded, and (ii) to motivate future work on automated extraction of meaningful motion primitives from real-world driving data, especially for higher vehicle velocities.
Several methods were presented for efficient encoding of motion primitives in neural networks. Therefore in particular (i) specific virtual velocity constraints, (ii) neural network scheduling based on vehicle velocity, (iii) training task setups dismissing -related components and focusing on for a 4D feature vector selection, and (iv) the capabilities of tiny neural networks were promoted. Furthermore, (i) a 3-states-2-controls kinematic and 16-states-2-controls dynamic model comparison, (ii) discussion of 3 feedforward neural network architectures including weighted skip connections, and (iii) implementation details of model-based and gradient-free training using 1 GPU were discussed. Findings were illustrated by means of 4 simulation experiments.
Main subject of future work is closed-loop evaluation. This comprises robustness analysis for scenarios unseen during training, vehicle model mismatches, and analysis of the combination with a reference setpoint selector with obstacle avoidance inequality checks of closed-loop forward simulated trajectories in a receding horizon fashion. Here, the preferred 4D feature vector with focus on -selection is believed to enable the design of simple recursive waypoint selectors. Waypoints may also be concatenated to generate trajectory-trees that then permit very long planning horizons.
-  M. G. Plessen, “Automating vehicles by deep reinforcement learning using task separation with hill climbing,” arXiv preprint arXiv:1711.10785, 2017.
-  T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever, “Evolution strategies as a scalable alternative to reinforcement learning,” arXiv preprint arXiv:1703.03864, 2017.
-  F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, “Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning,” arXiv preprint arXiv:1712.06567, 2017.
-  J. Lehman, J. Chen, J. Clune, and K. O. Stanley, “ES is more than just a traditional finite-difference approximator,” arXiv preprint arXiv:1712.06568, 2017.
-  E. Frazzoli, M. A. Dahleh, and E. Feron, “A hybrid control architecture for aggressive maneuvering of autonomous helicopters,” in IEEE Conference on Decision and Control, vol. 3, pp. 2471–2476, 1999.
-  M. McNaughton, C. Urmson, J. M. Dolan, and J.-W. Lee, “Motion planning for autonomous driving with a conformal spatiotemporal lattice,” in IEEE Conference on Robotics and Automation, pp. 4889–4895, 2011.
-  M. Srouji, J. Zhang, and R. Salakhutdinov, “Structured control nets for deep reinforcement learning,” arXiv preprint arXiv:1802.08311, 2018.
C. M. Bishop et al.,
Neural networks for pattern recognition. Oxford university press, 1995.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
-  D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in Neural Information Processing Systems, pp. 305–313, 1989.
-  M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
-  K. Kant and S. W. Zucker, “Toward efficient trajectory planning: The path-velocity decomposition,” The International Journal of Robotics Research, vol. 5, no. 3, pp. 72–89, 1986.
-  T. Lozano-Pérez and L. P. Kaelbling, “A constraint-based method for solving sequential manipulation planning problems,” in IEEE Conference on Intelligent Robots and Systems, pp. 3684–3691, 2014.
-  S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible planner-independent interface layer,” in IEEE Conference on Robotics and Automation, pp. 639–646, 2014.
-  X. Hu, L. Chen, B. Tang, D. Cao, and H. He, “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mechanical Systems and Signal Processing, vol. 100, pp. 482–500, 2018.
-  J. Liu, P. Hou, L. Mu, Y. Yu, and C. Huang, “Elements of effective deep reinforcement learning towards tactical driving decision making,” arXiv preprint arXiv:1802.00332, 2018.
C.-J. Hoel, M. Wahde, and K. Wolff, “An evolutionary approach to
general-purpose automated speed and lane change behavior,” in
IEEE International Conference on Machine Learning and Applications, pp. 743–748, 2017.
-  E. Velenis, E. Frazzoli, and P. Tsiotras, “Steady-state cornering equilibria and stabilisation for a vehicle during extreme operating conditions,” International Journal of Vehicle Autonomous Systems, vol. 8, no. 2-4, pp. 217–241, 2010.
-  E. Bakker, L. Nyborg, and H. B. Pacejka, “Tyre modelling for use in vehicle dynamics studies,” tech. rep., SAE Technical Paper, 1987.
-  S. M. Savaresi, C. Poussot-Vassal, C. Spelta, O. Sename, and L. Dugard, Semi-active suspension control design for vehicles. Elsevier, 2010.
J. Svendenius, “Tire modeling and friction estimation,”PhD Thesis, 2007.
-  H. P. Williams, Model building in mathematical programming. John Wiley & Sons, 2013.
-  M. G. Plessen, “Trajectory planning of automated vehicles in tube-like road segments,” in IEEE Conference on Intelligent Transportation Systems, pp. 83–88, 2017.
-  S. K. Park and K. W. Miller, “Random number generators: good ones are hard to find,” Communications of the ACM, vol. 31, no. 10, pp. 1192–1201, 1988.
-  G. E. Box, M. E. Muller, et al., “A note on the generation of random normal deviates,” The Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610–611, 1958.