I Introduction and Problem Formulation
Ia Motivation of a neural network approach for control
Control methods that permit small sampling times, highdimensional nonlinear nonconvex system models, and long planning horizons are very desirable. This also holds for autonomous driving. To underline complexity, for a sampling time of 0.01s and lookahead time of 2s a timebased prediction horizon of 200 sampling instances is needed. This motivates offline encoding of motion primitives in neural networks (NNs), before their employment online in combination with a reference waypoint selector. The main disadvantage is that typically high computational power is needed for the encoding of a large number of motion primitives in NNs.
IB Problem formulation and contribution
The problem addressed in this paper is to develop methods for efficient encoding of motion primitives in NNs. For clarity, it is stressed that the focus is here solely on aspects to improve the encoding process, in particular, for dynamic vehicle models. Not discussed here are all recursive feasibility related problems that arise in closedloop control, for scenarios unseen during training and model mismatches (subject of ongoing work). The following main contributions are made. First, specific virtual velocity constraints (VVC) are proposed. It is motivated how these must be handled differently for kinematic and dynamic vehicle models. For the latter, a specific 1scalar network extension is suggested. Second, network scheduling is proposed, whereby vehicle velocity is used as scheduling variable. Third, various feature vector selections are discussed, before a preferred 4D choice is motivated. Fourth, 3 feedforward structures are compared including weighted skip connections. Fifth, details of the GPUimplementation and a full 16states2controls dynamic vehicle model are discussed. Sixth, the benefits and capabilities of tiny NNs with as few as 10 parameters are illustrated.
IC Related work
Regarding the training method, this paper is based on the TSHCalgorithm (task separation with hill climbing) [1], which itself was motivated by the ESalgorithm (evolution strategies) [2]
. Note that ES can be considered a gradientbased algorithm since it performs stochastic gradient descent via an operation similar to a finitedifference approximation of the gradient
[3], however, generating more robust policies [4]. In contrast, TSHC is a truely gradientfree algorithm (hill climbing), which is designed specifically for deterministic encoding of motion primitives in NNs. This paper differs from [1] according to above listed 6 contributions.Regarding control by motion primitives, this approach differs from methods derived from [5], which require online search of lookup tables, e.g., using a GPU for exhaustive search [6]. In contrast, when encoding motion primitives in a NN, explicit search is not required. Instead, a feature vector relating the current state to a next desired state (waypoint) is fed to the network to generate control commands.
Regarding NN architectures, this paper extends recently proposed SCNs (structured control nets) [7]
, which combine a linear term mapping from network input to control channels additively with a nonlinear term resulting from a multilayer perceptron (MLP). In contrast, one of the discussed network architectures adds linearly weighted skip connections between all downstream layers. Skip connections are not new
[8], but popular for learning very deep architectures [9].Regarding visionbased endtoend learning approaches [10, 11], the proposed approach fundamentally differs in that it is founded on modelbased training. This offers the advantage that certificates about learnt control performance can be provided by statement of (i) the vehiclemodel used for training, and (ii) the encoded motion primitives (training tasks) and their associated lowdimensional feature vectors. In contrast, providing equivalent certificates for visionbased endtoend learning methods is in general much more difficult due to the high dimensionality of images.
Finally, it is noted that a closedloop control system based on encoded motion primitives must always be seen in combination with a reference setpoint selector that is determining waypoints or features to be fed to the NN, which must account for obstacles and thus primarily solve nonconvex optimization problems. For exemplatory approaches see [12, 13, 14, 15, 16, 17]. Explicit reference setpoint selection as well as a preceding perception module (fusing proprio and exteroceptive sensor measurements) are not the focus of this paper.
Ii System Model
Iia Kinematic 3states vehicle model
The equations of motion of a wellknown simple kinematic vehicle model are: , and , with wheelbase (in simulations 2.69m). This model has 3 states (positioncoordinates and heading) and 2 controls (steering angle and velocity ). Both controls are additionally constrained by absolute and rate actuation limits to emulate steering and 0100/1000km/h ac/deceleration performance of the dynamic vehicle model described next.
IiB Dynamic 16states vehicle model
Equations of motion of a 16states dynamic vehicle model are derived by extending the bicycle model [18]^{1}^{1}1
Used as starting point since discussing a highdimensional dynamic vehicle model and providing all hyperparameters for reproduction.
by aerodynamic friction forces, roll, yaw and fourwheeldynamics:(1a)  
(1b)  
(1c)  
(1d)  
(1e)  
(1f)  
(1g)  
(1h)  
(1i)  
(1j)  
(1k)  
(1l)  
(1m)  
(1n)  
(1o)  
(1p) 
Note that (1) is based on the Pacejka “magic formula” tyre model [19]. In addition, [20] and [21] were used for its derivation. Because of the general importance of models for all modelbased control and reinforcement learning algorithms, the entire C++ code excerpt is provided in Appendix A, including all system parameters which are extended from [18] and modified (among others) such that a 0100/1000km/h ac/deceleration performance of 7.4/3.8s is obtained.
Ifelse distinctions are convenient for model formulations and imply logical constraints. In the context of optimal control, these can be translated into integer linear inqualities [22] yielding mixedinteger optimization problems. Note that logical constraints require no special treatment when encoding motion primitives in NNs via gradientfree learning.
Control commands are discussed. Suppose continuous control for at sampling time . Then, for the dynamic vehicle model,
(2a)  
(2b) 
before is distributed to drive and braketorques at the different wheels. In contrast, for the kinematic model only (2a) is used likewise, while (2b) is replaced by , with and maximum and minimum velocity. For the dynamic model, both acceleration and deceleration are controlled via (instead of, e.g., distinguishing frontwheel drive and 4 brake commands). This is done to compare kinematic and dynamic models with both having 2 controls. It implies that physical acceleration and braking actuators are never activated simultaneously.
IiC Discussion of vehicle model and feature vector selection
Three more comments about vehicle models are made. First, higherfidelity vehicle models offer the potential of reducing control delays to a minimum. Therefore vehicles may be modeled to a degree such that the lowestpossible actuation commands are controlled, e.g., pulsewidth modulated (PWM) signals. In contrast, simple kinematic models usually require additional cascaded lowlevel control for the mapping from PWM to velocity. For example, see [23] for spatialbased velocity control using a kinematic model.
Second, multiple system parameters which typically characterize higherdimensional dynamic vehicle models offer means for robustifying control by encoding motion primitives for different system parameter settings.
Third, the choice of feature vector input to the NN controller may not necessarily be affected by the vehicle model. Various feature vectors are considered. In this paper, for example, a 5, 6 and 7D selection are considered. The first is defined as , relating states at time to desired goal pose. The latter two options (6D and 7D) add either only , or both and , respectively. Normalization constants are employed, which throughout this paper are selected as in SIunits. It is distinguished between and to account for different velocity effects on the dynamics. Note that the first two options for are identical for both kinematic and dynamic vehicle model. However, the third option varies due to the different interpretation of for the two models. The dimension of may influence the number of training tasks. This is since without a priori knowledge about meaningful training tasks, the simplest method to generate training tasks is to grid over the elements of .
Iii Neural Network Architecture
The processing of feature vector by NNs is discussed.
Iiia Fully Structured Control Nets (FSCNs)
Fully structured control nets (FSCNs) are introduced as
(3a)  
(3b)  
(3c)  
(3d) 
with parameters to learn , , and , and with . For the remainder of this paper all parameters to be learnt shall be summarized in a vector denoted by
initialized by small zeromean Gaussian noise (with a standard deviation of 0.001). For illustration of the concept of FSCNs see also Fig.
1. FSCN design choices are the number of layers and the number of units per layer . Note that and are fixed as the dimensions of feature vector and controls , respectively.IiiB Discussion of neural network architectures
Five remarks are made. First, FSCNs (3) extend recently proposed structured control nets (SCNs) [7], which for comparison read: , and . Thus, in contrast to (3), SCNs only add one linear term from network input to output. In Sect. VIC, both architectures plus the standard multilayer perceptron (MLP) are compared. MLPs itself are identical to SCNs minus the additional linear term.
Second, weighted skip connections are introduced as in (3b) and (3d). Since these weights are initialized by small Gaussian noise with zeromean, FSCNs initially resemble MLPs. Alternatively, initialization as identity mappings (plus small Gaussian noise) rather than around zero were also tested (then more emphasizing the skipaspect) but not found to accelerate learning at the early training phase.
Third, in this paper small NNs (with few parameters) that still enable encoding of all motion primitives (training tasks) are desired. Small NNs are preferrable since they (i) permit faster network evaluation, which is favorable for both faster offline training and online control execution, and (ii) reduce large hardware storage requirements for parameters. For perspective, in [3] huge 4M+ parameter networks are mentioned for playing Atari games (requiring image processing). In contrast, we here seek to reduce the number of parameters as much as possible. The benefits of small networks become most apparent when training with limited computational ressources. The effects on training times are demonstrated in the experiments of Sect. VIB.
Fourth, in contrast to MLPs, in general in (3d) cannot be guaranteed. This is because of its affine term. In experiments capping and an additional tanh activation were tested but found to not accelerate learning (on the contrary). Note that is ultimately ensured through physical actuator absolute and rate constraints.
Fifth, as will be shown, for the encoding of motion primitives based on dynamic vehicle models that do not control velocity directly, a NNextension was found to be very useful for the handling of spatial virtual velocity constraints (VVCs). These constraints and the corresponding network extension are presented next and consequently applied to all 3 NNs discussed: FSCNs, SCNs and MLPs.
Iv Training Algorithm
This section states key aspects of the proposed algorithm for efficient encoding of motion primitives in above NNs.
Iva Virtual Velocity Constraints and Network Extension
The notion of virtual velocity constraints (VVCs) is adopted from [1, Sect. IIIB]. However, due to the dynamic vehicle model used here and to remove 1 hyperparameter, modifications on VVCs are made as follows. First, let
(4) 
with being output of a NN for both cases of training on a kinematic and dynamic vehicle model, and let
(5) 
Second, if or , project and , respectively. Third, and now differentiating between the cases of training on a kinematic and dynamic vehicle model, set
(6) 
and
(7) 
for the former and latter cases, respectively. Here, is such that and is a scalar parameter to be learnt. However, only when training based on a dynamic vehicle model. Ultimately, for are applied to physical actuator absolute and rate constraints accounting for at the previous sampling time.
Several comments are made. First, the VVCs of (5) are spatially independent of goal proximity. This has several benefits: (i) due to the margin feasibility can be guaranteed even for training tasks that demand, e.g., only a small lateral displacement of the vehicle for a desired starting and end velocity of 0km/h, and (ii) no hyperparameters and additional measures are required to threshold spatial goal proximity.
Second, the margin around
is a heuristic choice. In general, it may be regared as a hyperparameter. However, here it is considered as fixed and to be interpreted as tolerable limitedly small velocity variation (for over/undershoots). The velocity corridor provided by
encourages to always quickly and monotonously approach the target velocity. This property is (i) in general desirable, especially for throughput maximization and when it is encouraged and permitted by traffic to drive at speed limits (e.g., for urban driving), and (ii) encourages at most one velocitysign change for the reaching of the goal velocity and state.Third, VVCs can be regarded as a filter for the output from a NN. As indicated in (7), for training on the dynamic vehicle model scalar must be learnt in addition to the other NN parameters. Since velocity is not controlled directly the encoding of motion primitives based on dynamic vehicle models is more complicated. In preliminary tests a variety of alternative filter functions were tested. The simple form of (7) was found to be suitable. It requires just one
scalar parameter and resembles a Pcontroller with nonlinear activation function. Note that
when .Fourth and to summarize, VVCs are motivated (i) to accelerate learning, and (ii) to avoid velocityover/undershoots until reaching of desired goalposes. The benefits of VVCs for both training on a kinematic and a dynamic vehicle model are illustrated in the experiments of Sect. VIA. They are found to be essential for efficient encoding of motion primitives in NNs, especially when training on sparse rewards as motivated in [1, Sect. IIIB].
IvB Task Separation with Hill Climbing
The gradientfree TSHC algorithm with refinement step [1] is used for training, whereby perturbation hyperparameter
is selected randomly according to a uniform distribution at every parameter iteration to reduce the number of difficult to select hyperparameters (which would occur when instead using a fixed or adaptive
). This is relevant for the generation of parameter solution candidates(8) 
with Gaussian distributed
for all .To summarize, all hyperparameters remaining for the training algorithm are: , , , and . The number of restarts and maximum number of iterations per restart may be selected according to desired total training time. When training on a GPU we may select as the product of the number of blocks and threads per block used for asynchronous training. In general a feasibility guarantee of all motion primitives can be given since training is conducted obstaclefree, which encourages to select the maximum number of permitted timesteps to solve a training task large. On the other hand, an unnecessarily conservative choice prolongs training. Tolerances indicate when a specific training task goalpose is reached.
Above comments underline a benefit of training by TSHC. That is its simplicity and interpretability of hyperparameters. Assuming large computational power being available, (i) all of , , and should be large, and (ii) should be small. Then, the only tunable hyperparameter remaining is . In practice, it was found that it should be selected sufficiently large to enable enough exploration in the NN parameter space. Hence, the choice .
IvC Neural Network Scheduling based on Vehicle Velocity
The TSHC algorithm used to encode a set of motion primitives in a NN was discussed above. Now it is proposed to partition a large set of motion primitives into subsets of motion primitives scheduled on vehicle velocity. Then, NNs may be learnt separately for each of these subsets by separate applications of the TSHC algorithm. As further demonstrated in the experiments of Sect. VID, this offers the advantage that learning effort can be adapted to difficulty of corresponding subsets of training tasks, e.g., using different network parametrizations and hyperparameters. Consequently, the overall time to learn the entire set of motion primitives can be reduced significantly. A disadvantage is a natural increase in the total number of network parameters. However, the former advantages clearly outweigh the latter disadvantage. This is since, as will be shown in the experiments of Sect. VI, tiny NNs can be used to encode many motion primitives.
V Implementation details
All methods are implemented in Cuda C++. Training is conducted on 1 GPU. Three more comments are made. First, a selfimposed guideline was to implement libraryfree code for the NN controller (such that in principle it could then run libraryfree on embedded hardware). Therefore, the tanhfunction is approximated by an implementation of an Lambert’s continued fraction series expansion.
Second, as outlined in Sect. IVB, parameter candidates
are generated by affine perturbations with zeromean Gaussian noise and spherical variance
. Therefore, uniform random numbers are first generated according to [24], before Gaussian random variables are generated based on the BoxMuller method
[25]. One instance of the latter method simultaneously generates two scalar Gaussian variables. Both are used to generate consecutive entries in . This enables libraryfree code. Furthermore, the same methods for uniform and Gaussian random variables are used on both GPU and CPU host. Thus, only from (8) and scalar random seeds need to be passed to the GPU kernels (workers), before parameter candidates are then generated directly on the GPU. A ranking of the performance of these workers and knowledge of their seed numbers then permits to reconstruct the best and corresponding on the CPU host.Third, for final experiments each of GPU kernels implements one of workers such that parameter candidates are tested in parallel. For completeness, nested parallelization with solving training tasks in parallel for each parameter candidate is in general also possible. This was also tested and implemented by using Cuda’s atomicAddfunction and an algebraic mapping to reconstruct a specific training task from a kernel’s thread index. For our scenario with 1 GPU this method did, however, not accelerate training. In contrast, the preferred method testing parameter candidates in parallel implies that all training tasks are tested for each parameter candidate and GPU kernel, before a cumulative score is returned to the CPU host. Training tasks are generated directly on the GPU within nested forloops instead of precomputation to minimize memory requirements.
Vi Simulation Experiments
Tolerances indicating the reaching of a desired goal pose are set as m and km/h. As will be discussed, is relevant only for Experiment 1 and is set as .
Via Experiment 1: Effects of , vehicle model and VVCs
162 
(s5, VVCs)  79    30  320.9  0 

(s5,)  68    29  341.9  0  
(s6, VVCs)  125  1959.3  34  379.1  10  
(s6,)  79    33  330.0  0  
(s7, VVCs)  125  1959.3  38  342.5  9  
(s7,)  100    37  330.5  0  
32 
(s5, VVCs)  118    29  51.3  0 
(s5,)  77    29  51.9  0  
(s6, VVCs)  125  1956.3  33  52.4  10  
(s6,)  88    33  54.8  0 
Experiment 1 is characterized by (i) a comparison of the 5, 6 and 7D feature vectors from Section IIC here abbreviated as s5, s6 and s7, (ii) , (iii) FSCNs with 1 hidden layer and 1 hidden unit, i.e., [5,1,2], [6,1,2] and [7,1,2] (with each number in brackets indicating the number of units per layer), (iv) , (v) solving all tasks at once (i.e., without network scheduling), and (vi) training tasks according to [km/h], (capped between 0 and 120km/h), with and if as well as with and if , , , and . This yields a total number of training tasks, which are selected to analyze longitudinal control (to focus on VVCs effects) and to ensure a maximum lookahead time of less than 2.5s. Thus, for the selected (in combination with a sampling time of 0.01s) all training tasks are guaranteed to be learnable.
Results are summarized in Table I, Fig. 2 and 3, whereby denotes the number of tasks solved to precision, [m] the negative (hill climbing convention) accumulated pathlength, the number of network parameters (with ), [s] the total learning time and the number of restarts (out of ) for which all tasks could be solved.
Several observations can be made. First, VVCs clearly improve learning progress for both kinematic and dynamic vehicle models, see Fig. 2. Second, the inclusion of in feature vector clearly helps: compare s6 and s7 (solving all tasks) vs. s5 (omitting from and not solving all tasks). Note also the robustness for the former cases w.r.t. restarts. As indicates, for s6 all 125 tasks are solved for all restarts. Third, despite varying randomly according to Section IVB, an evolution of parameters over and corresponding learning progress can be observed, see Fig. 2 (b) and (d). Fourth, significantly faster training times are observed for the kinematic in comparison to the dynamic vehicle model. This is since model simulations are much more complex for the latter, which accumulates to longer . Fourth, as Fig. 3 shows within limited desirable steering trajectories with small maximum lateral overshoot of only 0.41m (among all tasks) from optimal are learnt. Fig. 3 further indicates that monotonous velocity profiles are learnt. Finally, note that FSCNs with only one hidden layer and unit were sufficient to encode all training tasks.
ViB Experiment 2: Effects of network size for FSCNs
Experiment 2 is characterized by (i) feature vector
(9) 
and (ii) , (iii) a comparison of FSCNs for different numbers of hidden layers and units per hidden layer, (iv) , (v) attempting to solve a total of training tasks at once (i.e., without network scheduling), which are generated by gridding and with uniform spacings of 0.25m and 10km/h, respectively, for all , and .
1  2  4  8  

1  494/26/3674  531/39/4442  550/65/4744  548/117/6278 
2  445/35/3687  536/61/4476  533/125/7073  554/301/10026 
3  479/45/4215  496/87/5780  538/201/8030  535/549/13159 
Results are summarized in Table II. Several comments are made. First, note how quickly rises with increasing network size. Experiment 2 was setup such that neither of the solutions in Table II solved all 585 tasks to (i) better illustrate the effects of different network sizes on FSCNperformance, and to (ii) underline the role of in combination with Experiment 3, which treats the exact same training tasks.
Second, (9) differs from the 5, 6 and 7D feature vectors discussed in Sect. IIC and Experiment 1. Feature vector according to (9) is our preferred choice. This is motivated for three reasons. (i) While experimenting with different training task generation schemes it was found that including  and related components in made training task setup much more complicated for control tasks requiring lateral motion. Gridding that guarantees feasibility, simultaneously enables to limit learning effort (limitedly small ), generalizes enough and therefore does not require much manual tuning is difficult. In contrast, gridding over the components of (9) is relatively straightforward. (ii) Since the NN controller is ultimately envisioned in combination with obstacle avoidance feasibility checks along forward simulated trajectories in a receding horizon fashion, focusing on related training tasks appears suitable therefore and sufficient to encode lateral motion agility in NNs. This discussion is subject of ongoing work, see also Experiment 4 and Fig. 4 for illustration and Sect. VII. (iii) Furthermore, focus on enables control mirroring w.r.t. steering. This permits to limit training tasks to and thus use free training capacities to increase, e.g., lateral spacing resolution.
ViC Experiment 3: Aspects of different network architectures
Training setup is identical to Experiment 2 except that is increased from 500 to 1000. Different network architectures (FSCNs, SCNs [7] and MLPs) are compared for 3 different network sizes ([4,1,2], [4,2,2] and [4,4,2]). Results are summarized in Table III. The following observations are made. First, increasing enables to solve all tasks. While for Experiment 2 and even the largest FSCN could not solve all 585 tasks, for Experiment 3 and every and even the smallest FSCN can solve all tasks. In contrast, none of the SCNs and only MLP[4,4,2] could solve all . Note that the best result is obtained for MLP[4,4,2].
Second, significantly larger are observed for FSCNs in comparison to SCNs and MLPs. One reason is that for, e.g., FSCN[4,1,2] 178 out of best parameter settings solved less than 500 tasks. In contrast, for MLP[4,1,2] only 68 out of solved less than 500 tasks (despite none solving all 585). This was a recurring observation. Thus, on average per GPUcall MLPs solved more tasks, which reduced overall .
Third, for the same number of hidden layers and units per hidden layer, MLPs always have fewer parameters than SCNs, which themselves have fewer than FSCNs.
To summarize, based on above observations, in particular on and , for our purpose of encoding motion primitives in NNs it is found that MLPs are still preferrable as function approximators over both SCNs [7] and also its extension FSCNs. MLPs are the focus in Experiment 4.
ViD Experiment 4: Neural network scheduling on velocity
MLP  

[4,1,2]  13685  1209842  10  79504  0 
80  1125  135610  5632  2/5  0.0% 
100  1125  199759  5120  5/5  12.9% 
110  1125  241227  5731  1/5  0.0% 
Experiment 4 is characterized by (i) comparing feature vector from (9) and its 5D extension including , (ii) , (iii) a comparison of MLP[4,1,2] and MLP[5,1,2], (iv) , (v) attempting to solve a total of training tasks at once and alternatively with network scheduling, whereby (vi) training tasks are generated by gridding and with uniform spacings of 0.25m and 10km/h, respectively, for all , and (with ).
The results are summarized in Tables IV, V and VI and Fig. 4. Several observations can be made. First, the benefits of network scheduling are illustrated. These include (i) faster overall learning time (accumulated 8.1 hours for MLP[4,1,2] in Table V vs. 22.1 hours in Table IV), and (ii) the ability to quicker detect difficult subsets of training tasks that can consequently also be resolved faster after modification of hyperparameters or even NN parametrization. Second, results of Table V suggest to prefer a 4D over a 5D . The reduction of accumulated for all 10 subsets of tasks solved completely by both MLP[4,1,2] and MLP[5,1,2] is 15% for the former vs. the latter. Third, the capabilities of tiny NNs with only 10 parameters for MLP[4,1,2] are demonstrated. These were tested both when (i) encoding all 14625 motion primitives at once, and (ii) learning subsets of 1125 tasks scheduled on vehicle velocity. It is remarkable that a MLP[4,1,2] with 10 parameters (with ) can encode 13685 motion primitives to desired precision. Note that for the scheduling solution overall 13 MLPs are learnt, with each having 10 parameters. Fourth, it is stressed that training tasks of Experiment 4 are not easy. Since these are generated uniformly, in the extreme case, the steering angle is initialized as at an initial km/h. This resulted in learnt trajectories with a maximum lateral overshoot of 62.1m before recovery of the desired . For visualization see also Fig. 4 where the lateral maximum overshoot is slightly more than 10m for an initial m/h. The left frame of Fig. 4 is displayed to underline that for limited learning time only local optimal trajectories (according to the shortest path criterion subject to actuator and system constraints) are learnt. Trajectories are approximately balanced, but not entirely. Not displayed for brevity but very interestingly to observe was that for and trajectories were learnt that first reverse drive for some time before only then accelerating forward. This fully makes sense given the initial negative tire heading, the pathlength minimization objective for and . Finally, as indicated in Table V and discussed above through the lateral overshoots, more difficult learning was observed for the higher velocity tasks. Here, tasks were setup (i) for demonstration that difficult motion primitives can still be encoded, and (ii) to motivate future work on automated extraction of meaningful motion primitives from realworld driving data, especially for higher vehicle velocities.
Vii Conclusion
Several methods were presented for efficient encoding of motion primitives in neural networks. Therefore in particular (i) specific virtual velocity constraints, (ii) neural network scheduling based on vehicle velocity, (iii) training task setups dismissing related components and focusing on for a 4D feature vector selection, and (iv) the capabilities of tiny neural networks were promoted. Furthermore, (i) a 3states2controls kinematic and 16states2controls dynamic model comparison, (ii) discussion of 3 feedforward neural network architectures including weighted skip connections, and (iii) implementation details of modelbased and gradientfree training using 1 GPU were discussed. Findings were illustrated by means of 4 simulation experiments.
Main subject of future work is closedloop evaluation. This comprises robustness analysis for scenarios unseen during training, vehicle model mismatches, and analysis of the combination with a reference setpoint selector with obstacle avoidance inequality checks of closedloop forward simulated trajectories in a receding horizon fashion. Here, the preferred 4D feature vector with focus on selection is believed to enable the design of simple recursive waypoint selectors. Waypoints may also be concatenated to generate trajectorytrees that then permit very long planning horizons.
References
 [1] M. G. Plessen, “Automating vehicles by deep reinforcement learning using task separation with hill climbing,” arXiv preprint arXiv:1711.10785, 2017.
 [2] T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever, “Evolution strategies as a scalable alternative to reinforcement learning,” arXiv preprint arXiv:1703.03864, 2017.
 [3] F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, “Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning,” arXiv preprint arXiv:1712.06567, 2017.
 [4] J. Lehman, J. Chen, J. Clune, and K. O. Stanley, “ES is more than just a traditional finitedifference approximator,” arXiv preprint arXiv:1712.06568, 2017.
 [5] E. Frazzoli, M. A. Dahleh, and E. Feron, “A hybrid control architecture for aggressive maneuvering of autonomous helicopters,” in IEEE Conference on Decision and Control, vol. 3, pp. 2471–2476, 1999.
 [6] M. McNaughton, C. Urmson, J. M. Dolan, and J.W. Lee, “Motion planning for autonomous driving with a conformal spatiotemporal lattice,” in IEEE Conference on Robotics and Automation, pp. 4889–4895, 2011.
 [7] M. Srouji, J. Zhang, and R. Salakhutdinov, “Structured control nets for deep reinforcement learning,” arXiv preprint arXiv:1802.08311, 2018.

[8]
C. M. Bishop et al.,
Neural networks for pattern recognition
. Oxford university press, 1995. 
[9]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
IEEE Conference on Computer Vision and Pattern Recognition
, pp. 770–778, 2016.  [10] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in Neural Information Processing Systems, pp. 305–313, 1989.
 [11] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, et al., “End to end learning for selfdriving cars,” arXiv preprint arXiv:1604.07316, 2016.
 [12] K. Kant and S. W. Zucker, “Toward efficient trajectory planning: The pathvelocity decomposition,” The International Journal of Robotics Research, vol. 5, no. 3, pp. 72–89, 1986.
 [13] T. LozanoPérez and L. P. Kaelbling, “A constraintbased method for solving sequential manipulation planning problems,” in IEEE Conference on Intelligent Robots and Systems, pp. 3684–3691, 2014.
 [14] S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, “Combined task and motion planning through an extensible plannerindependent interface layer,” in IEEE Conference on Robotics and Automation, pp. 639–646, 2014.
 [15] X. Hu, L. Chen, B. Tang, D. Cao, and H. He, “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mechanical Systems and Signal Processing, vol. 100, pp. 482–500, 2018.
 [16] J. Liu, P. Hou, L. Mu, Y. Yu, and C. Huang, “Elements of effective deep reinforcement learning towards tactical driving decision making,” arXiv preprint arXiv:1802.00332, 2018.

[17]
C.J. Hoel, M. Wahde, and K. Wolff, “An evolutionary approach to
generalpurpose automated speed and lane change behavior,” in
IEEE International Conference on Machine Learning and Applications
, pp. 743–748, 2017.  [18] E. Velenis, E. Frazzoli, and P. Tsiotras, “Steadystate cornering equilibria and stabilisation for a vehicle during extreme operating conditions,” International Journal of Vehicle Autonomous Systems, vol. 8, no. 24, pp. 217–241, 2010.
 [19] E. Bakker, L. Nyborg, and H. B. Pacejka, “Tyre modelling for use in vehicle dynamics studies,” tech. rep., SAE Technical Paper, 1987.
 [20] S. M. Savaresi, C. PoussotVassal, C. Spelta, O. Sename, and L. Dugard, Semiactive suspension control design for vehicles. Elsevier, 2010.

[21]
J. Svendenius, “Tire modeling and friction estimation,”
PhD Thesis, 2007.  [22] H. P. Williams, Model building in mathematical programming. John Wiley & Sons, 2013.
 [23] M. G. Plessen, “Trajectory planning of automated vehicles in tubelike road segments,” in IEEE Conference on Intelligent Transportation Systems, pp. 83–88, 2017.
 [24] S. K. Park and K. W. Miller, “Random number generators: good ones are hard to find,” Communications of the ACM, vol. 31, no. 10, pp. 1192–1201, 1988.
 [25] G. E. Box, M. E. Muller, et al., “A note on the generation of random normal deviates,” The Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610–611, 1958.