Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

by   Guillaume Devineau, et al.
MINES ParisTech

This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided.



There are no comments yet.


page 1

page 2

page 3

page 4


Towards Agrobots: Trajectory Control of an Autonomous Tractor Using Type-2 Fuzzy Logic Controllers

Provision of some autonomous functions to an agricultural vehicle would ...

Vision-Based High Speed Driving with a Deep Dynamic Observer

In this paper we present a framework for combining deep learning-based r...

Robust EMRAN based Neural Aided Learning Controller for Autonomous Vehicles

This paper presents an online evolving neural network-based inverse dyna...

Spatial-Temporal Map Vehicle Trajectory Detection Using Dynamic Mode Decomposition and Res-UNet+ Neural Networks

This paper presents a machine-learning-enhanced longitudinal scanline me...

Using Artificial Intelligence for Particle Track Identification in CLAS12 Detector

In this article we describe the development of machine learning models t...

Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension

This paper addresses the problem of vehicle re-identification using dist...

Industrial Robot Trajectory Tracking Using Multi-Layer Neural Networks Trained by Iterative Learning Control

Fast and precise robot motion is needed in certain applications such as ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The recent development of deep learning has led to dramatic progress in multiple research fields, and this technique has naturally found applications in autonomous vehicles. The use of deep learning to perform perceptive tasks such as image segmentation has been widely researched in the last few years, and highly efficient neural network architectures are now available for such tasks. More recently, several teams have proposed taking deep learning a step further, by training so-called “end-to-end” algorithms to directly output vehicle controls from raw sensor data (see, in particular, the seminal work in


Although end-to-end driving is highly appealing, as it removes the need to design motion planning and control algorithms by hand, handing the safety of the car occupants to a software operating as a black box seems problematic. A possible workaround to this downside is to use “forensics” techniques that can, to a certain extent, help understand the behavior of deep neural networks [2].

We choose a different approach consisting in breaking down complexity by training simpler, mono-task neural networks to solve specific problems arising in autonomous driving; we argue that the reduced complexity of individual tasks allows much easier testing and validation.

In this article, we focus on the problem of controlling a car-like vehicle in highly dynamic situations, for instance to perform evasive maneuvers in face of an obstacle. A particular challenge in such scenarios is the important coupling between longitudinal and lateral dynamics when nearing the vehicle’s handling limits, which requires highly detailed models to properly take into account [3]. However, precisely modeling this coupling involves complex non-linear relations between state variables, and using the resulting model is usually too costly for real-time applications. For this reason, most references in the field of motion planning mainly focus on simpler models, such as point-mass or kinematic bicycle (single track), which are constrained to avoid highly coupled dynamics [4]. Similarly, research on automotive control usually treats the longitudinal and lateral dynamics separately in order to simplify the problem [5].

Although these simplifications can yield good results in standard on-road driving situations, they may be problematic for vehicle safety when driving near its handling limits, for instance at high speed or on slippery roads. To handle such situations, some authors have proposed using Model Predictive Control (MPC) with a simplified, coupled dynamic model [6] which is limited to extremely short time horizons (a few dozen milliseconds) to allow real-time computation. Other authors have proposed to model the coupling between longitudinal and lateral motions using the concept of “friction circle” [7], which allows precisely stabilizing a vehicle in circular drifts [8]. However, the transition towards the stabilized drifting phase – which is critical in the ability, e.g., to perform evasive maneuvers – remains problematic with this framework.

In this article, we propose to use deep neural networks to implicitly model highly coupled vehicular dynamics, and perform low-level control in real-time. In order to do so, we train a deep neural network to output low-level controls (wheels torque and steering angle) corresponding to a given initial vehicle state and target trajectory. Compared to classical MPC frameworks which require integrating dynamic equations on-line, this approach allows to perform this task off-line and use only simple mathematical operations on-line, leading to much faster computations.

Several authors have already proposed a divide-and-conquer approach by using machine learning on specific sub-tasks instead of performing end-to-end computations, and in particular on the case of motion planning and control. For instance, reference 

[9] used a Convolutional Neural Network (CNN) to generate a cost function from input images, which is then used inside an MPC framework for high-speed autonomous driving; however, this approach has the same limitations as model predictive control. Other approaches, such as [10]

, used reinforcement learning to output steering controls for a vehicle, but were limited to low-speed applications. Reference 


used a Rectified Linear Unit (ReLU) network model to identify the dynamics of a helicopter in order to predict its future accelerations, but this model has not been used for control.

Closer to our work, reference [12] trained neural networks integrating a priori knowledge of the bicycle model for decoupled longitudinal and lateral control of a vehicle; in [13]

, authors used supervised learning to generate lateral controls for truck and integrated a control barrier function to ensure the safety of the system. Reference

[14] coupled a standard control and an adaptive neural network to compensate for unknown perturbations in order to perform trajectory tracking for autonomous underwater vehicle. To the best of our knowledge, deep neural networks have not been used in the literature for the coupled control of wheeled vehicles.

The rest of this article is organized as follows: Section 2 presents the vehicle model used to generate the training dataset and to simulate the vehicle dynamics on a test track. Section 3 introduces two artificial neural networks architectures used to generate the control signals for a given target trajectory, and describes the training procedure used in this article. Section 4 compares the performance of these two networks, using simulation on a challenging test track. A comparison to conventional decoupled controllers is also provided. Finally, Section 5 concludes this study.


In this section, we present the 9 Degrees of Freedom (9 DoF) vehicle model which is used both to generate the training and testing dataset, and as a simulation model to evaluate the performance of the deep-learning-based controllers.

The Degrees of Freedom comprise 3 DoF for the vehicle’s motion in a plane (), 2 DoF for the carbody’s rotation () and 4 DoF for the rotational speed of each wheel (). The model takes into account both the coupling of longitudinal and lateral slips and the load transfer between tires. The control inputs of the model are the torques applied at each wheel and the steering angle of the front wheel. The low-level dynamics of the engine and brakes are not considered here. The notations are given in Table 1 and illustrated in Figure 1.

Remark: the subscript refers respectively to the front left (), front right (), rear left () and rear right () wheels.

Several assumptions were made for the model:

  • Only the front wheels are steerable.

  • The roll and pitch rotations happen around the center of gravity.

  • The aerodynamic force is applied at the height of the center of gravity. Therefore, it does not involve any moment on the vehicle.

  • The slope and road-bank angle of the road are not taken into account.

, Position of the vehicle in the ground frame
, , Roll, pitch and yaw angles of the carbody
, Longitudinal and lateral speed of the vehicle in its inertial frame
Total mass of the vehicle
, , Inertia of the vehicle around its roll, pitch and yaw axis
Inertia of the wheel
Total torque applied to the wheel
, Longitudinal and lateral tire forces generated by the road on the wheel expressed in the tire frame
, Longitudinal and lateral tire forces generated by the road on the wheel expressed in the vehicle frame
Normal reaction forces on wheel
, Distance between the front (resp. rear) axle and the center of gravity
Half-track of the vehicle
Height of the center of gravity
Effective radius of the wheel
Angular velocity of the wheel
Longitudinal speed of the center of rotation of wheel expressed in the tire frame
Table 1: Notations
Figure 1: Vehicle model and notations.

2.1 Vehicle dynamics

Equations (1a-1) give the expression of the vehicle dynamics:


and denote respectively the longitudinal and the lateral tire forces expressed in the vehicle frame; denote the aerodynamic drag forces with the mass density of air, the aerodynamic drag coefficient and the frontal area of the vehicle; denote the damped mass/spring forces depending on the suspension travel due to the roll and pitch angles according to Equation (1f). The parameters and are respectively the stiffness and the damping coefficients of the suspensions.


The position of the vehicle in the ground frame can then be derived using Equations (1g) and (1h).


2.2 Wheel dynamics

The dynamics of each wheel expressed in the pneumatic frame is given by Equation (2):


2.3 Tire dynamics

The longitudinal force and the lateral force applied by the road on each tire and expressed in the pneumatic frame are functions of the longitudinal slip ratio , the side-slip angle , the normal reaction force and the road friction coefficient :


The longitudinal slip ratio of the wheel is defined as following:


The lateral slip-angle of tire is the angle between the direction given by the orientation of the wheel and the direction of the velocity of the wheel (see Figure 1):


In order to model the functions and , we used the combined slip tire model presented by Pacejka in [15] (cf. Equations (4.E1) to (4.E67)) which takes into account the interaction between the longitudinal and lateral slips on the force generation. Therefore, the friction circle due to the laws of friction (see Equation (6)) is respected. Finally, the impact of load transfer between tires is also taken into account through .


Lastly, the relationships between the tire forces expressed in the vehicle frame and and the ones expressed in the pneumatic frame and are given in Equation (7):


More details on vehicle dynamics can be found in [3] and [16].

3 Deep Learning Models

We propose two different artificial neural network architectures to learn the inverse dynamics of a vehicle, in particular the coupled longitudinal and lateral dynamics. An artificial neural network is a network of simple functions called neurons. Each neuron computes an internal state (activation) depending on the input it receives and a set of trainable parameters, and returns an output depending on the input and the activation. Most neural networks are organized into groups of units called layers and arranged in a tree-like structure, where the output of a layer is used as input for the following one. The training of the neural network consists in finding the set of parameters (weights and biases) minimizing the error (or

loss) between predicted and actual values on a training dataset. In this paper, this training dataset is computed using the 9 DoF vehicle model presented in Section 2.

3.1 Dataset

The dataset generated by the 9DoF vehicle model has a total of 43241 instances: it is divided into a train set of 28539 instances and a test set of 14702 instances. The following procedure was used to generate each instance:

First, a control to apply is generated randomly, as well as an initial state of the vehicle. More precisely, the vehicle is chosen to be either in an acceleration phase or in a deceleration phase with equiprobability. In the first case, the torques at the front wheels and

are set equal to each other and drawn from a uniform distribution between

Nm and Nm, while the torques at the rear wheels and are set equal to zero (the vehicle is assumed to be a front-wheel drive one). In the second case, the torques of each wheel are set equal to each other and drawn from a uniform distribution between Nm and Nm. In both cases, the steering angle is drawn from a uniform distribution between and rad. The initial state is composed of the initial position of the vehicle in the ground frame, the longitudinal and lateral velocities and , the roll, pitch and yaw angles and their derivatives, and the rotational speed of the each wheels. The initial longitudinal speed is drawn from a uniform distribution between and m.s; the initial lateral speed is drawn from a uniform distribution whose parameters depend of ; the rotational speed is chosen such that the longitudinal slip ratio is zero. All the other initial states are set to zero.

Secondly, the 9 DoF vehicle model is run for s, starting from the initial state and keeping the control constant during the whole simulation.

The resulting trajectories are downsampled to 301 timesteps, corresponding to a sampling time of ms.

Consequently, each instance of the dataset consists in: an initial state of the vehicle, a control kept constant over time, and the associated trajectory obtained . The dataset generation method is summarized in Algorithm 1.

1:function generate instance
2:      Coin flipping
3:     if  then
4:          uniform; in N.m
5:          uniform; in rad
7:     else if  then
8:          uniform; in N.m
9:          uniform; in rad
11:     end if
12:      uniform; in m.s
13:      uniform; in m.s
14:     where
15:     and
17:     save
18:end function
19:function generate dataset()
20:     for  do generate instance()
21:     end for
22:end function
Algorithm 1 Dataset Generation

3.2 Model 1: Multi-Layer Perceptron

A Multi-Layer Perceptron (MLP), or multi-layer feedforward neural network, is a neural network whose equations are:



denotes the input vector,

the output of layer , the number of layers of the MLP and denotes the

-th activation function.

denotes the output vector of the neural network.

The MLP, presented in Figure 2, is used to predict the constant control to apply given an initial state and a desired trajectory . It is trained on the dataset presented in subection 3.1. It comprises layers, respectively containing 32, 32, 128, 32 and 128 neurons. All the activations functions of the network are rectified linear units (ReLU):

. The loss function used, as well as weights initialization or regularization are discussed in the section 

3.4, as they are common for the two neural networks proposed. We performed a grid search to choose the sizes of the layers among

possibilities by allowing each layer to have a size of either 32, 64, or 128 neurons, training the corresponding MLP for 200 epochs and evaluating its performance on the test dataset.

Figure 2: Multi-Layer Perceptron

3.3 Model 2: Convolutional Neural Network

Convolutional Neural Networks (CNN) are neural networks that use convolution in place of general matrix multiplication in at least one of their layers. A traditional CNN model almost always involves a sequence of convolution and pooling layers. CNNs have a proven history of being successful for processing data that has a known grid-like topology. For instance, numerous authors make use of CNNs for classification [17], or semantic segmentation [18] purposes.

We propose to use convolutions to pre-process the vehicle trajectory before feeding it to the MLP, as illustrated in Figure 3. Trajectories are time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and thus are very good inputs to process with a CNN. We decided to process the X and Y coordinates separately. For each channel (either or ), we construct the following CNN module, which is depicted in Figure 4:


where is the output of the CNN module, the number of layers, the -th activation function and the -th pooling function.

The parameters of the CNN module are , with a convolution kernel size of 3 for all convolutions. The activation functions are all ReLU and the pooling functions are all average-pooling of size 2. The first two convolutions have 4 feature maps while the last convolution has only 1 feature map.

As the longitudinal and lateral dynamics are quite different, distinct sets of weights are used for the and convolutions. After processing the X and Y 1D-trajectories by their dedicated CNN module, their output are concatenated. This new output is then fed to the former MLP whose characteristics remain the same except from the dimension of its input. The whole model shown in Figure 3 is designated as the “CNN model” in the rest of this work.

Figure 3: Convolutional Neural Network
Figure 4: CNN Module

3.4 Training procedure

The training procedure is the same for the two neural networks:

3.4.1 Weights Initialization & Batching

Each training batch is composed of 32 instances of the dataset. The Xavier initialization [19] (also known as GLOROT uniform initialization) is used to set the initial random weights for all the weights of our model.

3.4.2 Loss function, Regularization & Optimizer

The objective of the training is to reduce the mean square error (MSE) between the controls predicted by the neural network and the ones that were really applied to obtain the given trajectory. The neural network is trained in order to minimize the loss function defined by Equation (10) on the train dataset, before evaluation on the test dataset.




The scaling factors and were chosen in order to normalize the steering and the torques. The parameter was chosen in order to prioritize the lateral dynamics over the longitudinal one. Equation (11c) is an L2 regularization of our model, where is the vector containing all the weights of the network. We set .

To train our model, we used the Adam optimization algorithm [20]. It calculates an exponential moving average of the gradient and the squared gradient. For the decay rates of the moving averages, we used the parameters , . The values of other parameters were for the learning rate, and

to avoid singular values.

4 Results

In order to compare their ability to learn the vehicle dynamics, the two different artificial neural networks are used as “controllers”111Properly speaking, they are not real controllers as they to not learn how to reject disturbances and modeling errors.. The reference track, presented in Figure 5, comprises both long straight lines and narrow curves. The reference speed is set to m/s on the whole track.

Figure 5: Top view of the test track; numbers to refers to different road sections delimited by dashed lines in order to facilitate the matching with Figures 6 to 10.

4.1 Generating the control commands

In order to compute the control commands to be applied to the vehicle, the artificial neural network needs to know the trajectory the vehicle has to follow in the next s, as in the train dataset. One problem that arises is that it has only learned to follow trajectories starting from its actual position such as in Figure 11. However, in practice, the vehicle is almost never exactly on the reference path. Therefore, a path section starting from the actual position of the vehicle and bringing it back to the reference path is generated: for that purpose, cubic Bezier curves were chosen as illustrated in Figure 12. Thus, at each iteration, (i) a Bezier curve with length s is computed to link the actual position of the vehicle to the reference trajectory; (ii) a query comprising the previously computed Bezier curve is sent to the artificial neural network; (iii) the artificial neural network returns the torques at each wheel and the front steering angle to apply until the next control commands are obtained. The computation sequence is run every ms, even though the query takes less than ms.

4.2 Comparison of the models

The results obtained for the MLP and the CNN models are displayed respectively in blue and in red in Figures 6 to 10. The resulting videos, obtained using the software PreScan [21], are available online222 Clearly, it appears that the results obtained using a CNN are better than a MLP. First, we observe that the control commands are smoother in curves with the CNN. There are steep steering (see Figure 6) and front torques (see Figure 7) variations for the MLP around m in road sections n and around m in road sections n. In the latter case, the steering angle reaches its saturation value rad and the wheel torques change suddently from Nm to Nm and vice-versa, which is impossible in practice. On the contrary, the control signals of the CNN model remains always smooth and within a reasonable range of values. Secondly, both the longitudinal and lateral errors are smaller for the CNN than the MLP as shown respectively in Table 2 and 3.

model RMS average std. dev. max
MLP 0.76 -0.29 0.70 -4.94
CNN 0.60 -0.39 0.46 -2.33
Table 2: Comparison of the longitudinal performances of the MLP and CNN controllers (in m/s).
model RMS average std. dev. max
MLP 0.61 0.003 0.61 3.26
CNN 0.43 0.014 0.43 1.7
Table 3: Comparison of the lateral performances of the MLP and CNN controllers (in m).

However, unlike classic controllers, stability cannot be ensured for these “controllers” as they are black boxes. In particular, for the CNN, we observe a lateral static error in straight lines. This static error is caused in fact by the Bezier curves which do not converge fast enough to the reference track on straight lines as only the first ms are really followed by the CNN model (see Figure 13). Moreover, Figure 6 shows that the steering angle applied during straight lines is the same for MLP and CNN.

4.3 Coupling between longitudinal and lateral dynamics

The speed limit a kinematic bicycle model can reach in a curve of radius is given by Equation (12) where is the road friction coefficient and the gravity constant [4]. This corresponds to m/s (m) in road section n2 and m/s (m) in road section n6. As the reference speed is set to m/s throughout the track, conventional decoupled longitudinal and lateral controllers (based on a kinematic bicycle model) will not perform well in road section n6.


On the contrary, both models (especially the CNN) are able to pass this road section, showing the ability of artificial neural networks to handle coupled longitudinal and lateral dynamics. More precisely, we observe in Figure 9 that the speed is reduced in section n because the artificial neural networks deliberately brake (see Figure 7 and 8), even though the speed of the vehicle is below the reference speed. This is due to the loss function used during training and given by Equation (10) that penalizes more steering angle errors than torque errors. Hence, the models prioritize the lateral over the longitudinal dynamics.

Therefore, such “controllers” are particularly interesting for highly dynamic maneuvers such as emergency situations or aggressive driving where the longitudinal and lateral dynamics are strongly coupled. However, they should be used sparingly as they are only black boxes, or should at least be supervised by model-based systems. Moreover, for normal driving situations, conventional decoupled longitudinal and lateral controller should be preferred.

Figure 6: Comparison of the steering command computed by the different controllers. The numbers 1 to 7 correspond to the different road sections presented in Figure 5.
Figure 7: Comparison of the torque applied at the front wheels computed by the different controllers.
Figure 8: Comparison of the torque applied at the rear wheels computed by the different controllers.
Figure 9: Comparison of the total speed obtained with the different controllers.
Figure 10: Comparison of the absolute value of the lateral error obtained with the different controllers.
Figure 11: Example of a training dataset instance: in red, the reference trajectory, in blue the one obtained from the control predicted by the CNN model.
Figure 12: Example of a Bezier curve (in red) joining the actual position of the vehicle (the red circle) to the reference trajectory (in green). The actual trajectory followed by the vehicle is shown in blue.
Figure 13: Example of a Bezier curve on a straight line section of the reference trajectory. The lateral error is not corrected since the convergence of the Bezier curve to the reference trajectory is too slow.

4.4 Comparison with decoupled controllers

Finally, the “controllers” obtained with the MLP and CNN models are compared with commonly used decoupled controllers: the lateral controller is either a pure-pursuit (PP) [22] or a Stanley [23] controller while in both cases, the longitudinal controller is ensured by a Proportional-Integral (PI) controller with gains and . The gain for the front lateral error is for the Stanley controller. The preview distance of the pure-pursuit controller is defined as a function of the total speed at the center of gravity: where s is the anticipation time. The results of the PP and the Stanley controllers are shown respectively in green and grey in Figures 6 to 10. Clearly, a decrease of performance can be observed when using these decoupled controllers in the challenging part of the track. In particular, the lateral error becomes huge in both cases during the sharp turn of road section n while the CNN was able to perform reasonnably well.

5 Conclusions

This work presented some preliminary results on deep learning applied to trajectory tracking for autonomous vehicles. Two different approaches, namely a MLP and a CNN, were trained on a high-fidelity vehicle dynamics model in order to compute simultaneously the torque to apply on each wheel and the front steering angle from a given reference trajectory. It turns out that the CNN model provides better results, both in terms of accuracy and smoothness of the control commands. Moreover, compared to most of the existing controllers, it is able to handle situations with strongly coupled longitudinal and lateral dynamics in a very short time. However, the controller obtained is a black-box and should not be used in standalone.

The results proved the ability of deep learning algorithms to learn the vehicle dynamics characteristics. This opens a wide range of new possible applications of such techniques, for example for generating dynamically feasible trajectories. Future work will focus on (i) replacing the complex dynamics models by a learned off-line model in Model Predictive Control for motion planning, (ii) using Generative Adversarial Networks (GAN) to generate safe trajectories where the learned dynamics is used as constraint, and (iii) performing real-world experiments with our approach on a real car.