The Society of Automobile Engineers(SAE) has coined six different levels of autonomy beginning at level 0, absence of any autonomy, to level 5, complete autonomy requiring no human intervention whatsoever. Currently, many luxury vehicles possess level 3 autonomy in terms of cruise control and active lane control, and a handful of vehicles possess level 4 autonomy. Level 5 autonomy in cars is still under research and development. The main barrier to attain this level of autonomy is the task of collecting data and the lack thereof. Although a deep model is extremely adept at generalizing features, it can only learn what it sees. There are only so many scenarios we as humans can drive around that model can learn from. Essentially, even if it learns to navigate through a busy street, it may not be able to correct oversteer or understeer due to several factors such as poor roads, tire wear, etc. causing a loss of traction, which may not have been accounted for in the training dataset. That is why an evolutionary approach would solve these issues. What if the car could learn to drive on its own, via trial and error, over countless generations? It would have trained, evolved to overcome such edge cases and scenarios, and would know exactly what to do once it detects wheel spin or any form of loss of traction and grip.
2 Overview of Existing Systems
2.1 Neural Networks
A neural network is an interconnected network of neurons, also called nodes. Each neuron has a set of output edges that activate based on the resultant value obtained from the weighted inputs it received from the previous layer.
In a supervised learning approach, we would have a list of attributes or features as our inputs and a list of targets as our outputs. We would then have to use back-propagation to train our neural network to correct its weight to suit our target and increase its accuracy. Back-propagation is just a way of propagating the total loss back into the neural network to know how much of the loss each node is responsible for, and subsequently updating the weights in such a way that minimizes the loss by giving the nodes with higher error rates lower weights and nodes with lower error, greater weights.
So, in a situation where it is difficult to obtain a dataset large enough to train the neural network to a certain degree of accuracy, we will face problems arriving at our optimal solution. This is especially true in the scenario of self-driving cars, where large corporations like Nvidia use 1000 hours of driving data to train their vehicle to navigate the roads. In such scenarios, we could adopt an evolutionary technique that requires no datasets and train our model in a simulated environment.
Although Deep Neural Networks utilizing convolutional layers have performed extraordinarily in several scenarios, the problem arises after the fact that it can only generalize what it is shown or taught. Since there are countless more possibilities of things that can happen on the road, which cannot be accounted for in the driving data we gather.
2.2 Reinforcement Learning
Reinforcement Learning is another critical area of research in autonomous vehicles, where an agent learns to accomplish a task by gathering experience by itself, rather than through a supervised dataset. The basic gist of the algorithm is that an agent granted a reward when it performs an action that is desirable in the current scenario and gets punished if it does something undesirable.
Although this form of carrot and stick approach seems to be how we, as individuals learn, the key drawback of this algorithm is that the agent has no prior experience whatsoever. We humans learn pretty quickly through this approach due to the generalization of a multitude of experiences that we have gathered from birth till date. This is not the case for the agent, and so it takes quite a while, depending on the complexity of the problem, for the agent to gather enough experiences in order for it to determine whether a certain action is good or bad.
3 Proposed System
3.1 Genetic Algorithm
Genetic Algorithms, also called Evolutionary algorithms, inspired by the process of natural selection and Darwinian evolution, mimic species evolution to arrive at a solution. Each generation has a set of species that contain specific genes, and the best are selected to populate the next generation. This process goes on until we arrive at our optimal solution. Genetic Algorithms are currently in use to generate solutions for optimization and search problems by utilizing techniques such as selection, mutation, and crossover.
Neuro-Evolution is a Genetic Algorithm that is used to evolve artificial neural networks. In this model, each species of a generation has a brain (the neural network) that has a set of genes (weights). In the beginning, all species of the population have random weights and hence perform random actions. It is through serendipitous discovery that a certain species gets closer to our solution. We select this species based on a fitness function and pick similarly performing species to perform crossover. After crossover, we mutate this gene and pass it on to the next generation.
So the entire genetic algorithm can be summarized to 3 key processes:
Selection: We select the best species of the generation based on the fitness function.
Crossover: We crossover the genes of the population to converge onto our solution.
Mutation: We mutate the genes, in the hope of a better solution, of the selected species following crossover.
We can see that mutation and crossover seem a little opposite to one another. Mutation randomizes the weights of a certain percentage of neurons while crossover tries to converge them. There is a trade-off here between exploration and exploitation. Exploration via mutation is exploring new gene sets out of a hope that something new can lead to promising results. While, exploitation via crossover is taking what you learned and using that information, combining the best, to inform newer decision-making processes. Typically, in genetic algorithms, the mutation rate is at 10–20%, while the crossover rate is at 80–90%.
We chose to simulate Neuro-Evolution using Unreal Engine 4, which is a game engine that utilizes Nvidia’s PhysX Physics Engine to replicate real-world like vehicle dynamics, which is essential if we plan to transfer the learning that has happened in this environment.
Compared to using simulators such as CARLA, which was also built on the same engine, we have a lot more freedom when we build the whole environment from the ground up, in terms of level design, vehicle physics, frame times (time dilations) and overall gives more power to the user.
4.1 Vehicle Dynamics
Since most vehicles these days in the low to mid-tier range are front-engine, front-wheel drives (FF), we chose this as our vehicle layout and for the differential, we went with a limited-slip differential (LSD) that prevents wheel spin, which is getting more and more common these days. The transmission of the vehicle is set to automatic. The suspension settings have also been altered so that it favours under-steer rather than it over-steer, as most manufacturers do these days, as it is easier to correct under-steer. Weight transfer and tyre traction are also essential aspects that dictate the vehicle’s physical handling, and are simulated accurately.
We also wanted to observe how this approach would fare on a more difficult layout which is harder to control, which is the front engine, rear wheel drive (FR), also the typical sports car layout, as they more prone to over-steering and sliding through corners without proper throttle control and adequate counter-steering. The suspension of this layout has also been altered so that it favours over-steering behaviour rather than under-steer.
4.2 Neural Net
Each vehicle we simulate has a brain that controls the values for the throttle pedal, the brake pedal, and the steering angles directly. This brain is our deep neural network which outputs a value from for all the above inputs of the vehicle.
The inputs to the neural network are the distances (normalized to
) we obtained by tracing a point cloud around the vehicle. We also feed in the current speed of the vehicle (normalized to its maximum speed) and the angle between the velocity of the vehicle and its forward vector, which provides the neural net information about in which direction the car is sliding towards, if or when it does.
4.3 Genetic Algorithm
The genetic algorithm in this scenario is a higher-level entity that oversees the processes responsible for selection, crossover and mutation. It controls the mutation and crossover rates and is responsible for spawning and tracking all the features of the entire vehicle species for each generation of the population.
Initially, in the first generation, all the weights of all the neural networks are initialized to random values, and it is through serendipitous discovery aided by the fitness function that we converge on to a solution through selection and crossover and also search for a better solution through mutation.
In the first generations, all the weights of all the neural networks in the vehicles are initialized randomly, so they have no clue what to do when they are spawned and hence move randomly. To remove poorly performing agents, which is a crucial part of Darwinian evolution, “survival of the fittest”, we de-spawn vehicles that crash into obstacles and guard rails or those that do not reach a certain threshold score within a predetermined period of time.
The working of the whole model follows the following pipeline:
The Genetic Algorithm agent spawns a population
The top-performing vehicles are selected via the fitness function
They are then crossover-ed by the weighted average of their weights in the neural network which is obtained by multiplying them with their relative fitness with respect to the population
Once crossover is done, we mutate a small percentage of the weights by setting them to random values.
We now spawn the next generation of vehicles
This process is repeated for several generations till we obtain satisfactory results.
For each vehicle, the distance travelled (in the direction of the course) each frame, which we call the score of the neural net and is calculated as:
|distance travelled that frame|
In order to prevent over-correcting behaviour and so that it doesn’t game the fitness score, we increment its score only when the angle between the velocity vector and the car’s forward vector is less than a threshold value which we set as .
From this, we calculate the net score each frame, which is the total distance travelled (until it de-spawns) and is calculated as:
At the end of each generation, the relative fitness of each neural network is calculated as:
|relative fitness of the current neural network|
|total distance travelled by the vehicle|
|total population of the generation|
Now for spawning the next generation of vehicles, we pick the top vehicles with the greatest fitness ( could be selected arbitrarily, we chose it to be of the population, ).
We now perform an arithmetic crossover of these species by weighted addition of their weights with respect to their fitness. For each connection in the neural network, the weight of the connection after crossover is calculated as:
|relative fitness of the current neural network|
|weight of a certain connection in the neural net|
|the top selected species of the generation|
Once we perform crossover for about 80% of the weights in the neural network, we then move on to mutation which is typically done to about 20% of the weights by using a random function on the weights.
Within the simulated environment, we have observed that the population over several generations have evolved to not crash into obstacles, and also through sheer randomness have decided to stick to one side of a lane in certain simulations. They also perform advanced traffic management techniques such as zipper merges. A zipper merge is when a car continues to stay on the lane even after a blockade is located. They merge into the free lane only once they are close to the blockade in order to prevent traffic congestions that would occur if everyone stopped using that lane entirely.
|Layout||Crossover Rate||Mutation Rate||Generation||Population|
A few things we observed over the course of running several simulations, iterating and variating different parameters, is that when the fitness function is simple, it generalizes the course pretty quickly and is able to navigate it well over very few generations. But, when we alter it so that it favours a certain style of cornering or maintaining a certain amount of speed, it takes drastically more generations for it achieve this sort of specialization. This interpretation is backed by the discrepancies seen in the number of generations it took for the Front engine, Rear wheel drive (FR) layout compared to the Front engine, Front wheel drive(FF) layout.
6 Conclusion and Future Work
Based on the results we’ve observed above, we can come to the conclusion that genetic algorithms such as NeuroEvolution can speed up the initial phase of generalizing features several fold compared to traditional techniques such as back-propagation which place the prerequisite of procuring a massive dataset for training so as to not over-fit the solution. But once the network attains the basic cognitive abilities for driving, it can be further improved upon through reinforcement learning techniques such as Deep Q-learning, since it has already gathered a plethora of experiences over several generations, which is one of the key barriers slowing down reinforcement, since now we can quickly jump to the phase where the focus is more on obtaining as many rewards as possible rather than the initial phase of gathering experience where the agent primarily tries to just not get punished for its actions.
-  Bimbraw, Keshav. (2015). Autonomous Cars: Past, Present and Future - A Review of the Developments in the Last Century, the Present Scenario and the Expected Future of Autonomous Vehicle Technology. ICINCO 2015 - 12th International Conference on Informatics in Control, Automation and Robotics, Proceedings. 1. 191-198. 10.5220/0005540501910198.
-  Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., & Clune, J. (2017). Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. ArXiv, abs/1712.06567.
-  Mardle, Simon & Pascoe, Sean & Tamiz, Mehrdad. (2000). An investigation of genetic algorithms for the optimization of multi‐objective fisheries bioeconomic models. International Transactions in Operational Research. 7. 33 - 49. 10.1111/j.1475-3995.2000.tb00183.x.
-  Stanley, Kenneth & Clune, Jeff & Lehman, Joel & Miikkulainen, Risto. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence. 1. 10.1038/s42256-018-0006-z.
-  Long Zhao and Zemin Liu, ”A genetic algorithm for reinforcement learning,” Proceedings of International Conference on Neural Networks (ICNN’96), Washington, DC, USA, 1996, pp. 1056-1060 vol.2, doi: 10.1109/ICNN.1996.549044.
-  Iglesias Rodriguez, Roberto & Rodríguez, Miguel & Regueiro, Carlos & Correa, José & Barro, S.. (2006). Combining reinforcement learning and genetic algorithms to learn behaviours in mobile robotics.. 188-195.
-  Zong, Xiaopeng & Xu, Guoyan & Yu, Guizhen & Su, Hongjie & Hu, Chaowei. (2017). Obstacle Avoidance for Self-Driving Vehicle with Reinforcement Learning. SAE International Journal of Passenger Cars - Electronic and Electrical Systems. 11. 10.4271/2017-01-1960.
-  Sáez, Yago & Perez Liebana, Diego & Sanjuán, Oscar & Isasi, Pedro. (2008). Driving Cars by Means of Genetic Algorithms.. 1101-1110.
-  AbuZekry, Ahmed. (2019). Comparative Study of NeuroEvolution Algorithms in Reinforcement Learning for Self-Driving Cars. European Journal of Engineering Science and Technology. 10.33422/EJEST.2019.09.38.
-  Floreano, Dario & Mondada, Francesco. (1999). Automatic Creation of an Autonomous Agent: Genetic Evolution of a Neural-Network Driven Robot.
-  Chang, Simyung & Yang, John & Choi, Jaeseok & Kwak, Nojun. (2018). Genetic-Gated Networks for Deep Reinforcement.
-  Sehgal, A., La, H.M., Louis, S.J., & Nguyen, H. (2019). Deep Reinforcement Learning Using Genetic Algorithm for Parameter Optimization. 2019 Third IEEE International Conference on Robotic Computing (IRC), 596-601.
-  Ahmad, Tanwir & Ashraf, Adnan & Truscan, Dragos & Porres, Ivan. (2019). Exploratory Performance Testing Using Reinforcement Learning. 10.1109/SEAA.2019.00032.
-  Kardell, Simon and Mattias Kuosku. “Autonomous vehicle control via deep reinforcement learning.” (2017).
-  Stanley, Kenneth & Miikkulainen, Risto. (2002). Efficient Reinforcement Learning through Evolving Neural Network Topologies.