Learning to Race through Coordinate Descent Bayesian Optimisation

02/17/2018
by   Rafael Oliveira, et al.
0

In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy's performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track's centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.

READ FULL TEXT
research
11/26/2020

Learning from Simulation, Racing in Reality

We present a reinforcement learning-based solution to autonomously race ...
research
05/27/2018

Contextual Policy Optimisation

Policy gradient methods have been successfully applied to a variety of r...
research
03/26/2021

Composable Learning with Sparse Kernel Representations

We present a reinforcement learning algorithm for learning sparse non-pa...
research
05/14/2017

Discrete Sequential Prediction of Continuous Actions for Deep RL

It has long been assumed that high dimensional continuous control proble...
research
05/28/2021

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

The reinforcement learning problem of finding a control policy that mini...
research
05/24/2016

Alternating Optimisation and Quadrature for Robust Control

Bayesian optimisation has been successfully applied to a variety of rein...
research
06/21/2019

Entropic Risk Measure in Policy Search

With the increasing pace of automation, modern robotic systems need to a...

Please sign up or login with your details

Forgot password? Click here to reset