Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

05/26/2023
by   Tom Bewley, et al.
0

We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations.

READ FULL TEXT

page 3

page 5

page 9

page 10

page 12

research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
05/30/2022

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

We generalise the problem of reward modelling (RM) for reinforcement lea...
research
01/20/2022

Safe Deep RL in 3D Environments using Human Feedback

Agents should avoid unsafe behaviour during both training and deployment...
research
10/03/2022

Reward Learning with Trees: Methods and Evaluation

Recent efforts to learn reward functions from human feedback have tended...
research
08/04/2019

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

In preference-based reinforcement learning (RL), an agent interacts with...
research
06/08/2021

Exploration and preference satisfaction trade-off in reward-free learning

Biological agents have meaningful interactions with their environment de...
research
12/16/2020

Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

Learning to produce efficient movement behaviour for humanoid robots fro...

Please sign up or login with your details

Forgot password? Click here to reset