Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

02/17/2022
by   Quanyi Li, et al.
0

Human intervention is an effective way to inject human knowledge into the training loop of reinforcement learning, which can bring fast learning and ensured training safety. Given the very limited budget of human intervention, it remains challenging to design when and how human expert interacts with the learning agent in the training. In this work, we develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO).To allow the agent's sufficient exploration in the risky environments while ensuring the training safety, the human expert can take over the control and demonstrate how to avoid probably dangerous situations or trivial behaviors. The proposed HACO then effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent. HACO extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values meanwhile reduce the human interventions. The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. HACO can train agents to drive in unseen traffic scenarios with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. Code and demo videos are available at: https://decisionforce.github.io/HACO/.

READ FULL TEXT

page 6

page 17

research
10/13/2021

Safe Driving via Expert Guided Policy Optimization

When learning common skills like driving, beginners usually have domain ...
research
06/16/2021

Safe Reinforcement Learning Using Advantage-Based Intervention

Many sequential decision problems involve finding a policy that maximize...
research
11/10/2021

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Safety has become one of the main challenges of applying deep reinforcem...
research
01/19/2022

Improving Behavioural Cloning with Human-Driven Dynamic Dataset Augmentation

Behavioural cloning has been extensively used to train agents and is rec...
research
05/31/2022

Human-AI Shared Control via Frequency-based Policy Dissection

Human-AI shared control allows human to interact and collaborate with AI...
research
07/17/2017

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

AI systems are increasingly applied to complex tasks that involve intera...
research
07/10/2023

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

Safe exploration aims at addressing the limitations of Reinforcement Lea...

Please sign up or login with your details

Forgot password? Click here to reset