Learning Interpretable, High-Performing Policies for Continuous Control Problems

02/04/2022
by   Rohan Paleja, et al.
12

Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for continuous control problems. While the performance of these approaches warrants real-world adoption in domains, such as in autonomous driving and robotics, these policies lack interpretability, limiting deployability in safety-critical and legally-regulated domains. Such domains require interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines.

READ FULL TEXT
research
05/25/2022

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning

Many recent breakthroughs in multi-agent reinforcement learning (MARL) r...
research
01/18/2021

Interpretable Policy Specification and Synthesis through Natural Language and RL

Policy specification is a process by which a human can initialize a robo...
research
05/18/2022

Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability

Although deep Reinforcement Learning (RL) has proven successful in a wid...
research
08/23/2012

Optimized Look-Ahead Tree Policies: A Bridge Between Look-Ahead Tree Policies and Direct Policy Search

Direct policy search (DPS) and look-ahead tree (LT) policies are two wid...
research
09/16/2021

Interpretable Local Tree Surrogate Policies

High-dimensional policies, such as those represented by neural networks,...
research
09/11/2018

Re-purposing Compact Neuronal Circuit Policies to Govern Reinforcement Learning Tasks

We propose an effective method for creating interpretable control agents...
research
09/19/2022

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control

Reinforcement learning (RL) and trajectory optimization (TO) present str...

Please sign up or login with your details

Forgot password? Click here to reset