Disentangled Planning and Control in Vision Based Robotics via Reward Machines

12/28/2020
by   Alberto Camacho, et al.
14

In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.

READ FULL TEXT

page 2

page 7

page 8

research
06/25/2021

Compositional Reinforcement Learning from Logical Specifications

We study the problem of learning control policies for complex tasks give...
research
10/26/2022

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

Autonomous drones can operate in remote and unstructured environments, e...
research
02/13/2019

Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

We present a method for fast training of vision based control policies o...
research
09/10/2021

Potential-based Reward Shaping in Sokoban

Learning to solve sparse-reward reinforcement learning problems is diffi...
research
01/26/2023

Policy Optimization with Robustness Certificates

We present a policy optimization framework in which the learned policy c...
research
05/31/2022

Hierarchies of Reward Machines

Reward machines (RMs) are a recent formalism for representing the reward...
research
03/10/2020

Active Reward Learning for Co-Robotic Vision Based Exploration in Bandwidth Limited Environments

We present a novel POMDP problem formulation for a robot that must auton...

Please sign up or login with your details

Forgot password? Click here to reset