CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

by   Per-Arne Andersen, et al.

Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as epsilon-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the environment for learning the policy while model-free approaches are fully explorative and exploitative without considering the underlying environment dynamics. Model-free RL works conceptually well in simulated environments, and empirical evidence suggests that trial and error lead to a near-optimal behavior with enough training. On the other hand, model-based RL aims to be sample efficient, and studies show that it requires far less training in the real environment for learning a good policy. A significant challenge with RL is that it relies on a well-defined reward function to work well for complex environments and such a reward function is challenging to define. Goal-Directed RL is an alternative method that learns an intrinsic reward function with emphasis on a few explored trajectories that reveals the path to the goal state. This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process. The learned distance function works as an intrinsic reward that fuels the agent's learning. Using the distance-metric as a reward, we show that the algorithm performs comparably to model-free RL while having significantly better sample-efficiently in several test environments.


Floyd-Warshall Reinforcement Learning Learning from Past Experiences to Reach New Goals

Consider mutli-goal tasks that involve static environments and dynamic g...

RLang: A Declarative Language for Expression Prior Knowledge for Reinforcement Learning

Communicating useful background knowledge to reinforcement learning (RL)...

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Preference-based reinforcement learning (PbRL) can enable robots to lear...

Autonomous Penetration Testing using Reinforcement Learning

Penetration testing (pentesting) involves performing a controlled attack...

Reinforcement Learning via Recurrent Convolutional Neural Networks

Deep Reinforcement Learning has enabled the learning of policies for com...

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Function-as-a-Service (FaaS) introduces a lightweight, function-based cl...

Learning to search efficiently for causally near-optimal treatments

Finding an effective medical treatment often requires a search by trial ...

Please sign up or login with your details

Forgot password? Click here to reset