On Reward Function for Survival

06/18/2016
by   Naoto Yoshida, et al.
0

Obtaining a survival strategy (policy) is one of the fundamental problems of biological agents. In this paper, we generalize the formulation of previous research related to the survival of an agent and we formulate the survival problem as a maximization of the multi-step survival probability in future time steps. We introduce a method for converting the maximization of multi-step survival probability into a classical reinforcement learning problem. Using this conversion, the reward function (negative temporal cost function) is expressed as the log of the temporal survival probability. And we show that the objective function of the reinforcement learning in this sense is proportional to the variational lower bound of the original problem. Finally, We empirically demonstrate that the agent learns survival behavior by using the reward function introduced in this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
research
08/24/2023

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...
research
04/18/2020

Modeling Survival in model-based Reinforcement Learning

Although recent model-free reinforcement learning algorithms have been s...
research
06/07/2022

The Survival Bandit Problem

We study the survival bandit problem, a variant of the multi-armed bandi...
research
04/25/2023

Towards Theoretical Understanding of Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) denotes a powerful family of algori...
research
02/24/2011

An Artificial Immune System Model for Multi-Agents Resource Sharing in Distributed Environments

Natural Immune system plays a vital role in the survival of the all livi...
research
06/08/2021

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

To date, most abstractive summarisation models have relied on variants o...

Please sign up or login with your details

Forgot password? Click here to reset