Reinforcement with Fading Memories

07/29/2019
by   Kuang Xu, et al.
0

We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions which generate discrete rewards at different rates. She is allowed to make new choices at rate β, while past rewards disappear from her memory at rate μ. We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed-form formulae for the agent's steady-state choice distribution in the regime where the memory span is large (μ→ 0), and show that the agent's success critically depends on how quickly she updates her choices relative to the speed of memory decay. If β≫μ, the agent almost always chooses the best action, i.e., the one with the highest reward rate. Conversely, if β≪μ, the agent chooses an action with a probability roughly proportional to its reward rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, w...
research
07/09/2021

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

The success of reinforcement learning in typical settings is, in part, p...
research
07/01/2021

Prophet Inequality with Competing Agents

We introduce a model of competing agents in a prophet setting, where rew...
research
02/27/2018

Multi-agent Time-based Decision-making for the Search and Action Problem

Many robotic applications, such as search-and-rescue, require multiple a...
research
10/04/2018

Episodic Curiosity through Reachability

Rewards are sparse in the real world and most today's reinforcement lear...
research
12/18/2017

'Indifference' methods for managing agent rewards

Indifference is a class of methods that are used to control a reward bas...
research
11/26/2009

A conversion between utility and information

Rewards typically express desirabilities or preferences over a set of al...

Please sign up or login with your details

Forgot password? Click here to reset