Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

12/14/2018
by   Akihiro Tamatsukuri, et al.
0

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.

READ FULL TEXT
research
06/02/2022

A Confirmation of a Conjecture on the Feldman's Two-armed Bandit Problem

Myopic strategy is one of the most important strategies when studying ba...
research
12/13/2021

Contextual Exploration Using a Linear Approximation Method Based on Satisficing

Deep reinforcement learning has enabled human-level or even super-human ...
research
09/09/2019

Recommendation System-based Upper Confidence Bound for Online Advertising

In this paper, the method UCB-RS, which resorts to recommendation system...
research
02/14/2012

Graphical Models for Bandit Problems

We introduce a rich class of graphical models for multi-armed bandit pro...
research
09/14/2019

Active Learning for Risk-Sensitive Inverse Reinforcement Learning

One typical assumption in inverse reinforcement learning (IRL) is that h...
research
07/03/2020

Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning

The construction of replication strategies for contingent claims in the ...
research
11/28/2017

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

The literature on Inverse Reinforcement Learning (IRL) typically assumes...

Please sign up or login with your details

Forgot password? Click here to reset