Minimum information divergence of Q-functions for dynamic treatment resumes

11/16/2022
by   Shinto Eguchi, et al.
0

This paper aims at presenting a new application of information geometry to reinforcement learning focusing on dynamic treatment resumes. In a standard framework of reinforcement learning, a Q-function is defined as the conditional expectation of a reward given a state and an action for a single-stage situation. We introduce an equivalence relation, called the policy equivalence, in the space of all the Q-functions. A class of information divergence is defined in the Q-function space for every stage. The main objective is to propose an estimator of the optimal policy function by a method of minimum information divergence based on a dataset of trajectories. In particular, we discuss the γ-power divergence that is shown to have an advantageous property such that the γ-power divergence between policy-equivalent Q-functions vanishes. This property essentially works to seek the optimal policy, which is discussed in a framework of a semiparametric model for the Q-function. The specific choices of power index γ give interesting relationships of the value function, and the geometric and harmonic means of the Q-function. A numerical experiment demonstrates the performance of the minimum γ-power divergence method in the context of dynamic treatment regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism

In this paper, we study offline Reinforcement Learning with Human Feedba...
research
01/13/2020

Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Reinforcement learning is a general technique that allows an agent to le...
research
06/09/2018

Explainable Deterministic MDPs

We present a method for a certain class of Markov Decision Processes (MD...
research
04/18/2019

Interplanetary Transfers via Deep Representations of the Optimal Policy and/or of the Value Function

A number of applications to interplanetary trajectories have been recent...
research
09/24/2021

Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework

The framework of deep reinforcement learning (DRL) provides a powerful a...
research
04/01/2019

Dynamically optimal treatment allocation using Reinforcement Learning

Consider a situation wherein a stream of individuals arrive sequentially...
research
02/14/2019

Geometry of Arimoto Algorithm

This paper aims to reveal information geometric structure of Arimoto alg...

Please sign up or login with your details

Forgot password? Click here to reset