DeepAI AI Chat
Log In Sign Up

Minimum information divergence of Q-functions for dynamic treatment resumes

by   Shinto Eguchi, et al.
Institute of Statistical Mathematics

This paper aims at presenting a new application of information geometry to reinforcement learning focusing on dynamic treatment resumes. In a standard framework of reinforcement learning, a Q-function is defined as the conditional expectation of a reward given a state and an action for a single-stage situation. We introduce an equivalence relation, called the policy equivalence, in the space of all the Q-functions. A class of information divergence is defined in the Q-function space for every stage. The main objective is to propose an estimator of the optimal policy function by a method of minimum information divergence based on a dataset of trajectories. In particular, we discuss the γ-power divergence that is shown to have an advantageous property such that the γ-power divergence between policy-equivalent Q-functions vanishes. This property essentially works to seek the optimal policy, which is discussed in a framework of a semiparametric model for the Q-function. The specific choices of power index γ give interesting relationships of the value function, and the geometric and harmonic means of the Q-function. A numerical experiment demonstrates the performance of the minimum γ-power divergence method in the context of dynamic treatment regimes.


page 1

page 2

page 3

page 4


Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism

In this paper, we study offline Reinforcement Learning with Human Feedba...

Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Reinforcement learning is a general technique that allows an agent to le...

Explainable Deterministic MDPs

We present a method for a certain class of Markov Decision Processes (MD...

Interplanetary Transfers via Deep Representations of the Optimal Policy and/or of the Value Function

A number of applications to interplanetary trajectories have been recent...

Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework

The framework of deep reinforcement learning (DRL) provides a powerful a...

Dynamically optimal treatment allocation using Reinforcement Learning

Consider a situation wherein a stream of individuals arrive sequentially...

Geometry of Arimoto Algorithm

This paper aims to reveal information geometric structure of Arimoto alg...