Causal Reinforcement Learning: An Instrumental Variable Approach

03/06/2021
by   Jin Li, et al.
0

In the standard data analysis framework, data is first collected (once for all), and then data analysis is carried out. With the advancement of digital technology, decisionmakers constantly analyze past data and generate new data through the decisions they make. In this paper, we model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias – reinforcement bias – that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their asymptotic properties by incorporating them into a two-timescale stochastic approximation framework. A key contribution of the paper is the development of new techniques that allow for the analysis of the algorithms in general settings where noises feature time-dependency. We use the techniques to derive sharper results on finite-time trajectory stability bounds: with a polynomial rate, the entire future trajectory of the iterates from the algorithm fall within a ball that is centered at the true parameter and is shrinking at a (different) polynomial rate. We also use the technique to provide formulas for inferences that are rarely done for RL algorithms. These formulas highlight how the strength of the IV and the degree of the noise's time dependency affect the inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

In offline reinforcement learning (RL) an optimal policy is learnt solel...
research
09/30/2022

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

With the increasing need for handling large state and action spaces, gen...
research
04/22/2022

Analysis of Temporal Difference Learning: Linear System Approach

The goal of this technical note is to introduce a new finite-time conver...
research
03/15/2021

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Reinforcement learning (RL) algorithms aim to learn optimal decisions in...
research
06/26/2022

Estimating Link Flows in Road Networks with Synthetic Trajectory Data Generation: Reinforcement Learning-based Approaches

This paper addresses the problem of estimating link flows in a road netw...
research
02/26/2023

Revolutionizing Genomics with Reinforcement Learning Techniques

In recent years, Reinforcement Learning (RL) has emerged as a powerful t...

Please sign up or login with your details

Forgot password? Click here to reset