Sequential Knockoffs for Variable Selection in Reinforcement Learning

03/24/2023
by   Tao Ma, et al.
0

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret.

READ FULL TEXT
research
03/22/2023

Reinforcement Learning with Exogenous States and Rewards

Exogenous state variables and rewards can slow reinforcement learning by...
research
12/21/2019

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

Markov Decision Process (MDP) problems can be solved using Dynamic Progr...
research
06/13/2023

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data co...
research
11/21/2017

Posterior Sampling for Large Scale Reinforcement Learning

Posterior sampling for reinforcement learning (PSRL) is a popular algori...
research
05/27/2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communi...
research
02/16/2019

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

Spatial puzzles composed of rigid objects, flexible strings and holes of...
research
02/02/2021

Metrics and continuity in reinforcement learning

In most practical applications of reinforcement learning, it is untenabl...

Please sign up or login with your details

Forgot password? Click here to reset