K-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

06/07/2023
by   Michael Giegrich, et al.
0

We propose a novel K-nearest neighbor resampling procedure for estimating the performance of a policy from historical data containing realized episodes of a decision process generated under a different policy. We focus on feedback policies that depend deterministically on the current state in environments with continuous state-action spaces and system-inherent stochasticity effected by chosen actions. Such settings are common in a wide range of high-stake applications and are actively investigated in the context of stochastic control. Our procedure exploits that similar state/action pairs (in a metric sense) are associated with similar rewards and state transitions. This enables our resampling procedure to tackle the counterfactual estimation problem underlying off-policy evaluation (OPE) by simulating trajectories similarly to Monte Carlo methods. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization and does not explicitly assume a parametric model for the environment's dynamics. These properties make the proposed resampling algorithm particularly useful for stochastic control environments. We prove that our method is statistically consistent in estimating the performance of a policy in the OPE setting under weak assumptions and for data sets containing entire episodes rather than independent transitions. To establish the consistency, we generalize Stone's Theorem, a well-known result in nonparametric statistics on local averaging, to include episodic data and the counterfactual estimation underlying OPE. Numerical experiments demonstrate the effectiveness of the algorithm in a variety of stochastic control settings including a linear quadratic regulator, trade execution in limit order books and online stochastic bin packing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2014

Rates of Convergence for Nearest Neighbor Classification

Nearest neighbor methods are a popular class of nonparametric estimators...
research
10/27/2021

Nearest neighbor process: weak convergence and non-asymptotic bound

An empirical measure that results from the nearest neighbors to a given ...
research
09/02/2020

Nearest Neighbor Search for Hyperbolic Embeddings

Embedding into hyperbolic space is emerging as an effective representati...
research
05/10/2023

Speeding up Monte Carlo Integration: Control Neighbors for Optimal Convergence

A novel linear integration rule called control neighbors is proposed in ...
research
05/18/2018

Graphon estimation via nearest neighbor algorithm and 2D fused lasso denoising

We propose a class of methods for graphon estimation based on exploiting...
research
02/03/2023

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

The paper introduces DiSProD, an online planner developed for environmen...
research
10/28/2014

Abrupt Motion Tracking via Nearest Neighbor Field Driven Stochastic Sampling

Stochastic sampling based trackers have shown good performance for abrup...

Please sign up or login with your details

Forgot password? Click here to reset