Improving Monte Carlo Evaluation with Offline Data

01/31/2023
by   Shuze Liu, et al.
0

Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use off-policy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the off-policy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our off-policy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our off-policy MC estimator is always unbiased.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2023

Efficient Propagation of Uncertainty via Reordering Monte Carlo Samples

Uncertainty analysis in the outcomes of model predictions is a key eleme...
research
06/19/2019

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value f...
research
06/07/2016

Reducing the error of Monte Carlo Algorithms by Learning Control Variates

Monte Carlo (MC) sampling algorithms are an extremely widely-used techni...
research
03/27/2023

Online Non-Destructive Moisture Content Estimation of Filter Media During Drying Using Artificial Neural Networks

Moisture content (MC) estimation is important in the manufacturing proce...
research
02/01/2019

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits

Monte Carlo (MC) permutation testing is considered the gold standard for...
research
12/31/2019

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Sequence generation models are commonly refined with reinforcement learn...
research
02/19/2022

Graph Reparameterizations for Enabling 1000+ Monte Carlo Iterations in Bayesian Deep Neural Networks

Uncertainty estimation in deep models is essential in many real-world ap...

Please sign up or login with your details

Forgot password? Click here to reset