Entropy Regularization for Population Estimation

08/24/2022
by   Ben Chugg, et al.
0

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate structured bandit setting. Mean reward estimation (i.e., population estimation) tasks have recently been shown to be essential for public policy settings where legal constraints often require precise estimates of population metrics. We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines, all while remaining nearly unbiased. These properties of entropy regularization illustrate an exciting potential for bridging the optimal exploration and estimation literatures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection

We introduce a new setting, optimize-and-estimate structured bandits. He...
research
01/28/2022

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task...
research
10/19/2021

Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization

Entropy regularization is an efficient technique for encouraging explora...
research
11/06/2021

SOPE: Spectrum of Off-Policy Estimators

Many sequential decision making problems are high-stakes and require off...
research
03/23/2022

Your Policy Regularizer is Secretly an Adversary

Policy regularization methods such as maximum entropy regularization are...
research
10/21/2021

Can Q-learning solve Multi Armed Bantids?

When a reinforcement learning (RL) method has to decide between several ...
research
12/25/2021

Variance Reduction for Experiments with One-Sided Triggering using CUPED

In online experimentation, trigger-dilute analysis is an approach to obt...

Please sign up or login with your details

Forgot password? Click here to reset