Inference on Optimal Dynamic Policies via Softmax Approximation

03/08/2023
by   Qizhao Chen, et al.
0

Estimating optimal dynamic policies from offline data is a fundamental problem in dynamic decision making. In the context of causal inference, the problem is known as estimating the optimal dynamic treatment regime. Even though there exists a plethora of methods for estimation, constructing confidence intervals for the value of the optimal regime and structural parameters associated with it is inherently harder, as it involves non-linear and non-differentiable functionals of un-known quantities that need to be estimated. Prior work resorted to sub-sample approaches that can deteriorate the quality of the estimate. We show that a simple soft-max approximation to the optimal treatment regime, for an appropriately fast growing temperature parameter, can achieve valid inference on the truly optimal regime. We illustrate our result for a two-period optimal dynamic regime, though our approach should directly extend to the finite horizon case. Our work combines techniques from semi-parametric inference and g-estimation, together with an appropriate triangular array central limit theorem, as well as a novel analysis of the asymptotic influence and asymptotic bias of softmax approximations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2019

Resampling-based Confidence Intervals for Model-free Robust Inference on Optimal Treatment Regimes

Recently, there has been growing interest in estimating optimal treatmen...
research
03/11/2021

Doubly robust confidence sequences for sequential causal inference

This paper derives time-uniform confidence sequences (CS) for causal eff...
research
04/24/2017

On Prediction and Tolerance Intervals for Dynamic Treatment Regimes

We develop and evaluate tolerance interval methods for dynamic treatment...
research
12/19/2022

Optimal Individualized Decision-Making with Proxies

A common concern when a policymaker draws causal inferences from and mak...
research
10/13/2022

Adaptive A/B Tests and Simultaneous Treatment Parameter Optimization

Constructing asymptotically valid confidence intervals through a valid c...
research
12/28/2021

A Finite Sample Theorem for Longitudinal Causal Inference with Machine Learning: Long Term, Dynamic, and Mediated Effects

I construct and justify confidence intervals for longitudinal causal par...
research
07/27/2022

Identification and Inference with Min-over-max Estimators for the Measurement of Labor Market Fairness

These notes shows how to do inference on the Demographic Parity (DP) met...

Please sign up or login with your details

Forgot password? Click here to reset