This paper introduces an interpretable contextual bandit algorithm using...
We propose a theoretical framework for approximate planning and learning...
We study the problem of off-policy critic evaluation in several variants...
The policy gradient theorem is defined based on an objective with respec...