Off-policy policy evaluation methods for sequential decision making can ...
Model-based reinforcement learning (RL) is appealing because (i) it enab...
Interactive adaptive systems powered by Reinforcement Learning (RL) have...
When observed decisions depend only on observed features, off-policy pol...
While maximizing expected return is the goal in most reinforcement learn...
Humans learn to play video games significantly faster than state-of-the-...