We apply neural nets with ReLU gates in online reinforcement learning. O...
We consider off-policy temporal-difference (TD) learning methods for pol...
In this paper, we propose a new lower approximation scheme for POMDP wit...
We consider the estimation of the policy gradient in partially observabl...