We study the problem of episodic reinforcement learning in continuous
st...
We consider the problem of online linear regression in the stochastic
se...
In the fixed budget thresholding bandit problem, an algorithm sequential...
Policy gradient algorithms have proven to be successful in diverse decis...