We focus on the task of approximating the optimal value function in deep...
We study the convergence behavior of the celebrated temporal-difference ...
We study the action generalization ability of deep Q-learning in discret...
We employ Proximal Iteration for value-function optimization in reinforc...
Principled decision-making in continuous state–action spaces is impossib...
Fluid human-agent communication is essential for the future of
human-in-...
Can simple algorithms with a good representation solve challenging
reinf...
A core operation in reinforcement learning (RL) is finding an action tha...
We consider the problem of knowledge transfer when an agent is facing a
...
Model-based reinforcement learning is an appealing framework for creatin...
An agent with an inaccurate model of its environment faces a difficult
c...
When environmental interaction is expensive, model-based reinforcement
l...
Learning a generative model is a key component of model-based reinforcem...
Model-based reinforcement-learning methods learn transition and reward m...
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action...
End-to-end learning of recurrent neural networks (RNNs) is an attractive...
Representing a dialog policy as a recurrent neural network (RNN) is
attr...
A softmax operator applied to a set of values acts somewhat like the
max...