In this paper we explore few-shot imitation learning for control problem...
Trust Region Policy Optimization (TRPO) is an iterative method that
simu...
Recent success in Deep Reinforcement Learning (DRL) methods has shown th...
Diffusion models have emerged as powerful generative models in the
text-...
We revisit the estimation bias in policy gradients for the discounted
ep...
The availability of challenging benchmarks has played a key role in the
...
Randomly masking and predicting word tokens has been a successful approa...
Randomly masking and predicting word tokens has been a successful approa...
We present a new monotonic improvement guarantee for optimizing decentra...
Proximal Policy Optimization (PPO) methods learn a policy by iteratively...
Social distancing can reduce the infection rates in respiratory pandemic...
Sample efficiency is crucial for imitation learning methods to be applic...
We present SoftDICE, which achieves state-of-the-art performance for
imi...
We present JueWu-SL, the first supervised-learning-based artificial
inte...
Robot Learning from Demonstration (RLfD) is a technique for robots to de...
Imitation learning targets deriving a mapping from states to actions, a....
Equipping social and service robots with the ability to perceive human
e...