Online policy optimization (OPO) views policy optimization for sequentia...
Policy gradient methods have demonstrated success in reinforcement learn...
We present a predictor-corrector framework, called PicCoLO, that can
tra...
Sample efficiency is critical in solving real-world reinforcement learni...
Imitation learning (IL) consists of a set of tools that leverage expert
...
Policy evaluation or value function or Q-function approximation is a key...