Smooth and Efficient Policy Exploration for Robot Trajectory Learning

04/13/2018

∙

Many policy search algorithms have been proposed for robot learning and proved to be practical in real robot applications. However, there are still hyperparameters in the algorithms, such as the exploration rate, which requires manual tuning. The existing methods to design the exploration rate manually or automatically may not be general enough or hard to apply in the real robot. In this paper, we propose a learning model to update the exploration rate adaptively. We blend the advantages of several previous methods. Smooth trajectories for the robot control system can be produced by the updated exploration rate which maximizes the lower bound of the expected return. Our method is tested in the ball-in-cup problem. The results show that our method can receive the same learning outcome as the previous methods with fewer iterations.

READ FULL TEXT

Smooth and Efficient Policy Exploration for Robot Trajectory Learning

Sign in with Google

Consider DeepAI Pro