End-to-end Active Object Tracking via Reinforcement Learning
In this paper, we propose an active object tracking approach, which provides a tracking solution simultaneously addressing tracking and camera control. Crucially, these two tasks are tackled in an end-to-end manner via reinforcement learning. Specifically, a ConvNet-LSTM function approximator is adopted, which takes as input only visual observations (i.e., frame sequences) and directly outputs camera motions (e.g., move forward, turn left, etc.). The tracker, regarded as an agent, is trained with the A3C algorithm, where we harness environment augmentation techniques and a customized reward function to encourage robust object tracking. We carry out experiments on two types of virtual environments, ViZDoom and Unreal Engine. The yielded tracker can automatically pay attention to the most likely object in the initial frame and perform tracking subsequently, not requiring a manual bounding box as initialization. Moreover, our approach shows a good generalization ability when performing tracking in the case of unseen object moving path, object appearance, background and distracting object. The tracker can even restore tracking once it occasionally loses the target.
READ FULL TEXT