Primary Object Segmentation in Aerial Videos via Hierarchical Temporal Slicing and Co-Segmentation
Primary object segmentation plays an important role in understanding videos generated by unmanned aerial vehicles. In this paper, we propose a large-scale dataset APD with 500 aerial videos, in which the primary objects are manually annotated on 5,014 sparsely sampled frames. To the best of our knowledge, it is the largest dataset to date for the task of primary object segmentation in aerial videos. From this dataset, we find that most aerial videos contain large-scale scenes, small sized primary objects as well as consistently varying scales and viewpoints. Inspired by that, we propose a novel hierarchical temporal slicing approach that repeatedly divides a video into two sub-videos formed by the odd and even frames, respectively. In this manner, an aerial video can be represented by a set of hierarchically organized short video clips, and the primary objects they share can be segmented by training end-to-end co-segmentation CNNs and finally refined within the neighborhood reversible flows. Experimental results show that our approach remarkably outperforms 24 state-of-the-art methods in segmenting primary objects in various types of aerial videos.
READ FULL TEXT