Weakly Supervised Gaussian Networks for Action Detection

04/16/2019
by   Basura Fernando, et al.
0

Detecting temporal extents of human actions in videos is a challenging computer vision problem that require detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors on a limited number of categories. We propose a novel action recognition method, called WSGN, that can learn to detect actions from "weak supervision", video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. We show that a combination of the local and global channels leads to significant gains in two standard benchmarks THUMOS14 and Charades. Our method improves more than 12 weakly supervised state-of-the-art methods and only 4 state-of-the-art supervised method in THUMOS14 dataset for action detection. Similarly, our method is only 0.3 method on challenging Charades dataset for action localisation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset