Weakly Supervised Learning of Heterogeneous Concepts in Videos

07/12/2016
by   Sohil Shah, et al.
0

Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24 classification in the Casablanca dataset and 9 localization in the A2D dataset as compared to the most competitive baseline.

READ FULL TEXT

page 3

page 13

page 14

page 20

page 21

research
10/17/2016

Spatio-Temporal Attention Models for Grounded Video Captioning

Automatic video captioning is challenging due to the complex interaction...
research
12/03/2017

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Despite the availability of a huge amount of video data accompanied by d...
research
07/08/2018

Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

The goal of this paper is spatio-temporal localization of human actions ...
research
11/21/2018

MAC: Mining Activity Concepts for Language-based Temporal Localization

We address the problem of language-based temporal localization in untrim...
research
08/07/2022

Weakly Supervised Online Action Detection for Infant General Movements

To make the earlier medical intervention of infants' cerebral palsy (CP)...
research
04/28/2022

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

In videos that contain actions performed unintentionally, agents do not ...

Please sign up or login with your details

Forgot password? Click here to reset