Multimodal Visual Concept Learning with Weakly Supervised Techniques

12/03/2017
by   Giorgos Bouritsas, et al.
0

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.

READ FULL TEXT

page 1

page 4

research
07/21/2020

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

Despite the recent advances in video classification, progress in spatio-...
research
04/16/2019

Weakly Supervised Gaussian Networks for Action Detection

Detecting temporal extents of human actions in videos is a challenging c...
research
07/12/2016

Weakly Supervised Learning of Heterogeneous Concepts in Videos

Typical textual descriptions that accompany online videos are 'weak': i....
research
06/21/2013

Discriminative Training: Learning to Describe Video with Sentences, from Video Described with Sentences

We present a method for learning word meanings from complex and realisti...
research
01/21/2021

Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting

Since collecting and annotating data for spatio-temporal action detectio...
research
09/05/2018

Learning Concept Abstractness Using Weak Supervision

We introduce a weakly supervised approach for inferring the property of ...
research
06/21/2020

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

A system capturing the association between video frames and textual quer...

Please sign up or login with your details

Forgot password? Click here to reset