Speech2Action: Cross-modal Supervision for Action Recognition

03/30/2020
by   Arsha Nagrani, et al.
6

Is it possible to guess human action from dialogue alone? In this work we investigate the link between spoken words and actions in movies. We note that movie screenplays describe actions, as well as contain the speech of characters and hence can be used to learn this correlation with no additional supervision. We train a BERT-based Speech2Action classifier on over a thousand movie screenplays, to predict action labels from transcribed speech segments. We then apply this model to the speech segments of a large unlabelled movie corpus (188M speech segments from 288K movies). Using the predictions of this model, we obtain weak action labels for over 800K video clips. By training on these video clips, we demonstrate superior action recognition performance on standard action recognition benchmarks, without using a single manually labelled action example.

READ FULL TEXT

page 1

page 6

page 8

page 12

page 13

research
04/21/2016

Improving Human Action Recognition by Non-action Classification

In this paper we consider the task of recognizing human actions in reali...
research
04/09/2019

Action Recognition from Single Timestamp Supervision in Untrimmed Videos

Recognising actions in videos relies on labelled supervision during trai...
research
07/25/2019

Learning Visual Actions Using Multiple Verb-Only Labels

This work introduces verb-only representations for both recognition and ...
research
09/15/2019

Multitask Learning to Improve Egocentric Action Recognition

In this work we employ multitask learning to capitalize on the structure...
research
10/10/2022

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Precisely naming the action depicted in a video can be a challenging and...
research
06/17/2019

A Temporal Sequence Learning for Action Recognition and Prediction

In this work[This work was supported in part by the National Science Fou...
research
04/30/2019

Curvature: A signature for Action Recognition in Video Sequences

In this paper, a novel signature of human action recognition, namely the...

Please sign up or login with your details

Forgot password? Click here to reset