Action Modifiers: Learning from Adverbs in Instructional Videos

12/13/2019
by   Hazel Doughty, et al.
18

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. Key to our method is the fact that the visual representation of the adverb is highly dependant on the action to which it applies, although the same adverb will modify multiple actions in a similar way. For instance, while 'spread quickly' and 'mix quickly' will look dissimilar, we can learn a common representation that allows us to recognize both, among other actions. We formulate this as an embedding problem, and use scaled dot-product attention to learn from weakly-supervised video narrations. We jointly learn adverbs as invertible transformations operating on the embedding space, so as to add or remove the effect of the adverb. As there is no prior work on weakly supervised learning from adverbs, we gather paired action-adverb annotations from a subset of the HowTo100M dataset for 6 adverbs: quickly/slowly, finely/coarsely, and partially/completely. Our method outperforms all baselines for video-to-adverb retrieval with a performance of 0.719 mAP. We also demonstrate our model's ability to attend to the relevant video parts in order to determine the adverb for a given action.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 13

research
08/19/2019

Weakly-supervised Action Localization with Background Modeling

We describe a latent approach that learns to detect actions in long sequ...
research
10/07/2016

Weakly supervised learning of actions from transcripts

We present an approach for weakly supervised learning of human actions f...
research
07/04/2014

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

We are given a set of video clips, each one annotated with an ordered l...
research
02/04/2020

Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks

We present a method for weakly-supervised action localization based on g...
research
03/02/2022

Weakly Supervised Correspondence Learning

Correspondence learning is a fundamental problem in robotics, which aims...
research
10/22/2019

Weakly-Supervised Completion Moment Detection using Temporal Attention

Monitoring the progression of an action towards completion offers fine g...
research
07/16/2020

Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation

In recognition-based action interaction, robots' responses to human acti...

Please sign up or login with your details

Forgot password? Click here to reset