Seeing What You're Told: Sentence-Guided Activity Recognition In Video

08/19/2013
by   N. Siddharth, et al.
0

We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.

READ FULL TEXT

page 1

page 2

page 9

page 10

research
11/26/2018

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Egocentric activity recognition is one of the most challenging tasks in ...
research
10/15/2020

Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset

Recently, there has been a growing interest in wearable sensors which pr...
research
08/20/2019

Multi-Modal Recognition of Worker Activity for Human-Centered Intelligent Manufacturing

In a human-centered intelligent manufacturing system, sensing and unders...
research
08/18/2021

The Multi-Modal Video Reasoning and Analyzing Competition

In this paper, we introduce the Multi-Modal Video Reasoning and Analyzin...
research
07/02/2018

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Egocentric activity recognition in first-person videos has an increasing...
research
12/02/2020

Fine-grained activity recognition for assembly videos

In this paper we address the task of recognizing assembly actions as a s...
research
02/06/2015

Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel

Many visual recognition problems can be approached by counting instances...

Please sign up or login with your details

Forgot password? Click here to reset