Describing Common Human Visual Actions in Images

06/07/2015
by   Matteo Ruggero Ronchi, et al.
0

Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common `visual actions', obtained by analyzing the largest on-line verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those `visual actions', composed of subject-object and associated verb, which we call COCO-a (a for `actions'). COCO-a is larger than existing action datasets in terms of number of actions and instances of these actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.

READ FULL TEXT

page 2

page 7

page 8

page 10

page 16

page 17

research
04/17/2016

Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

Understanding images with people often entails understanding their inter...
research
01/07/2022

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Detecting human interactions is crucial for human behavior analysis. Man...
research
12/06/2021

Analyzing a Carceral Algorithm used by the Pennsylvania Department of Corrections

Scholars have focused on algorithms used during sentencing, bail, and pa...
research
02/04/2018

Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model

We introduce the first benchmark for a new problem --- recognizing human...
research
05/07/2021

Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior

Human-Object Interaction (HOI) detection aims to detect visual relations...
research
09/06/2021

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

We aim to automatically identify human action reasons in online videos. ...
research
03/22/2021

TICaM: A Time-of-flight In-car Cabin Monitoring Dataset

We present TICaM, a Time-of-flight In-car Cabin Monitoring dataset for v...

Please sign up or login with your details

Forgot password? Click here to reset