Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

04/06/2016
by   Gunnar A. Sigurdsson, et al.
0

Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community.

READ FULL TEXT

page 3

page 8

page 9

page 11

page 13

research
05/01/2020

The AVA-Kinetics Localized Human Actions Video Dataset

This paper describes the AVA-Kinetics localized human actions video data...
research
04/08/2018

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

First-person vision is gaining interest as it offers a unique viewpoint ...
research
01/09/2018

Moments in Time Dataset: one million videos for event understanding

We present the Moments in Time Dataset, a large-scale human-annotated co...
research
11/07/2016

Crowdsourcing in Computer Vision

Computer vision systems require large amounts of manually annotated data...
research
05/26/2023

CVB: A Video Dataset of Cattle Visual Behaviors

Existing image/video datasets for cattle behavior recognition are mostly...
research
03/20/2020

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation

Thanks to the substantial and explosively inscreased instructional video...
research
03/04/2020

ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly

Deep learning, based on which many modern algorithms operate, is well kn...

Please sign up or login with your details

Forgot password? Click here to reset