Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

04/17/2016
by   Chao-Yeh Chen, et al.
0

Understanding images with people often entails understanding their interactions with other objects or people. As such, given a novel image, a vision system ought to infer which other objects/people play an important role in a given person's activity. However, existing methods are limited to learning action-specific interactions (e.g., how the pose of a tennis player relates to the position of his racquet when serving the ball) for improved recognition, making them unequipped to reason about novel interactions with actions or objects unobserved in the training data. We propose to predict the "interactee" in novel images---that is, to localize the object of a person's action. Given an arbitrary image with a detected person, the goal is to produce a saliency map indicating the most likely positions and scales where that person's interactee would be found. To that end, we explore ways to learn the generic, action-independent connections between (a) representations of a person's pose, gaze, and scene cues and (b) the interactee object's position and scale. We provide results on a newly collected UT Interactee dataset spanning more than 10,000 images from SUN, PASCAL, and COCO. We show that the proposed interaction-informed saliency metric has practical utility for four tasks: contextual object detection, image retargeting, predicting object importance, and data-driven natural language scene description. All four scenarios reveal the value in linking the subject to its object in order to understand the story of an image.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

page 12

page 15

page 20

research
02/19/2015

VIP: Finding Important People in Images

People preserve memories of events such as birthdays, weddings, or vacat...
research
06/07/2015

Describing Common Human Visual Actions in Images

Which common human actions and interactions are recognizable in monocula...
research
11/09/2015

Exploiting Egocentric Object Prior for 3D Saliency Detection

On a minute-to-minute basis people undergo numerous fluid interactions w...
research
03/15/2016

First Person Action-Object Detection with EgoNet

Unlike traditional third-person cameras mounted on robots, a first-perso...
research
06/20/2014

Predicting Motivations of Actions by Leveraging Text

Understanding human actions is a key problem in computer vision. However...
research
11/06/2017

PersonRank: Detecting Important People in Images

Always, some individuals in images are more important/attractive than ot...
research
08/16/2021

Who's Waldo? Linking People Across Text and Images

We present a task and benchmark dataset for person-centric visual ground...

Please sign up or login with your details

Forgot password? Click here to reset