What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

07/15/2021
by   Sangmin Woo, et al.
17

Identifying relations between objects is central to understanding the scene. While several works have been proposed for relation modeling in the image domain, there have been many constraints in the video domain due to challenging dynamics of spatio-temporal interactions (e.g., Between which objects are there an interaction? When do relations occur and end?). To date, two representative methods have been proposed to tackle Video Visual Relation Detection (VidVRD): segment-based and window-based. We first point out the limitations these two methods have and propose Temporal Span Proposal Network (TSPN), a novel method with two advantages in terms of efficiency and effectiveness. 1) TSPN tells what to look: it sparsifies relation search space by scoring relationness (i.e., confidence score for the existence of a relation between pair of objects) of object pair. 2) TSPN tells when to look: it leverages the full video context to simultaneously predict the temporal span and categories of the entire relations. TSPN demonstrates its effectiveness by achieving new state-of-the-art by a significant margin on two VidVRD benchmarks (ImageNet-VidVDR and VidOR) while also showing lower time complexity than existing methods - in particular, twice as efficient as a popular segment-based approach.

READ FULL TEXT

page 1

page 2

page 4

page 8

page 10

research
08/26/2019

Relation Distillation Networks for Video Object Detection

It has been well recognized that modeling object-to-object relations wou...
research
03/31/2020

Long Short-Term Relation Networks for Video Action Detection

It has been well recognized that modeling human-object or object-object ...
research
08/18/2021

Social Fabric: Tubelet Compositions for Video Relation Detection

This paper strives to classify and detect the relationship between objec...
research
01/13/2022

Hand-Object Interaction Reasoning

This paper proposes an interaction reasoning network for modelling spati...
research
04/24/2023

MRSN: Multi-Relation Support Network for Video Action Detection

Action detection is a challenging video understanding task, requiring mo...
research
07/16/2022

Knowledge Guided Bidirectional Attention Network for Human-Object Interaction Detection

Human Object Interaction (HOI) detection is a challenging task that requ...
research
08/08/2017

Temporal Context Network for Activity Localization in Videos

We present a Temporal Context Network (TCN) for precise temporal localiz...

Please sign up or login with your details

Forgot password? Click here to reset