Social Fabric: Tubelet Compositions for Video Relation Detection

08/18/2021
by   Shuo Chen, et al.
6

This paper strives to classify and detect the relationship between object tubelets appearing within a video as a <subject-predicate-object> triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and predict predicate labels for the tubelets. Experiments demonstrate the benefit of early video relation modeling, our encoding and the two-stage architecture, leading to a new state-of-the-art on two benchmarks. We also show how the encoding enables query-by-primitive-example to search for spatio-temporal video relations. Code: https://github.com/shanshuo/Social-Fabric.

READ FULL TEXT

page 1

page 3

page 8

research
12/08/2021

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

Today's VidSGG models are all proposal-based methods, i.e., they first g...
research
07/15/2021

What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

Identifying relations between objects is central to understanding the sc...
research
08/26/2019

Relation Distillation Networks for Video Object Detection

It has been well recognized that modeling object-to-object relations wou...
research
08/19/2021

Video Relation Detection via Tracklet based Visual Transformer

Video Visual Relation Detection (VidVRD), has received significant atten...
research
02/12/2019

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

Visual Grounding (VG) aims to locate the most relevant region in an imag...
research
06/10/2020

H3DNet: 3D Object Detection Using Hybrid Geometric Primitives

We introduce H3DNet, which takes a colorless 3D point cloud as input and...
research
10/25/2021

Diagnosing Errors in Video Relation Detectors

Video relation detection forms a new and challenging problem in computer...

Please sign up or login with your details

Forgot password? Click here to reset