Human-centric Spatio-Temporal Video Grounding via the Combination of Mutual Matching Network and TubeDETR

07/09/2022
by   Fan Yu, et al.
0

In this technical report, we represent our solution for the Human-centric Spatio-Temporal Video Grounding (HC-STVG) track of the 4th Person in Context (PIC) workshop and challenge. Our solution is built on the basis of TubeDETR and Mutual Matching Network (MMN). Specifically, TubeDETR exploits a video-text encoder and a space-time decoder to predict the starting time, the ending time and the tube of the target person. MMN detects persons in images, links them as tubes, extracts features of person tubes and the text description, and predicts the similarities between them to choose the most likely person tube as the grounding result. Our solution finally finetunes the results by combining the spatio localization of MMN and with temporal localization of TubeDETR. In the HC-STVG track of the 4th PIC challenge, our solution achieves the third place.

READ FULL TEXT

page 2

page 3

page 4

research
06/14/2021

2rd Place Solutions in the HC-STVG track of Person in Context Challenge 2021

In this technical report, we present our solution to localize a spatio-t...
research
11/10/2020

Human-centric Spatio-Temporal Video Grounding With Visual Transformers

In this work, we introduce a novel task - Humancentric Spatio-Temporal V...
research
03/30/2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers

We consider the problem of localizing a spatio-temporal tube in a video ...
research
07/06/2022

STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding

In this technical report, we introduce our solution to human-centric spa...
research
03/16/2023

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Existing methods of multi-person video 3D human Pose and Shape Estimatio...
research
08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...
research
04/08/2019

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

This paper presents a new task, the grounding of spatio-temporal identif...

Please sign up or login with your details

Forgot password? Click here to reset