When hearing music, it is natural for people to dance to its rhythm.
Aut...
Open-vocabulary object detection aims to provide object detectors traine...
We present a simple yet effective end-to-end Video-language Pre-training...
Music is essential when editing videos, but selecting music manually is
...
Conventional knowledge distillation (KD) methods for object detection ma...
Multi-object Tracking (MOT) generally can be split into two sub-tasks, i...
The task of Human-Object Interaction (HOI) detection could be divided in...
Two-stage methods have dominated Human-Object Interaction (HOI) detectio...
Vision and language understanding techniques have achieved remarkable
pr...
In this work, we introduce a novel task - Humancentric Spatio-Temporal V...
Keypoint-based detectors have achieved pretty-well performance. However,...
We propose a single-stage Human-Object Interaction (HOI) detection metho...
Referring expression comprehension aims to localize the object instance
...