Current prevailing Video Object Segmentation (VOS) methods usually perfo...
Object modeling has become a core part of recent tracking frameworks. Cu...
Heatmap-based methods have become the mainstream method for pose estimat...
While transformer models have been demonstrated to be effective for natu...
Transformer-based trackers have achieved strong accuracy on the standard...
Video Frame Interpolation (VFI) aims to synthesize non-existent intermed...
Streaming video clips with large-scale video tokens impede vision
transf...
Multi-object tracking in sports scenes plays a critical role in gatherin...
The relation modeling between actors and scene context advances video ac...
Extending the success of 2D Large Kernel to 3D perception is challenging...
Traditional video action detectors typically adopt the two-stage pipelin...
Effectively extracting inter-frame motion and appearance information is
...
Current RGB-D scene recognition approaches often train two standalone
ba...
Image super-resolution (SR) serves as a fundamental tool for the process...
In this technical report, we represent our solution for the Human-centri...
Point-cloud-based 3D classification task involves aggregating features f...
Runtime and memory consumption are two important aspects for efficient i...
Tracking often uses a multi-stage pipeline of feature extraction, target...
Generic Boundary Detection (GBD) aims at locating general boundaries tha...
Deep convolutional neural networks have been demonstrated to be effectiv...
Normalization like Batch Normalization (BN) is a milestone technique to
...
The existing few-shot video classification methods often employ a
meta-l...
The classification and regression head are both indispensable components...
Temporal grounding aims to localize a video moment which is semantically...
Deep learning has achieved remarkable progress for visual recognition on...
This paper deals with a challenging task of video scene graph generation...
Three-dimensional face dense alignment and reconstruction in the wild is...
Along with the rapid development of real-world applications, higher
requ...
Spatio-temporal action detection is an important and challenging problem...
Frame sampling is a fundamental problem in video action recognition due ...
Accurate tracking is still a challenging task due to appearance variatio...
Temporal action proposal generation is an important and challenging task...
Temporal modeling still remains challenging for action recognition in vi...
Recent advances in single image super-resolution (SISR) explored the pow...
This paper reviews the AIM 2020 challenge on efficient single image
supe...
Video action detection approaches usually conduct actor-centric action
r...
Discriminative training has turned out to be effective for robust tracki...
The existing action tubelet detectors mainly depend on heuristic anchor ...
Recent research on human pose estimation has achieved significant
improv...
Spatial downsampling layers are favored in convolutional neural networks...
Due to the high cost of manual annotation, learning directly from the we...
Cross-modal transfer is helpful to enhance modality-specific discriminat...
Modeling relation between actors is important for recognizing group acti...