Many studies focus on improving pretraining or developing new backbones ...
3D dense captioning requires a model to translate its understanding of a...
Recently, with the advancement of the Internet of Things (IoT), WiFi
CSI...
Text-video retrieval contains various challenges, including biases comin...
Point cloud based 3D deep model has wide applications in many applicatio...
In recent years, research on few-shot learning (FSL) has been fast-growi...
Due to the emergence of powerful computing resources and large-scale
ann...
3D dense captioning aims to generate multiple captions localized with th...
In this report, we present our approach for EPIC-KITCHENS-100 Multi-Inst...
With the emergence of social media, voluminous video clips are uploaded ...
Seas of videos are uploaded daily with the popularity of social channels...
As Deep Neural Networks (DNNs) usually are overparameterized and have
mi...
Optical flow estimation aims to find the 2D motion field by identifying
...
Point cloud instance segmentation has achieved huge progress with the
em...
There has been an emerging paradigm shift from the era of "internet AI" ...
6D object pose estimation is widely applied in robotic tasks such as gra...
Graph-based clustering has shown promising performance in many tasks. A ...
A large amount of annotated training images is critical for training acc...
Recent advances in generative adversarial networks (GANs) have shown gre...
Answering questions according to multi-modal context is a challenging pr...
In this paper, we study how to make clustering benefiting from different...
In this paper, we present YoTube-a novel network fusion framework for
se...
The YouTube-8M video classification challenge requires teams to classify...
Object proposal has become a popular paradigm to replace exhaustive slid...
Image segmentation refers to the process to divide an image into
nonover...