
-
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Vision-and-language pre-training has achieved impressive success in lear...
read it
-
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
This work concerns video-language pre-training and representation learni...
read it
-
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
The canonical approach to video-and-language learning (e.g., video quest...
read it
-
Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
Articulated hand pose tracking is an underexplored problem that carries ...
read it
-
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Transformer has become ubiquitous in the deep learning field. One of the...
read it
-
Unified Vision-Language Pre-Training for Image Captioning and VQA
This paper presents a unified Vision-Language Pre-training (VLP) model. ...
read it
-
Grounded Video Description
Video description is one of the most challenging problems in vision and ...
read it
-
Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition
Video action recognition, as a critical problem towards video understand...
read it
-
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
We study weakly-supervised video object grounding: given a video segment...
read it
-
End-to-End Dense Video Captioning with Masked Transformer
Dense video captioning aims to generate text descriptions for all events...
read it
-
Towards Automatic Learning of Procedures from Web Instructional Videos
The potential for agents, whether embodied or software, to learn by obse...
read it
-
Watch What You Just Said: Image Captioning with Text-Conditional Attention
Attention mechanisms have attracted considerable interest in image capti...
read it
-
Multi-agent Reinforcement Learning with Sparse Interactions by Negotiation and Knowledge Transfer
Reinforcement learning has significant applications for multi-agent syst...
read it