Recently, deep cross-modal hashing has gained increasing attention. Howe...
The Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) is a challenging ta...
Recently, several Vision Transformer (ViT) based methods have been propo...
Online tracking of multiple objects in videos requires strong capacity o...
We propose a hierarchical graph neural network (GNN) model that learns h...
Human communication is multimodal in nature; it is through multiple
moda...
With the rapid development of social websites, recent years have witness...
3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent ...
Question answering biases in video QA datasets can mislead multimodal mo...
Object detection and data association are critical components in multi-o...
3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent ...
We address the problem of detecting attention targets in video. Specific...
This paper addresses the challenging problem of estimating the general v...