Given a group of images, co-salient object detection (CoSOD) aims to
hig...
Continual learning aims to learn on non-stationary data streams without
...
This paper aims to establish a generic multi-modal foundation model that...
Image completion with large-scale free-form missing regions is one of th...
Spatio-Temporal video grounding (STVG) focuses on retrieving the
spatio-...
The task of temporal grounding aims to locate video moment in an untrimm...
Vision-and-language navigation (VLN) is a trending topic which aims to
n...
This paper presents Poisoning MorphNet, the first backdoor attack method...
Convolutional Neural Networks (CNNs) are known to rely more on local tex...
Weakly-supervised temporal action localization is a problem of learning ...
High-resolution representations are essential for position-sensitive vis...
Temporal action localization is a recently-emerging task, aiming to loca...
Fusing multi-modality information is known to be able to effectively bri...
High-resolution representation learning plays an essential role in many
...
Steganography represents the art of unobtrusively concealing a secrete
m...
In recent years, autonomous driving algorithms using low-cost vehicle-mo...
Similarity-based image hashing represents crucial technique for visual d...
This paper proposes a generic formulation that significantly expedites t...
Stochastic gradient descent (SGD) holds as a classical method to build l...
Vision problems ranging from image clustering to motion segmentation to
...
This paper describes and provides an initial solution to a novel video
e...
Hyperplane hashing aims at rapidly searching nearest points to a hyperpl...