Generative AI has made significant strides in computer vision, particula...
The emergence of large-scale pretrained language models has posed
unprec...
This study explores the concept of equivariance in vision-language found...
We are interested in learning robust models from insufficient data, with...
The task of video virtual try-on aims to fit the target clothes to a per...
Extracting class activation maps (CAM) is arguably the most standard ste...
A good visual representation is an inference map from observations (imag...
Attention module does not always help deep models learn causal features ...
We unveil a long-standing problem in the prevailing co-saliency detectio...
We present a novel counterfactual framework for both Zero-Shot Learning ...
In this paper, we propose to investigate the problem of out-of-domain
vi...
We present a novel unsupervised feature representation learning method,
...
A major challenge in matching images and text is that they have intrinsi...
In this paper, scalable Whole Slide Imaging (sWSI), a novel high-through...