Current research on cross-modal retrieval is mostly English-oriented, as...
Zero-shot video recognition (ZSVR) is a task that aims to recognize vide...
Recent studies have shown that higher accuracy on ImageNet usually leads...
Few-shot image generation aims to generate data of an unseen category ba...
Few-shot image generation is a challenging task since it aims to generat...
In neuroimaging analysis, functional magnetic resonance imaging (fMRI) c...
Referring video object segmentation aims to segment the object referred ...
Weakly supervised Referring Expression Grounding (REG) aims to ground a
...
In recent years, creative content generations like style transfer and ne...
Conditional image generation is an active research topic including text2...
Few-shot image generation is a challenging task even using the
state-of-...
Neural networks often make predictions relying on the spurious correlati...
Event analysis in untrimmed videos has attracted increasing attention du...
Future activity anticipation is a challenging problem in egocentric visi...
Dense video captioning (DVC) aims to generate multi-sentence description...
Current state-of-the-art approaches for image captioning typically adopt...
Language bias is a critical issue in Visual Question Answering (VQA), wh...
Due to the domain discrepancy in visual domain adaptation, the performan...
Semi-supervised domain adaptation (SSDA) aims to solve tasks in target d...
Multimedia content is of predominance in the modern Web era. Investigati...
We study the query-based attack against image retrieval to evaluate its
...
Adversarial attack is a technique for deceiving Machine Learning (ML) mo...
In visual domain adaptation (DA), separating the domain-specific
charact...
Semantic editing on segmentation map has been proposed as an intermediat...
To get more accurate saliency maps, recent methods mainly focus on
aggre...
With the rapid development of facial manipulation techniques, face forge...
Vehicle Re-Identification is to find images of the same vehicle from var...
Active learning is to design label-efficient algorithms by sampling the ...
In unsupervised domain adaptation, rich domain-specific characteristics ...
The learning of the deep networks largely relies on the data with
human-...
Most of existing salient object detection models have achieved great pro...
Weakly supervised referring expression grounding (REG) aims at localizin...
Weakly supervised referring expression grounding aims at localizing the
...
Multimodal learning aims to discover the relationship between multiple
m...
We address the unsupervised open domain recognition (UODR) problem, wher...
In video captioning task, the best practice has been achieved by
attenti...