Image-Text Matching (ITM) task, a fundamental vision-language (VL) task,...
Recently, large-scale vision-language pre-training models and visual sem...
Recovering 3D human mesh in the wild is greatly challenging as in-the-wi...
This paper proposes a novel diffusion-based model, CompoDiff, for solvin...
We need billion-scale images to achieve more generalizable and
ground-br...
Many existing group fairness-aware training methods aim to achieve the g...
Vision Transformer (ViT) extracts the final representation from either c...
In this paper, we aim to design a quantitative similarity function betwe...
We propose the first unified theoretical analysis of mixed sample data
a...
Image-Test matching (ITM) is a common task for evaluating the quality of...
Domain generalization (DG) aims to learn a generalized model to an unsee...
Recent studies have demonstrated that gradient matching-based dataset
sy...
Automatic few-shot font generation aims to solve a well-defined, real-wo...
Recently, fairness-aware learning have become increasingly crucial, but ...
Deep neural networks (DNNs) often rely on easy-to-learn discriminatory
f...
Recent powerful vision classifiers are biased towards textures, while sh...
Effective control and prediction of dynamical systems often require
appr...
A few-shot font generation (FFG) method has to satisfy two objectives: t...
Vision Transformer (ViT) extends the application range of transformers f...
Cross-modal retrieval methods build a common representation space for sa...
ImageNet has been arguably the most popular image classification benchma...
Automatic few-shot font generation is in high demand because manual desi...
Weakly-supervised object localization (WSOL) has gained popularity over ...
Normalization techniques, such as batch normalization (BN), have led to
...
Generating a new font library is a very labor-intensive and time-consumi...
Despite apparent human-level performances of deep neural networks (DNN),...
Weakly-supervised object localization (WSOL) has gained popularity over ...
Recently, we proposed a self-attention based music tagging model. Differ...
We propose a generic confidence-based approximation that can be plugged ...
Many machine learning algorithms are trained and evaluated by splitting ...
Self-attention is an attention mechanism that learns a representation by...
Regional dropout strategies have been proposed to enhance the performanc...
Recent style transfer models have provided promising artistic results.
H...
We present a hybrid framework that leverages the trade-off between tempo...