This paper proposes a novel diffusion-based model, CompoDiff, for solvin...
We need billion-scale images to achieve more generalizable and
ground-br...
Generated synthetic data in medical research can substitute privacy and
...
Vision Transformer (ViT) extracts the final representation from either c...
Recent studies have proposed unified user modeling frameworks that lever...
Recently, dense contrastive learning has shown superior performance on d...
Image-Test matching (ITM) is a common task for evaluating the quality of...
Periodic signals play an important role in daily lives. Although convent...
Vision-and-Language Pretraining (VLP) has improved performance on variou...
Mutual learning is an ensemble training strategy to improve generalizati...
Without relevant human priors, neural networks may learn uninterpretable...
Learning compact discrete representations of data is itself a key task i...