Fine-grained visual classification (FGVC) involves categorizing fine
sub...
Modern approaches have proved the huge potential of addressing semantic
...
Vision Foundation Models (VFMs) such as the Segment Anything Model (SAM)...
Jointly processing information from multiple sensors is crucial to achie...
In the last few years, Neural Painting (NP) techniques became capable of...
We propose a novel ECGAN for the challenging semantic image synthesis ta...
Occlusion perturbation presents a significant challenge in person
re-ide...
Wearable sensor-based Human Action Recognition (HAR) has made significan...
The existing contrastive learning methods widely adopt one-hot instance
...
Image restoration is a low-level visual task, and most CNN methods are
d...
Acoustic word embeddings are typically created by training a pooling fun...
Underwater object detection (UOD) is crucial for marine economic develop...
Generating facial reactions in a human-human dyadic interaction is compl...
Self-supervised speech representations are known to encode both speaker ...
We propose a method for unsupervised opinion summarization that encodes
...
Transformer-based models achieve favorable performance in artistic style...
Generating a high-quality High Dynamic Range (HDR) image from dynamic sc...
Deep point cloud registration methods face challenges to partial overlap...
We present a novel graph Transformer generative adversarial network (GTG...
This study investigates the effectiveness of Explainable Artificial
Inte...
The rapid advances in Vision Transformer (ViT) refresh the state-of-the-...
There is a recent trend of applying multi-agent reinforcement learning (...
Synthesizing high-fidelity complex images from text is challenging. Base...
The generation of natural human motion interactions is a hot topic in
co...
Visual object tracking is a key component to many egocentric vision prob...
Adversarial attacks on thermal infrared imaging expose the risk of relat...
Recently, due to the increasing requirements of medical imaging applicat...
Despite the success of Transformers in self-supervised learning with
app...
Self-supervised models have had great success in learning speech
represe...
Video processing and analysis have become an urgent task since a huge am...
We present a novel bipartite graph reasoning Generative Adversarial Netw...
The conventional lottery ticket hypothesis (LTH) claims that there exist...
While discrete latent variable models have had great success in
self-sup...
Given the strong results of self-supervised models on various tasks, the...
Recently, diffusion models (DMs) have been increasingly used in audio
pr...
Compressing self-supervised models has become increasingly necessary, as...
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel
fine-...
Although Deep Neural Networks (DNNs) have achieved impressive results in...
We propose a simple yet powerful Landmark guided Generative Adversarial
...
Text-based person search is a challenging task that aims to search pedes...
3D-aware GANs based on generative neural radiance fields (GNeRF) have
ac...
Sign Language Production (SLP) aims to translate spoken languages into s...
The essence of video semantic segmentation (VSS) is how to leverage temp...
Brain vessel image segmentation can be used as a promising biomarker for...
Degraded images commonly exist in the general sources of character image...
Constructing high-quality character image datasets is challenging becaus...
The long-tail effect is a common issue that limits the performance of de...
Self-supervised skeleton-based action recognition with contrastive learn...
We address the challenging task of human reaction generation which aims ...
Generative models have emerged as an essential building block for many i...