Text-based Person Retrieval aims to retrieve the target person images gi...
Supervised visual captioning models typically require a large scale of i...
The exponential growth of data, alongside advancements in model structur...
Machine learning models often learn to make predictions that rely on
sen...
Token interaction operation is one of the core modules in MLP-based mode...
Unpaired Medical Image Enhancement (UMIE) aims to transform a low-qualit...
Mixup style data augmentation algorithms have been widely adopted in var...
Deep representation learning is a subfield of machine learning that focu...
Domain gap between synthetic and real data in visual regression (6D pose...
Visual Grounding (VG) refers to locating a region described by expressio...
Vision foundation models exhibit impressive power, benefiting from the
e...
Vision-and-language navigation (VLN) is the task to enable an embodied a...
This paper aims to solve the video object segmentation (VOS) task in a
s...
Natural Language Generation (NLG) accepts input data in the form of imag...
With the swift advancement of deep learning, state-of-the-art algorithms...
Unsupervised domain adaptation addresses the problem of classifying data...
With the urgent demand for generalized deep models, many pre-trained big...
As a de facto solution, the vanilla Vision Transformers (ViTs) are encou...
There is a growing interest in developing unlearnable examples (UEs) aga...
Over the past few years, developing a broad, universal, and general-purp...
This paper focuses on the prevalent performance imbalance in the stages ...
Although significant progress has been made in few-shot learning, most o...
In this paper, we present an integral pre-training framework based on ma...
Combining the Color and Event cameras (also called Dynamic Vision Sensor...
The main streams of human activity recognition (HAR) algorithms are deve...
We consider two biologically plausible structures, the Spiking Neural Ne...
In this paper, we propose a new agency-guided semi-supervised counting
a...
Deep Metric Learning (DML) serves to learn an embedding function to proj...
Our goal in this research is to study a more realistic environment in wh...
Unpaired Image Captioning (UIC) has been developed to learn image
descri...
In this paper, we propose a Global-Supervised Contrastive loss and a
vie...
Semantic patterns of fine-grained objects are determined by subtle appea...
Conventional deep models predict a test sample with a single forward
pro...
The exponentially large discrete search space in mixed-precision quantiz...
Object detection is an algorithm that recognizes and locates the objects...
The goal of unpaired image captioning (UIC) is to describe images withou...
This paper focuses on the challenging crowd counting task. As large-scal...
Class Activation Mapping (CAM) has been widely adopted to generate salie...
We propose an end-to-end image compression and analysis model with
Trans...
In this paper, we study the problem of networked multi-agent reinforceme...
Domain adaptive object detection (DAOD) aims to improve the generalizati...
Traditional crowd counting approaches usually use Gaussian assumption to...
Within Convolutional Neural Network (CNN), the convolution operations ar...
Transformer is showing its superiority over convolutional architectures ...
We propose a novel joint lossy image and residual compression framework ...
One of the key steps in Neural Architecture Search (NAS) is to estimate ...
Few-shot learning (FSL) aims at recognizing novel classes given only few...
Few-shot learning, which aims at extracting new concepts rapidly from
ex...
Most of the existing recognition algorithms are proposed for closed set
...
In recent years, the performance of action recognition has been signific...