Magnetic resonance imaging (MRI) have played a crucial role in brain dis...
Large language models (LLMs) have achieved significant success in intera...
Open-vocabulary semantic segmentation is a challenging task that require...
While Large Language Models (LLMs) have demonstrated commendable perform...
Exploring spatial-temporal dependencies from observed motions is one of ...
Clinical classification of chest radiography is particularly challenging...
Multi-person motion prediction is a challenging problem due to the depen...
Anomaly detection has gained considerable attention due to its broad ran...
In this study, we aim to initiate the development of Radiology Foundatio...
Class incremental learning (CIL) aims to incrementally update a trained ...
Electrocardiogram (ECG) is a widely used diagnostic tool for detecting h...
The goal of the audio-visual segmentation (AVS) task is to segment the
s...
Current mainstream vision-language (VL) tracking framework consists of t...
In semantic segmentation, adapting a visual system to novel object categ...
Video frame interpolation (VFI) is a challenging task that aims to gener...
In this paper, we consider the problem of composed image retrieval (CIR)...
Image compression aims to reduce the information redundancy in images. M...
This work considers the category distribution heterogeneity in federated...
In this paper, we focus on the problem of Medical Visual Question Answer...
Large Language Models (LLMs) have showcased remarkable capabilities in
n...
Camera-only 3D detection provides an economical solution with a simple
c...
In this paper, we consider the problem of temporal action localization u...
To model the indeterminacy of human behaviors, stochastic trajectory
pre...
Learning to predict agent motions with relationship reasoning is importa...
Interactive segmentation has recently been explored to effectively and
e...
Vision-centric joint perception and prediction (PnP) has become an emerg...
Learning from a large corpus of data, pre-trained models have achieved
i...
Foundation models trained on large-scale dataset gain a recent surge in ...
Blind face restoration usually synthesizes degraded low-quality data wit...
Despite of the success of multi-modal foundation models pre-trained on
l...
In this paper, we consider the problem of disease diagnosis. Unlike the
...
Multimodal emotion recognition is a challenging research area that aims ...
Real-world data usually couples the label ambiguity and heavy imbalance,...
The goal of this paper is to augment a pre-trained text-to-image diffusi...
Metastasis on lymph nodes (LNs), the most common way of spread for prima...
In this paper, we consider the problem of enhancing self-supervised
visu...
Weakly-supervised temporal action localization (WTAL) learns to detect a...
The statistical heterogeneity of the non-independent and identically
dis...
Collaborative 3D object detection exploits information exchange among
mu...
Multi-agent learning has gained increasing attention to tackle distribut...
When trained at a sufficient scale, self-supervised learning has exhibit...
3D point cloud semantic segmentation is one of the fundamental tasks for...
Existing models on super-resolution often specialized for one scale,
fun...
Low-light video enhancement (LLVE) is an important yet challenging task ...
In multi-modal multi-agent trajectory forecasting, two major challenges ...
The research and applications of multimodal emotion recognition have bec...
This paper considers the problem of fast MRI reconstruction. We propose ...
Self-supervised learning has achieved a great success in the representat...
Recently, anomaly detection and localization in multimedia data have rec...
Household environments are important testbeds for embodied AI research. ...