Continual learning is a challenging problem in which models need to be
t...
The field of Sign Language Production (SLP) lacked a large-scale, pre-tr...
The Diffusion Model (DM) has emerged as the SOTA approach for image
synt...
In this paper, we introduce audio-visual class-incremental learning, a
c...
We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framew...
Audio-visual learning seeks to enhance the computer's multi-modal percep...
Limited by imaging systems, the reconstruction of Magnetic Resonance Ima...
We live in a world filled with never-ending streams of multimodal
inform...
Due to the limitations of capture devices and scenarios, egocentric vide...
Text-to-audio (TTA) generation is a recent popular problem that aims to
...
Segment Anything Model (SAM) has recently shown its powerful effectivene...
Sound source localization is a typical and challenging task that predict...
Humans naturally perceive surrounding scenes by unifying sound and sight...
Diffusion model (DM) has achieved SOTA performance by modeling the image...
Human perception of the complex world relies on a comprehensive analysis...
Blind image super-resolution (Blind-SR) aims to recover a high-resolutio...
Lighter and faster image restoration (IR) models are crucial for the
dep...
Sight and hearing are two senses that play a vital role in human
communi...
The most of CNN based super-resolution (SR) methods assume that the
degr...
In this paper, we focus on the Audio-Visual Question Answering (AVQA) ta...
Magnetic resonance imaging (MRI) can present multi-contrast images of th...
Downsampling is one of the most basic image processing operations. Impro...
The target of space-time video super-resolution (STVSR) is to increase t...
Reference-based super-resolution (RefSR) has made significant progress i...
Non-Local Attention (NLA) brings significant improvement for Single Imag...
Leveraging temporal synchronization and association within sight and sou...
In this paper, we address the space-time video super-resolution, which a...
There are rich synchronized audio and visual events in our daily life. I...
In this paper, we propose to make a systematic study on machines multise...
In this paper, we introduce a new problem, named audio-visual video pars...
In this paper, we explore the space-time video super-resolution task, wh...
Deep convolutional neural networks are known to specialize in distilling...
In this paper, we develop a concise but efficient network architecture c...
Deep learning methods have witnessed the great progress in image restora...
Convolutional neural network has recently achieved great success for ima...
Video super-resolution (VSR) aims to restore a photo-realistic
high-reso...
Automatically generating a natural language sentence to describe the con...
Single image super-resolution (SISR) is a notoriously challenging ill-po...
In this paper, we introduce a novel problem of audio-visual event
locali...
A very deep convolutional neural network (CNN) has recently achieved gre...