The goal of the audio-visual segmentation (AVS) task is to segment the
s...
The objective of Audio-Visual Segmentation (AVS) is to localise the soun...
Learning from a large corpus of data, pre-trained models have achieved
i...
Temporal sentence grounding aims to detect the event timestamps describe...
Weakly-supervised temporal action localization (WTAL) learns to detect a...
We present a simple yet effective self-supervised framework for audio-vi...
Human pose transfer has typically been modeled as a 2D image-to-image
tr...