We present PAT, a transformer-based network that learns complex temporal...
3D audio-visual production aims to deliver immersive and interactive
exp...
Active speaker detection (ASD) is a multi-modal task that aims to identi...
As audio-visual systems increasingly bring immersive and interactive
cap...
Immersive audio-visual perception relies on the spatial integration of b...
Typical methods for binaural source separation consider only the direct ...
Single-channel signal separation and deconvolution aims to separate and
...
Environmental audio tagging aims to predict only the presence or absence...
Acoustic event detection for content analysis in most cases relies on lo...