For fine-grained generation and recognition tasks such as
minimally-supe...
Recently, there has been a growing interest in text-to-speech (TTS) meth...
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel vi...
In recent years, the joint training of speech enhancement front-end and
...
Time-domain speech enhancement (SE) has recently been intensively
invest...
Recently, many deep learning based beamformers have been proposed for
mu...
This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...
The bi-encoder structure has been intensively investigated in code-switc...
Speaker extraction seeks to extract the target speech in a multi-talker
...
Recent neural network based Direction of Arrival (DoA) estimation algori...
Dual-encoder structure successfully utilizes two language-specific encod...
Sound source localization aims to seek the direction of arrival (DOA) of...
Talking head generation is to synthesize a lip-synchronized talking head...
Speaker extraction aims to extract the target speaker's voice from a
mul...
The task of talking head generation is to synthesize a lip synchronized
...
The end-to-end speech synthesis model can directly take an utterance as
...
Expressive neural text-to-speech (TTS) systems incorporate a style encod...
Speaker extraction uses a pre-recorded reference speech as the reference...
Speaker extraction aims to extract the target speech signal from a
multi...
Spikes are the currency in central nervous systems for information
trans...
Most existing AU detection works considering AU relationships are relyin...
The capability for environmental sound recognition (ESR) can determine t...
Recently, increasing attention has been directed to the study of the spe...
In this paper, we study several microphone channel selection and weighti...