Estimating confidence scores for recognition results is a classic task i...
Listening to long video/audio recordings from video conferencing and onl...
ICASSP2023 General Meeting Understanding and Generation Challenge (MUG)
...
Conventional ASR systems use frame-level phoneme posterior to conduct
fo...
In this paper, we propose a novel multi-modal multi-task encoder-decoder...
Recently, hybrid systems of clustering and neural diarization models hav...
Transformers have recently dominated the ASR field. Although able to yie...
Overlapping speech diarization has been traditionally treated as a
multi...
Expressive text-to-speech (TTS) has become a hot research topic recently...
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand
Ch...
Recent development of speech signal processing, such as speech recogniti...
We propose BeamTransformer, an efficient architecture to leverage
beamfo...
In this paper we describe a speaker diarization system that enables
loca...
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...
Connectionist Temporal Classification (CTC) based end-to-end speech
reco...
Speaker adaptation methods aim to create fair quality synthesis speech v...
In this paper, we present an improved feedforward sequential memory netw...
The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is amon...