End-to-end speech translation (ST) for conversation recordings involves
...
Artificial General Intelligence (AGI) requires comprehensive understandi...
The convergence of text, visual, and audio data is a key step towards
hu...
There is a surge in interest in self-supervised learning approaches for
...
Several trade-offs need to be balanced when employing monaural speech
se...
Self-supervised learning (SSL) methods such as WavLM have shown promisin...
Personalized speech enhancement (PSE) models achieve promising results
c...
Personalized speech enhancement (PSE), a process of estimating a clean t...
We present the first neural network model to achieve real-time and strea...
Multi-talker automatic speech recognition (ASR) has been studied to gene...
This paper presents a novel streaming automatic speech recognition (ASR)...
This paper describes a speaker diarization model based on target speaker...
Human intelligence is multimodal; we integrate visual, linguistic, and
a...
Transformer has been successfully applied to speech separation recently ...
Existing multi-channel continuous speech separation (CSS) models are hea...
This paper investigates how to improve the runtime speed of personalized...
This paper presents a streaming speaker-attributed automatic speech
reco...
The Deep Noise Suppression (DNS) challenge is designed to foster innovat...
This paper proposes a token-level serialized output training (t-SOT), a ...
This paper proposes PickNet, a neural network model for real-time channe...
While permutation invariant training (PIT) based continuous speech separ...
Multi-talker conversational speech processing has drawn many interests f...
Self-supervised learning (SSL) achieves great success in speech recognit...
With the recent surge of video conferencing tools usage, providing
high-...
Personalized speech enhancement (PSE) models utilize additional cues, su...
Continuous speech separation (CSS) aims to separate overlapping voices f...
Continuous speech separation using a microphone array was shown to be
pr...
This paper presents Transcribe-to-Diarize, a new approach for neural spe...
Speaker-attributed automatic speech recognition (SA-ASR) is a task to
re...
Speech separation has been successfully applied as a frontend processing...
This paper presents our recent effort on end-to-end speaker-attributed
a...
Transcribing meetings containing overlapped speech with only a single di...
Speech separation has been shown effective for multi-talker speech
recog...
An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...
Joint optimization of multi-channel front-end and automatic speech
recog...
Recently, an end-to-end speaker-attributed automatic speech recognition ...
Multi-speaker speech recognition of unsegmented recordings has diverse
a...
With its strong modeling capacity that comes from a multi-head and
multi...
This paper describes the Microsoft speaker diarization system for monaur...
Multi-speaker speech recognition has been one of the keychallenges in
co...
Recently, an end-to-end (E2E) speaker-attributed automatic speech recogn...
We propose an end-to-end speaker-attributed automatic speech recognition...
This paper proposes a neural network based speech separation method usin...
This paper proposes serialized output training (SOT), a novel framework ...
This paper describes a dataset and protocols for evaluating continuous s...
This paper describes a system that generates speaker-annotated transcrip...
An important problem in ad-hoc microphone speech separation is how to
gu...
Recent studies in deep learning-based speech separation have proven the
...
Speech recognition and other natural language tasks have long benefited ...
We describe a system that generates speaker-annotated transcripts of mee...