Online speech recognition, where the model only accesses context to the ...
The last year has seen astonishing progress in text-prompted image gener...
We introduce the Universal Speech Model (USM), a single large model that...
We propose JEIT, a joint end-to-end (E2E) model and internal language mo...
Text-only adaptation of a transducer model remains challenging for end-t...
This paper presents a streaming speaker-attributed automatic speech
reco...
This paper proposes a token-level serialized output training (t-SOT), a ...
While permutation invariant training (PIT) based continuous speech separ...
Multi-talker conversational speech processing has drawn many interests f...
This paper presents Transcribe-to-Diarize, a new approach for neural spe...
Text-only adaptation of an end-to-end (E2E) model remains a challenging ...
In recent years, end-to-end (E2E) based automatic speech recognition (AS...
Speaker-attributed automatic speech recognition (SA-ASR) is a task to
re...
Integrating external language models (LMs) into end-to-end (E2E) models
...
This paper presents our recent effort on end-to-end speaker-attributed
a...
Transcribing meetings containing overlapped speech with only a single di...
Speech separation has been shown effective for multi-talker speech
recog...
The efficacy of external language model (LM) integration with existing
e...
An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...
Joint optimization of multi-channel front-end and automatic speech
recog...
Recently, an end-to-end speaker-attributed automatic speech recognition ...
The external language models (LM) integration remains a challenging task...
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
Recently, an end-to-end (E2E) speaker-attributed automatic speech recogn...
Because of its streaming nature, recurrent neural network transducer (RN...
We propose an end-to-end speaker-attributed automatic speech recognition...
Active authentication refers to a new mode of identity verification in w...
We propose a novel neural label embedding (NLE) scheme for the domain
ad...
This paper proposes serialized output training (SOT), a novel framework ...
While the community keeps promoting end-to-end models over conventional
...
This paper describes a dataset and protocols for evaluating continuous s...
Teacher-student (T/S) has shown to be effective for domain adaptation of...
Predicting words and subword units (WSUs) as the output has shown to be
...
We propose three regularization-based speaker adaptation approaches to a...
We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
The use of deep networks to extract embeddings for speaker recognition h...
Adversarial domain-invariant training (ADIT) proves to be effective in
s...
The teacher-student (T/S) learning has been shown to be effective for a
...
Feature mapping using deep neural networks is an effective approach for
...
Feature-mapping with deep neural networks is commonly used for single-ch...
We propose a novel adversarial multi-task learning scheme, aiming at act...
The teacher-student (T/S) learning has been shown effective in unsupervi...
Far-field speech recognition in noisy and reverberant conditions remains...
Unsupervised domain adaptation of speech signal aims at adapting a
well-...