Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

06/30/2019
by   Shaoshi Ling, et al.
0

Pretrained contextual word representations in NLP have greatly improved performance on various downstream tasks. For speech, we propose contextual frame representations that capture phonetic information at the acoustic frame level and can be used for utterance-level language, speaker, and speech recognition. These representations come from the frame-wise intermediate representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken utterances. We first train the model on the Fisher English corpus with context-independent phoneme labels, then use its representations at inference time as features for task-specific models on the NIST LRE07 closed-set language recognition task and a Fisher speaker recognition task, giving significant improvements over the state-of-the-art on both (e.g., language EER of 4.68 3sec utterances, 23 competitive when using a novel dilated convolutional model for language recognition, or when ASR pretraining is done with character labels only.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Attention-based Contextual Language Model Adaptation for Speech Recognition

Language modeling (LM) for automatic speech recognition (ASR) does not u...
research
05/09/2023

Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

Attention-based contextual biasing approaches have shown significant imp...
research
07/01/2021

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

End-to-end DNN architectures have pushed the state-of-the-art in speech ...
research
05/01/2022

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...
research
02/26/2019

Utterance-level Aggregation For Speaker Recognition In The Wild

The objective of this paper is speaker recognition "in the wild"-where u...
research
10/20/2021

LMSOC: An Approach for Socially Sensitive Pretraining

While large-scale pretrained language models have been shown to learn ef...

Please sign up or login with your details

Forgot password? Click here to reset