Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

07/09/2019
by   Yonatan Belinkov, et al.
0

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of the deep neural network.

READ FULL TEXT

page 3

page 4

research
09/13/2017

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

Neural models have become ubiquitous in automatic speech recognition sys...
research
01/21/2021

Arabic Speech Recognition by End-to-End, Modular Systems and Human

Recent advances in automatic speech recognition (ASR) have achieved accu...
research
01/13/2017

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to c...
research
09/21/2020

End-to-End Bengali Speech Recognition

Bengali is a prominent language of the Indian subcontinent. However, whi...
research
05/25/2020

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

Acoustic analyses of infant vocalizations are valuable for research on s...
research
11/02/2021

Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving f...
research
11/04/2019

What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis

End-to-end speech recognition systems have achieved competitive results ...

Please sign up or login with your details

Forgot password? Click here to reset