What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis

11/04/2019
by   Chung-Yi Li, et al.
0

End-to-end speech recognition systems have achieved competitive results compared to traditional systems. However, the complex transformations involved between layers given highly variable acoustic signals are hard to analyze. In this paper, we present our ASR probing model, which synthesizes speech from hidden representations of end-to-end ASR to examine the information maintain after each layer calculation. Listening to the synthesized speech, we observe gradual removal of speaker variability and noise as the layer goes deeper, which aligns with the previous studies on how deep network functions in speech recognition. This paper is the first study analyzing the end-to-end speech recognition model by demonstrating what each layer hears. Speaker verification and speech enhancement measurements on synthesized speech are also conducted to confirm our observation further.

READ FULL TEXT
research
08/22/2023

Convoifilter: A case study of doing cocktail party speech recognition

This paper presents an end-to-end model designed to improve automatic sp...
research
07/09/2019

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

End-to-end neural network systems for automatic speech recognition (ASR)...
research
11/05/2018

End-to-End Monaural Multi-speaker ASR System without Pretraining

Recently, end-to-end models have become a popular approach as an alterna...
research
01/27/2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition

Dysarthria is a motor speech disorder often characterized by reduced spe...
research
05/19/2022

Insights on Neural Representations for End-to-End Speech Recognition

End-to-end automatic speech recognition (ASR) models aim to learn a gene...
research
12/17/2014

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...
research
07/01/2021

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

End-to-end DNN architectures have pushed the state-of-the-art in speech ...

Please sign up or login with your details

Forgot password? Click here to reset