Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

11/12/2018
by   Joanna Rownicka, et al.
0

We explore why deep convolutional neural networks (CNNs) with small two-dimensional kernels, primarily used for modeling spatial relations in images, are also effective in speech recognition. We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation. To explore whether interpretable information can be decoded from the learned representations we evaluate their ability to discriminate between speakers, acoustic conditions, noise type, and gender using the Aurora-4 dataset. We extract both whole model embeddings (to capture the information learned across the whole network) and layer-specific embeddings which enable understanding of the flow of information across the network. We also use learned representations as the additional input for a time-delay neural network (TDNN) for the Aurora-4 and MGB-3 English datasets. We find that deep CNN embeddings outperform DNN embeddings for acoustic model adaptation and auxiliary features based on deep CNN embeddings result in similar word error rates to i-vectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

The word error rate (WER) of an automatic speech recognition (ASR) syste...
research
08/27/2018

Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots

As for the humanoid robots, the internal noise, which is generated by mo...
research
05/30/2022

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Deep neural networks (DNNs) have shown promising results for acoustic ec...
research
06/10/2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

We propose to learn acoustic word embeddings with temporal context for q...
research
07/19/2017

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Layer normalization is a recently introduced technique for normalizing t...
research
06/15/2021

Canonical Face Embeddings

We present evidence that many common convolutional neural networks (CNNs...
research
07/02/2019

Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification

This paper proposes a Sub-band Convolutional Neural Network for spoken t...

Please sign up or login with your details

Forgot password? Click here to reset