Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

02/01/2020
by   Sanna Wager, et al.
0

In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the higher classification layers were jointly trained using the logits from the teacher model. In our experiments, compared to a baseline model trained on about 600 hours of transcribed data, a relative word-error rate (WER) reduction of about 27.3 additional 1800 hours of untranscribed data. We also investigated the benefit of pre-training the multi-channel front end to output the beamformed log-mel filter bank energies (LFBE) using L2 loss. We find that pre-training improves the word error rate by 10.7 initialized with a beamformer and mel-filter bank coefficients for the front end. Finally, combining pre-training and teacher-student training produces a WER reduction of 31

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learni...
research
02/01/2020

Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

Recent literature has shown that a learned front end with multi-channel ...
research
04/24/2019

Realizing Petabyte Scale Acoustic Modeling

Large scale machine learning (ML) systems such as the Alexa automatic sp...
research
06/11/2021

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

We present results from Alexa speech teams on semi-supervised learning (...
research
04/14/2018

Developing Far-Field Speaker System Via Teacher-Student Learning

In this study, we develop the keyword spotting (KWS) and acoustic model ...
research
07/09/2019

Teach an all-rounder with experts in different domains

In many automatic speech recognition (ASR) tasks, an ideal model has to ...
research
08/10/2020

Knowledge Distillation and Data Selection for Semi-Supervised Learning in CTC Acoustic Models

Semi-supervised learning (SSL) is an active area of research which aims ...

Please sign up or login with your details

Forgot password? Click here to reset