Reducing Bias in Production Speech Models

05/11/2017
by   Eric Battenberg, et al.
0

Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end speech models back into the underfitting regime and expose biases in the model that we show cannot be overcome by "scaling up", i.e., training bigger models on more data. In this work we systematically identify and address sources of bias, reducing error rates by up to 20 for deployment. We achieve this by utilizing improved neural architectures for streaming inference, solving optimization issues, and employing strategies that increase audio and label modelling versatility.

READ FULL TEXT
02/02/2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit

In this paper, we present a new open source, production first and produc...
08/29/2019

Two-Pass End-to-End Speech Recognition

The requirements for many applications of state-of-the-art speech recogn...
06/13/2023

Hidden Biases of End-to-End Driving Models

End-to-end driving systems have recently made rapid progress, in particu...
12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...
12/17/2014

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...
01/22/2021

Streaming Models for Joint Speech Recognition and Translation

Using end-to-end models for speech translation (ST) has increasingly bee...
10/26/2022

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English

Automation of on-call customer support relies heavily on accurate and ef...

Please sign up or login with your details

Forgot password? Click here to reset