Reducing Bias in Production Speech Models

by   Eric Battenberg, et al.

Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end speech models back into the underfitting regime and expose biases in the model that we show cannot be overcome by "scaling up", i.e., training bigger models on more data. In this work we systematically identify and address sources of bias, reducing error rates by up to 20 for deployment. We achieve this by utilizing improved neural architectures for streaming inference, solving optimization issues, and employing strategies that increase audio and label modelling versatility.


WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit

In this paper, we present a new open source, production first and produc...

Two-Pass End-to-End Speech Recognition

The requirements for many applications of state-of-the-art speech recogn...

Hidden Biases of End-to-End Driving Models

End-to-end driving systems have recently made rapid progress, in particu...

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using ...

Streaming Models for Joint Speech Recognition and Translation

Using end-to-end models for speech translation (ST) has increasingly bee...

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English

Automation of on-call customer support relies heavily on accurate and ef...

Please sign up or login with your details

Forgot password? Click here to reset