Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

10/10/2021
by   Guoli Ye, et al.
0

Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results. By jointly modeling audio and text, the E2E model performs better in matched scenarios and scales well with a large amount of paired audio-text training data. The modularized hybrid model is easier for customization, and better to make use of a massive amount of unpaired text data. This paper proposes a two-pass hybrid and E2E cascading (HEC) framework to combine the hybrid and E2E model in order to take advantage of both sides, with hybrid in the first pass and E2E in the second pass. We show that the proposed system achieves 8-10 reduction with respect to each individual system. More importantly, compared with the pure E2E system, we show the proposed system has the potential to keep the advantages of hybrid system, e.g., customization and segmentation capabilities. We also show the second pass E2E model in HEC is robust with respect to the change in the first pass hybrid model.

READ FULL TEXT
research
10/31/2022

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

Recently, there has been an increasing interest in two-pass streaming en...
research
03/17/2020

Deliberation Model Based Two-Pass End-to-End Speech Recognition

End-to-end (E2E) models have made rapid progress in automatic speech rec...
research
01/27/2021

Transformer Based Deliberation for Two-Pass Speech Recognition

Interactive speech recognition systems must generate words quickly while...
research
08/30/2020

Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Recent advances of end-to-end models have outperformed conventional mode...
research
02/26/2019

Directional Embedding Based Semi-supervised Framework For Bird Vocalization Segmentation

This paper proposes a data-efficient, semi-supervised, two-pass framewor...
research
02/15/2021

Personalization Strategies for End-to-End Speech Recognition Systems

The recognition of personalized content, such as contact names, remains ...
research
08/02/2016

Efficient Segmental Cascades for Speech Recognition

Discriminative segmental models offer a way to incorporate flexible feat...

Please sign up or login with your details

Forgot password? Click here to reset