Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

12/27/2022
by   Tomer Wullach, et al.
0

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. Our proposed approach does not require further training beyond the original fine-tuning, nor additional model parameters. In fact, we find that our proposed method requires significantly less inference computation than current approaches. We propose aggregating the top M layers, potentially leveraging useful information encoded in intermediate layers, and relaxing model confidence. We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios.

READ FULL TEXT

page 1

page 3

page 6

research
03/21/2022

Enhancing Speech Recognition Decoding via Layer Aggregation

Recently proposed speech recognition systems are designed to predict usi...
research
04/06/2021

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

This paper proposes a method to relax the conditional independence assum...
research
10/06/2021

Spell my name: keyword boosted speech recognition

Recognition of uncommon words such as names and technical terminology is...
research
11/01/2022

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Self-supervised learning (SSL) models reshaped our approach to speech, l...
research
03/12/2023

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Self-supervised learning (SSL) has allowed substantial progress in Autom...
research
05/08/2019

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding

The Connectionist Temporal Classification (CTC) has achieved great succe...
research
09/21/2023

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning

Transformer-based speech recognition (ASR) model with deep layers exhibi...

Please sign up or login with your details

Forgot password? Click here to reset