Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

05/09/2023
by   Yassir Fathullah, et al.
0

State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

Insertion-Based Modeling for End-to-End Automatic Speech Recognition

End-to-end (E2E) models have gained attention in the research field of a...
research
12/09/2020

On Knowledge Distillation for Direct Speech Translation

Direct speech translation (ST) has shown to be a complex task requiring ...
research
10/04/2019

Modeling Confidence in Sequence-to-Sequence Models

Recently, significant improvements have been achieved in various natural...
research
01/10/2023

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition

While speech recognition Word Error Rate (WER) has reached human parity ...
research
10/11/2022

Streaming Punctuation for Long-form Dictation with Transformers

While speech recognition Word Error Rate (WER) has reached human parity ...
research
10/19/2020

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

In this work, we focus on improving ASR output segmentation in the conte...
research
05/30/2020

Dynamic Masking for Improved Stability in Spoken Language Translation

For spoken language translation (SLT) in live scenarios such as conferen...

Please sign up or login with your details

Forgot password? Click here to reset