Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection

05/02/2022
by   Angelica Chen, et al.
0

In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed. This post-processing step is crucial for producing clean transcripts and high performance on downstream tasks (e.g. machine translation). However, most current state-of-the-art NLP models such as the Transformer operate non-incrementally, potentially causing unacceptable delays. We propose a streaming BERT-based sequence tagging model that, combined with a novel training objective, is capable of detecting disfluencies in real-time while balancing accuracy and latency. This is accomplished by training the model to decide whether to immediately output a prediction for the current input or to wait for further context. Essentially, the model learns to dynamically size its lookahead window. Our results demonstrate that our model produces comparably accurate predictions and does so sooner than our baselines, with lower flicker. Furthermore, the model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2020

Streaming Simultaneous Speech Translation with Augmented Memory Transformer

Transformer-based models have achieved state-of-the-art performance on s...
research
07/27/2023

Turning Whisper into Real-Time Transcription System

Whisper is one of the recent state-of-the-art multilingual speech recogn...
research
06/22/2022

Answer Fast: Accelerating BERT on the Tensor Streaming Processor

Transformers have become a predominant machine learning workload, they a...
research
11/16/2022

Streaming Joint Speech Recognition and Disfluency Detection

Disfluency detection has mainly been solved in a pipeline approach, as p...
research
11/25/2022

Efficient Incremental Text-to-Speech on GPUs

Incremental text-to-speech, also known as streaming TTS, has been increa...
research
09/15/2021

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...
research
01/03/2020

Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing ...

Please sign up or login with your details

Forgot password? Click here to reset