Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

10/26/2022
by   Piyush Behre, et al.
0

Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8 show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points.

READ FULL TEXT
research
11/18/2015

Enhancements in statistical spoken language translation by de-normalization of ASR results

Spoken language translation (SLT) has become very important in an increa...
research
04/16/2021

Segmenting Subtitles for Correcting ASR Segmentation Errors

Typical ASR systems segment the input audio into utterances using purely...
research
01/10/2023

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition

While speech recognition Word Error Rate (WER) has reached human parity ...
research
10/19/2020

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

In this work, we focus on improving ASR output segmentation in the conte...
research
03/08/2020

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

We demonstrate how we can practically incorporate multi-step future info...
research
03/06/2020

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

Data-driven segmentation of words into subword units has been used in va...
research
02/24/2022

Speech segmentation using multilevel hybrid filters

A novel approach for speech segmentation is proposed, based on Multileve...

Please sign up or login with your details

Forgot password? Click here to reset