Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

10/19/2020
by   David Wan, et al.
0

In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation. ASR output segmentation is crucial, as ASR systems segment the input audio using purely acoustic information and are not guaranteed to output sentence-like segments. Since most MT systems expect sentences as input, feeding in longer unsegmented passages can lead to sub-optimal performance. We explore the feasibility of using datasets of subtitles from TV shows and movies to train better ASR segmentation models. We further incorporate part-of-speech (POS) tag and dependency label information (derived from the unsegmented ASR outputs) into our segmentation model. We show that this noisy syntactic information can improve model accuracy. We evaluate our models intrinsically on segmentation quality and extrinsically on downstream MT performance, as well as downstream tasks including cross-lingual information retrieval (CLIR) tasks and human relevance assessments. Our model shows improved performance on downstream tasks for Lithuanian and Bulgarian.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2021

Segmenting Subtitles for Correcting ASR Segmentation Errors

Typical ASR systems segment the input audio into utterances using purely...
research
06/09/2020

Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Transfer learning from high-resource languages is known to be an efficie...
research
05/27/2020

Phone Features Improve Speech Translation

End-to-end models for speech translation (ST) more tightly couple speech...
research
10/26/2022

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Segmentation for continuous Automatic Speech Recognition (ASR) has tradi...
research
07/05/2022

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

We aim at improving spoken language modeling (LM) using very large amoun...
research
09/03/2017

Disentangling ASR and MT Errors in Speech Translation

The main aim of this paper is to investigate automatic quality assessmen...
research
05/09/2023

Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

State-of-the-art sequence-to-sequence models often require autoregressiv...

Please sign up or login with your details

Forgot password? Click here to reset