AlloST: Low-resource Speech Translation without Source Transcription

05/01/2021
by   Yao-Fei Cheng, et al.
0

The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attention-based sequence-to-sequence model, where the encoder generates the phonetic embeddings and phone-aware acoustic representations, and the decoder controls the fusion of the two embedding streams to produce the target token sequence. In addition to investigating different fusion strategies, we explore the specific usage of byte pair encoding (BPE), which compresses a phone sequence into a syllable-like segmented sequence with semantic information. Experiments conducted on the Fisher Spanish-English and Taigi-Mandarin drama corpora show that our method outperforms the conformer-based baseline, and the performance is close to that of the existing best method using source transcription.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

Phone Features Improve Speech Translation

End-to-end models for speech translation (ST) more tightly couple speech...
research
06/20/2023

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Low-resource accented speech recognition is one of the important challen...
research
06/21/2023

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

We compare using a PHOIBLE-based phone mapping method and using phonolog...
research
09/09/2021

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Back-translation (BT) of target monolingual corpora is a widely used dat...
research
06/01/2023

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

We compare phone labels and articulatory features as input for cross-lin...
research
02/13/2018

Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation

Sequence-to-sequence attentional-based neural network architectures have...
research
02/19/2018

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting...

Please sign up or login with your details

Forgot password? Click here to reset