Textual Supervision for Visually Grounded Spoken Language Understanding

10/06/2020
by   Bertrand Higy, et al.
0

Visually-grounded models of spoken language understanding extract semantic information directly from speech, without relying on transcriptions. This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain. Recent work showed that these models can be improved if transcriptions are available at training time. However, it is not clear how an end-to-end approach compares to a traditional pipeline-based approach when one has access to transcriptions. Comparing different strategies, we find that the pipeline approach works better when enough text is available. With low-resource languages in mind, we also show that translations can be effectively used in place of transcriptions but more data is needed to obtain similar results.

READ FULL TEXT
research
03/30/2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

The objective of this work is to explore the learning of visually ground...
research
06/11/2021

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

We investigate the efficiency of two very different spoken term detectio...
research
09/03/2021

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Lack of training data presents a grand challenge to scaling out spoken l...
research
11/24/2022

Bidirectional Representations for Low Resource Spoken Language Understanding

Most spoken language understanding systems use a pipeline approach compo...
research
10/25/2020

Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding

End-to-end approaches open a new way for more accurate and efficient spo...
research
05/08/2018

Capsule Networks for Low Resource Spoken Language Understanding

Designing a spoken language understanding system for command-and-control...
research
11/12/2020

Enabling Interactive Transcription in an Indigenous Community

We propose a novel transcription workflow which combines spoken term det...

Please sign up or login with your details

Forgot password? Click here to reset