Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

10/27/2022
by   Ewan Dunbar, et al.
0

Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defined tasks – Acoustic Unit Discovery, Spoken Term Discovery, Discrete Resynthesis, and Spoken Language Modeling – and introduce associated metrics and benchmarks enabling model comparison and cumulative progress. We present an overview of the six editions of this challenge series since 2015, discuss the lessons learned, and outline the areas which need more work or give puzzling results.

READ FULL TEXT

page 2

page 17

research
11/23/2020

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

We introduce a new unsupervised task, spoken language modeling: the lear...
research
04/29/2021

The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

We present the Zero Resource Speech Challenge 2021, which asks participa...
research
07/17/2021

Learning De-identified Representations of Prosody from Raw Audio

We propose a method for learning de-identified prosody representations f...
research
07/26/2020

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

Unsupervised spoken term discovery consists of two tasks: finding the ac...
research
04/11/2022

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

We introduce a simple neural encoder architecture that can be trained us...
research
01/02/2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

This work profoundly analyzes discrete self-supervised speech representa...
research
03/30/2022

Generative Spoken Dialogue Language Modeling

We introduce dGSLM, the first "textless" model able to generate audio sa...

Please sign up or login with your details

Forgot password? Click here to reset