Generative Spoken Dialogue Language Modeling

03/30/2022
by   Tu Anh Nguyen, et al.
5

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. It is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces naturalistic turn taking. Generation samples can be found at: https://speechbot.github.io/dgslm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2021

Text-Free Prosody-Aware Generative Spoken Language Modeling

Speech pre-training has primarily demonstrated efficacy on classificatio...
research
03/11/2022

Are discrete units necessary for Spoken Language Modeling?

Recent work in spoken language modeling shows the possibility of learnin...
research
02/01/2021

Generative Spoken Language Modeling from Raw Audio

Generative spoken language modeling involves learning jointly the acoust...
research
10/27/2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

Recent progress in self-supervised or unsupervised machine learning has ...
research
11/23/2020

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

We introduce a new unsupervised task, spoken language modeling: the lear...
research
06/13/2020

GIPFA: Generating IPA Pronunciation from Audio

Transcribing spoken audio samples into International Phonetic Alphabet (...
research
05/19/2023

MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup

Current disfluency detection models focus on individual utterances each ...

Please sign up or login with your details

Forgot password? Click here to reset