Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model

09/20/2023
by   Xinyu Zhou, et al.
0

This paper explores the potential of constructing an AI spoken dialogue system that "thinks how to respond" and "thinks how to speak" simultaneously, which more closely aligns with the human speech production process compared to the current cascade pipeline of independent chatbot and Text-to-Speech (TTS) modules. We hypothesize that Large Language Models (LLMs) with billions of parameters possess significant speech understanding capabilities and can jointly model dialogue responses and linguistic features. We conduct two sets of experiments: 1) Prosodic structure prediction, a typical front-end task in TTS, demonstrating the speech understanding ability of LLMs, and 2) Further integrating dialogue response and a wide array of linguistic features using a unified encoding format. Our results indicate that the LLM-based approach is a promising direction for building unified spoken dialogue systems.

READ FULL TEXT

page 6

page 7

research
06/16/2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

We propose an end-to-end empathetic dialogue speech synthesis (DSS) mode...
research
05/18/2020

Neural Generation of Dialogue Response Timings

The timings of spoken response offsets in human dialogue have been shown...
research
06/22/2023

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding ...
research
08/07/2022

When can I Speak? Predicting initiation points for spoken dialogue agents

Current spoken dialogue systems initiate their turns after a long period...
research
08/15/2023

DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue

Large Language Models (LLMs), such as ChatGPT, are becoming increasingly...
research
09/13/2018

Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System

This paper presents a study on mutual speech variation influences in a h...
research
07/25/2019

What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue

The assumptions we make about a dialogue partner's knowledge and communi...

Please sign up or login with your details

Forgot password? Click here to reset