SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts

06/03/2023
by   Haibin Wu, et al.
0

Large language models (LLMs) have gained considerable attention for Artificial Intelligence Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct adaptation of continuous speech to LLMs that process discrete tokens remains an unsolved challenge, hindering the application of LLMs for speech generation. The advanced speech LMs are in the corner, as that speech signals encapsulate a wealth of information, including speaker and emotion, beyond textual data alone. Prompt tuning has demonstrated notable gains in parameter efficiency and competitive performance on some speech classification tasks. However, the extent to which prompts can effectively elicit generation tasks from speech LMs remains an open question. In this paper, we present pioneering research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen, with around 10M trainable parameters. The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs, which will significantly enhance the capabilities of the framework. The code and demos of SpeechGen will be available on the project website: <https://ga642381.github.io/SpeechPrompt/speechgen>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

Prompt tuning is a technology that tunes a small set of parameters to st...
research
08/31/2023

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Current speech large language models build upon discrete speech represen...
research
03/31/2022

An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Speech representations learned from Self-supervised learning (SSL) model...
research
11/16/2022

Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models

Conventional fine-tuning encounters increasing difficulties given the si...
research
07/12/2023

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Despite recent advancements in speech emotion recognition (SER) models, ...
research
04/09/2022

Contrastive Demonstration Tuning for Pre-trained Language Models

Pretrained language models can be effectively stimulated by textual prom...
research
10/29/2019

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Speech signals are complex composites of various information, including ...

Please sign up or login with your details

Forgot password? Click here to reset