GRASS: Unified Generation Model for Speech-to-Semantic Tasks

09/06/2023
by   Aobo Xia, et al.
0

This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.

READ FULL TEXT
research
06/01/2023

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

The pre-training-fine-tuning paradigm based on layout-aware multimodal p...
research
09/06/2023

HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

ChatGPT has gained significant interest due to its impressive performanc...
research
07/05/2023

LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Despite previous efforts in melody-to-lyric generation research, there i...
research
05/22/2023

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

ChatGPT and GPT-4 have attracted substantial interest from both academic...
research
08/02/2023

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

In this work, we evaluate 10 open-source instructed LLMs on four represe...
research
10/24/2022

Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

Zero-shot cross-lingual transfer learning has been shown to be highly ch...
research
04/24/2023

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

The immense scale of the recent large language models (LLM) allows many ...

Please sign up or login with your details

Forgot password? Click here to reset