T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

11/01/2022
by   Chan-Jan Hsu, et al.
0

In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e.g. HuBERT) and pretrained language models (PLM, e.g. T5). Most previous works use pretrained language models with subword-based tokenization. However, the granularity of input units affects the alignment of speech model outputs and language model inputs, and PLM with character-based tokenization is underexplored. In this work, we conduct extensive studies on how PLMs with different tokenization strategies affect spoken language understanding task including spoken question answering (SQA) and speech translation (ST). We further extend the idea to create T5lephone(pronounced as telephone), a variant of T5 that is pretrained using phonemicized text. We initialize T5lephone with existing PLMs to pretrain it using relatively lightweight computational resources. We reached state-of-the-art on NMSQA, and the T5lephone model exceeds T5 with other types of units on end-to-end SQA and ST.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

While end-to-end models for spoken language understanding tasks have bee...
research
06/29/2022

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

In Spoken Language Understanding (SLU) the task is to extract important ...
research
07/19/2022

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

In the last five years, the rise of the self-attentional Transformer-bas...
research
09/04/2023

A Comparative Analysis of Pretrained Language Models for Text-to-Speech

State-of-the-art text-to-speech (TTS) systems have utilized pretrained l...
research
05/22/2023

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data on...
research
05/29/2023

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract seman...
research
07/01/2022

Toward Low-Cost End-to-End Spoken Language Understanding

Recent advances in spoken language understanding benefited from Self-Sup...

Please sign up or login with your details

Forgot password? Click here to reset