WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

04/22/2022
by   Lin Yao, et al.
0

Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stage method, which uses a speech model to transfer speech to text, then uses a language model to get the results of downstream tasks; (2) One-stage method, which just fine-tunes a pre-trained speech model to fit in the downstream tasks. The first method loses emotional cues such as intonation, and causes recognition errors during ASR process, and the second one lacks necessary language knowledge. In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model and the language model for SLU tasks. WaBERT is based on the pre-trained speech and language model, hence training from scratch is not needed. We also set most parameters of WaBERT frozen during training. By introducing WaBERT, audio-specific information and language knowledge are integrated in the short-time and low-resource training process to improve results on the dev dataset of SLUE SA tasks by 1.15 recall score and 0.82 Continuous Integrate-and-Fire (CIF) mechanism to achieve the monotonic alignment between the speech and text modalities.

READ FULL TEXT

page 5

page 9

research
06/11/2021

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

In this paper, we explore the use of pre-trained language models to lear...
research
10/29/2022

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

This paper presents BERT-CTC, a novel formulation of end-to-end speech r...
research
07/22/2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) systems that genera...
research
02/22/2021

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high perfo...
research
04/07/2020

Keywords Extraction and Sentiment Analysis using Automatic Speech Recognition

Automatic Speech Recognition (ASR) is the interdisciplinary subfield of ...
research
02/21/2023

Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

In this paper, we explore the application of language and speech technol...
research
04/05/2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Word Error Rate (WER) has been the predominant metric used to evaluate t...

Please sign up or login with your details

Forgot password? Click here to reset