Understanding Semantics from Speech Through Pre-training

09/24/2019
by   Pengwei Wang, et al.
0

End-to-end Spoken Language Understanding (SLU) is proposed to infer the semantic meaning directly from audio features without intermediate text representation. Although the acoustic model component of an end-to-end SLU system can be pre-trained with Automatic Speech Recognition (ASR) targets, the SLU component can only learn semantic features from limited task-specific training data. In this paper, for the first time we propose to do large-scale unsupervised pre-training for the SLU component of an end-to-end SLU system, so that the SLU component may preserve semantic features from massive unlabeled audio data. As the output of the acoustic model component, i.e. phoneme posterior sequences, has much different characteristic from text sequences, we propose a novel pre-training model called BERT-PLM, which stands for Bidirectional Encoder Representations from Transformers through Permutation Language Modeling. BERT-PLM trains the SLU component on unlabeled data through a regression objective equivalent to the partial permutation language modeling objective, while leverages full bi-directional context information with BERT networks. The experiment results show that our approach out-perform the state-of-the-art end-to-end systems with over 12.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding

Language model pre-training has shown promising results in various downs...
research
01/17/2021

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

End-to-end models have achieved impressive results on the task of automa...
research
10/29/2022

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

This paper presents BERT-CTC, a novel formulation of end-to-end speech r...
research
05/25/2020

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

In a spoken multiple-choice question answering (SMCQA) task, given a pas...
research
12/18/2022

BEATs: Audio Pre-Training with Acoustic Tokenizers

The massive growth of self-supervised learning (SSL) has been witnessed ...
research
08/13/2020

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

End-to-end Spoken Language Understanding (SLU) models are made increasin...
research
02/13/2022

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Encrypted traffic classification requires discriminative and robust traf...

Please sign up or login with your details

Forgot password? Click here to reset