End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

11/16/2020
by   Edmilson Morais, et al.
0

Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2022

Speech Emotion Recognition using Self-Supervised Features

Self-supervised pre-trained features have consistently delivered state-o...
research
01/22/2022

Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals

In this work, we propose a bi-directional long short-term memory (BiLSTM...
research
10/22/2020

Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models

Deep learning based speech denoising still suffers from the challenge of...
research
07/11/2023

PIGEON: Predicting Image Geolocations

We introduce PIGEON, a multi-task end-to-end system for planet-scale ima...
research
02/14/2020

A Data Efficient End-To-End Spoken Language Understanding Architecture

End-to-end architectures have been recently proposed for spoken language...
research
04/15/2021

Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks...
research
05/19/2023

North Sámi Dialect Identification with Self-supervised Speech Models

The North Sámi (NS) language encapsulates four primary dialectal variant...

Please sign up or login with your details

Forgot password? Click here to reset