End2End Acoustic to Semantic Transduction

02/01/2021
by   Valentin Pelloin, et al.
0

In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture capable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system reaches a 13.6 concept error rate (CER) and an 18.5 concept value error rate (CVER) on the French MEDIA corpus, achieving an absolute 2.8 points reduction compared to the state-of-the-art. Then, an original model is proposed for hypothesizing concepts and their values. This transduction reaches a 15.4 CER and a 21.6 CVER without any new type of context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2020

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

Spoken language understanding is typically based on pipeline architectur...
research
05/20/2020

Early Stage LM Integration Using Local and Global Log-Linear Combination

Sequence-to-sequence models with an implicit alignment mechanism (e.g. a...
research
02/14/2020

A Data Efficient End-To-End Spoken Language Understanding Architecture

End-to-end architectures have been recently proposed for spoken language...
research
05/26/2017

ASR error management for improving spoken language understanding

This paper addresses the problem of automatic speech recognition (ASR) e...
research
02/14/2020

Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

This work investigates the embeddings for representing dialog history in...

Please sign up or login with your details

Forgot password? Click here to reset