FANS: Fusing ASR and NLU for on-device SLU

10/31/2021
by   Martin Radfar, et al.
0

Spoken language understanding (SLU) systems translate voice input commands to semantics which are encoded as an intent and pairs of slot tags and values. Most current SLU systems deploy a cascade of two neural models where the first one maps the input audio to a transcript (ASR) and the second predicts the intent and slots from the transcript (NLU). In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription. FANS consists of a shared audio encoder and three decoders, two of which are seq-to-seq decoders that predict non null slot tags and slot values in parallel and in an auto-regressive manner. FANS neural encoder and decoders architectures are flexible which allows us to leverage different combinations of LSTM, self-attention, and attenders. Our experiments show compared to the state-of-the-art end-to-end SLU models, FANS reduces ICER and IRER errors relatively by 30 dataset and by 0.86

READ FULL TEXT
research
04/01/2022

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

End-to-end Spoken Language Understanding (E2E SLU) has attracted increas...
research
07/13/2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

We study speech intent classification and slot filling (SICSF) by propos...
research
03/19/2023

CTRAN: CNN-Transformer-based Network for Natural Language Understanding

Intent-detection and slot-filling are the two main tasks in natural lang...
research
08/12/2020

End-to-End Neural Transformer Based Spoken Language Understanding

Spoken language understanding (SLU) refers to the process of inferring t...
research
06/08/2021

Sequential End-to-End Intent and Slot Label Classification and Localization

Human-computer interaction (HCI) is significantly impacted by delayed re...
research
10/21/2022

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Accurate prediction of the user intent to interact with a voice assistan...
research
11/03/2020

Sound Natural: Content Rephrasing in Dialog Systems

We introduce a new task of rephrasing for a more natural virtual assista...

Please sign up or login with your details

Forgot password? Click here to reset