M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding

04/13/2019
by   Titouan Parcollet, et al.
0

Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neural networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations, it is common to hold both a human, manual (TRS) and an automatically transcribed (ASR) versions of the conversations. Nonetheless, and due to production constraints, only the ASR transcripts are considered to build automatic classifiers. TRS transcripts are only used to measure the performances of ASR systems. Moreover, the recent performances in term of classification accuracy, obtained by DNN related systems are close to the performances reached by humans, and it becomes difficult to further increase the performances by only considering the ASR transcripts. This paper proposes to distillates the TRS knowledge available during the training phase within the ASR representation, by using a new generative adversarial network called M2H-GAN to generate a TRS-like version of an ASR document, to improve the theme identification performances.

READ FULL TEXT
research
11/29/2021

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

As Automatic Speech Processing (ASR) systems are getting better, there i...
research
03/22/2017

Topic Identification for Speech without ASR

Modern topic identification (topic ID) systems for speech use automatic ...
research
05/02/2022

Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

Spoken language understanding (SLU) is an essential task for machines to...
research
10/07/2020

WER we are and WER we think we are

Natural language processing of conversational speech requires the availa...
research
07/17/2020

Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations

Summaries generated from medical conversations can improve recall and un...
research
10/22/2018

Investigation of Independent Monaural Front-End Processing for Robust ASR without Retraining and Joint-Training

In recent years, monaural speech separation has been formulated as a sup...
research
05/13/2017

Annotating and Modeling Empathy in Spoken Conversations

Empathy, as defined in behavioral sciences, expresses the ability of hum...

Please sign up or login with your details

Forgot password? Click here to reset