Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

06/16/2021
by   Tan Liu, et al.
0

Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic classification. Experiments conducted on a subset selected from Switchboard corpus show that our proposed framework outperforms the conventional ASR+TTC systems and achieves a 3.13 improvement in ACC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2021

Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

Alongside acoustic information, linguistic features based on speech tran...
research
04/20/2021

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Text encodings from automatic speech recognition (ASR) transcripts and a...
research
03/22/2017

Topic Identification for Speech without ASR

Modern topic identification (topic ID) systems for speech use automatic ...
research
06/16/2022

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

This work aims to automatically evaluate whether the language developmen...
research
12/02/2021

A Mixture of Expert Based Deep Neural Network for Improved ASR

This paper presents a novel deep learning architecture for acoustic mode...
research
04/08/2022

Transducer-based language embedding for spoken language identification

The acoustic and linguistic features are important cues for the spoken l...
research
08/13/2020

LSTM Acoustic Models Learn to Align and Pronounce with Graphemes

Automated speech recognition coverage of the world's languages continues...

Please sign up or login with your details

Forgot password? Click here to reset