Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

07/21/2023
by   Dejan Porjazovski, et al.
0

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.

READ FULL TEXT

page 1

page 2

research
07/20/2022

When Is TTS Augmentation Through a Pivot Language Useful?

Developing Automatic Speech Recognition (ASR) for low-resource languages...
research
06/01/2022

Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages

Automatic Speech Recognition (ASR) has increasing utility in the modern ...
research
03/22/2017

Topic Identification for Speech without ASR

Modern topic identification (topic ID) systems for speech use automatic ...
research
04/05/2022

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Unpaired data has shown to be beneficial for low-resource automatic spee...
research
05/27/2023

Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

Recent studies have proposed the use of Text-To-Speech (TTS) systems to ...
research
12/09/2021

Are E2E ASR models ready for an industrial usage?

The Automated Speech Recognition (ASR) community experiences a major tur...
research
04/08/2019

Adversarial Audio: A New Information Hiding Method and Backdoor for DNN-based Speech Recognition Models

Audio is an important medium in people's daily life, hidden information ...

Please sign up or login with your details

Forgot password? Click here to reset