Topic Model Robustness to Automatic Speech Recognition Errors in Podcast Transcripts

09/25/2021
by   Raluca Alexandra Fetic, et al.
0

For a multilingual podcast streaming service, it is critical to be able to deliver relevant content to all users independent of language. Podcast content relevance is conventionally determined using various metadata sources. However, with the increasing quality of speech recognition in many languages, utilizing automatic transcriptions to provide better content recommendations becomes possible. In this work, we explore the robustness of a Latent Dirichlet Allocation topic model when applied to transcripts created by an automatic speech recognition engine. Specifically, we explore how increasing transcription noise influences topics obtained from transcriptions in Danish; a low resource language. First, we observe a baseline of cosine similarity scores between topic embeddings from automatic transcriptions and the descriptions of the podcasts written by the podcast creators. We then observe how the cosine similarities decrease as transcription noise increases and conclude that even when automatic speech recognition transcripts are erroneous, it is still possible to obtain high-quality topic embeddings from the transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2022

Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

The following paper presents a project focused on the research and creat...
research
07/11/2022

Speaker Anonymization with Phonetic Intermediate Representations

In this work, we propose a speaker anonymization pipeline that leverages...
research
10/12/2022

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

Code-switching automatic speech recognition becomes one of the most chal...
research
01/09/2020

Open Challenge for Correcting Errors of Speech Recognition Systems

The paper announces the new long-term challenge for improving the perfor...
research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
09/23/2020

Cosine Similarity of Multimodal Content Vectors for TV Programmes

Multimodal information originates from a variety of sources: audiovisual...
research
11/28/2017

Exploiting Nontrivial Connectivity for Automatic Speech Recognition

Nontrivial connectivity has allowed the training of very deep networks b...

Please sign up or login with your details

Forgot password? Click here to reset