Filler Word Detection and Classification: A Dataset and Benchmark

03/28/2022
by   Ge Zhu, et al.
0

Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem. A key reason is the absence of a dataset with annotated filler words for training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a detailed ablation study. In particular, we evaluate the importance of using ASR and how it compares to a transcription-free approach resembling keyword spotting. We show that our pipeline obtains state-of-the-art results, and that leveraging ASR strongly outperforms a keyword spotting approach. We make PodcastFillers publicly available, and hope our work serves as a benchmark for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

The Norwegian Parliamentary Speech Corpus

The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset wit...
research
08/15/2023

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Conventional keyword search systems operate on automatic speech recognit...
research
07/23/2018

Zero-shot keyword spotting for visual speech recognition in-the-wild

Visual keyword spotting (KWS) is the problem of estimating whether a tex...
research
09/18/2023

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

With the development of deep learning, automatic speech recognition (ASR...
research
09/08/2022

Goodness of Pronunciation Pipelines for OOV Problem

In the following report we propose pipelines for Goodness of Pronunciati...
research
03/06/2020

NYTWIT: A Dataset of Novel Words in the New York Times

We present the New York Times Word Innovation Types dataset, or NYTWIT, ...
research
05/17/2020

Wake Word Detection with Alignment-Free Lattice-Free MMI

Always-on spoken language interfaces, e.g. personal digital assistants, ...

Please sign up or login with your details

Forgot password? Click here to reset