A Recorded Debating Dataset

09/19/2017
by   Shachar Mirkin, et al.
0

This paper describes an audio and textual dataset of debating speeches, a first-of-a-kind resource for the growing research field of computational argumentation and debating technologies. We detail the process of speech recording by professional debaters, the transcription of the speeches with an Automatic Speech Recognition (ASR) system, their consequent automatic processing to produce a text that is more "NLP-friendly", and in parallel -- the manual transcription of the speeches in order to produce gold-standard "reference" transcripts. We release speeches on various controversial topics, each in 5 formats corresponding to the different stages in the production of the data. The intention is to allow utilizing this resource for multiple research purposes, be it the addition of in-domain training data for a debate-specific ASR system, or applying argumentation mining on either noisy or clean debate transcripts. We intend to make further releases of this data in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages

Automatic Speech Recognition (ASR) has increasing utility in the modern ...
research
01/30/2020

BUT Opensat 2019 Speech Recognition System

The paper describes the BUT Automatic Speech Recognition (ASR) systems s...
research
06/01/2023

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication

This paper introduces a multilingual automatic speech recognizer (ASR) f...
research
04/08/2020

The Spotify Podcasts Dataset

Podcasts are a relatively new form of audio media. Episodes appear on a ...
research
03/06/2022

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Automatic speech recognition (ASR) on low resource languages improves t...
research
05/06/2017

A Generative Model of a Pronunciation Lexicon for Hindi

Voice browser applications in Text-to- Speech (TTS) and Automatic Speech...

Please sign up or login with your details

Forgot password? Click here to reset