The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

05/14/2018
by   José Lopes, et al.
0

This paper describes the Spot the Difference Corpus which contains 54 interactions between pairs of subjects interacting to find differences in two very similar scenes. The setup used, the participants' metadata and details about collection are described. We are releasing this corpus of task-oriented spontaneous dialogues. This release includes rich transcriptions, annotations, audio and video. We believe that this dataset constitutes a valuable resource to study several dimensions of human communication that go from turn-taking to the study of referring expressions. In our preliminary analyses we have looked at task success (how many differences were found out of the total number of differences) and how it evolves over time. In addition we have looked at scene complexity provided by the RGB components' entropy and how it could relate to speech overlaps, interruptions and the expression of uncertainty. We found there is a tendency that more complex scenes have more competitive interruptions.

READ FULL TEXT

page 6

page 7

research
11/08/2019

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Current research into spoken language translation (SLT) is often hampere...
research
03/24/2022

Lahjoita puhetta – a large-scale corpus of spoken Finnish with some benchmarks

The Donate Speech campaign has so far succeeded in gathering approximate...
research
12/16/2022

Speech Aware Dialog System Technology Challenge (DSTC11)

Most research on task oriented dialog modeling is based on written text ...
research
05/19/2023

MD3: The Multi-Dialect Dataset of Dialogues

We introduce a new dataset of conversational speech representing English...
research
05/24/2022

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in G...
research
08/23/2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

We proposed Audio Difference Captioning (ADC) as a new extension task of...
research
06/10/2020

Trust-UBA: A Corpus for the Study of the Manifestation of Trust in Speech

This paper describes a novel protocol for collecting speech data from su...

Please sign up or login with your details

Forgot password? Click here to reset