Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

12/04/2022
by   Davide Berghi, et al.
0

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio research primarily offer unimodal content, and when visual data is included, the quality is far from meeting the standard production needs. We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues, two-people conversations, and interactions with considerable movement and occlusion, yielding 30 sequences captured from a total of 22 different points of view and two 16-element microphone arrays. Additionally, we provide voice activity labels, 2D face bounding boxes for each camera view, 2D pose detection keypoints, 3D tracking data of the mouth of the actors, and dialogue transcriptions. We believe the community will benefit from this dataset as it can assist multidisciplinary research. Possible uses of the dataset are discussed.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
05/03/2021

Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction

As audio-visual systems increasingly bring immersive and interactive cap...
research
03/10/2020

PANDA: A Gigapixel-level Human-centric Video Dataset

We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset,...
research
08/07/2020

A Study on Visual Perception of Light Field Content

The effective design of visual computing systems depends heavily on the ...
research
09/13/2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

It is common in everyday spoken communication that we look at the turnin...
research
03/13/2023

The Audio-Visual BatVision Dataset for Research on Sight and Sound

Vision research showed remarkable success in understanding our world, pr...
research
09/23/2020

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Visual voice activity detection (V-VAD) uses visual features to predict ...
research
07/11/2022

Documenting Data Production Processes: A Participatory Approach for Data Work

The opacity of machine learning data is a significant threat to ethical ...

Please sign up or login with your details

Forgot password? Click here to reset