The Casual Conversations v2 Dataset

03/08/2023
by   Bilal Porgali, et al.
0

This paper introduces a new large consent-driven dataset aimed at assisting in the evaluation of algorithmic bias and robustness of computer vision and audio speech models in regards to 11 attributes that are self-provided or labeled by trained annotators. The dataset includes 26,467 videos of 5,567 unique paid participants, with an average of almost 5 videos per person, recorded in Brazil, India, Indonesia, Mexico, Vietnam, Philippines, and the USA, representing diverse demographic characteristics. The participants agreed for their data to be used in assessing fairness of AI models and provided self-reported age, gender, language/dialect, disability status, physical adornments, physical attributes and geo-location information, while trained annotators labeled apparent skin tone using the Fitzpatrick Skin Type and Monk Skin Tone scales, and voice timbre. Annotators also labeled for different recording setups and per-second activity annotations.

READ FULL TEXT

page 1

page 8

research
04/06/2021

Towards measuring fairness in AI: the Casual Conversations dataset

This paper introduces a novel dataset to help researchers evaluate their...
research
11/18/2021

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

It is well known that many machine learning systems demonstrate bias tow...
research
10/19/2019

The Deepfake Detection Challenge (DFDC) Preview Dataset

In this paper, we introduce a preview of the Deepfakes Detection Challen...
research
05/22/2023

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

This paper presents the Coswara dataset, a dataset containing diverse se...
research
05/25/2022

Empathic Conversations: A Multi-level Dataset of Contextualized Conversations

Empathy is a cognitive and emotional reaction to an observed situation o...
research
12/28/2020

Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset

This paper introduces UDIVA, a new non-acted dataset of face-to-face dya...
research
04/08/2018

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

First-person vision is gaining interest as it offers a unique viewpoint ...

Please sign up or login with your details

Forgot password? Click here to reset