COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

More than two years after its outbreak, the COVID-19 pandemic continues to plague medical systems around the world, putting a strain on scarce resources, and claiming human lives. From the very beginning, various AI-based COVID-19 detection and monitoring tools have been pursued in an attempt to stem the tide of infections through timely diagnosis. In particular, computer audition has been suggested as a non-invasive, cost-efficient, and eco-friendly alternative for detecting COVID-19 infections through vocal sounds. However, like all AI methods, also computer audition is heavily dependent on the quantity and quality of available data, and large-scale COVID-19 sound datasets are difficult to acquire – amongst other reasons – due to the sensitive nature of such data. To that end, we introduce the COVYT dataset – a novel COVID-19 dataset collected from public sources containing more than 8 hours of speech from 65 speakers. As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers. We analyse the acoustic manifestation of COVID-19 on the basis of these perfectly speaker characteristic balanced `in-the-wild' data using interpretable audio descriptors, and investigate several classification scenarios that shed light into proper partitioning strategies for a fair speech-based COVID-19 detection.

READ FULL TEXT
research
10/06/2020

Pay Attention to the cough: Early Diagnosis of COVID-19 using Interpretable Symptoms Embeddings with Cough Sound Signal Processing

COVID-19 (coronavirus disease 2019) pandemic caused by SARS-CoV-2 has le...
research
11/09/2020

COVID-19 Patient Detection from Telephone Quality Speech Data

In this paper, we try to investigate the presence of cues about the COVI...
research
07/30/2021

Evaluating the COVID-19 Identification ResNet (CIdeR) on the INTERSPEECH COVID-19 from Audio Challenges

We report on cross-running the recent COVID-19 Identification ResNet (CI...
research
11/26/2020

Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough

Rapid and affordable methods of testing for COVID-19 infections are esse...
research
01/12/2022

Sound Dr – A database of Respiratory Sound and Baseline System for COVID-19 Detection

As the COVID-19 pandemic significantly affect every aspects of human lif...
research
04/14/2021

Audio feature ranking for sound-based COVID-19 patient detection

Audio classification using breath and cough samples has recently emerged...
research
10/21/2020

Detection of COVID-19 through the analysis of vocal fold oscillations

Phonation, or the vibration of the vocal folds, is the primary source of...

Please sign up or login with your details

Forgot password? Click here to reset