Sound Dr – A database of Respiratory Sound and Baseline System for COVID-19 Detection

by   Hoang Van Truong, et al.

As the COVID-19 pandemic significantly affect every aspects of human life, it is urgent to provide a data for further researches. We, therefore, introduce a dataset named Sound Dr Database which provides not only quality sounds of coughing and breathing but also metadata for relevant-respiratory illness or diseases. Creating proof-of-concept systems is effective for the detection of abnormalities in the respiratory sounds of patients. These solutions will serve as effective tools to assist physicians in diagnosing respiratory disorders.


page 2

page 3

page 4


Coswara – A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

The COVID-19 pandemic presents global challenges transcending boundaries...

Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

The COVID-19 pandemic has accelerated research on design of alternative,...

Dr-COVID: Graph Neural Networks for SARS-CoV-2 Drug Repurposing

The 2019 novel coronavirus (SARS-CoV-2) pandemic has resulted in more th...

OtoMechanic: Auditory Automobile Diagnostics via Query-by-Example

Early detection and repair of failing components in automobiles reduces ...

Prototype Learning for Interpretable Respiratory Sound Analysis

Remote screening of respiratory diseases has been widely studied as a no...

Recent Advances in Computer Audition for Diagnosing COVID-19: An Overview

Computer audition (CA) has been demonstrated to be efficient in healthca...

I Introduction

Many articles have proven that it is effective to detect diseases through respiratory sounds such as [1, 2, 3, 4]. There is an abnormality in the respiratory sound of people with fever, asthma, tuberculosis, pneumonia, COVID-19, … compared to the sound of a person without the disease.

The demand for remote diagnostics to examination and treatment has increased rapidly and is necessary. Especially when COVID-19 is widespread, it is essential that methods of COVID-19 detection. Breathing or coughing test, which takes only 1 to 3 seconds, is faster and more reliable than conventional temperature measurement. System and data can be periodically updated, whereby accuracy and reliability can be improved. Since the outbreak of the epidemic, the need for quick and economical health check method increased. Locations where entry and exit are allowed, such as Airport or seaport, may apply breathing or coughing test. Locations, which has employees and customers such as companies, factories, supermarkets, may also apply.

At first, the world has two major research topics on the respiratory sound of New York [5] and Cambridge [6] Universities. These respiratory sounds are not public completely. In addition to these two databases, there are a few datasets that are publicized to the community such as Coswara [7], Coughvid [8].

Project Coswara by the Indian Institute of Science (IISc) Bangalore is an attempt to build a diagnostic tool for COVID-19 detection using audio recordings such as breathing, cough, and speech sounds of an individual. The sound samples are collected via worldwide crowdsourcing using a website. The curated dataset is released as open access. Therefore, IISc also organizes two challenges based on Coswara dataset. The DiCOVA Challenge has three aims: release a curated dataset of sound samples (breathing, cough, and speech) drawn from individuals with and without COVID-19 during the time of recording; invite researchers from around the globe to search for acoustic biomarkers in this dataset; evaluate the findings of each group using a blind test set, and present a competitive leaderboard with global participation. The first DiCOVA Challenge was launched on Feb 04, 2021. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification [9]. It focused on COVID-19 detection using only cough sounds. We reported an AUC of 87.04% and finished on top of the Leaderboard [10]. The Second DiCOVA Challenge was launched on Aug 12, 2021. This challenge is an open call for researchers to analyze a dataset of audio recordings consisting of breathing, cough, and speech signals. The challenge features two tracks, one focusing on cough sounds, and the other on using a collection of breath, sustained vowel phonation, and number counting speech recordings [11]. Focusing on cough sounds, we joined on Track 2 and achieved the top-2 ranking with an AUC score of 81.21 on the blind Test set, improving the challenge baseline by 6.32 and showing competitive with the state-of-the-art systems [12]. Based on the experience working with the above dataset, we want to continuously grow. Therefore, we built a system to collect respiratory sound data in the most efficient way. We called it- Sound Dr. Distribution of Country are mostly Vietnamese. A country which COVID-19 spread very fast. In addition, we built a baseline system on the Sound Dr database. This gives us an overview of the initial accuracy of the database as well as the efficiency of the data collection.

The organization of the paper is as follows: Section I contains the related works and literature survey; Section II describes the aspects of the provided dataset; the pre-processing steps and the description containing our baseline system have been explained in Section III; the results and discussions are present under Section III-A; finally, section IV concludes the paper with future scopes and improvements.

Ii Sound Dr Overview

Respiratory sounds might indicate the health status of a person. From the insights of these sounds, the respiratory health of a person can be predicted to be normal or abnormal by machine learning algorithms, especially the health status towards COVID-19. A respiratory sound is a sound produced by your lungs during inhalation or exhalation. This sound can be heard using a stethoscope or while you are breathing. Listening to the respiratory sound is a critical component of the diagnosis procedure for a variety of different disorders. Your doctor will listen to your breathing with a stethoscope. Abnormal respiratory sounds are indicative of a lung or airway issue.

The COVID-19 pandemic period need a lot of good data about the respiratory of humans to research. To meet this demand, Sound Dr Database provides not only quality coughing and breathing sounds but also metadata for researching illness or disease related to respiratory systems.

Ii-a Data collection

To make it convenient for users, we created a web application that can be used on popular personal devices (laptops, mobiles, etc.) to collect data. Metadata is collected when the user fill a form by answering questions about symptoms from the last 14 days, medical conditions, COVID-19 status, smoking or not, and personal information (age, gender). After filling the online form, the user will record three different types: mouth breathing, nose breathing, and coughing. For each type, the user will record 3 times. Each record must be at least 5 seconds, and stop for 2 seconds between each record time. Then these 3 records will be merged and saved as an audio for the subject. Almost, every audio has a duration from 10 to 30 seconds, as shown in Figure 1.

Fig. 1: Duration distribution of audio files in Sound Dr

The dataset is provided by FPT Software Company Limited [13] which consists of data on 1,310 subjects during the peak season of the COVID-19 pandemic in Vietnam. Almost all subjects are from Vietnam. Each subject includes coughing and breathing sounds of three different types: mouth breathing, nose breathing, and coughing, and metadata which is about symptoms from the last 14 days, medical history, current health status, etc. from 1,310 participants. Every sound sample is more than 10 seconds. Our dataset has 3 labels: never, over14, and last14. We group into two main labels: COVID-19 Positive (over14 and last14) with 346 sound samples and COVID-19 Negative (never) with 964 sound samples.

Ii-B Metadata

Respiratory sound data are gathered from males and females of all ages. Figure 2 shows the age distribution of the Sound Dr dataset whose major subjects are young adults. In addition, the ratio between men and women is approximately 1.5 in the dataset according to Figure 3.

Metadata information collected includes the participant’s age, gender, location (country, state/ province). The details are describe in Table I.

Fig. 2: Data Distribution for Age Groups

Fig. 3: Data Distribution for Gender
Categories Fields Details
Demographics sex_choice Gender: Male, Female
age_choice Age
current_city The current city living
Symptoms symptoms_status_choice Symptoms since last 14 days: Fever, Chills, Sore throat, Dry cough, Wet cough, Stuffy nose,
Snivel, Difficulty breathing or feeling short of breath, Tightness in your chest, Headache,
Dizziness, confusion or vertigo, Muscle aches, Loss of taste and smell, None
Medical conditions medical_condition_choice The medical conditions of the subject: Asthma, Cystic fibrosis, COPD/Emphysema,
Pulmonary fibrosis, Other lung disease, Other lung disease, Angina,
Previous stroke or Transient is chaemic attack, Previous heart attack, Valvular heart disease,
Other heart disease, Diabetes, Cancer, Previous organ transplant,
HIV or impaired immune system, Other long-term condition, None
Insomnia Symptoms insomnia_status_choice How often the subject suffers from insomnia: Never, Once in the last 2 weeks,
Once a week., 2 to 3 days a week., 4 days a week or more
Smoking habit smoke_status_choice How often the subject smokes: Never smoked, Ex-smoker,
Current smoker (less than once a day),
Current smoker (1-5 cigarettes per day),
Current smoker (11-20 cigarettes per day),
Current smoker (21+ cigarettes per day),
COVID-19 status cov19_status_choice How long has had a positive test for COVID-19: Never,
In the last 14 days, More than 14 days ago.
f_condition_choice Status with COVID-19 of a subject
TABLE I: Metadata fields of the Sound Dr dataset.

Subjects has COVID-19 status metadata, more specifically, based on this COVID-19 status value: 964 subjects labeled be negative with COVID-19, whereas, 346 subjects labeled be positive. The number of subjects who have had symptoms in the dataset is 432 cases, approximately a half compared with 878 subjects without symptoms.

Fig. 4: The Number of Subjects with symptoms and without symptom

Fig. 5: The Number of Subjects with COVID-19 Status

Iii Baseline System

The audio files are first converted to mono and resampled at a 16 kHz sampling rate. From the samples, the Librosa library was used [14]. Once the audio was loaded with the library, is a waveform, we use it as an input to a pre-train model.

We use TRILL model [15], a pre-train model on AudioSet [16] to output a TRILL-based embedding features [12], which have better result in the Second 2021 DiCOVA Challenge Track-2 [11] and competitive with the state-of-the-art systems.

The output shape from the TRILL pre-train model is

. The following last linear layers have size 512. Next, calculates the mean and the std statics of the representation across the time axis, ending up with a 1024-dimensional vector. Finally, the classification layer uses a 1024-dimensional vector to classify cough audio.

We use stratified k-fold to split our dataset by 5 folds group by labels. Using the cross-validation technique with 5 folds for the train.

The baseline system uses TRILL-based embedding features as input for Back-end classification models. We conducted experiments on the Support Vector Machine, Random Forest, Multilayer Perceptron, ExtraTreesClassifier, LightGBM, XGBClassifier. To obtain the hyper-parameters optimization for machine learning methods, we made use of the Optuna framework 

[17] with Grid Search algorithm. All these models are implemented by using Scikit-Learn toolkit [18]

and XGBoost library 

[19]. However, we only achieved a good score on XGBClassifier. To deal with the issue of the unbalanced dataset, we set the param for XGBClassifier as the rate of two classes. For example, the rate of COVID-19 Negative and COVID-19 Positive is 1.7. The parameters of XGBClassifier are described in Table II.

Models Setting Parameters
XGBClassifier max_depth = 6, learning_rate = 0.07
scale_pos_weight = 1.7, n_estimators = 200
subsample = 1, colsample_bytree = 1
eta = 1, objective = ‘binary:logistic’
eval_metric = ‘auc’
TABLE II: The setting parameters for COVID-19 Detection.

Iii-a Result

Models Setting Parameters
XGBClassifier max_depth = 7, learning_rate = 0.3
scale_pos_weight = 2, n_estimators = 200
subsample = 1, colsample_bytree = 1
nthread = -1, eval_metric = ‘logloss’

The setting parameters for Anomaly Detection.

We experimented with the task of COVID-19 Detection and achieved the score of 88.30 AUC, 74.14 F1, 86.26 Accuracy. In addition, we also experimented Abnormal Detection in respiratory sound by adjusting the label by combining the COVID-19 Positive and Symptomatic status into Abnormal labels. Using XGBClassifier with hyper-parameters is shown in Table III, achieved the AUC score of 82.68 AUC, 70.71 F1, and 79.01 Accuracy. The performance comparison is described in Table IV. This shows that our database needs to explore more for Anomaly Detection in Respiration Sound. It gives us hope that Respiration Sound can resolve many tasks of health. We can build the model based on the Sound Dr database to support the doctor diagnosis disease faster and more accurate.

Task AUC F1 Acc
Symptom Detection 83.22 69.30 80.99
COVID-19 Detection 88.30 74.14 86.26
Abnormal (Symptom + COVID-19) 82.68 70.71 79.01
TABLE IV: The experimented results on Sound Dr database.

Iv Conclusion and Future Work

Respiratory sound data, which can be used to detect patient symptoms, is still limited. Thus, Sound Dr is essential for researchers to build health applications.

Sound Dr is collected with our method that reduce the noise. We also build a system to evaluate this collect way and create the first baseline for other researchers to compare. Base on the conduct of the experimented result, Sound Dr is collected in an effective way. With the Sound Dr database, the researchers can build an Artificial Intelligence model that can help doctors diagnose diseases faster and more accurately.


This work was supported by FPT Software AI Committee, FPT Software Company Limited [13], Hanoi, Vietnam. FPT Software is a global technology and IT services provider headquartered in Vietnam. As the pioneer in digital transformation, the company delivers world-class services in Smart factories, Digital platforms, RPA, AI, IoT, Cloud, AR/VR, BPO, and more.


  • [1] M. Pahar, M. Klopper, B. Reeve, R. Warren, G. Theron, and T. Niesler, “Automatic cough classification for tuberculosis screening in a real-world environment,” Physiological Measurement, vol. 42, no. 10, p. 105014, oct 2021. [Online]. Available:
  • [2] I. Song, “Diagnosis of pneumonia from sounds collected using low cost cell phones,” in

    2015 International Joint Conference on Neural Networks (IJCNN)

    , 2015, pp. 1–8.
  • [3] C. Infante, D. Chamberlain, R. Fletcher, Y. Thorat, and R. Kodgule, “Use of cough sounds for diagnosis and screening of pulmonary disease,” in 2017 IEEE Global Humanitarian Technology Conference (GHTC), 2017, pp. 1–10.
  • [4] P. Sakkatos, A. Barney, A. Bruton, H. M. Haitchi, R. J. Kurukulaaratchy, and D. Thackray, “Quantified breathing patterns can be used as a physiological marker to monitor asthma,” European Respiratory Journal, vol. 54, no. suppl 63, 2019. [Online]. Available:
  • [5] “NYU Breathing Sounds for COVID-19,”, 2020, [Online; accessed 09-Jan-2021].
  • [6] C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo, Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data.   New York, NY, USA: Association for Computing Machinery, 2020, p. 3474–3484. [Online]. Available:
  • [7] N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P. Ghosh, and S. Ganapathy, “Coswara — a database of breathing, cough, and voice sounds for covid-19 diagnosis,” Proc. Interspeech, pp. 4811–4815, 2020.
  • [8] L. Orlandic, T. Teijeiro, and D. Atienza, “The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms,” Springer Science and Business Media LLC, vol. 8, no. 1, 2021. [Online]. Available:
  • [9] A. Muguli, L. Pinto, N. R., N. Sharma, P. Krishnan, P. K. Ghosh, R. Kumar, S. Bhat, S. R. Chetupalli, S. Ganapathy, S. Ramoji, and V. Nanda, “Dicova challenge: Dataset, task, and baseline system for covid-19 diagnosis using acoustics,” 2021.
  • [10] S. K. Mahanta, D. Kaushik, S. Jain, H. V. Truong, and K. Guha, “Covid-19 diagnosis from cough acoustics using convnets and data augmentation,” 2021.
  • [11] N. K. Sharma, S. R. Chetupalli, D. Bhattacharya, D. Dutta, P. Mote, and S. Ganapathy, “The second dicova challenge: Dataset, task, and baseline system for covid-19 diagnosis using acoustics,” arXiv:2110.01177, 2021.
  • [12] H. V. Truong and L. Pham, “A cough-based deep learning framework for detecting covid-19,” 2021.
  • [13] “Fpt softwave company limited,” 1999, [Online; accessed 10-01-2022]. [Online]. Available:
  • [14] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python,” in Proceedings of the 14th python in science conference

    , vol. 8.   Citeseer, 2015, pp. 18–25.

  • [15] J. Shor, A. Jansen, R. Maor, O. Lang, O. Tuval, F. de Chaumont Quitry, M. Tagliasacchi, I. Shavitt, D. Emanuel, and Y. Haviv, “Towards learning a universal non-semantic representation of speech,” ArXiv e-prints, 2020. [Online]. Available:
  • [16] J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in Proc. ICASSP, 2017.
  • [17]

    T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in

    Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  • [18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  • [19] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: