Many articles have proven that it is effective to detect diseases through respiratory sounds such as [1, 2, 3, 4]. There is an abnormality in the respiratory sound of people with fever, asthma, tuberculosis, pneumonia, COVID-19, … compared to the sound of a person without the disease.
The demand for remote diagnostics to examination and treatment has increased rapidly and is necessary. Especially when COVID-19 is widespread, it is essential that methods of COVID-19 detection. Breathing or coughing test, which takes only 1 to 3 seconds, is faster and more reliable than conventional temperature measurement. System and data can be periodically updated, whereby accuracy and reliability can be improved. Since the outbreak of the epidemic, the need for quick and economical health check method increased. Locations where entry and exit are allowed, such as Airport or seaport, may apply breathing or coughing test. Locations, which has employees and customers such as companies, factories, supermarkets, may also apply.
At first, the world has two major research topics on the respiratory sound of New York  and Cambridge  Universities. These respiratory sounds are not public completely. In addition to these two databases, there are a few datasets that are publicized to the community such as Coswara , Coughvid .
Project Coswara by the Indian Institute of Science (IISc) Bangalore is an attempt to build a diagnostic tool for COVID-19 detection using audio recordings such as breathing, cough, and speech sounds of an individual. The sound samples are collected via worldwide crowdsourcing using a website. The curated dataset is released as open access. Therefore, IISc also organizes two challenges based on Coswara dataset. The DiCOVA Challenge has three aims: release a curated dataset of sound samples (breathing, cough, and speech) drawn from individuals with and without COVID-19 during the time of recording; invite researchers from around the globe to search for acoustic biomarkers in this dataset; evaluate the findings of each group using a blind test set, and present a competitive leaderboard with global participation. The first DiCOVA Challenge was launched on Feb 04, 2021. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification . It focused on COVID-19 detection using only cough sounds. We reported an AUC of 87.04% and finished on top of the Leaderboard . The Second DiCOVA Challenge was launched on Aug 12, 2021. This challenge is an open call for researchers to analyze a dataset of audio recordings consisting of breathing, cough, and speech signals. The challenge features two tracks, one focusing on cough sounds, and the other on using a collection of breath, sustained vowel phonation, and number counting speech recordings . Focusing on cough sounds, we joined on Track 2 and achieved the top-2 ranking with an AUC score of 81.21 on the blind Test set, improving the challenge baseline by 6.32 and showing competitive with the state-of-the-art systems . Based on the experience working with the above dataset, we want to continuously grow. Therefore, we built a system to collect respiratory sound data in the most efficient way. We called it- Sound Dr. Distribution of Country are mostly Vietnamese. A country which COVID-19 spread very fast. In addition, we built a baseline system on the Sound Dr database. This gives us an overview of the initial accuracy of the database as well as the efficiency of the data collection.
The organization of the paper is as follows: Section I contains the related works and literature survey; Section II describes the aspects of the provided dataset; the pre-processing steps and the description containing our baseline system have been explained in Section III; the results and discussions are present under Section III-A; finally, section IV concludes the paper with future scopes and improvements.
Ii Sound Dr Overview
Respiratory sounds might indicate the health status of a person. From the insights of these sounds, the respiratory health of a person can be predicted to be normal or abnormal by machine learning algorithms, especially the health status towards COVID-19. A respiratory sound is a sound produced by your lungs during inhalation or exhalation. This sound can be heard using a stethoscope or while you are breathing. Listening to the respiratory sound is a critical component of the diagnosis procedure for a variety of different disorders. Your doctor will listen to your breathing with a stethoscope. Abnormal respiratory sounds are indicative of a lung or airway issue.
The COVID-19 pandemic period need a lot of good data about the respiratory of humans to research. To meet this demand, Sound Dr Database provides not only quality coughing and breathing sounds but also metadata for researching illness or disease related to respiratory systems.
Ii-a Data collection
To make it convenient for users, we created a web application that can be used on popular personal devices (laptops, mobiles, etc.) to collect data. Metadata is collected when the user fill a form by answering questions about symptoms from the last 14 days, medical conditions, COVID-19 status, smoking or not, and personal information (age, gender). After filling the online form, the user will record three different types: mouth breathing, nose breathing, and coughing. For each type, the user will record 3 times. Each record must be at least 5 seconds, and stop for 2 seconds between each record time. Then these 3 records will be merged and saved as an audio for the subject. Almost, every audio has a duration from 10 to 30 seconds, as shown in Figure 1.
The dataset is provided by FPT Software Company Limited  which consists of data on 1,310 subjects during the peak season of the COVID-19 pandemic in Vietnam. Almost all subjects are from Vietnam. Each subject includes coughing and breathing sounds of three different types: mouth breathing, nose breathing, and coughing, and metadata which is about symptoms from the last 14 days, medical history, current health status, etc. from 1,310 participants. Every sound sample is more than 10 seconds. Our dataset has 3 labels: never, over14, and last14. We group into two main labels: COVID-19 Positive (over14 and last14) with 346 sound samples and COVID-19 Negative (never) with 964 sound samples.
Respiratory sound data are gathered from males and females of all ages. Figure 2 shows the age distribution of the Sound Dr dataset whose major subjects are young adults. In addition, the ratio between men and women is approximately 1.5 in the dataset according to Figure 3.
Metadata information collected includes the participant’s age, gender, location (country, state/ province). The details are describe in Table I.
|Demographics||sex_choice||Gender: Male, Female|
|current_city||The current city living|
|Symptoms||symptoms_status_choice||Symptoms since last 14 days: Fever, Chills, Sore throat, Dry cough, Wet cough, Stuffy nose,|
|Snivel, Difficulty breathing or feeling short of breath, Tightness in your chest, Headache,|
|Dizziness, confusion or vertigo, Muscle aches, Loss of taste and smell, None|
|Medical conditions||medical_condition_choice||The medical conditions of the subject: Asthma, Cystic fibrosis, COPD/Emphysema,|
|Pulmonary fibrosis, Other lung disease, Other lung disease, Angina,|
|Previous stroke or Transient is chaemic attack, Previous heart attack, Valvular heart disease,|
|Other heart disease, Diabetes, Cancer, Previous organ transplant,|
|HIV or impaired immune system, Other long-term condition, None|
|Insomnia Symptoms||insomnia_status_choice||How often the subject suffers from insomnia: Never, Once in the last 2 weeks,|
|Once a week., 2 to 3 days a week., 4 days a week or more|
|Smoking habit||smoke_status_choice||How often the subject smokes: Never smoked, Ex-smoker,|
|Current smoker (less than once a day),|
|Current smoker (1-5 cigarettes per day),|
|Current smoker (11-20 cigarettes per day),|
|Current smoker (21+ cigarettes per day),|
|COVID-19 status||cov19_status_choice||How long has had a positive test for COVID-19: Never,|
|In the last 14 days, More than 14 days ago.|
|f_condition_choice||Status with COVID-19 of a subject|
Subjects has COVID-19 status metadata, more specifically, based on this COVID-19 status value: 964 subjects labeled be negative with COVID-19, whereas, 346 subjects labeled be positive. The number of subjects who have had symptoms in the dataset is 432 cases, approximately a half compared with 878 subjects without symptoms.
Iii Baseline System
The audio files are first converted to mono and resampled at a 16 kHz sampling rate. From the samples, the Librosa library was used . Once the audio was loaded with the library, is a waveform, we use it as an input to a pre-train model.
We use TRILL model , a pre-train model on AudioSet  to output a TRILL-based embedding features , which have better result in the Second 2021 DiCOVA Challenge Track-2  and competitive with the state-of-the-art systems.
The output shape from the TRILL pre-train model is
. The following last linear layers have size 512. Next, calculates the mean and the std statics of the representation across the time axis, ending up with a 1024-dimensional vector. Finally, the classification layer uses a 1024-dimensional vector to classify cough audio.
We use stratified k-fold to split our dataset by 5 folds group by labels. Using the cross-validation technique with 5 folds for the train.
The baseline system uses TRILL-based embedding features as input for Back-end classification models. We conducted experiments on the Support Vector Machine, Random Forest, Multilayer Perceptron, ExtraTreesClassifier, LightGBM, XGBClassifier. To obtain the hyper-parameters optimization for machine learning methods, we made use of the Optuna framework with Grid Search algorithm. All these models are implemented by using Scikit-Learn toolkit 
and XGBoost library. However, we only achieved a good score on XGBClassifier. To deal with the issue of the unbalanced dataset, we set the param for XGBClassifier as the rate of two classes. For example, the rate of COVID-19 Negative and COVID-19 Positive is 1.7. The parameters of XGBClassifier are described in Table II.
|XGBClassifier||max_depth = 6, learning_rate = 0.07|
|scale_pos_weight = 1.7, n_estimators = 200|
|subsample = 1, colsample_bytree = 1|
|eta = 1, objective = ‘binary:logistic’|
|eval_metric = ‘auc’|
|XGBClassifier||max_depth = 7, learning_rate = 0.3|
|scale_pos_weight = 2, n_estimators = 200|
|subsample = 1, colsample_bytree = 1|
|nthread = -1, eval_metric = ‘logloss’|
The setting parameters for Anomaly Detection.
We experimented with the task of COVID-19 Detection and achieved the score of 88.30 AUC, 74.14 F1, 86.26 Accuracy. In addition, we also experimented Abnormal Detection in respiratory sound by adjusting the label by combining the COVID-19 Positive and Symptomatic status into Abnormal labels. Using XGBClassifier with hyper-parameters is shown in Table III, achieved the AUC score of 82.68 AUC, 70.71 F1, and 79.01 Accuracy. The performance comparison is described in Table IV. This shows that our database needs to explore more for Anomaly Detection in Respiration Sound. It gives us hope that Respiration Sound can resolve many tasks of health. We can build the model based on the Sound Dr database to support the doctor diagnosis disease faster and more accurate.
|Abnormal (Symptom + COVID-19)||82.68||70.71||79.01|
Iv Conclusion and Future Work
Respiratory sound data, which can be used to detect patient symptoms, is still limited. Thus, Sound Dr is essential for researchers to build health applications.
Sound Dr is collected with our method that reduce the noise. We also build a system to evaluate this collect way and create the first baseline for other researchers to compare. Base on the conduct of the experimented result, Sound Dr is collected in an effective way. With the Sound Dr database, the researchers can build an Artificial Intelligence model that can help doctors diagnose diseases faster and more accurately.
This work was supported by FPT Software AI Committee, FPT Software Company Limited , Hanoi, Vietnam. FPT Software is a global technology and IT services provider headquartered in Vietnam. As the pioneer in digital transformation, the company delivers world-class services in Smart factories, Digital platforms, RPA, AI, IoT, Cloud, AR/VR, BPO, and more.
-  M. Pahar, M. Klopper, B. Reeve, R. Warren, G. Theron, and T. Niesler, “Automatic cough classification for tuberculosis screening in a real-world environment,” Physiological Measurement, vol. 42, no. 10, p. 105014, oct 2021. [Online]. Available: https://doi.org/10.1088/1361-6579/ac2fb8
I. Song, “Diagnosis of pneumonia from sounds collected using low cost cell
2015 International Joint Conference on Neural Networks (IJCNN), 2015, pp. 1–8.
-  C. Infante, D. Chamberlain, R. Fletcher, Y. Thorat, and R. Kodgule, “Use of cough sounds for diagnosis and screening of pulmonary disease,” in 2017 IEEE Global Humanitarian Technology Conference (GHTC), 2017, pp. 1–10.
-  P. Sakkatos, A. Barney, A. Bruton, H. M. Haitchi, R. J. Kurukulaaratchy, and D. Thackray, “Quantified breathing patterns can be used as a physiological marker to monitor asthma,” European Respiratory Journal, vol. 54, no. suppl 63, 2019. [Online]. Available: https://erj.ersjournals.com/content/54/suppl_63/PA5038
-  “NYU Breathing Sounds for COVID-19,” https://breatheforscience.com/, 2020, [Online; accessed 09-Jan-2021].
-  C. Brown, J. Chauhan, A. Grammenos, J. Han, A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta, and C. Mascolo, Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data. New York, NY, USA: Association for Computing Machinery, 2020, p. 3474–3484. [Online]. Available: https://doi.org/10.1145/3394486.3412865
-  N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, P. Ghosh, and S. Ganapathy, “Coswara — a database of breathing, cough, and voice sounds for covid-19 diagnosis,” Proc. Interspeech, pp. 4811–4815, 2020.
-  L. Orlandic, T. Teijeiro, and D. Atienza, “The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms,” Springer Science and Business Media LLC, vol. 8, no. 1, 2021. [Online]. Available: https://doi.org/10.1038/s41597-021-00937-4
-  A. Muguli, L. Pinto, N. R., N. Sharma, P. Krishnan, P. K. Ghosh, R. Kumar, S. Bhat, S. R. Chetupalli, S. Ganapathy, S. Ramoji, and V. Nanda, “Dicova challenge: Dataset, task, and baseline system for covid-19 diagnosis using acoustics,” 2021.
-  S. K. Mahanta, D. Kaushik, S. Jain, H. V. Truong, and K. Guha, “Covid-19 diagnosis from cough acoustics using convnets and data augmentation,” 2021.
-  N. K. Sharma, S. R. Chetupalli, D. Bhattacharya, D. Dutta, P. Mote, and S. Ganapathy, “The second dicova challenge: Dataset, task, and baseline system for covid-19 diagnosis using acoustics,” arXiv:2110.01177, 2021.
-  H. V. Truong and L. Pham, “A cough-based deep learning framework for detecting covid-19,” 2021.
-  “Fpt softwave company limited,” 1999, [Online; accessed 10-01-2022]. [Online]. Available: https://www.fpt-software.com
B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and
O. Nieto, “librosa: Audio and music signal analysis in python,” in
Proceedings of the 14th python in science conference
, vol. 8. Citeseer, 2015, pp. 18–25.
-  J. Shor, A. Jansen, R. Maor, O. Lang, O. Tuval, F. de Chaumont Quitry, M. Tagliasacchi, I. Shavitt, D. Emanuel, and Y. Haviv, “Towards learning a universal non-semantic representation of speech,” ArXiv e-prints, 2020. [Online]. Available: https://arxiv.org/abs/2002.12764
-  J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in Proc. ICASSP, 2017.
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” inProceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
-  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
-  T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: https://doi.org/10.1145/2939672.2939785