To date, SARS-CoV-2 has infected several millions and killed hundreds of thousands around the globe. Due to its long incubation time, fast, accurate and reliable techniques for early disease diagnosis are key in successfully fighting the spread [li2020early]. The standard genetic test, reverse transcription polymerase chain reaction (RT-PCR), is characterized by high reliability (at least in most countries) but a relatively long processing time (more than an hour). Alternatively, fast serology tests are in early stages of development, and are based on antibodies that the immune system only produces in an advanced stage of the disease. It is of great concern, however, that several publications have reported the problem of false negatives thrown by molecular genetics and immunological tests [Imm1, RNA1, RNA2, kanne2020essentials]. Overall, global containment efforts suffer from bottlenecks in diagnosis due to partially unreliable tests and lacking availability of the necessary testing equipment; a situation exacerbated by often asymptomatic, yet infected patients that are not properly managed due to the lack of precision around the global process [Imm1].
In this context, biomedical imaging techniques have great potential to complement the conventional diagnostic techniques of COVID-19 such as molecular biology (RT-PCR) and immune (IGM/G) assays. Specifically, imaging can be a viable tool in the detection process by providing a fast assessment of patients in order to guide the selection of subsequent molecular and immunological tests. Indeed, it was reported in two studies that CT scans can detect COVID-19 at higher sensitivity rate (98, respectively 88%) compared to RT-PCR (71% and 59%) in cohorts of 51 [fang2020sensitivity] and 1014 patients [CThigh]. Note that sensitivity of RT-PCR varies heavily across countries, ranging from 65% [kanne2020essentials] to 96% [mossa2020radiology]. While CT scans are the gold standard for pneumonia detection [bourcier2014performance], X-ray scans are still the first line examination, despite their low specificity and sensitivity [niederman2001guidelines, weinstock2020chest], whereas ultrasound (US) has only received growing attention in the last few years [gehmacher1995ultrasound] and achieved promising results [gazon2011agreement]. In this contribution, we emphatically make the case for a more prominent role of the latter, based on clear evidence for the diagnostic value of lung ultrasound (US), which is provided in much detail in a comparison of CT, X-ray and US in section 2.
Lung point-of-care ultrasound (POCUS).
Note first that lung ultrasound is already an established method for monitoring pneumonia and related lung diseases [gehmacher1995ultrasound, pagano2015lung, chavez2014lung]. It has been suggested as the preferred diagnosis technique for lung infections, especially in resource limited settings such as emergency situations or low-income countries [amatya2018diagnostic] and it has started to replace X-ray as first-line examination [bourcier2014performance, gazon2011agreement, lichtenstein2004comparative, bourcier2016lung]. Although literature on the applicability of ultrasound for COVID-19 is still scarce, a growing body of evidence for disease-specific patterns in US has lead to advocacy for an amplified role of US in the research community [buonsenso2020covid, soldati2020there, smith2020point, sofia2020thtoracic].
The strengths of POCUS are numerous and include its simplicity of execution, its cost-effectiveness, its ease of repeatability, its non-invasiveness, its execution without relocation and its ease of disinfection at the bedside. The devices are small and portable and can be wrapped in single-use plastics to reduce the risk of contamination and promoting sterilization procedures. Moreover, they are less expensive than conventional ultrasound machines, facilitating the distribution to hospitals and primary care centers [soldati2020proposal]. The diagnostic routine can be accelerated by connecting the device to a cloud service and uploading the recordings automatically. As a tool that is ubiquitously available even in sparsely equipped medical facilities and that can serve not only for diagnosis but also for monitoring disease evolution on a daily basis, POCUS is the ideal biomedical imaging tool in the current crisis.
The growing expectations of POCUS are perhaps best exemplified by the NIH’s recently launched large-scale initiative on "Point Of Care UltraSonography (POCUS) for risk stratification of COVID-19 patients" (accessible at: https://clinicaltrials.gov/ct2/show/NCT04338100
). However, this initiative is launched without the availability of an automatic tool that can assist in COVID-19 detection and patient stratification. Therefore, there is an evident need for an open-source framework that pools COVID-19 ultrasound scans from worldwide sources and exploits the power of deep learning to develop a system that can complement the work of physicians in a timely manner.
In the last months, a myriad of preprints attempting to use machine learning for biomedical image analysis for COVID-19 has appeared, but to the best of our knowledge they exclusively focus on X-ray or CT (for reviews see[shi2020review, ulhaq2020computer, kalkreuth2020covid, pham2020artificial]) and neither a publication nor a preprint has used ultrasound data for automatic COVID-19 detection. Here, we aim to close this gap with a first approach of training a deep learning model to detect COVID-19 on POCUS images. It is crucial to note that medical doctors must be trained thoroughly to reliably differentiate COVID-19 from pneumonia and that the relevant patterns are hard to discern for the human eye [ng2020imaging]. Therefore, automatic detection is highly relevant as it has been shown to reduce the time doctors invest to make a diagnosis [shan2020lung].
In this work, we propose the first framework for automatized detection of COVID-19 on US images. Our study is in line with others demonstrating that deep learning can be a promising tool to detect COVID-19 from CT [li2020artificial] or X-ray [wang2020covid]. Our contributions can be summarized in the following three steps:
We publish the first dataset of lung POCUS recordings of COVID-19, pneumonia and healthy patients. The collected data is heterogeneous, but was pre-processed manually to remove artifacts and checked by a medical doctor for its quality.
We trained a convolutional neural network (that we dub POCOVID-Net) on the available data and evaluated it in 5-fold cross validation. We report a classification accuracy of 89% and a sensitivity to detect COVID-19 of 96%. The model demonstrates the diagnostic value of the collected data and the applicability of deep learning for US images.
We offer a free web service that first promotes clinical data collection by giving users the possibility to upload data, and secondly provides an interface to our trained model.
2 Related work
To outline the background of the application of biomedical imaging techniques for COVID-19 diagnosis, we first compare the information content of the three main methods, and then report of previous attempts to automatize the detection.
2.1 Biomedical imaging for COVID-19
Three biomedical imaging sources are considered of interest for screening, diagnostics and management of COVID-19:
Although chest X-rays were traditionally heavily used for diagnosing lung conditions, they are not well suited to detect COVID-19 at early stages [chen2020epidemiological], for example [weinstock2020chest] found that 89% of 493 COVID-19 patients had normal or only mildly abnormal X-ray scans. Instead, it is a reliable tool to evidence bilateral multifocal consolidation, partially fused into massive consolidation with small pleural effusions and "white lung”[chinese2020radiological]. However, multiple studies demonstrated the superiority of ultrasound imaging in detecting pneumonia and related lung conditions [bourcier2014performance, bourcier2016lung, reali2014can, claes2017performance].
Computed tomography (CT).
CT presents a more viable technique for early COVID-19 detection and has been the most promoted screening tool so far [bao2020covid, lee2020covid]. Reviews report high detection rates among symptomatic individuals [bao2020covid, salehi2020coronavirus], as CT can unveil air space consolidation, traction bronchiectasis, paving appearance and bronchovascular thickening [wang2020clinical, pan2020time]. Also, (multifocal) ground glass opacities (GGO) were observed especially frequently, often bilateral and with consolidations and prominent peripherally subpleaural distribution [kanne2020chest]. GGO are zones of increased attenuation that usually appear in several interstitial and alveolar processes with conservation of the bronchial and vascular margins [franquet2011imaging]. However, CT involves evident practical downsides such as exposing the patient to excessive radiation, high cost, the availability of sophisticated equipment (only 30,000 machines exist globally [castillo2012industry]), the need for extensive sterilization [fiala2020brief, mossa2020radiology] and patient mobilisation.
Ultrasound can evidence pleural and interstitial thickening, subpleural consolidation and other physiological phenomena linked to changes in lung structure when the infection is in early stages [buonsenso2020covid]. Studies report abnormalities in bilateral B-lines (hydroaeric comet-tail artifacts arising from the pleural line), as well as identifiable lesions in the bilateral lower lobes as the main characteristics to enable COVID-19 detection [huang2020preliminary, peng2020findings]. In Figure 1 an example of B-lines visible in the US image of a COVID-19 patient is shown. In a review, [fiala2020brief] observed great agreement between ultrasound and CT when monitoring COVID-19 patients (especially between B-lines in ultrasound and GGOs in CT [poggiali2020can]) and concluded a high potential for ultrasound in evaluating early lung patients, especially to guide subsequent testing in triage situations. Others reported concordance between US and chest recordings in 7/8 monitored adolescents with COVID-19, while the last patient had a normal radiography despite irregular US [sofia2020thtoracic]. For a review on the timeline of US findings in relation to CT see [fiala2020ultrasound].
2.2 Automatic detection of COVID-19
While to the best of our knowledge no work has been published so far on the detection on ultrasound images, various work exists on automatic inference on CT and X-Ray scans.
Data collection initiatives.
Perhaps the most significant initiative is from Cohen et. al. [cohen2020covid] who started building an open-access database of X-ray (now also CT images) that, to date, contains 150 COVID-19 images: Using deep convolutional neural networks, many have claimed strong performances on the X-ray data of Cohen et al, ranging from 91% up to 98% [luz2020towards, hall2020finding, narin2020automatic, alqudahcovid, zhang2020covid, abbas2020classification, bukhari2020diagnostic]. Besides that, the COVID-Net open source initiative presented the COVIDx dataset of chest radiography (X-ray) data [wang2020covid] that was assembled from [cohen2020covid] and other sources, resulting in 183 COVID-19 images across a total of 13k samples. While the authors report 92 % accuracy [wang2020covid], others reported higher numbers with more refined models [farooq2020covid, afshar2020covid]. Regarding CT imaging, [zhao2020covid] published a database of 275 COVID-19 CT scans and reported an accuracy of 85%. In light of these seemingly convincing performances, it needs to be emphasized that these databases still contain rather limited number of samples with suboptimal quality. Since at the same time the amount of proprietary samples is rising quickly, we encourage all responsible hospitals and decision makers to contribute their data to open-access initiatives.
|Data source||Data selected||Website description|
|Grepmed||10 COVID-19, 9 pneumonia and 2 healthy videos||
|Butterfly||19 COVID-19 and 2 healthy videos||
|ThePocusAtlas||8 COVID-19, 3 pneumonia and 3 healthy videos||The PocusAtlas is a Collaborative Ultrasound Education Platform|
Reports on propriertary data.
Further preprints gathered their CT data independently and reported accuracy between 80% and 90% on 400-2700 slices [wang2020deep, xu2020deep, shi2020large] or even accuracy up to 95% on 2000 or more slices [song2020deep, wu2020jcs, gozes2020rapid]. Apart from simple classification models, also segmentation technique were employed to identify infected areas and infer disease progression state to accelerate inspection time of doctors [shan2020lung, chen2020deep]. Similarly, [li2020artificial] collected a dataset of 4,356 chest CT scans from 3,322 patients and trained a deep convolutional neural network (COVNet) that could differentiate COVID-19 from community-acquired pneumonia and regular scans with a ROC-AUC of 0.96 (sensitivity 90%, specificity 96%). However, to date, the data underlying all these efforts remain unavailable to the public.
3 A lung US dataset for COVID-19 detection
We assembled and pre-processed a dataset of a total of 64 lung POCUS video recordings, divided into 39 videos of COVID-19, 14 videos of (typical bacterial) pneumonia and 11 videos of healthy patients. Note that despite the novelty of the disease, COVID-19 images account for 60% of our dataset. So far, we have restricted ourselves to convex utrasound probes. Linear and convex probes are the most standard ones in medical services, and we are focusing on the latter because more data was available. The linear probe is a higher frequency probe and gives better resolution images, but with less tissue penetration and therefore more superficial images. The convex probe is more suitable for deep organs (abdomen, fetal ultrasound, etc.) or in obese patients. Usually, linear probes are preferred for lung ultrasound, but in practice, most medical facilities are equipped with a curved probe than can be used for everything, which explains why more convex probe images and videos were found online. However, available linear probes were collected as well, such that the model can be trained on both types of images once data availability is increased.
Table 1 gives an overview of our sources, comprising community platforms, open medical repositories, health-tech companies and other scientific literature. Main sources of data were grepmed.com, thepocusatlas.com, butterflynetwork.com, radiopaedia.org, while individual samples were retrieved from everydayultrasound.com, and nephropocus.com amongst others. We provide more details on the dataset in an extensive list in Appendix A: First, the exact source url is given for each single single video / image file. Second, technical details are listed, comprising image resolution, the frame rate and the number of frames for each video after processing. Importantly, all samples of our database were observed by a medical doctor and notes on the visible patterns in each video (e.g. B-Lines or consolidations) were added. They confirm that in all collected videos of COVID-19 and pneumonia disease-specific patterns are visible. However, in order to train a machine learning model, a larger dataset of images instead of videos was required.
Since the 64 videos were taken from various sources, the format and illumination differ significantly. In order to generate a diverse and still sufficiently large data set, images were selected from the videos with a frame rate of 3Hz and a maximum of 30 frames per video. This resulted in an average of 17 frames per video and a total of 1103 images (654 COVID-19, 277 bacterial pneumonia, 172 healthy). To homogenize the dataset, we cropped the images with a quadratic window excluding measure bars and texts visible on the sides or top of the videos. Five videos were manually processed with Adobe After Effect in order to remove measure scales and other artifacts that were overlaying the US recording. Examples of the cropped images are shown in Figure 1. We are however aware that the heterogeneity of the data is still problematic, and we are constantly searching for more data to prevent the model from over fitting on the specific properties of the available recordings.
4 Classification with POCOVID-Net
We propose a convolutional neural network that we name POCOVID-Net
to tackle the present computer vision task. First, we use the convolutional part ofVGG-16 [Simonyan15]
, an established deep convolutional neural network that has been demonstrated to be successful on various image types. It is followed by one hidden layer of 64 neurons withReLU activation, dropout of 0.5 [srivastava2014dropout]ioffe2015batch]; and further by the output layer with softmax activation. The model was pre-trained on Imagenet to extract image features such as shapes and textures. All images are resized to and fed through the convolutional layers of the model. During training, only the weights of the last three layers were fine-tuned, while the other ones were frozen to the values from pre-training. This results in a total of trainable and
non-trainable parameters. The model is trained with a the cross entropy loss function on thesoftmax outputs, and optimized with Adam [kingma2014adam] with an initial learning rate of .
Furthermore, we use data augmentation techniques to diversify the dataset. In explanation, the Keras ImageDataGenerator is used, which applies a series of random transformations on each image when generating a batch (in-place augmentation). Here, we allow transformations of the following types: Rotations of up to 10 degrees, horizontal and vertical flips, and shifts of up to 10% of the image height or width respectively. As such transformations can naturally occur with diverse ultrasound devices and recording parameters, augmentation adds valuable and realistic diversity that helps to prevent overfitting.
All reported results were obtained with 5-fold cross validation. It was ensured that the frames of a single video are present within a single fold only, such that train and test data are completely disjoint recordings.111As a consequence though, the sizes of each fold vary, for example with 110 COVID-19 images in the smallest and 183 in the largest split.
As mentioned above, the model was trained to classify frames as COVID-19, pneumonia or healthy. When splitting the data, it was assured that the number of samples per class is similar in all folds. The performance of the proposed model on the frame-wise classification task is visualized inFigure 2, depicting the ROC curve for each class.
Clearly, the model learns to classify the images, with all ROC-AUC scores above or equal to 0.94. Pneumonia seems to appear very distinctive, whereas the performance on COVID-19 and regular images is lower, and varies across folds. Nevertheless, the ROC-AUC score of COVID-19 detection is 0.94. Figure 2 also depicts where the accuracy is maximal for each class. It can be observed that the rate of false positives at the maximal-accuracy point is larger for COVID-19 than for pneumonia and healthy patients. In a clinical setting where false positives are less problematic than false negatives, this property is desired.
Furthermore, Figure 3 provides more details on the predictions in the form of three confusion matrices: one with absolute values and two normalized along each axis.
Most importantly, 628 out of 654 COVID-19 images were classified correctly, leading to a sensitivity or recall of 96%. From the confusion matrices it becomes clear that pneumonia can be distinguished best with a sensitivity of 93% and precision of 95%. Note that only three frames of pneumonia were classified as healthy patients, showing - at the very least - the model’s ability to recognize strong irregularities in lung images. Nevertheless, it is clear that further work is necessary to improve the false negative rate of COVID-19, as 75 images are classified as healthy lungs. We believe that a main reason for that is the low number of lung POCUS recordings of healthy subjects compared to the number of COVID-19 images. Thus, model performance might be improved significantly with more data being collected, for example in the form of collaborations with ultrasound companies or hospitals.
To demonstrate the effectiveness of the proposed model, we compare the results to the performance of a model called COVID-Net that has recently been proposed for the classification of X-Ray images [wang2020covid]. Training COVID-Net on our data, it achieves an accuracy of 81% (averaged across all folds), whereas our model has 89% accuracy. Factoring in the balanced accuracy score over the three classes (82% vs 63%), POCOVID-Net clearly outperforms COVID-Net. Apart from the COVID-Net model, we also tested an architecture following [li2020artificial] on our data. The authors employ Res-Net [he2016deep] instead of VGG-16. In our experiments we observed that using Res-Net or NasNet [zoph2018learning] (a more recent pretrained model) resulted in significantly worse results.
Bal. Acc.: 0.82
Bal. Acc.: 0.63
Furthermore, Table 2 breaks down the comparison between COVID-Net and our model in more detail. An explanation for the low balanced accuracy of COVID-Net is given by the apparent inability of the model to deal with unbalanced data, leading to rarely any healthy-classifications. We were not able to obtain any better results when training the model on our data with the implementation provided by the authors. Although the model is able to differentiate between COVID and pneumonia to some extent (specificity of 0.98 for pneumonia), our model is superior in all scores except for sensitivity of COVID-19, resulting in a much higher false positive rate though. In the future, it might be beneficial though to combine several different models with ensemble methods.
Finally, in an eventual clinical deployment of POCOVID-Net
it would actually be desired to classify a video instead of single frames. We therefore summarized the frame-wise scores to determine the video class, employing two different methods: First, a majority vote of the predicted classes was taken, and secondly we average the class probabilities as predicted by the network, and then select the class with the highest average probability. Both methods achieved the same accuracy of 92% videos that were correctly classified, and a balanced video accuracy of 84%.
In summary, POCOVID-Net is able to learn frame-wise classification into COVID-19, pneumonia and healthy, where sensitivity for COVID-19 is already very high at 96%. It was demonstrated that the proposed architecture outperforms COVID-Net, and aggregating the frame-wise predictions into video classification yields a general classification performance of 92% accuracy. Last, it was argued that further work is required to reduce the number of false positive predictions of healthy lungs as COVID-19.
5 Web service (POCOVIDScreen)
The dataset and proposed model constitute a very preliminary first step toward the detection of COVID-19 from ultrasound data. Envisioning an interface that simplifies data sharing processes for medical practitioners thus attempting to foster collaborations with data scientists in order to serve the global need for rapid development of tools to alleviate the COVID-19 crisis, we have decided to build a web platform accessible at: https://pocovidscreen.org.
The platform (see preview in Figure 4) is open-access and was designed for two purposes: First, users can contribute to the open-access dataset by uploading their US recordings (i.e. images or videos from COVID-19, pneumonia or healthy controls). The collected lung ultrasound data will be continuously reviewed and approved by medical doctors, carefully processed and then integrated into our database on GitHub. This strategy is chosen in order to simplify the data sharing process as much as possible.
Secondly, users can access our trained model to perform a rapid screening of their own (unlabeled) data. Best performance is to be expected if the image is cropped to a quadratic section of the relevant part, similarly to our data. Subsequently, the prediction is performed by evaluating all five models trained during cross validation. The output scores are averaged and the predicted class, in conjunction with a probability, is displayed to the user.
In short, we hope that this tool can serve as a starting point which will lead to the development of better prediction models. Our web-service facilitates the process of data collection thus transforming the task into a community effort. Additionally it gives users the opportunity to use our model to infer the class of their own data, so far of course with a preliminary model whose outputs should not be considered of any clinical significance.
In the current global pandemic, it is as relevant as hardly ever before that the research community pools its expertise to provide solutions in the near future. Our contribution is the exploration of automatizing COVID-19 detection from lung ultrasound imaging in order to provide a quick assessment of the possibility of a person to be infected with COVID-19.
Our first step towards this goal is to release a collection of POCUS images and videos, that were gathered and pre-processed from the referenced sources. The videos are reliably labeled and can be split to generate a dataset of more than a thousand images. We would like to invite researchers to contribute to our database, e.g by pointing to new publications or by making lung ultrasound recordings available. We will constantly update this dataset on: https://github.com/jannisborn/covid19_pocus_ultrasound.
Second, the proposed machine learning model, POCOVID-Net, is intended as a proof-of-concept showing the predictive power of the data, as well as the capabilities of a computer vision model to recognize certain patterns in US data as they appear in COVID-19 and pneumonia. Our model, based on a pre-trained convolutional network, was demonstrated a detection accuracy of 89% and a video accuracy of 92%. With a high sensitivity rate of 96% for the COVID-19 class (specificity 79%), we provide evidence that automatic detection of COVID-19 is a promising future endeavour. Despite these encouraging results, readers are reminded that we have presented very preliminary results herein and that we do not claim any diagnostic performance. We also do not consider the detection on US data as a replacement for other test methods and neither as an alternative to extensive training of doctors in the usage of ultrasound for the detection task.
Last, with the presented web service we provide an interface that makes our model publicly available to researchers and hospitals around the world. Most importantly, this interface resembles a user-friendly way to contribute lung ultrasound data to our open-access initiative.
We believe that POCUS can provide a quick, easy and low-cost method to assess the possibility of a SARS-CoV-2 infection. We hope to have opened a branch for automatic detection of COVID-19 from ultrasound data and envision our method as a first step toward an auxiliary tool that can assist medical doctors. In case of a positive first-line examination, further testing is required for corroboration (e.g. CT scan or RT-PCR). Lung ultrasound may not only take a key role in disease diagnosis, but can be utilized to monitor disease evolution through regular checks performed non-invasively and without the need of relocation.
Regarding the patient journey, we envision POCOVID-Net as a preliminary test for random screening and a step toward a complementary test to PCR for COVID-19 suspicions. If a patient is suspected of COVID-19, POCUS could be used to assess the presence of lung lesion that might not have clinical repercussions and thus help detecting people at risk of lung complications which would allow to focus in a more dedicated manner on high-risk patients. For random screening, if POCOVID-Net identifies the patient with COVID-19 lung symptoms, further tests such as RT-PCR, CT-scan and medical check-up should be conducted. As POCUS devices are easily transportable, this test could take place in a primary care center where a physician or technical specialist can go to perform several screenings and avoid contamination in hospitals. It should be noted that with one device, it is possible to perform 4 to 5 lung screenings per hour, taking into account the time needed for installation and cleaning.
From the machine learning perspectives several improvements to POCOVID-Net are possible and should be considered, given more data becomes available. First, an evident improvement of the framework would be to perform inference directly on the videos (e.g. temporal CNNs) instead of the current frame based image analysis. While we did not perform this herein explicitly (due to the lack of sufficient data), we report a promising indication, namely that the video classification based on a frame-based majority vote improved the error rate by more than 25% (from 89% to 92% accuracy). Secondly, the benefit of pre-training the network on large image databases could be improved by training the model on (non lung) ultrasound samples instead of using ImageNet
, a database of real life objects. This pre-training may help detecting ultrasound specific patterns such as B-Lines. In addition, to exploit the higher availability of CT or X-ray scans, transfer learning strategies could be adopted as in[zahangir2020covid_mtnet, apostolopoulos2020covid]. Furthermore, generative models could help to complement the scarce data about COVID-19 as recently proposed in [loey2020within].
We aim to extend the functionality of the website in the future, to further encourage the community effort of researchers, doctors and companies to build a dataset of POCUS images that can leverage the predictive power of automatic detection systems, and thereby also the value of ultrasound imaging for COVID-19 in general. If the approach turns out to be successful, we plan to build an app as suggested in [li2020covid] that can enable medical doctors to draw inference from their ultrasound images with unprecedented ease, convenience and speed.
We would like to thank Jasmin Frieden for help in data processing and Moritz Gruber and J. Leo van Hemmen for feedback on the manuscript.
Appendix A Details on the dataset
We would like to acknowledge the following contributions from US videos from Radiopaedia (access date: 17.04.2020):
The following table presents details of the utilized videos, including source, meta information (length, size, frame rate) and comments from medical experts.
See pages 1- of pocus_data_table.pdf