A dataset for Computer-Aided Detection of Pulmonary Embolism in CTA images

by   Mojtaba Masoudi, et al.
Ferdowsi University of Mashhad

Todays, researchers in the field of Pulmonary Embolism (PE) analysis need to use a publicly available dataset to assess and compare their methods. Different systems have been designed for the detection of pulmonary embolism (PE), but none of them have used any public datasets. All papers have used their own private dataset. In order to fill this gap, we have collected 5160 slices of computed tomography angiography (CTA) images acquired from 20 patients, and after labeling the image by experts in this field, we provided a reliable dataset which is now publicly available. In some situation, PE detection can be difficult, for example when it occurs in the peripheral branches or when patients have pulmonary diseases (such as parenchymal disease). Therefore, the efficiency of CAD systems highly depends on the dataset. In the given dataset, 66 are also included.



There are no comments yet.


page 1

page 2


Explainable Disease Classification via weakly-supervised segmentation

Deep learning based approaches to Computer Aided Diagnosis (CAD) typical...

Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries

Coronary artery disease (CAD) has posed a leading threat to the lives of...

Computer Aided Automatic Brain Segmentation from Computed Tomography Images using Multilevel Masking

Importance of computed tomography (CT) images lies in imaging speed, ima...

The Role of Publicly Available Data in MICCAI Papers from 2014 to 2018

Widely-used public benchmarks are of huge importance to computer vision ...

Penobscot Dataset: Fostering Machine Learning Development for Seismic Interpretation

We have seen in the past years the flourishing of machine and deep learn...

CNN-CASS: CNN for Classification of Coronary Artery Stenosis Score in MPR Images

To decrease patient waiting time for diagnosis of the Coronary Artery Di...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pulmonary embolism (PE) is a sudden blockage of a lung artery by a clot appears in an artery. This is usually caused by a blood clot in the veins of the pelvis and carry through blood flow from the heart to the lungs. This phenomenon led to the closure of the pulmonary artery, thus reducing the ability of respiratory [1] . Because of a more than 50% artery-blockage which caused by clot, patient may die immediately, and a smaller clots can result in excessive bleeding inside the lungs. Therefore PE is a common disorder with high morbidity and mortality, hence an early and exact detection is required. Contrast-enhanced computed tomography X-ray images have been widely used in the diagnosis of PE [2]. They called computed tomography angiography (CTA) images. These images have low-risk and proper display of lesions in blood vessels. Dye is dissolved in the blood and increases the contrast of vessels as a bright area in the images. However, it does not dissolve to embolism so PE appears as a dark area in CTA images [3]. The identification of dark spots corresponding embolism by radiologists is difficult, unreliable and time-consuming, for example different radiologist may recognize different masses. In recent years to support radiologists and improve their performance on the challenge of pulmonary angiography in CTA images, computer-aided detection (CAD) systems are developed [4][5][6][7]. The capturing time is important because it should be done when the dye is in arteries. Arteries have high contrast in return embolism and veins have low contrast in an acceptable CTA. There are other areas that are similar to emboli such as lymphatic tissue and parenchymal tissue disease. Also, partial volume effect produces similar areas on the boundaries. These values increased FP error in most of the methods that have been proposed. Therefore, dataset has a significant role in these methods.

2 Dataset

A lot of research to identify PE has been done, but their efficiency is dependent on the dataset. For example, the presence or absence of various diseases of the lung or PE in the major or peripheral branches can affect the performance of designed system [5]. So far, there is no publicly available dataset for PE, so each paper has used their own private dataset. In order to Verify and compare these methods, a public dataset must be used. We have provided and released a public dataset that can be downloaded in DICOM format [8]. The gold-standard annotation is done by two radiologists named F.Shafiee (a board certified radiologist with over 5 years of experience reading CTA) and M.Pezeshki (head of the radiology unit of Emam-Reza Hospital in Mashhad, Iran with more than 18 years of clinical experience). F.Shafiee first analyzed images and then marked the region of interest (ROI), after that M.Pezeshki re-examined ROI. From these markings, a semi-automated method for PE segmentation was applied. This PE segmentation is consists of a thresholding step based on Hounsfield units, followed by a morphological operation and connected component analysis. Each segmentation is then manually inspected to remove spurious pixels. The gold-standard is given as segmentation masks with 0 value for background and 1 for the PEs. As mentioned before PE position affects the performance of the CAD systems, so in this dataset, we have selected cases which have PE in different branches. In Table I, we show how many PE is in the major and peripheral artery. As can be seen, a high percentage of PE (66%) is in the peripheral artery which is a better case to have more PE in the peripheral of the artery branches. There are also some types of pulmonary diseases in some cases.

Case #PE PE All PE
Major Artery Peripheral Artery
201 46201 34 80
185 0 21 21
210 39 55 94
197 17 29 46
217 71 92 163
232 0 15 15
197 95 147 242
273 23 36 59
237 0 53 53
204 0 8 8
217 0 18 18
178 0 77 77
189 30 23 53
217 0 98 98
250 15 31 46
235 18 59 77
475 194 60 254
452 54 102 156
370 51 300 351
424 19 48 67
5160 672(34%) 1306(66%) 1978
TABLE I: summary of the pulmonary embolism distribution.


  • [1] A. Pforte, “Epidemiology, diagnosis and therapy of pulmonary embolism,” Eur. J. Med. Res., vol. 9, pp. 171–179, Apr 2004.
  • [2] M. Remy-Jardin and J. Remy, “Spiral ct angiography of the pulmonary circulation,” Radiology (RSNA), vol. 212, pp. 615–636, Sep 1999.
  • [3] I. Hartmann and M. Prokop, “Spiral ct in the diagnosis of acute pulmonary embolism,” Medica Mundi, vol. 46, no. 3, pp. 2–11, 2002.
  • [4] Y. Masutani, H. MacMahon, and K. Doi, “Computerized detection of pulmonary embolism in spiral ct angiography based on volumetric image analysis,” IEEE Transactions on Medical Imaging, vol. 21, no. 12, pp. 1517–1523, Dec 2002.
  • [5] H. Bouma, J. J. Sonnemans, A. Vilanova, and F. A. Gerritsen, “Automatic detection of pulmonary embolism in cta images,” IEEE Transactions on Medical Imaging, vol. 28, no. 8, pp. 1223–1230, Aug 2009.
  • [6] S. C. Park, B. E. Chapman, and B. Zheng, “A multistage approach to improve performance of computer-aided detection of pulmonary embolisms depicted on ct images: Preliminary investigation,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 6, pp. 1519–1527, June 2011.
  • [7] H. Özkan, O. Osman, S. Şahin, and A. F. Boz, “A novel method for pulmonary embolism detection in cta images,” Computer Methods and Programs in Biomedicine, vol. 113, no. 3, pp. 757–766, Dec 2014.
  • [8] “Machine vision laboratory.” [Online]. Available: http://mvlab.um.ac.ir/index.php/en/downloadable-files/2-uncategorised/51-pulmonary-embolism