The American College of Radiology recommends acquisition of chest radiographs following intubation, to ensure proper positioning of inserted tubes, for patients in the Intensive Care Unit (ICU) . This is justified by studies, such as , which show that following intubation, physical examination identified tube malposition in only 2% to 5% of patients, whereas the radiograph revealed suboptimal positioning in 10% to 25%. The ideal endotracheal (ET) tube position is in the mid trachea if the patient’s head is in the neutral position. Malposition of the ET tube can cause serious complications if not detected, especially where the tube is too low and selective bronchial intubation occurs. Such complications include a segmental or complete collapse of the contralateral lung, pneumothorax and atelectasis .
Using the acquired radiographs, Computer-Aided Detection (CAD) systems can assist physicians in automatic detection of the ET tubes. Previous studies used classical approaches to determine seed points followed by a line tracking algorithms [3, 4]
. A more recent study used a convolutional neural network (CNN) classification system for the presence or absence identification of the ET tube, with reported area under curve (AUC) of 0.99; and a second classification network for identification of low vs normal positioning of the ET tube, with AUC of 0.81. The above studies used private datasets of portable chest X-ray images with a relatively small amount of cases: 64  and 87  were used for the classical approaches; 300 cases were used for the CNN based solution .
Collecting and labeling chest radiographs for presence of ET tubes requires collaboration with hospitals and data extraction methods. For the ET tube detection and ET tube segmentation annotation, expert physicians are needed. In this paper, we present an innovative solution for the task of detection and segmentation of ET tubes in chest radiographs, in the scenario of limited expert labeled data: We use a public dataset of chest radiographs  which allows us to collect a large data of normal and ET tube examples required for training a deep learning network. We then synthesize ET tubes on top of the X-ray images to generate ground truth data for the ET tube segmentation. Finally, we present a combined CNN for ET tube detection and segmentation in chest radiographs showing promising results.
In this study, we apply a technique to insert synthetic ET tubes as an overlay to the original X-ray images taken from a publicly available dataset of chest radiographs  (hereon will be called the NIH dataset). This dataset contains over 100,000 frontal view images, many of them coming from ICU patients. While annotations are provided for 14 lung diseases, no annotations exist for the presence of ET tubes (or other tubes). A few sample images from the NIH dataset are shown in Figure 1 - the cases have high variability and many have poor image quality. We only used cases in Anterior-Posterior (AP) positioning to simulate intubated patients.
In the first step of our proposed solution, we propose a technique to generate new images with ground truth ET tube segmentation masks. The new image set we form will be used in a follow-up step, for training a combined CNN for detection and segmentation of ET tubes in chest radiographs.
2.1 Generating Synthetic Data
Generating the synthetic ET tubes over real X-ray images includes the following main steps as shown in Fig. 2: a) Selection of cases from the NIH dataset that do not contain ET tubes but may include other tubes (such as nasogastric (NG) tube, drainage tubes, catheters); b) Segmentation of the clavicles in order to localize the synthetic ET tube in the trachea area; c) Blending of generated synthetic ET tubes onto real X-ray images.
2.1.1 Clavicles Segmentation:
ET tubes are inserted into the trachea to allow artificial ventilation of the lungs. X-ray images are mostly aligned with the trachea located between the clavicles. Therefore, correct segmentation of the clavicles assists in placing the synthetic ET tube in the trachea area. In  a methodology for organ segmentation within Chest radiographs was presented, and shown to outperform alternate schemes, when tested on a common benchmark of 247 chest radiographs from the JSRT dataset, with ground-truth segmentation masks from the SCR dataset . The architecture proposed is based on a modified U-Net based architecture, in which pre-trained encoder weights were used, based on VGG16. In the current work, we use a similar scheme: For training we input
images, each normalized by its mean and standard deviation. We train the single-class segmentation model using Dice loss and threshold the output score maps to generate binary segmentation masks of the clavicles structure. This model gives us Dice coefficient score of 93.1% and Mean average contour distance of 0.871 mm.
2.1.2 Realistic ET Tubes Generation:
We present next our methodology for generating synthetic ET tubes for adult X-ray images. In our solution, we were inspired by the work of Yi et al.  that generated synthetic catheters on pediatric X-ray images. Figure 3 depicts the ET tube generation steps: First, we created a 2D simulation of the ET tube, as a hollow tabular object with a rectangular marker made of a radiopaque material. The tube and the marker are made from different materials and therefore have different attenuation components (c1 and c2). The tube outer and inner width, d1 and d2, were chosen to fit an adult ET tube with strip thickness t. We defined . All the parameters above were selected based on true physical properties of ET tubes or based on .
In order to simulate the ET tube from different rotations, we projected the 2D profile using a Radon transform and sampled the projection at 0, 30, 60, 90. For each synthetic ET tube we selected one of the four profiles and sampled the values of 15 pixels for drawing the tube.
The trace of the ET tube was simulated over the trachea area using the clavicles segmentation. we extracted the middle point x between the clavicles and the lowest point y. Then, we randomly selected 4 points with x offset of pixels and y-axis samples starting from 0 to y+offset of
pixels. The random points compose a line using B-spline interpolation. Finally, we draw the tube sampled profile over the line.
The last step for creating a realistic X-ray with an ET tube is to merge the synthetic tube with the real X-ray image. We selected AP X-ray images from the NIH dataset that do not contain ET tubes and blended the random synthetic ET tubes into the images. We used a simple blending with random weights in the range of .
2.2 Detection and Segmentation CNN
We propose a combined CNN architecture for ET tube detection and segmentation in chest radiographs, ETT-Net, as depicted in Figure 4
. The architecture is built from a VGG16 style encoder followed by two paths: One is a decoder that continues the U-Net shape for addressing the ET tube segmentation task; The other path summarizes the features extracted at the end of the encoder using a global pooling layer followed by two dense layers and a sigmoid, for addressing the ET tube classification task. We used pre-trained VGG16 weights as initialization for the encoder. The two paths of the network are trained simultaneously for both the classification task and the segmentation task using a combined weighted Binary Cross-Entropy (BCE) loss and a Dice (D) loss as follows:
where and are the classification output label and the ground truth label, respectively; S and are the segmentation output mask and the ground truth mask, respectively. is the weight to balance between the loss components and was chosen (empirically) as 0.1.
The network inputs X-ray image of size pixels duplicated 3 times (to fit the pre-trained encoder), the corresponding ET tube segmentation mask and a binary label for the presence or absence of ET tube. The input images are preprocessed with contrast limited adaptive histogram equalization (CLAHE) and normalized by their mean and standard deviation. The segmentation masks can be a blank ”all zero” image where no ET tube is present or a binary segmentation mask of the ET tube. For the training we augmented the data using horizontal flipping and small rotations of .
3 Experiments and Results
3.1 Two Phase Training
In order to train our suggested CNN using the synthesized data and still benefit from the existence of hundreds of X-ray images containing ET tube in the public dataset, we used a two phase training methodology. First, we trained the CNN using the generated data as explained in Section 2.1
. Then, we used all the AP cases from the NIH dataset for inference: We extracted real cases to fine-tune the network to improve the classification and segmentation performance on real chest radiographs data. In both training phases, we trained the network for 50 epochs using an Adam optimizer with default parameters.
The data for the first training phase includes 1669 X-ray images: 869 synthetic examples with ET tube and 800 without. The segmentation masks of the positive cases were obtained using a simple binary threshold of the synthetic tube before the blending operation. For the second phase, we used all NIH dataset AP cases and set conditions on the classification and segmentation outputs of the model: Images with classification prob. higher than 0.8 and non zero segmentation map were selected as positive examples; Images with classification prob. lower than 0.01 and a zero segmentation map were selected as negative examples. These conditions resulted in 3972 positive X-ray images with ET tube, and 36557 X-ray images without ET tube. Overall, after balancing the data, we trained the second phase using 7944 real chest radiographs.
3.2 Test Set
The test set includes 479 real chest radiographs from the NIH dataset that were collected manually one time during the development and entirely independent from all training data. All cases are in AP view position, 232 cases with ET tube and 247 without ET tube. After collecting the cases, we verified that the label for each case is consistent with the presence of an ET tube. It is important to note that as we didn’t use manual annotations for the segmentation of the tube, the ground truth segmentation maps are not pixel-wise accurate; still, they represent an expected range for the ET tube position in the images. The classification accuracy is an quantitative measure we can use. Thus, we tested our model using the AUC for the classification accuracy. The segmentation output was examined qualitatively.
|AUC||Sensitivity, Specificity||Testing Size [pos, neg]|
|Ramakrishna et al. ||-||92.9%, 97.2%||64 [28, 36]|
|Chen et al. ||0.95||-||87 [44, 43]|
|Lakhani et al. ||0.99||-||60 [30, 30]|
|DenseNet||0.97||89.2%, 93.0%||479 [232, 247]|
|ETT-Net - Phase1||0.96||89.2%, 93.0%||479 [232, 247]|
|ETT-Net - Phase2||0.99||95.5%, 96.5%||479 [232, 247]|
Training the combined model for classification and segmentation of ET tubes on synthetic X-ray images, we reached an AUC of 0.962 in classification accuracy. Using fine-tuning on real X-ray images, the accuracy improved to an AUC of 0.987 with both sensitivity and specificity over 95% (Figure 6). Figure 6 shows real chest radiographs from the test set that were classified correctly for presence of ET tube and their output segmentation maps.
We conducted an additional experiment using a different CNN architecture only for the classification task: identification of the tubes in real case scenarios. We trained a DenseNet  architecture with the same dataset we used in the second phase of the combined model - real cases with and without ET tube (n=7944) for 50 epochs and Adam optimizer. Training only for classification using a large real training data, we reached a high accuracy with an AUC of 0.975. Figure 7 shows a heatmap visualization of the last convolutional layer of the network. This visualization clearly indicates the localization on the ET tube area.
Table 1 compares state-of-the-art methods for the classification of presence or absent ET tube to our methods - results after the first phase of training using the ETT-Net model, Second phase results of the final model after fine-tuning with real examples and the only classification method using DenseNet. The table shows the amount of testing images used for testing each method, with separation for cases with ET tube (pos) and without (neg). Our best model, ETT-Net - Phase2, reached high performance with a test set size of one magnitude more than state-of-the-art methods. In addition our model was trained and tested using free public dataset (of real ICU patients) without the need for manual annotations, in contrast with the other methods were the cases were hand picked and annotated.
In this work, we proposed an approach for training a combined deep learning network for the tasks of detection and segmentation of the ET tube in adult chest radiographs without collecting and annotating data. We used a public dataset of X-ray images and synthesized realistic ET tubes blended into those images. We used the synthetic data as a first phase of training our model. Collecting real X-ray cases using the trained model, we continued to a second phase of training. Both stages are trained using the ETT-Net - a combined CNN architecture for ET tube detection and segmentation in chest radiographs. The combined model achieved a very high accuracy for the presence of ET tube in real ICU patients (0.99 AUC) using a test set which is ten times larger compared to previous studies and also outputs high quality segmentation maps that can assist in detection of the misplacement of the tubes. We also showed accurate results (0.97 AUC) using a CNN for classification only where the synthetic cases are used only for retrieval of real cases from the public dataset. Future work can include exploring a similar method for other tube types and combining them together in a multi-class detection and segmentation method. The ideas presented in our paper for synthesizing data over public dataset images, can be used in other medical imaging domains (for example generating tumors over healthy patients in X-ray or CT studies).
-  Godoy, M.C., Leitman, B.S., de Groot, P.M., Vlahos, I., Naidich, D.P.: Chest radiography in the icu: part 1, evaluation of airway, enteric, and pleural tubes. American Journal of Roentgenology 198(3), 563–-571 (2012)
-  Trotman-Dickenson, B.: Radiology in the intensive care unit (part i). Journal of intensive care medicine 18(4), 198–-210 (2003)
-  Ramakrishna, B., Brown, M., Goldin, J., Cagnon, C., Enzmann, D.: An improved automatic computer aided tube detection and labeling system on chest radiographs. In: Medical Imaging 2012: Computer-Aided Diagnosis. vol. 8315, p. 83150R. Inter-national Society for Optics and Photonics (2012)
-  Chen, S., Zhang, M., Yao, L., Xu, W.: Endotracheal tubes positioning detection in adult portable chest radiography for intensive care unit. International Journal of Computer Assisted Radiology and Surgery11(11), 2049-–2057 (2016)
-  Lakhani, P.: Deep convolutional neural networks for endotracheal tube position and x-ray image classification: Challenges and opportunities. Journal of Digital Imaging 30(4), 460–-468 (2017). 10.10007/s10278-017-9980-77
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: The IEEE Conference on Computer Vision and Pattern Recognition (2017)
Frid-Adar, M., Ben-Cohen, A., Amer, R., Greenspan, H.: Improving the segmen-tation of anatomical structures in chest radiographs using u-net with an imagenet pre-trained encoder. In: Image Analysis for Moving Organ, Breast, and Thoracic Images, MICCAI. pp. 159–-168. Springer International Publishing, Cham (2018)
-  Ginneken, B.V., Stegmann, M.B., Loog, M.: Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Medical Image Analysis 10(1), 19–-40 (2006)
-  Yi, X., Adams, S., Babyn, P., Elnajmi, A.: Automatic catheter detection in pediatric x-ray images using a scale-recurrent network and synthetic data. CoRR (2018). http://arxiv.org/abs/1806.00921
-  Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. CoRR (2016). http://arxiv.org/abs/1608.069936