DeepAI
Log In Sign Up

Development of Automatic Endotracheal Tube and Carina Detection on Portable Supine Chest Radiographs using Artificial Intelligence

The image quality of portable supine chest radiographs is inherently poor due to low contrast and high noise. The endotracheal intubation detection requires the locations of the endotracheal tube (ETT) tip and carina. The goal is to find the distance between the ETT tip and the carina in chest radiography. To overcome such a problem, we propose a feature extraction method with Mask R-CNN. The Mask R-CNN predicts a tube and a tracheal bifurcation in an image. Then, the feature extraction method is used to find the feature point of the ETT tip and that of the carina. Therefore, the ETT-carina distance can be obtained. In our experiments, our results can exceed 96% in terms of recall and precision. Moreover, the object error is less than 4.7751± 5.3420 mm, and the ETT-carina distance errors are less than 5.5432± 6.3100 mm. The external validation shows that the proposed method is a high-robustness system. According to the Pearson correlation coefficient, we have a strong correlation between the board-certified intensivists and our result in terms of ETT-carina distance.

READ FULL TEXT VIEW PDF

page 3

page 11

12/21/2020

Automatic Diagnosis of Pneumothorax from Chest Radiographs: A Systematic Literature Review

Among various medical imaging tools, chest radiographs are the most impo...
07/09/2018

ChestNet: A Deep Neural Network for Classification of Thoracic Diseases on Chest Radiography

Computer-aided techniques may lead to more accurate and more acces-sible...
10/09/2018

Automatic Segmentation of Thoracic Aorta Segments in Low-Dose Chest CT

Morphological analysis and identification of pathologies in the aorta ar...
01/20/2023

Estimation of mitral valve hinge point coordinates – deep neural net for echocardiogram segmentation

Cardiac image segmentation is a powerful tool in regard to diagnostics a...
09/04/2022

Data-Driven Deep Supervision for Skin Lesion Classification

Automatic classification of pigmented, non-pigmented, and depigmented no...

1 Introduction

Mechanical ventilation via endotracheal intubation is a life-saving procedure for patients with acute respiratory failure [16]. However, improper depth of endotracheal tube (ETT) placement is not rare during endotracheal intubation and may cause catastrophic effects if not recognized promptly after intubation [2, 1, 15]. An ETT being placed in too shallow a position may increase the risk of accidental extubation or air leak. On the contrary, a deeply positioned ETT may lead to a collapse of the nonventilated lung and hyperinflation of the intubated lung with consequent tension pneumothorax. Therefore, it is essential to measure the distance between the ETT tip and the carina on a chest radiograph (CXR) to confirm the ETT is being placed at the proper depth and to avoid potential life-threatening complications in intensive care units (ICUs) [14, 10].

For ICU patients with endotracheal intubation, the CXR is commonly taken by a portable, or bedside, X-ray machine with patients placed in a supine position. Compared to a standard standing CXR, a portable supine CXR usually has technical disadvantages, such as lower contrast and higher noise [19]. Several anatomical structures, such as the heart, great vessels, and spine, may overlap with the ETT and carina. Also, the existence of medical devices and lines may obscure the position of the ETT tip and carina on portable CXRs. Identifying the positions of the ETT tip and carina can be a laborious task for critical care providers during routine rounds. ETT malposition may be ignored and thus cannot be promptly corrected. This delay may endanger the lives of those who have been critically ill. A computer-aided detection system that could accurately identify the positions of the ETT tip and carina on portable supine CXRs may improve ICU care quality and ensure patient safety [6, 21]. Therefore, the purpose of this study is to develop an AI system to identify the positions of the ETT tip and carina on portable supine CXRs, to measure the distance between the ETT tip and the carina and to recognize improper ETT positioning in the ICU.

Frid-Adar et al. [8] generated synthetic ET tubes for adult X-ray images and then used the U-Net to segment the ETT. Lakhani et al. [12]

placed chest radiographs into 12 categories, including bronchial insertion and distance from the carina at 1.0-cm intervals (0.0–0.9 cm, 1.0–1.9 cm, etc.) and greater than 10 cm. They used Inception V3 to classify chest radiographs, which implies predicting the ETT-carina distance interval.

The endotracheal intubation detection can be considered a multi-label mask detection problem. Segmentation methods can be categorized into two classes: segmentation-based methods and detection-based methods. Segmentation-based methods predict the category labels of each pixel in the border region of the image. The most famous segmentation-based method is U-Net [18]. Detection-based methods are based on state-of-the-art detectors to get the bounding box of each instance and then predict the mask for each bounding box. The detectors include Faster R-CNN [17], R-FCN [7] and Cascade-RCNN [3], etc. Based on Faster R-CNN, He et al. [9] proposed Mask R-CNN, in which an instance-level semantic segmentation branch is added. According to position-sensitive scores, Chen et al. [5] proposed MaskLab to obtain better results. However, the score of the instance mask is from the box-level classification confidence. To focus on scoring the masks, Huang et al. [11] proposed Mask Scoring R-CNN, which is based on Mask R-CNN.

This paper proposes a feature extraction method with Mask R-CNN. The Mask R-CNN predicts a tube and a tracheal bifurcation in an image. Then, the feature extraction method is used to find the feature point of the tube tip and that of the carina. Compared to [12], this paper predicts not only the ETT–carina distance but also the mask of the distal ETT end, the mask of the tracheal bifurcation, the feature point of the tube tip, and the feature point of the carina. Moreover, the solution of [12] was trained with category labels, whereas our solution was trained using pixel-level segmentation labels, which can generate a more accurate object location.

The rest of this paper is organized as follows. Section 2 proposes a method for endotracheal intubation detection. Section 3 shows the experiment results. Section 4 draws conclusions.

Figure 1: The ground truth.
Figure 2: Mask R-CNN.

2 Method

In the ground truth, four points () are used to label the end part of the ETT, and another nine points () are used to label the tracheal bifurcation (see Fig. 1). Moreover, two boxes with a size of are used to label the feature point of the ETT tip (the middle point of (, )) and the feature point of the carina (), respectively. The goal is to find the mask of the distal ETT end, the mask of the tracheal bifurcation, and two boxes of the feature point on chest radiographs to locate the ETT tip and the carina.

This work uses Mask R-CNN [9] to predict the masks of ETT and tracheal bifurcation. Figure 2 shows the Mask R-CNN architecture. The backbone is ResNeXt 50 (32x4d) [20] with Feature Pyramid Network (FPN) [13]. The head architecture (right panel in Fig. 2) extended the Faster R-CNN head. The numbers denote the spatial resolution and channels. According to the spatial dimension, arrows are either conv or deconv layers. The single number denotes the fc layer. All convs are , except the output conv is . The deconvs are

with stride 2.

Figure 3: The processing of finding feature points in the ETT tip and carina.

2.1 The Inference Process

In the inference step, we only keep the object with the maximal score for each class. Then, a feature extraction method is used to find the feature points. Figure 3 shows the process of finding the feature points of the ETT tip and carina. We use the boxes of the feature points and the masks, which are predicted by the Mask R-CNN, to obtain the exact feature points of the ETT tip and carina. Two results are obtained by the bounding boxes of the feature point and by the masks of the ETT and the carina, respectively. Then, the exact feature points of the ETT tip and carina can be obtained by fusing the two results.

Figure 4: The flow chart of finding the exact feature points in ETT tip.
Figure 5: The flow chart of finding the exact feature points in carina.

The masking processes are as follows. We use skeletonization to find the skeleton of the mask. The feature point of the ETT is the lowest point in the mask skeleton (see the top of Figure 3). In the feature point of the carina, this work uses a slide window with size of 15x15 to find the central point in the skeleton of the carina mask. Then, the Canny edge detection is applied to find the edge of the mask. According to the central point, a patch of the mask edge with a size of 100x150 is cropped. Then, we used a slide window with a size of 7x7 to find the feature point of the carina in the patch (see the middle of Figure 3).

Figures 4 and 5

show the flow chart of finding the exact feature points in the ETT and carina. In the feature points of the ETT tip, we accept all results of the bounding boxes of the feature point. Since the bounding boxes of the feature point are too small, many objects (about 10%) are not detected. Hence, this work uses the mask result to supplement the bounding box result. In the feature points of the carina, the mean error of the bounding box result is low and the standard deviation is high. On the other hand, the mean error of the mask result is high and standard deviation is low. Hence, we use the mask result to eliminate the worst cases of the bounding box result. If the distance between two feature points (from bounding box and mask) is greater than 100 pixels, we use the mask result to replace the bounding box result. Moreover, the mask result also supplements the bounding box result.

3 Experiments

3.1 Dataset and Evaluation Metrics

This study was approved by the institutional review board (IRB) of the National Cheng Kung University Hospital (IRB number: A-ER-108-305). Chest radiographs of intubated ICU patients were identified by data research in the institutional Picture Archiving and Communication System (PACS), and the underlying DICOMs were exported after de-identification. The dataset had a total of 1,842 portable chest radiographs. The ground-truth annotations were labeled by two board-certified intensivists. Moreover, we used 150 images from outsider hospitals to validate.

Four metrics, namely object error, distance error, recall, and precision, were applied to evaluate the performance of the proposed method. The concept of Dice Coefficient (DC) was used to measure the performance of mask (the two polygons representing the end part of the ETT end and the bifurcation of the tracheobronchial tree) detection. The DC can measure the similarity between two sets of data and has been broadly used for validating image segmentation algorithms. Moreover, the Euclidean Distance was used to measure the performance of the feature point (ETT tip and carina) detection.

Let be the middle point of in the ground truth annotation. The distance between the point and the predicted feature point of the ETT tip is the object error of the ETT tip. Moreover, the distance between the point in the annotated chest radiographs and the predicted feature point of the carina is the object error of the carina. We consider the object to have been successfully detected if the Dice Coefficient was no less than 0.6 or the object error was no more than 100 pixels. We call the detected object according to a true positive. A false positive is an error in object detection in which a mask incorrectly indicates an object. Moreover, a false negative is the error in which the model did not indicate the object. The definitions of recall and precision are:

and

where is the number of true positives, is the number of false negatives, and is the number of false positives.

Let be the ETT-carina distance between the point and the point in the ground truth annotation. Let be the ETT–carina distance between the predicted feature point of the ETT tip and the predicted feature point of the carina. The ETT–carina distance error was the absolute value of .

3.2 Implementation Details

The MMDetection [4]

and the PyTorch library were used for building the network. The layers of the network were initialized using pretrained weights from open-mmlab. The loss functions of the bounding box head were set to the cross-entropy and smooth L1. The loss function of the mask head was set to the average binary cross-entropy. Images were resized to

, and each mini batch has 5 images. The model was trained using an SGD optimizer with a momentum of 0.9 and a weight decay of 0.0001. The learning rate was 0.001. Training was performed for 120 epochs, decreasing the learning rate by a factor of 0.1 after 80 epochs and 110 epochs.

Tube Tip Carina
Recall Precision Recall Precision
Fold 1 95.42% 95.53% 95.74% 96.28%
Fold 2 95.84% 96.26% 96.86% 97.52%
Fold 3 96.26% 96.47% 96.71% 97.26%
Fold 4 96.96% 96.96% 97.10% 97.54%
Average 96.12% 96.31% 96.60% 97.15%
External val. 93.96% 93.96% 95.47% 95.80%
Table 1: The object detection performance in recall and precision

3.3 Results

Table 1 shows the object detection performance in terms of recall and precision. The recall of the ETT tip was 96.12%, and the precision of that is 96.31%. The recall of the carina was 96.60%, and the precision of that was 97.15%. In an external validation, the performance degraded less than 3%

Tube Tip (mm) Carina (mm)
Mean Std. Mean Std.
Fold 1 4.1219 4.6022 4.9794 5.3327
Fold 2 4.2362 4.7496 4.5163 5.5910
Fold 3 4.2715 4.5649 4.8013 5.4726
Fold 4 3.8564 3.6895 4.8032 4.9717
Average 4.1215 4.4016 4.7751 5.3420
External val. 4.2856 5.9425 4.5668 4.5132
Table 2: The object detection performance in object error

Table 2 shows the object detection performance in terms of object error. In the ETT tip location, the mean of object error was 4.1215 mm, and the standard deviation of that was 4.4016 mm. In the carina location, the mean of the object error was 4.7751 mm, and the standard deviation of that was 5.3420 mm. In an external validation, the performance of the ETT tip location decreased slightly. Table 3 shows the object detection performance in terms of the ETT–carina distance error. The mean of the ETT–carina distance error was 5.5432 mm. The standard deviation of the ETT–carina distance error was 6.3100 mm. In an external validation, the performance also decreased slightly. In [12], the mean of the ETT–carina distance error was 6.9 mm, and the standard deviation of the ETT–carina distance error was 7.0 mm.

Mean (mm) Std. (mm)
Fold 1 5.7413 6.3497
Fold 2 5.6370 7.3877
Fold 3 5.5754 6.3576
Fold 4 5.2190 5.1451
Average 5.5432 6.3100
External val. 5.6680 6.6514
Table 3: The object detection performance in ETT-carina distance error
mm mm mm mm
Fold 1 59.31% 84.42% 91.13% 95.24%
Fold 2 60.30% 83.30% 92.62% 95.44%
Fold 3 59.83% 83.41% 92.58% 94.76%
Fold 4 62.04% 85.68% 94.79% 96.10%
Average 60.37% 84.20% 92.78% 95.39%
External val. 64.00% 82.00% 90.67% 94.67%
Table 4: The distribution of number of images in ETT-carina distance error

Table 4 shows the distribution of the number of images in terms of ETT–carina distance error. We have 92.78% images that are less than 15 mm in ETT–carina distance error. Table 5 shows the distribution of the number of images in object error (Endotracheal tube tip). We have 96.36% images that are less than 15 mm in object error. Table 6 shows the distribution of the number of images in object error (carina). We have 95.55% images that are less than 15 mm in object error. Moreover, an external validation shows that the performance has no significant differences.

mm mm mm mm
Fold 1 77.06% 92.86% 96.54% 97.62%
Fold 2 73.75% 91.76% 95.66% 97.61%
Fold 3 72.71% 92.36% 95.85% 98.03%
Fold 4 76.79% 92.84% 97.40% 99.57%
Average 75.08% 92.46% 96.36% 98.21%
External val. 79.33% 90.00% 95.33% 96.67%
Table 5: The distribution of number of images in object error (Endotracheal tube tip)
mm mm mm mm
Fold 1 67.53% 90.04% 95.02% 96.54%
Fold 2 70.93% 93.06% 96.10% 97.83%
Fold 3 69.65% 91.27% 94.76% 96.72%
Fold 4 67.25% 91.76% 96.31% 97.40%
Average 68.84% 91.53% 95.55% 97.12%
External val. 73.33% 89.33% 95.33% 96.54%
Table 6: The distribution of number of images in object error (carina)
PredictGT Suitable Unsuitable
Suitable 1305 107
Unsuitable 85 318
Undetection 12 15
Table 7:

The confusion matrix of diagnosis

Table 7 shows the confusion matrix of diagnosis. We call an ETT suitable if the ETT–carina distance is in the range between 20 to 70 mm. We have 1,623 (88.11%) images that correspond to results from the board-certified intensivists. Table 8 shows that the correlation between the ground truth and the result of the proposed method is significant in terms of ETT–carina distance. In the external validation, Table 9 also shows that the correlation between the ground truth and the result of the proposed method is significant in terms of ETT-carina distance.

fold 1 fold 2 fold 3 fold 4
Pearson
0.8895 0.8742 0.8800 0.9164

95% confidence interval

0.8686 to 0.9072 0.8505 to 0.8943 0.8573 to 0.8993 0.9004 to 0.9300
R square 0.7912 0.7642 0.7744 0.8398
P value
P (two-tailed) <0.0001 <0.0001 <0.0001 <0.0001
P value summary **** **** **** ****
Significant? () Yes Yes Yes Yes
Number of XY Pairs 456 451 451 457
Overall number 462 461 458 460
Table 8: The correlation between ground truth and the result of proposed method in terms of ETT-carina distance.
Other Hospital Data
Pearson
0.8860
95% confidence interval 0.8457 to 0.9163
R square 0.7851
P value
P (two-tailed) <0.0001
P value summary ****
Significant? () Yes
Number of XY Pairs 149
Overall number 150
Table 9: The correlation between ground truth of outsider hospitals and the result of proposed method in terms of ETT-carina distance.

Table 10 shows the visualization of result. Although the first two images of the good case are blurred, the features of the ETT and the carina are clear. The last image of the good case is an easy case. The images of the medium case were selected based on the mean error. Although the predicted feature points had a slight error in the medium case, the predicted shape of the objects was correct. Due to blurred and unclear features in the first two images of the worst case, the proposed method cannot obtain an exact result. In the last image of the worst case, the predicted ETT was incorrect, since the edge of the end part is unclear.

Good
Medium
Worse
Table 10: The predicted images: the yellow line is ground truth, the yellow asterisk is ground truth of feature point and the red asterisk is predicted feature point.

4 Discussion and Conclusion

For most critically ill patients, endotracheal intubation with mechanical ventilation is crucial to maintain their lives. Malposition of the ETT can cause serious harm to the patients if not detected in a timely manner. Clinicians need to verify proper ETT positioning by measuring the distance between the ETT tip and the carina on a portable supine CXR. If malposition is detected, clinicians need to adjust the ETT position according to measured distance. However, it is not easy to clearly identify the positions of the ETT and the carina on portable supine CXRs due to the low image contrast and abundant noise at the region of interest, especially for clinicians with less experience. An AI system to reliably identify ETT and carina positions on portable supine CXRs may lead to rapid and accurate identification of improper ETT position and reduce the risk of critically ill patients.

This paper proposed a feature extraction method with Mask R-CNN to predict feature points of tube and tracheal bifurcation in an image. In our experiments, our result exceeded 96% in terms of recall and precision. The object error was less than mm, and the ETT-carina distance error was less than mm. Moreover, the proposed system labeled the feature point of the ETT tip and that of the carina, which can assist intensivists to monitor ICU patients’ condition. According to the Pearson correlation coefficient, we have a strong correlation between the board-certified intensivists and our result in terms of ETT-carina distance. The external validation shows that the proposed method is a high-robustness system.

Although the proposed AI system can overcome problems from a portable CXR with lower contrast and higher noise, the system cannot operate normally in CXRs with extremely poor quality due to the improper operation of the X-ray machine. Therefore, future work is to suggest a clinical guideline and to obtain feedback from medical staff.

Acknowledgment

The work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. under Grants no. MOST 110-2634-F-006-012 - and by Higher Education Sprout Project, Ministry of Education to the Headquarters of University Advancement at National Cheng Kung University (NCKU).

References

  • [1] C. A. Brown, A. E. Bair, D. J. Pallin, and R. M. Walls, “Techniques, success, and adverse events of emergency department adult intubations,” Annals of Emergency Medicine, vol. 65, no. 4, pp. 363–370.e1, 2015.
  • [2] W. Brunel, D. L. Coleman, D. E. Schwartz, E. Peper, and N. H. Cohen, “Assessment of routine chest roentgenograms and the physical examination to confirm endotracheal tube position,” Chest, vol. 96, no. 5, pp. 1043–1045, 1989.
  • [3] Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality object detection,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , June 2018.
  • [4] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “MMDetection: Open mmlab detection toolbox and benchmark,” arXiv preprint arXiv:1906.07155, 2019.
  • [5] L.-C. Chen, A. Hermans, G. Papandreou, F. Schroff, P. Wang, and H. Adam, “Masklab: Instance segmentation by refining object detection with semantic and direction features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [6] S. Chen, M. Zhang, L. Yao, and W. Xu, “Endotracheal tubes positioning detection in adult portable chest radiography for intensive care unit,” International journal of computer assisted radiology and surgery, vol. 11, no. 11, pp. 2049–2057, 2016.
  • [7] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29.   Curran Associates, Inc., 2016, pp. 379–387.
  • [8] M. Frid-Adar, R. Amer, and H. Greenspan, “Endotracheal tube detection and segmentation in chest radiographs using synthetic data,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P.-T. Yap, and A. Khan, Eds.   Cham: Springer International Publishing, 2019, pp. 784–792.
  • [9] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [10] H. Hossein-Nejad, P. Payandemehr, S. A. Bashiri, and H. H.-N. Nedai, “Chest radiography after endotracheal tube placement: is it necessary or not?” The American Journal of Emergency Medicine, vol. 31, no. 8, pp. 1181–1182, 2013.
  • [11] Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring r-cnn,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • [12] P. Lakhani, A. Flanders, and R. Gorniak, “Endotracheal tube position assessment on chest radiographs using deep learning,” Radiology: Artificial Intelligence, vol. 0, no. ja, p. e200026, 2020.
  • [13] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [14] R. Lotano, D. Gerber, C. Aseron, R. Santarelli, and M. Pratter, “Utility of postintubation chest radiographs in the intensive care unit,” Critical Care, vol. 4, no. 1, pp. 1–4, 2000.
  • [15] Y. Ono, T. Kakamu, H. Kikuchi, Y. Mori, Y. Watanabe, and K. Shinohara, “Expert-performed endotracheal intubation-related complications in trauma patients: incidence, possible risk factors, and outcomes in the prehospital setting and emergency department,” Emergency medicine international, vol. 2018, 2018.
  • [16] T. Pham, L. J. Brochard, and A. S. Slutsky, “Mechanical ventilation: state of the art,” in Mayo Clinic Proceedings, vol. 92, no. 9.   Elsevier, 2017, pp. 1382–1400.
  • [17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28.   Curran Associates, Inc., 2015, pp. 91–99.
  • [18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds.   Cham: Springer International Publishing, 2015, pp. 234–241.
  • [19] M. Wiener, S. Garay, B. Leitman, D. Wiener, and C. Ravin, “Imaging of the intensive care unit patient,” Clinics in chest medicine, vol. 12, no. 1, p. 169—198, March 1991.
  • [20]

    S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [21] X. Yi, S. J. Adams, R. D. Henderson, and P. Babyn, “Computer-aided assessment of catheters and tubes on radiographs: How good is artificial intelligence for assessment?” Radiology: Artificial Intelligence, vol. 2, no. 1, p. e190082, 2020.