Sparing esophagus as the major organ at risk (OAR) during the treatment of lung cancer patients is important due to its proximity to tumor. Due to poor sparing, currently 50% of the patients develop acute esophagitis  that severely impacts their quality of life and there is no reliable way to assess the response in real time. The esophagus is a mobile tubular soft tissue organ and the esophagus contour on planning CT (pCT) may not accurately represent the actual weekly esophagus change due to anatomical changes and setup uncertainties arising from e.g. physiological variations and respiratory motion over the course of radiotherapy (RT) . Discrepancies between the planned and on-treatment esophagus structures are challenging to detect using Cone Beam Computed Tomography (CBCT) due to low soft-tissue contrast [17, 1] which is the current limitation of image-guided radiotherapy (RT). CBCTs are frequently used in clinic for patient setup and treatment response evaluation etc. However due to artifacts/noise in CBCTs, soft-tissue organs such as esophagus are difficult to delineate on weekly CBCTs (wCBCTs). CBCT images can be used for segmenting tumor/OARs e.g. in an adaptive radiotherapy framework where the treatment is adapted mid-way to the weekly changes of the critical organs, so that the physicians can re-evaluate their plan. Moreover, being able to properly adopt CBCTs daily/weekly in the clinic, could overcome the image-guided RT limitations.
, Markov-chain models etc. Recently, deep learning models due to their higher generalizability and instance validations were utilized to perform (semi)auto-segmentation of esophagus [2, 9, 15]. From among these, some used 2D models to overcome the training dataset size limitation [2, 4]. These however resulted in discontinuity of the generated feature maps which in turn resulted in poor esophagus boundary delineation. Moreover, 2D models require additional data-specific post-processing that make these less generalizable. This problem is more prominent when segmenting a long tubular soft-tissue organ such as esophagus in noisy CBCT images where 3D spatial information is critical to avoid discontinuities and to deal with the noise/artifacts better and specific tailored network architecture is needed to extract sharper edges.
Some groups have tried to segment esophagus on high-quality planning Computed Tomography (pCT) images that are used to delineate tumors and OARs for treatment planning in RT [5, 4]. However no work has been done on segmenting esophagus on low-quality CBCT images. Delineation of esophagus is challenging for physicians on pCT and wCBCTs due to low soft-tissue contrast and the presence of artifacts and noise; even the best previous dice overlap on high-quality pCTs was 0.72 . Other works have tried to remove noises/artifacts from CBCTs to make it more similar to pCT [20, 24], however none of these works have been translated to clinic due to issues with anatomical hallucinations , computational complexity (stability/convergence issues) and also lack of sufficient corresponding CT and CBCT data to train on .
In this work, we introduce an in silico (computer simulated) data augmentation approach to induce different variations of scatter artifacts extracted from the wCBCTs to the pCTs for each patient. We then reconstructed the artifact-induced pCTs using CBCT reconstruction parameters/technique. We refer to these artifact-induced pCTs as pseudo-CBCT (ps-CBCT); these ps-CBCTs have similar artifacts distribution as the clinical CBCT images and include all the physics-based aspects of diagnostic imaging i.e. scatter, noise, beam hardening, motion. Subsequently, the high quality pCT manual contours and the generated ps-CBCTs were fed to a modified 3D-UNet tailored for esophagus using a multi-objective dice and binary cross entropy loss function to obtain the segmentation. By adding the noise/scatter artifacts from the low-dose CBCTs to the corresponding pCTs and reconstructing these using a CBCT reconstruction algorithm/parameters, we generate pseudo-CBCTs that exploit higher soft-tissue contrast in pCTs to preserve esophagus textures/boundaries. The pseudo-CBCTs incorporate all the physics-based aspects like pure physics Monte Carlo approaches but come without the computational overhead and the requirement for corresponding projection data (that is normally discarded after acquisition due to memory constraints). In essence, this in silico data augmentation presents a novel way to train a deep learning model for a semantic and physics-based segmentation of esophagus in CBCTs. The model trained on the augmented pseudo-CBCT data is robust and generalizable enough that it achieves state-of-the-art performance on both real weekly CBCTs as well as pCTs. This shows that the in silico data augmentation pipeline can help models perform well across modalities without the retraining or domain adaptation.
The contributions are as follows:
We introduced a new in silico 3D data augmentation technique for converting planning CT images to pseudo CBCTs with different scatter variations that comes with the added advantage of utilizing the ground truth contours from the high-quality CT images (rather than relying on the manual contouring on the noisy/artifact-ridden CBCT images).
Modified 3D-UNet architecture designed specifically for soft tissue segmentation.
Proposed model trained using ps-CBCT images segmented esophagus on the testing ps-CBCTs data with dice=0.74.
We validated our results using the real CBCTs (week 1 to week 6), relatively higher-quality breath-hold CBCTs (all manually segmented by an experienced radiation oncologist) achieving 0.72 and 0.70 dice overlap, respectively. The same model trained solely on ps-CBCTs was also validated on the high-quality pCTs, resulting in 0.77 dice overlap.
2 Materials and Method
Fig. 1 shows our entire workflow for generating pseudo-CBCTs which in turn are trained with 3D-UNet. The model was validated on low-quality weekly CBCTs, relatively high-quality breath-hold CBCTs and high-quality pCT.
The study included 60 locally advanced non-small cell lung cancer patients treated via intensity-modulated RT and concurrent chemotherapy. All patients had contrast enhanced high-quality planning CT (pCT) and 5/6 weekly CBCTs. A total of 60 pCT and 351 weekly CBCTs images (411 total images) were acquired. The pCT and wCBCT images resolutions were and , respectively. wCBCTs had smaller Superior-Inferior Field-Of-View (FOV) () compared to pCTs (), due to necessity of only focusing on the thoracic part of the lung cancer patients. Esophagus on both pCT and wCBCTs were delineated by an experienced radiation oncologist according to anatomic and contouring atlases of organs at risk  by using cavities to identify esophageal wall and the surrounding organs. The physician-generated esophagus contour represented the ground-truth to train our model. In addition, another 2 lung cancer patients with breath-hold CBCTs were included for external validation which contained less scatter artifacts/noises. These two patients underwent different treatment protocols and showed more irregular thoracic structures compared to our main cohort.
2.2 Generating Pseudo-CBCT images
. Since registration of cancer images are challenging due to lack of correspondence between the two images and loss/gain of large mass/volume throughout the therapy, the integrated B-spline regularization fits the deformation vector field to a B-spline object to capture large differences. This gives free-form elasticity to the converging/diverging vectors that represents a morphological shrinkage/expansion. Note that the purpose of planning CT and week1 CBCT registration was just to facilitate artifacts extraction and mapping and hence does not need to be really accurate. Since there is only a few days difference between the acquisition of planning CT and week1 CBCT the registration artifact mapping accuracy is least susceptible to uncertainties.
Then, different variations of scatter artifacts/noise were extracted from the week1 CBCTs using power-law adaptive histogram equalization (PL-AHE)  that contained the highest to the smoothest frequency components. Equations below define PL-AHE:
is the convolution function calculating gray-level difference (u-v) and Equation 2 defines the power-low transformation function over the convolution window and their neighbor pixels.
The advantage of PL-AHE compered to the traditional histogram equalization method are (i) it transforms each pixel with a power-law mapping function derived from a neighbourhood region controlled by the parameters and , instead of applying a strict pixel-size window histogram equalization, (ii) the parameters and control the degree of frequency for the extracted contrast in an image. Using =0 or =1, the algorithm acts closer to a high-pass filter hence enhance a dynamic range of intensity distribution, and with =1 or =0 the algorithm acts more similar to unsharp masking low-pass filter. (iii) By modifying different alpha and beta in the algorithm we could extract details in an image without significantly changing the desired structures. (iv) Selecting the combination of parameters makes it easy to extract different frequency components from an image along with a fast and flexible implementation. We used a fixed window radius ( pixels) to convolve over the image. We experimentally selected 7 different combinations of and to be able to cover large range of frequency components. The combinations were i)=0.5 =1 ii)=1 =0.5 iii)=0.5 =0.5 iv)=1 =0 v)=0.5 =0 vi)=0 =1 and vii)=0 =0.5.
Extracted CBCT artifacts were added to their corresponding pCT and intensities were scaled to [0,1]. Compared to pCT, CBCTs had smaller FOV, hence pCTs were cropped to their corresponding w1 CBCT FOV. Then, 2D x-ray projections were generated from the artifact-induced pCTs using the 3D texture memory linear interpolation of the integrated sinograms. The projections were reconstructed using iterative Ordered-Subset Simultaneous Algebraic Reconstruction Technique (OS-SART) to generate pseudo CBCTs (ps-CBCT). The parameters used for reconstruction were taken directly from the clinical practice for CBCT reconstruction: Detector size was , Detector pixel size was , Object size and resolution were the same as imported cropped pCT, Distance Detector Source was set to 1500mm, Distance Detector Object was 1000mm with Center offset=[-160mm, 0]. 500 projections were generated through 360 degree rotation.
ps-CBCT evaluation methods: ps-CBCTs were evaluated against their ground-truth CBCT by first qualitatively comparing their histogram distributions and quantitative using four types of similarity metrics i.e. structure similarity index metric (SSIM), root mean square error (RMSE), Cross Correlation (CC)  and Universal Quality Index (UQI) . The above pipeline provides a quick way of mapping the artifacts which are random and the deep learning models appear insensitive to the accuracy of the artifacts mapped to the images. Besides, two CBCT experts who were shown the pseudo-CBCT images could not distinguish these from real CBCT images and the accompanying quantitative measures comparing real and pseudo-CBCT images are given in the supplementary material (Figure 7).
2.3 3D-UNet model
We used 3D-UNet deep learning model Fig. 2
to train the ps-CBCTs. A multi-objective loss function combining dice coefficient and binary cross-entropy was used where convolution neural networks (CNN) extracted features from the ps-CBCTs. Then, CNN features were concatenated with the deconvolved features as a feedback Fig.2. Typically in a conventional UNet, convolutions to extract feature maps are performed over the same resolution of images in each stage and the extracted feature maps from the last layers of each encoded convolutions are sent to the decoder phase and concatenated to the deconvolved feature maps. In this work, we kept the same size/resolution of the features until the second layer, however down-sampled the convolved feature maps at the last layer. Then subsequently, sending the feature map from the second layer with higher resolution/size to the decoder phase for concatenation. The reason for this was as mentioned before, unlike a typical other structures such as tumor, heart etc. that have a sphere shape, esophagus is a long/curved tubed shape structure and covering the entire connected parts when passing the feedback to the decoder phase is important. Therefore, while we down-sample the final layer of each CNN to extract more details and pass to the next CNN of each encoder, however, we pass the second layer (not the last) to the decoder to make sure all the tube shape structure is involved. Feature maps illustrated in Fig. 2 depicts this concept.
In addition to commonly used weight regularization/dropouts to avoid overfitting, utilizing variations of pseudo CBCTs increased the training size to further lower the risk of overfitting. Moreover, we also applied geometric-based data augmentation including Sharpening (emphasis high frequency components), Sigmoid contrast (to emphasis the soft-tissue component), scale (1.2) /shear (8 degree) and scale (0.8)/rotate(5 degree).
The ps-CBCTs were split into 70/30 training (n=42x7) and testing cases (n=18x7) and fed to a 3D-UNet for esophagus segmentation using pCT esophagus contours as ground-truth. The model was externally validated on the weekly CBCTs (week 1 to week 6) and pCTs using Dice Similarity Coefficient (DSC) and Sensitivity of overlap between the physician-contoured and UNet-segmented esophagus.
For comparison, in addition to ps-CBCT model, we also trained the same model with high-quality pCT and week1 CBCT images (with geometric data augmentation only) to see how realistic is the in silico data augmentation. We then in addition to testing case of each pCT and week1 CBCT models, we validated the pCT model using also week1 CBCT images and the week1 CBCT model with pCT images as well as the weekly CBCTs.
3 Results and Discussion
The best reconstructed ps-CBCTs (=0.5 =1) had average SSIM0.89, RMSE0.05, CC0.97 and UQI0.95 in the cohort and the worst (=0 =0.5) had SSIM0.44, RMSE0.14, CC0.81 and UQI0.74 (Fig. 3) (see supplementary material for the histogram comparisons).
3.1 Quantitative evaluation of ps-CBCT images
Different variations of ps-CBCTs from the highest (top-right, ps-CBCT 1, SSIM=0.91) to the lowest (bottom-right, ps-CBCT 5, SSIM=0.76) similarity with the week1 CBCT are shown in Fig. 3 for a case along with their pCT and ground-truth week1 CBCTs. These ps-CBCTs mimic realistic scatter artifacts which could actually be seen in the weekly CBCT acquisitions in clinic.
3.2 Esophagus segmentation results
Fig. 4 shows segmented (red) vs. the ground-truth esophagus contours (green) of two typical testing cases, on pCT, week1 and week2 CBCTs and their ps-CBCT with the highest similarity to the ground-truth week1 CBCT. The model could segment esophagus on both high quality pCTs as well as low-quality week1 & week2 CBCTs accurately.
Validation results for all three different models i.e. ps-CBCT, week1 CBCT and pCT models are presented in Table 1. Also Fig. 5 shows segmentation results for each model. Interestingly pCT model, could segment pCT images with high DSC 0.76 but failed to segment w1 CBCT (DSC 0.63) and w1 CBCT model had a moderate to low accuracy for segmenting weekly CBCTs as well as pCT. However, using ps-CBCT model, average DSC for segmenting ps-CBCTs, weekly CBCTs, and pCT were 0.740.03, 0.720.05, and 0.770.04, respectively. Average sensitivities were 0.790.07 (ps-CBCT), 0.830.07 (w1 CBCTs) and 0.780.08 (pCTs). The segmentation results of ps-CBCT model on the breath-hold CBCT cases were DSC0.63 and DSC0.74 for the two patients, respectively (see supplementary material Figure 6). The ps-CBCT model trained on a realistic and in silico data, could not only segment esophagus on pCTs with high accuracy, it also segmented esophagus on later weeks CBCT with promising accuracy. This shows this model has a potential to segment any OAR on both weekly CBCTs and pCTs.
With respect to the limitations in our pseudo-CBCT generation, we tried to maintain the same FOV used in the clinic as the original CBCT image during reconstruction, and it could reduce the image quality at the superior/inferior borders due to data truncation, but it did not appear to affect the network training for esophagus segmentation.
4 Conclusion and Future Work
3D-UNet model trained on more realistic artifact-induced pCTs, could segment esophagus on both weekly CBCTs and pCTs with high accuracy for longitudinal imaging studies. The model has a potential to segment any OAR on CBCT/pCT and therefore, can be used as a cross-modality segmentation tool to provide image guidance. This shows that our in silico data augmentation spans the realistic noise/artifact spectrum across patient CBCT/pCT data and can generalize well across modalities. In the future, we plan to extend our method to other soft-organ tissues and include CBCTs with motion artifacts in our training set by mimicking the motion from 4DCT images of these patients.
-  Botros, M., Gore, E., Johnstone, C., Knechtges, P., Paulson, E.: Mr simulation for esophageal cancer: Imaging protocol and gross tumor volume comparison between mri, ct, and pet/ct. International Journal of Radiation Oncology• Biology• Physics 93(3), S191 (2015)
-  Chen, S., Yang, H., Fu, J., Mei, W., Ren, S., Liu, Y., Zhu, Z., Liu, L., Li, H., Chen, H.: U-net plus: Deep semantic segmentation for esophagus and esophageal cancer in computed tomography images. IEEE Access 7, 82867–82877 (2019)
-  Cohen, J.P., Luck, M., Honari, S.: Distribution matching losses can hallucinate features in medical image translation. In: International conference on medical image computing and computer-assisted intervention. pp. 529–536. Springer (2018)
-  Dong, X., Lei, Y., Wang, T., Thomas, M., Tang, L., Curran, W.J., Liu, T., Yang, X.: Automatic multiorgan segmentation in thorax ct images using u-net-gan. Medical physics 46(5), 2157–2168 (2019)
-  Feng, X., Qing, K., Tustison, N.J., Meyer, C.H., Chen, Q.: Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3d images. Medical physics 46(5), 2169–2180 (2019)
-  Feulner, J., Zhou, S.K., Hammon, M., Seifert, S., Huber, M., Comaniciu, D., Hornegger, J., Cavallaro, A.: A probabilistic model for automatic segmentation of the esophagus in 3-d ct scans. IEEE transactions on medical imaging 30(6), 1252–1264 (2011)
-  Grosgeorge, D., Petitjean, C., Dubray, B., Ruan, S.: Esophagus segmentation from 3d ct data using skeleton prior-based graph cut. Computational and mathematical methods in medicine 2013 (2013)
-  therapy oncology group, R., et al.: Atlases for organs at risk (oars) in thoracic radiation therapy. 2011 (2015)
-  Hao, Z., Liu, J., Liu, J.: Esophagus tumor segmentation using fully convolutional neural network and graph cut. In: Chinese Intelligent Systems Conference. pp. 413–420. Springer (2017)
-  Kim, J., Nuyts, J., Kyme, A., Kuncic, Z., Fulton, R.: A rigid motion correction method for helical computed tomography (ct). Physics in Medicine & Biology 60(5), 2047 (2015)
-  Riyahi, S., Choi, W., Liu, C.J., Nadeem, S., Tan, S., Zhong, H., Chen, W., Wu, A.J., Mechalakos, J.G., Deasy, J.O., et al.: Quantification of local metabolic tumor volume changes by registering blended pet-ct images for prediction of pathologic tumor response. In: Data Driven Treatment Response Assessment and Preterm, Perinatal, and Paediatric Image Analysis, pp. 31–41. Springer (2018)
-  Sorin, V., Barash, Y., Konen, E., Klang, E.: Creating artificial images for radiology applications using generative adversarial networks (gans)–a systematic review. Academic Radiology (2020)
-  Stark, J.A.: Adaptive image contrast enhancement using generalizations of histogram equalization. IEEE Transactions on image processing 9(5), 889–896 (2000)
Thor, M., Deasy, J., Iyer, A., Bendau, E., Fontanella, A., Apte, A., Yorke, E., Rimner, A., Jackson, A.: Toward personalized dose-prescription in locally advanced non-small cell lung cancer: Validation of published normal tissue complication probability models. Radiotherapy and Oncology138, 45–51 (2019)
-  Trullo, R., Petitjean, C., Nie, D., Shen, D., Ruan, S.: Fully automated esophagus segmentation with a hierarchical deep learning approach. In: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA). pp. 503–506. IEEE (2017)
-  Tustison, N.J., et al.: Explicit b-spline regularization in diffeomorphic image registration. Frontiers in neuroinformatics 7, 39 (2013)
-  Van Rossum, P., Van Lier, A., Lips, I., Meijer, G., Reerink, O., van Vulpen, M., Lam, M., van Hillegersberg, R., Ruurda, J.: Imaging of oesophageal cancer with fdg-pet/ct and mri. Clinical radiology 70(1), 81–95 (2015)
-  Velec, M., Moseley, J.L., Eccles, C.L., Craig, T., Sharpe, M.B., Dawson, L.A., Brock, K.K.: Effect of breathing motion on radiotherapy dose accumulation in the abdomen using deformable registration. International Journal of Radiation Oncology* Biology* Physics 80(1), 265–272 (2011)
-  Wang, G., Jiang, M.: Ordered-subset simultaneous algebraic reconstruction techniques (os-sart). Journal of X-ray Science and Technology 12(3), 169–177 (2004)
-  Xie, S., Yang, C., Zhang, Z., Li, H.: Scatter artifacts removal using learning-based method for cbct in igrt system. IEEE Access 6, 78031–78037 (2018)
-  Yang, J., Haas, B., Fang, R., Beadle, B.M., Garden, A.S., Liao, Z., Zhang, L., Balter, P., et al.: Atlas ranking and selection for automatic segmentation of the esophagus from ct scans. Physics in Medicine & Biology 62(23), 9140 (2017)
-  Yang, J., Veeraraghavan, H., Armato III, S.G., Farahani, K., Kirby, J.S., Kalpathy-Kramer, J., van Elmpt, W., Dekker, A., Han, X., Feng, X., et al.: Autosegmentation for thoracic radiation treatment planning: A grand challenge at aapm 2017. Medical physics 45(10), 4568–4581 (2018)
-  Zhang, H., Ouyang, L., Huang, J., Ma, J., Chen, W., Wang, J.: Few-view cone-beam ct reconstruction with deformed prior image. Medical physics 41(12), 121905 (2014)
-  Zhi, S., Duan, J., Cai, J., Mou, X.: Artifacts reduction method for phase-resolved cone-beam ct (cbct) images via a prior-guided cnn. In: Medical Imaging 2019: Physics of Medical Imaging. vol. 10948, p. 1094828. International Society for Optics and Photonics (2019)