1 Introduction
Chemoradiation (combining chemotherapy and radiotherapy) is one of the major regimens of treating cancers. The most significant bottleneck in the treatment flow is the delineation of treatment regions in radiotherapy (RT) plan on patients’ tomographic images. This process could take experienced radiation oncologists more than an hour. Since the beginning time of treatment has been shown to affect the outcomes of patients, we argue that solving this bottleneck is critical for patients. We focus on the RT treatment of esophageal cancer as our task. Having death rate higher than incidences, less than 20% of esophageal cancer patients were diagnosed at surgically resectable stage I disease and makes chemoradiation the primary treatment.
Fortunately, because of the adequate visual clue in tomography data, the delineation problem can be formulated for the convolutional neural network to solve. Instead of conventional 2D and 3D methods, we propose a novel convolutional gated graph network to tackle this task more efficiently. Specifically, given a sequence of PETCT images, we first use an encoder to extract the deep representations of the images. Then we design a graphbased mechanism for our propagator to bridge the communication between these image slices and propagate the information among them. (Throughout this paper, we will use the term “image” and “slice” interchangeably.) Finally, a decoder will aggregate the highresolution feature maps from the encoder and the slices propagated by propagator to predict the final segmentation map. The whole process is trained in an endtoend manner.
We summarize our contributions as below:

We propose to use a graph as the representation for the relationship between each slice in 3D medical images which are characterized by an adjacency matrix learned by our model.

We further design a propagator for slices to exchange information in feature space based on the graph representation. Experimental results show that our method significantly outperforms baseline methods.

Our work supports interactive setting, by editing one of the slices, the predictions of neighboring slices would be improved according to the user input. Such a setting is more suitable to the clinical scenario and more efficient for doctors to refine the predictions.

We collect a new Esophageal Cancer Radiotherapy Target Treatment Contouring dataset. To the best of our knowledge, it is the radiotherapy contouring dataset with the most patients included for esophageal cancer annotated by one or more qualified radiation oncologists.
2 Related Work
Neural Network on Graphs.
There have been several approaches which apply neural networks to graphstructured data. One is to perform graph convolution in the spectral domain, while the other applies the neural network to each node in the graph. Among them, Acuna et al. acuna2018efficient used Gated Graph Neural Network (GGNN) li2015gated to refine each vertex in the predicted polygon segmentation mask. Yan et al. yan2018spatial used a spatialtemporal graph convolutional networks to process the skeleton graph for action recognition. We exploit the idea in li2015gated to our medical image data by using an adjacency matrix to construct the graph, through which the information is propagated between image slices.
3D Biomedical Image Segmentation.
In clinical practice, especially oncology, physicians heavily rely on 3D medical images to make diagnoses or treatment plans. Various CNN based image segmentation methods have been developed to give direct or indirect assistance in the medical procedures and related biological researches. Havaei et al. HAVAEI201718 proposed a model which input is 2D Image patch from 3D brain MRI volume to segment the region of brain tumors. 3DUNet ; VNet used 3D convolutional kernels for the volumetric segmentation of Xenopus kidney and prostates respectively. Chen2016CombiningFC ; tseng2017joint
leveraged the combination of CNN and the recurrent unit for exploiting the intra and interslice contexts of image volumes of 3D neuron structures and the human brain. In our work, we design an information propagator to exchange information between slices in the 3D medical image volume, which is demonstrated to predict RT planning contours of good quality.
Interactive Segmentation. To generate an accurate treatment plan which meets radiologists’ knowledge and experience, cooperation with experts is one of the most crucial issues in our application. In fact, userassisted segmentation for the 3D medical image has been studied for years. Methods include modified GrowCut algorithm zhu2014effective , an interactive network amrehn2017ui
, and weighted loss function which can incorporate with optional user scribble input
wang2018interactive . While the previous methods require the user to edit each predicted image, our model aims to learn essential information in high dimensional feature space and then propagated those features for prediction refinement of neighboring slices.3 Approach
Notations
Given n stacked PETCT images with height and width , we compile them to form a sequence of slices . We then feed the sequence of stacked images to our model to produce , the treatment region prediction of the sequence’s middle image. Since our model is an encoderdecoder architecture, mentioned later in Sec. 3.1, we are able to obtain a deep representation for slice in the sequence and for the whole sequence. In the following sections, the convolution operation will be denoted as and could be followed with a bias term
and an activation function
. We also defined a function to reshape its high dimensional inputs to a matrix, , and its inverse mapping, .3.1 Model Architecture
Overview.
For each sequence , our model first uses UNetlike contraction part to obtain the feature maps of the images with various resolutions (see Figure 2 (a)). Then our Gated Graph Propagator (GGP, elaborated in the next paragraph) propagates spatial information throughout all slices in the sequence. (see Figure 2 (b)) Subsequently, the decoder combines deep representations of slices and highresolution features from the skip connection of our encoder to perform upsampling (see Figure 2 (c)). Finally, a pixelwise softmax will output the final segmentation mask as our prediction. We use DICE loss to train our network.
Gated Graph Propagator.
We propose to use GGNN as the backbone of our GGP. Given produced from our encoder, we set up a learnable adjacency matrix , where is the set of indices for each slice in the sequence and is the set of indices for different types of edges. can be used to establish and weight the relationship between slices. For each pair of slice and , there will be a type of convolutional kernel to extract essential information from , weighted by :
(1) 
(2) 
Same as li2015gated
, we use Gated Recurrent Unit like mechanism to update
:Interactive Setting. Furthermore, we collect representations of a sequence of slices, , with one of the representations for slice was replaced with the user input representation to form the propagator for the interactive setting. We adopt the features from the last convolutional layer as the representations (see Figure 2 (d)). However, can’t be obtained directly, we approximate the representation for user input, , by iteratively updating the original features from the last convolutional layer until the prediction is close enough to the user input while inferencing :
(7) 
where denotes the learning rate, is the user input, is the last convolutional layer and softmax and is the NLL loss. Once is obtained, we then train with a handcrafted adjacency matrix, where values of each entry are assigned according to its distance to slice . Then other slices in the sequence would be able to update its own representation regarding the information from the .
4 Experiments
Dataset.
Our Esophageal Cancer Radiotherapy Target Treatment Contouring dataset, which contains PETCT/RTCT images of 81 patients treated between 2015 and 2017, was collected through the cooperation with Chang Gung Memorial Hospital Linkou branch. The scan resolution is pixels with slice thickness ranging from 2.5 to 5 mm. PET and CT are considered two kinds of input modalities; the segmented targets for RT are Gross Tumor Volume (GTV), Clinical Target Volume (CTV), and Planning Target Volume (PTV).
Experimental Settings and Results.
To evaluate our method, we chose the most commonly used CNNbased segmentation frameworks: 2D UNetUNet and 3D UNet3DUNet as our baselines. Since we focus on predicting the accurate radiotherapy target contouring, we trained and tested with valid slices that contain tumors. That is, given the volume where oncologists want to deliver radiation, we are able to segment the GTV, CTV and PTV regions precisely. For quantitative comparison, following metrics in HAVAEI201718 , we use DSC, Sensitivity, and Specificity and ran with fivefold crossvalidation. Table. 1 and Figure. 2 shows that our noninteractive method outperforms all the baselines both qualitatively and quantitatively. In the interactive setting, the results are evaluated on the selected sequences with “min median DSC” of GTV for each patient, in the exclusion of the slice we have reconstructed the features. As shown in Table. 2, the results are further improved after the interaction.
5 Conclusions
In this paper, we propose an endtoend network for radiotherapy target contouring. We combine an encoderdecoder with a convolutional Gated Graph Propagator, featured by a learnable adjacency matrix to exchange information with neighbor slices. In the interactive setting, we apply an iterative method for user input based feature reconstruction of a certain slice further enhancing the segmentation results of its neighbors. The experimental result on our Esophageal Cancer Radiotherapy Target Treatment Contouring dataset shows our system’s capability of predicting RT planning contours interactively and efficiently, thus its suitability for clinical scenarios.
References

(1)
D. Acuna, H. Ling, A. Kar, and S. Fidler.
Efficient interactive annotation of segmentation datasets with
polygonrnn++.
In
Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
, 2018.  (2) M. Amrehn, S. Gaube, M. Unberath, F. Schebesch, T. Horz, M. Strumia, S. Steidl, M. Kowarschik, and A. Maier. Uinet: Interactive artificial neural networks for iterative image segmentation based on a user model. arXiv:1709.03450, 2017.

(3)
J. Chen, L. Yang, Y. Zhang, M. S. Alber, and D. Z. Chen.
Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation.
In Advances in Neural Information Processing Systems, 2016.  (4) Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3d unet: learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 424–432. Springer, 2016.
 (5) M. Havaei, A. Davy, D. WardeFarley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.M. Jodoin, and H. Larochelle. Brain tumor segmentation with deep neural networks. Medical Image Analysis, 2017.
 (6) Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. In Proceedings of International Conference on Learning Representations, 2015.
 (7) F. Milletari, N. Navab, and S. Ahmadi. Vnet: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision, pages 565–571, 2016.
 (8) O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 234–241. Springer, 2015.
 (9) K.L. Tseng, Y.L. Lin, W. Hsu, and C.Y. Huang. Joint sequence learning and crossmodality convolution for 3d biomedical segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017.

(10)
G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel,
A. L. David, J. Deprest, S. Ourselin, et al.
Interactive medical image segmentation using deep learning with imagespecific finetuning.
IEEE Transactions on Medical Imaging, 2018.  (11) S. Yan, Y. Xiong, and D. Lin. Spatial temporal graph convolutional networks for skeletonbased action recognition. arXiv:1801.07455, 2018.
 (12) L. Zhu, I. Kolesov, Y. Q. Gao, R. Kikinis, and A. Tannenbaum. An effective interactive medical image segmentation method using fast growcut. In MICCAI Interactive Medical Image Computing Workshop, 2014.
Comments
There are no comments yet.