The implementation and availability of high-throughput computing has made it possible to extract innumerable features from medical imaging datasets. These extracted features can reveal disease related characteristics that can relate to prognosis . The process of converting visual imaging data into mineable quantitative features is referred to radiomics . Radiomics is an emerging field of translational research in medical imaging where the modalities include digital radiography, magnetic resonance imaging (MRI), computed tomography (CT), combined positron emission tomography – computed tomography (PET-CT) etc. The range of medical imaging modalities is wide and, in essence, these modalities provide information about structure, physiology, pathology, biochemistry and pathophysiology . PET-CT, for example, combines the sensitivity of PET in detecting regions of abnormal function and the specificity of CT in depicting the underlying anatomy of where the abnormal functions are occurring. Multi-modality PET-CT, therefore, is regarded as the imaging modality of choice for the diagnosis, staging and monitoring the treatment response of many cancers . Conventional radiomics studies mainly focus on encoding regions of interest (e.g., tumors), with hand-crafted features, such as intensity, texture, shape, etc. These features are used to build conventional predictive models such as multivariable statistical analysis 
, support vector machine (SVM)
and random forest. Unfortunately, these methods rely on prior knowledge in hand-crafting image features and tuning of a large number of parameters for building the predictive models.
Radiomics methods based on convolutional neural networks (CNN) are regarded as the state-of-the-art because they can learn high-level semantic image information in an end-to-end fashion. CNN-based radiomics methods were mainly designed for 2D single-modality images such as CT [11, 4] and MRI . For the limited methods that attempted to fuse multi-modality images, the focus was on fusing the image features that were separately extracted from the individual modalities [13, 14, 16]. In addition, these methods required human expertise to design the dataset specific architectures e.g., the number of convolutional layers, the layer to fuse multi-modality image features. Architecture design and optimization require a large amount of domain knowledge such as in validating the architecture performance and tuning the hyper-parameters. Neural architecture search (NAS) has recently been proposed to simplify the challenges in architecture design by automatically searching for an optimal net-work architecture based on a given dataset. The NAS thus enables reduced manual input and reliance on prior knowledge . Investigators have attempted to apply the NAS for single medical imaging modality related tasks but the main focus has been on segmentation [1, 5].
We propose a multi-modality NAS method (MM-NAS) to search for a multi-modality CNN architecture for use in PET-CT radiomics. Our contribution, when compared to existing methods includes: (i) the ability to build an optimal, fully-automated radiomics CNN architecture; (ii) enabling an optimal fusion of PET-CT images for radiomics. Our method finds various fusion modules e.g., fusion via different network operations (e.g., convolution, pooling, etc.) at different stages of the network. These searched fusion modules provide more options for integrating the complementary PET and CT data. We outline how our approach can predict the development of distant metastases (DM) in patients with soft-tissue sarcomas (STSs). STSs include slow-growing, more well-differentiated tumors, aggressive tumors that grow rapidly and spread to other organs (distant metastases – DM) and more intermediate that behave between the two extremes [2, 21]. The early identification of patient who may develop metastatic disease may contribute to improved care and better patient outcomes.
We used a public PET-CT STSs dataset from the cancer imaging archive (TCIA) repository [19, 3]. The dataset has 51 multi-modality PET-CT scans derived from 51 patients with pathology-proven STSs. DM were confirmed via biopsy or diagnosed by an expert clinician. Three patients without clear metastases information was excluded. Thus, our dataset consists of 48 studies, half of which developed DM.
2.2 Neural Architecture Search Setting
We followed the existing NAS methods [23, 15, 17] and focused on searching of different computational cells (normal, reduction) to improve the computational efficiency. The computational cells are the basic unit that can be stacked multiple times to form a CNN. A NAS workflow is as follows: (i) based on the given training data, search for optimal cell structure that can form a CNN; and (ii) train the searched CNN based on the training data and then evaluate on the testing data. In our MM-NAS (as shown in Fig. 1
), every cell is regarded as a directed acyclic graph consisting of two inputs, one output and several ordered nodes. Our MM-NAS has normal and reduction cells. The input and the output feature maps of a normal cell have the same dimensions. The reduction cell doubles the channel number and reduces the input feature map by half. A stem block consists of a 3D convolutional layer and a batch normalization layer and is used for input image transitions. In our method, the outputs of PET and CT stem blocks are separately fed into the first normal cell to facilitate the fusion process. Then the output feature maps of the first normal cell flows into the first reduction cell with the sum of PET and CT image, which is also processed by one stem block. The rest of the reduction cells used the output feature maps from the previous two layers as input. For DM prediction, the output feature maps of last reduction cell were fed into two convolutional layers and one fully connected layer for classification.
2.3 Optimization Strategy
Each intermediate node inside a cell is a feature map. We represent the searched operations on edge using the vector and the vector of all optional operations as , where denotes the set of optional operations, denotes the parameters of the operation on edge . Then the intermediate nodes can be computed by sum of all their predecessors:
As the possible operations are mixed through a SoftMax function, this makes the search space continuous:
denotes a probability distribution over the operation set.
Denote by and the training and the validation loss. Because both losses are determined not only by the architecture , but also the weights in the network, where , is the computational cell. The aim of searching the best architecture is to find a proper that minimizes the validation loss , where the weights associated with the architecture are obtained by minimizing the training loss:
2.4 Implementation Details
We implemented our MMR-NAS in PyTorch. The input image size was fixed to 112112144. The operation set
for each cell includes 3D standard convolutions, 3D separable convolutions, 3D dilated convolutions, 3D max pooling, 3D average pooling, skip connections and zero operations. All operations are of stride one (if applicable) and the kernel size of pooling operations are 3. The kernel size for the convolutional operations can either be 3 or 5. Cross-entropy loss was used during the architecture search step for training optimization. The parameters of each cell were optimized by Adam with a learning rate of 0.0005 while the weight in the whole network was optimized by SGD with a learning rate of 0.0001, and the batch size was set to 1. It took about 3 minutes to process one epoch with 40 PET-CT volumetric training images, and the best architecture was obtained at epoch 70 out of total 200 epochs. Cross-entropy loss with Adam was used for training optimization in the second step for training the searched architecture. Learning rate was set to with 0.001 and batch size was set to 1. It took 2 minutes to train one epoch, the best model was obtained at approximately epoch 80 out of 200 epochs. All the experiments were conducted on a 11GB NVIDIA GeForce GTX 2080Ti GPU.
2.5 Experimental Setup
We conducted the following experiments: (a) a comparison with the state-of-the-art radiomics methods; (b) compared the performance of using multi-modality CNNs to single-modality CNNs; and (c) compared the performance of using 2D CNNs with 3D CNNs for radiomics. In experiment (a), we compared our MM-NAS with the following methods: (i) HC+RF – we followed the conventional radiomics method 
used hand-crafted (HC) features (e.g. intensity solidity, skewness, grey-level co-occurrence matrix features, etc.) extracted from tumor region with random forest (RF) as the classifier for predication; (ii) DLHN – a deep learning based head & neck cancer outcome (e.g., DM, loco-regional failure, and overall survival) prediction; (iii) 3DMCL – a deep learning based 3D based multi-modality collaborative learning for distant metastases prediction with PET-CT images 
. We used a 6-fold cross-validation approach for the MM-NAS and the comparison methods. In each-fold cross-validation, we used 40 PET-CT images for training and the remaining 8 images for testing. Six well established evaluation metrics were used for comparison including accuracy (acc.), precision (pre.), F1 score (F1) and area under the receiver-operating characteristic curve (AUC).
The receiver-operating characteristic (ROC) curve is shown in Fig. 2. It shows that our 2D MM-NAS achieved better performance when compared with 2D CNN based methods. Our 3D MM-NAS outperformed other 3D CNN based comparison methods and achieved the overall best performance.
Table 1 and Table 2 present results of 3D MM-NAS achieving the best outcomes in all measures with AUC value of 0.896, accuracy of 0.896, sensitivity of 0.917, specificity of 0.875, precision of 0.880, and F1 score of 0. 898.
|2D MM-NAS (Ours)||0.750||0.833||0.667||0.714||0.769||0.711|
|3D MM-NAS (Ours)||0.896||0.917||0.875||0.880||0.898||0.896|
|2D CT CNN||0.583||0.708||0.458||0.567||0.630||0.503|
|2D PET CNN||0.729||0.542||0.917||0.867||0.667||0.656|
|2D PET-CT CNN||0.729||0.792||0.667||0.703||0.745||0.698|
|2D MM-NAS (Ours)||0.750||0.833||0.667||0.714||0.769||0.711|
|3D CT CNN||0.667||0.667||0.667||0.667||0.667||0.684|
|3D PET CNN||0.771||0.750||0.792||0.783||0.766||0.734|
|3D PET-CT CNN||0.792||0.792||0.792||0.792||0.792||0.773|
|3D MM-NAS (Ours)||0.896||0.917||0.875||0.880||0.898||0.896|
Our main findings are that our MM-NAS: (i) performs better than the commonly used radiomics methods and, (ii) derives optimal multi-modality radiomic features from PET-CT images; (iii) removes the reliance on prior knowledge when building the optimal CNN architecture.
We attribute the improved performance of our MM-NAS to the search of the optimal computation cells, within the NAS, that allowed for fusing multi-modality image features at different stages of the network. Existing approaches often choose to fuse the separately extracted feature maps after several convolutional / pooling layers (see Fig. 3
). Our derives cell structure offers more freedom to integrate multi-modality images via various operations and connections, thus producing the optimal radiomic features to predict distant disease. The state-of-the-art method 3DMCL outperformed HC+RF and DLHN due to the collaborative learning of both pre-defined radiomic features and deep features, whereas our MM-NAS obtained better performance over all the evaluation metrics without feature handcrafting. Thus, the elimination of prior knowledge could contribute to a better generalizability for applications in other radiomics studies.
The differences between PET-CT CNN and CNN with PET or CT alone show the advantage of incorporating multi-modality information. Across the single modality CNNs, PET-based methods outperformed CT-based methods. We ascribe this to the functional features, which can better characterize the tumor, when compared to anatomical features from CT that rely on changes in size which are often a later development. Such features from PET could potentially uncover functional information that relate to the biological behavior of tumors .
The relatively poor performance of 2D CNNs when compared to 3D CNNs is expected. This is attributed to the fact that volumetric image features derived from 3D CNNs are better able to derive spatial information e.g., volumetric tumor shape and size. Spatial information has strong correlations to the DM predictions .
We have outlined a multi-modality neural architecture search method (MM-NAS) for PET-CT to predict the development of distant disease (metastases) in patient with STSs. Our method automatically searched for a multi-modality CNN based radiomics architecture and the architecture can then be used to fuse and derive the optimal PET-CT image features. Our results show that our PET-CT image features are the most relevant for predicting distant metastases.
-  (2019) Resource optimized neural architecture search for 3d medical image segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 228–236. External Links: Cited by: §1.
-  (1999) Multifactorial analysis of the survival of patients with distant metastasis arising from primary extremity sarcoma. Cancer: Interdisciplinary International Journal of the American Cancer Society 85 (2), pp. 389–395. Cited by: §1.
-  (2013) The cancer imaging archive (tcia): maintaining and operating a public information repository. Journal of digital imaging 26 (6), pp. 1045–1057. Cited by: §2.1.
-  (2019) Deep learning in head & neck cancer outcome prediction. Scientific reports 9 (1), pp. 1–10. Cited by: §1, §2.5, Table 1.
-  (2019) Neural architecture search for adversarial medical image segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 828–836. External Links: Cited by: §1.
-  (2018) Neural architecture search: a survey. arXiv preprint arXiv:1808.05377. Cited by: §1.
-  (2019) Radiomics: data are also images. Journal of Nuclear Medicine 60 (Supplement 2), pp. 38S–44S. Cited by: §1.
-  (2017) Characterization of pet/ct images using texture analysis: the past, the present… any future?. European journal of nuclear medicine and molecular imaging 44 (1), pp. 151–165. Cited by: §1, §4.
-  (2018) Deep learning for lung cancer prognostication: a retrospective multi-cohort radiomics study. PLoS medicine 15 (11), pp. e1002711. Cited by: §4.
-  (2010) Machine learning study of several classifiers trained with texture analysis features to differentiate benign from malignant soft-tissue tumors in t1-mri images. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 31 (3), pp. 680–689. Cited by: §1.
-  (2017) Discovery radiomics for pathologically-proven computed tomography lung cancer prediction. In International Conference Image Analysis and Recognition, pp. 54–62. Cited by: §1.
-  (2012) Radiomics: extracting more information from medical images using advanced feature analysis. European journal of cancer 48 (4), pp. 441–446. Cited by: §1.
-  (2017) A deep learning-based radiomics model for prediction of survival in glioblastoma multiforme. Scientific reports 7 (1), pp. 1–8. Cited by: §1.
-  (2017) Deep learning based radiomics (dlr) and its usage in noninvasive idh1 prediction for low grade glioma. Scientific reports 7 (1), pp. 1–11. Cited by: §1.
-  (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055. Cited by: §2.2.
-  (2019) Deep multi-modality collaborative learning for distant metastases predication in pet-ct soft-tissue sarcoma studies. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3658–3688. Cited by: §1, §2.5, Table 1.
-  (2018) Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268. Cited by: §2.2.
-  (2018) Radiomics: the facts and the challenges of image analysis. European radiology experimental 2 (1), pp. 1–8. Cited by: §1.
-  (2015) A radiomics model from joint fdg-pet and mri texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Physics in Medicine & Biology 60 (14), pp. 5471. Cited by: §1, §2.1.
-  (2017) Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer. Scientific reports 7 (1), pp. 1–14. Cited by: §1, §2.5, Table 1.
-  (2010) Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467 (7319), pp. 1114–1117. Cited by: §1.
-  (2019) A deep learning radiomics model for preoperative grading in meningioma. European journal of radiology 116, pp. 128–134. Cited by: §1.
-  (2018) Learning transferable architectures for scalable image recognition. In , pp. 8697–8710. Cited by: §2.2.