1 Introduction
Deep learning based segmentation algorithms play a key role in medical applications [1, 2, 3]. However, designing highly accurate and efficient deep segmentation networks is not trivial. It is because manual exploration of highperformance deep networks requires extensive research by close supervision of human expert (from several months to several years) and huge amount of time and resources due to training time of networks. Considering that the choice of architecture and hyperparameters affects the segmentation results, it is extremely important to select the optimal hyperparameters. In this study, we address this pressing problem by developing a proof of concept optimization algorithm for network architecture design, specifically for medical image segmentation problems.
Our proposed method is generic and can be applied to any medical image segmentation task. As a proof concept study, we demontrate its efficacy by automatically segmenting heart structures from cardiac magnetic resonance imaging (MRI) scans. Our motivation comes from the fact cardiac MRI plays a significant role in quantification of cardiovascular diseases (CVDs) such that radiologists need to measure the volume of heart and its substructures in association with the cardiac function. This requires a precise segmentation algorithm available in the radiology rooms.
In recent years, the CNN based deep learning algorithms become the natural choice for medical image segmentation tasks. However, the stateoftheart CNNbased segmentation methods have very similar fixed network architectures and they all have been designed with a trialanderror basis. SegNet [2], CardiacNet [3], and UNet [1] are some of the notable approaches from the literature. To design such networks, experts have often large number of choices involved in design decisions, and manual search process is significantly guided by intuition. To address this issue, there is a considerable interest recently for designing the network architecture automatically. Reinforcement Learning (RL) [4] and evolutionary based algorithms [5] are proposed to search the optimum network hyperparameters. Such methods are computationally expensive and require a large number of processors (as low as 800 GPUs in Google’s network search algorithm [4]) and may not be doable for a widespread and more general use. Instead, in this paper, we propose a conceptually simple and very efficient network optimization search algorithm based on a policy gradient (PG) algorithm. PG is one of the successful algorithms in robotics field [6] for learning system design parameters. Another example is by Zoph and Le [4]
where authors used LSTM (long short term memory) to learn the hyperparameters of the CNN and the PG was used to learn the parameters of the LSTM. Learning parameters of LSTM need considerable amount of resources as it is discussed in
[4]. Unlike that indirect parameter estimation, in this paper we propose a PG algorithm to directly learn network hyperparameters. Our proposed approach is inspired by
[6] and it has been adapted to deep network architecture design for performing image segmentation tasks with high accuracy. In this study, to make the whole system economical to implement for wide range of applications, search space is significantly restricted.The overview of the proposed method is illustrated in Fig. 1. The hyperparameters of the network are considered as policies to be learned during PG training. To our best of knowledge, this is the first study to find optimum hyperparameters of a given network with policy gradient directly. Moreover, our proposed baseline architecture of densely connected encoderdecoder CNN and the use of Swish function as an alternative to ReLU are novel and superior to the existing systems. Lastly, our study is the first medical image segmentation work with a fully automated algorithm that discovers the optimal network architecture.
2 Methods
2.1 Policy Gradient
Policy gradient is a class of reinforcement learning (RL) algorithms and relied on optimization of parametrized policies with respect to a expected return (reward) [7]. Unlike other RL methods (such as QLearning), the PG learns the policy function directly to maximize receiving rewards. In our setting, we consider each hyperparameter of the network as a policy, which can be learned during network training. Assume that we have a policy , indicating the hyperparameters of the network, where is the number of hyperparameters (dimensions). Our objective is to learn these hyperparameters (i.e., policies) by maximizing a receiving reward. In segmentation task, this reward can be anything measuring the goodness of segmentations such as dice index and Hausdorff distances. Once we randomly initialize hyperparameters, we generate new policies by randomly perturbing the policies in each dimension. Note that each dimension represents an exploration space for hyperparameters such as filter width, hight, and etc. Let be random perturbation generated near , represented as for . For each random perturbation, , we assume that is randomly chosen from for every where epsilon is derivative of a function with respect to (Later we will define and for each dimension in Section 2.3).
The network is trained with these generated policies, and reward (segmentation outcome) is obtained for each policy. Finally, the maximal reward (i.e., highest dice coefficient) is determined to set the optimal network architecture hyperparameters accordingly. To estimate the partial derivative of the policy function for each dimension, each perturbation is grouped to nonoverlapping categories of negative perturbation, zero perturbation, and positive perturbation: , , and such that . The perturbations are generated to make sure each category has approximately members. Then, the absolute reward for each category is calculated as a mean of all the rewards for each dimension d. Based on this average reward, the initial policy is updated accordingly:
(1) 
The pseudocode for policy gradient is given in algorithm 1.
2.2 Proposed BaseArchitecture for Image Segmentation
As it has been shown in [2, 3], the encoderdecoder architecture is well design deep learning architecture for the segmentation tasks. More recently, the densely connected CNN [8] has been shown that connecting different layers lead into more accurate results for detection problem. Based on this recent evidence, a densely connected encoderdecoder is proposed herein as a new CNN architecture and we use this as our baseline architecture to optimize. The proposed baseline architecture is illustrated in Figure 2
. Dense blocks consist of four layers, each layer includes convolution operation following by batch normalization operation (BN) and
Swishactivation function [9] (unlike commonly used ReLU). Also, a concatenation operation is conducted for combining the feature maps (through direction (axis) of the channels) for the last three layers. In other words, if the input to layer is , then the output of layer can be represented as:(2) 
where and as it is discussed in [9], the Swish was shown to be more powerful than ReLu since parameter
can be learned during training to control the interpolation between linear function (
) and ReLu function (). Since we are doing concatenation before each layer (except the first one), so the output of each layer can be calculated only by considering the input and output of first layer as:(3) 
where is the concatenation operation. For initialization and are considered as and , respectively, which is an empty set and there are layers inside each block.
The decoder part of the CNN consists of three dense blocks and two transition layers. The decoder transition layers can be average pooling or max pooling
and decrease the size of the image by half. In the encoder part, we have same architecture as decoder part except that the transition layers are bilinear interpolation (i.e., unpooling). Each of the decoder transition doubles the size of the feature maps and at the end of this part, we obtain features maps as the same size as input images. Finally, the output of the decoder is passed through a convolution and softmax to produce the probability map.
Adam optimizer with a learning rate of 0.0001 is selected for training and Cross Entropyis used as a loss function. The other hyperparameters of network such as number of filters, filter heights, and widths for each layer are discussed in next section.
2.3 Learnable Hyperparameters
Following hyperparameters are learned automatically with our proposed architecture search algorithm: number of filters, filter height, and filter width for each layer. Additionally, type of pooling layer was considered as learnable hyperparameters in our setting. Totally, there are 76 parameters (N) to be learned: 3 parameters (filter size, height, and weight) for each of 25 layers (last layer has fixed number of filters), and 2 additional hyperparameters (average or max pooling) for downsampling layers. More specifically:

Number of filters: The number of filters (NF) for each layer is chosen from function which .

Filter height: The filter height (FH) for each layer is chosen from function which .

Filter width: The filter width (FW) for each layer is chosen from function which .

Pooling functions: The pooling layer is chosen from function which which ’0’ represents max pooling and ’1’ represents average pooling.
The number of generated perturbation is considered as 42 (experimentally) and in order to decrease the computational cost, each network is trained for 50 epochs, which is adequate to determine a stable reward for the network. The average of dice index for the last 5 epochs on the heldout validation set is considered as reward for the reinforcement learning.
3 Experiments and Results
Dataset: We used Automatic Cardiac Diagnosis Challenge (ACDCMICCAI Workshop 2017) data set for evaluation of the proposed system. This dataset is composed of 150 cineMR images including 30 normal cases, 30 patients with myocardium infarction, 30 patients with dilated cardiomyopathy, 30 patients with hypertrophic cardiomyopathy, and 30 patients with abnormal right ventricle (RV). While 100 cineMR images were used for training, the remaining 50 images were used for testing. We have applied data augmentation methods, as described in Table 1, prior to training. The MR images were obtained using two MRI scanners of different magnetic strengths (1.5T and 3.0T). Cine MR images were acquired in breath hold (and gating) with a SSFP sequence in short axis. A series of short axis slices cover the LV from the base to the apex, with a thickness of 5 mm (or sometimes 8 mm) and sometimes an interslice gap of 5 mm. The spatial resolution goes from 1.37 to 1.68 and 28 to 40 volumes cover completely or partially the cardiac cycle.
Implementation details: We calculated dice index (DI) and Hausdorff distance (HD) to evaluate segmentation accuracy (blind evaluation through challenge web page on the test data). The quantitative results for LV (left ventricle), RV, and Myo (myocardium) as well as mean accuracy (Ave.) are shown in Table 2. Twenty images were randomly selected out of the 100 training images as validation set. After finding optimized hyperparameters, the network with learned hyperparameters was trained fully with the augmented data. The augmentation was done with inplane rotation and scaling (Table 1). The number of images increased by factor of five after augmentation.
PostProcessing: To have a fair comparison with other segmentation methods, which often use postprocessing for improving their segmentation results, we also applied postprocessing to refine (improve) the overall segmentation results of all compared methods. We presented our results with and without postprocessing in Table 2. Briefly, a 3D fully connected Conditional Random Field (CRF) method was used to refine the segmentation results, taking only a few additional milliseconds. The output probability map of the CNN is used as unary potential and a Gaussian function was used as pairwise potential. Finally, a connected component analysis was applied for further removal of isolated points.
Comparison to other methods: The performances of the proposed segmentation algorithm in comparison with stateoftheart methods are summarized in Table 2. The DenseCNN (with ReLu and with Swish) is the densely connected encoderdecoder CNN designed by experts, and its use in segmentation tasks recently appeared in some few applications, but never used for cardiac segmentation before. Filter sizes were all set to in DenseCNN and growth rates were considered as 32, 64, 128, 128, 64, and 32 for each block from beginning to the end of the network, respectively. Also, the average pooling is chosen as the pooling layer. These values were all found after trialerror and empirical experiences, guided by expert opinions as dominant in this field. The 2D UNet, as one of the state of the arts, is the original implementation of the UNet architecture proposed by Ronneberger et al. in [1] was used for comparison too. Although we apply our algorithm into 2D setting for efficiency purpose, one can apply it to 3D architectures once memory and other hardware constraints are solved. The details of the learned architecture with the proposed method is shown in Fig. 3.
We obtained the final architecture design in 10 days of continuous training of a workstation with 15 GPUs (Titan X). Unlike the common CNN architecture designs (expert approach), which requires months or even years of trialanderror and experience guided search, the proposed search algorithm found optimal (or nearoptimal) segmentation results compared to the state of the art segmentation architectures within days.
Methods 



Proposed  Proposed+CRF  
LV  0.904  0.913  0.922  0.921  0.928  
RV  0.868  0.826  0.834  0.857  0.868  
MYO  0.847  0.832  0.845  0.838  0.849  
DI  Ave.  0.873  0.857  0.867  0.872  0.882  
LV  9.670  9.15  8.937  8.99  8.90  
RV  14.37  16.35  16.31  14.27  14.13  
MYO  12.13  11.32  11.28  10.70  10.66  
HD (mm)  Ave.  12.06  12.27  13.02  11.32  11.23 
4 Discussions and Conclusion
We proposed a new deep network architecture to automatically segment cardiac cine MR images. Our architecture design was fully automatic and based on policy gradient reinforcement learning. After baseline network was structured based on densely connected encoderdecoder network, the policy gradient algorithm automatically searched the hyperparameters of this network, achieving the state of the art results. Note that our hypothesis was to show that it was possible to design CNN automatically for medical image segmentation with similar or better performance in accuracy, and much better in efficiency. It is because expertdesign networks require extensive trialanderror experiments and may take even years to design. Our study has opened a new venue for designing a segmentation engine within a short period of time. Our study has some limitations due to its proof of concept nature. One interesting way to extend the proposed model will be to learn hyperparameters conditionally in each layer (unlike independent assumption of the layers). With the availability of more hardware sources, one may explore many more hyperparameters, such as ability to put more layers than basic model, defining skipconnections, and exploring different activation functions instead of ReLU and other default ones. One may also avoid increasing search space and still perform a good architecture design automatically by choosing the basearchitecture more powerful ones such as the SegCaps (i.e., segmentation capsules) [10].
References
 [1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “Unet: Convolutional networks for biomedical image segmentation,” in MICCAI. Springer, 2015.
 [2] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla, “Segnet: A deep convolutional encoderdecoder architecture for image segmentation,” arXiv preprint arXiv:1511.00561, 2015.
 [3] Aliasghar Mortazi, Rashed Karim, Kawal Rhode, Jeremy Burt, and Ulas Bagci, “Cardiacnet: Segmentation of left atrium and proximal pulmonary veins from mri using multiview cnn,” in MICCAI. Springer, 2017, pp. 377–385.
 [4] Barret Zoph and Quoc V Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
 [5] Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci, “A hypercubebased encoding for evolving largescale neural networks,” Artificial life, vol. 15, 2009.
 [6] Nate Kohl and Peter Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on. IEEE, 2004, vol. 3.
 [7] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.
 [8] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten, “Densely connected convolutional networks,” arXiv preprint arXiv:1608.06993.
 [9] Prajit Ramachandran, Barret Zoph, and Quoc V Le, “Searching for activation functions,” 2017.
 [10] Rodney LaLonde and Ulas Bagci, “Capsules for object segmentation,” arXiv preprint arXiv:1804.04241, 2018.