1 Introduction
The medical image segmentation task is to automatically extract the region of interest, such as organs and lesions in medical images. Segmentation plays a critical role in medical image analysis since the information of the segmented region (such as length, width, angles) can be used for further diagnosis and treatment of related diseases. How to locate the region of interest in an exact manner is a major challenge.
In recent years, convolutional neural networks (CNNs) have shown groundbreaking effects to medical image segmentation task and different approaches have been proposed [11, 18, 17, 12]
. The widely adopted choice of the loss function in CNN methods is pixelwise loss functions such as cross entropy. Pixelwise loss functions classify each pixel of an image individually by comparing the corresponding ground truth and learning each pixel in an image equally. As a result, this type of loss functions are not necessarily robust in detecting the details and small structures which are critical in medical image analysis. For instance, in the retinal vessel segmentation task, the long, thin and weak vessels can be disconnected easily or even missed under the supervision of a pixelwise loss function.
Before the flourishing of CNNs, variational image segmentation approaches [1, 16, 23] based on intensity or textural information were widely used in dealing with the segmentation task. Among them, the active contour methods define an evolving curve under the force derived by some energy functional, and this evolving curve moves to minimize this energy. Different energy functionals have been proposed in the active contour methods, such as the balloon force model [8], regionbased methods [13, 2]
, gradient vector flow method (GVF)
[22] and elastic interaction model [21, 20]. Compared to the pixelwise loss functions in CNN models, the curve evolution based on energy functionals in the active contour methods considers the topological or physical properties of the target objects rather than learning each pixel individually. However, these methods can only process one image at a time and cannot handle massive images in a collective way as in CNN methods.Recently, methods have been proposed [3, 10, 9] to train a CNN by minimizing curveevolving energy functionals. In [3] and [10], the image segmentation task was represented as a minimization problem of the MumfordShah functional. While in [9], an objective minimization function based on boundary integral was defined. Despite these efforts, developing robust medical image segmentation methods, especially for long, thin structures in medical images, is still a challenging task.
In this paper, we propose an elastic interactionbased loss function in a deep learning framework specifically for the long, thin structures that are commonly seen in medical images. Under the supervision of this loss function, the predicted region will be attracted strongly by the object boundary through a global elastic interaction. Moreover, the selfinteraction of the boundary of the predicted region will tend to smooth its own shape and will keep the predicted boundary connected. Due to these properties, this new loss function shows great advantages in tackling the segmentation of thin, long structures, such as vessels, in medical images. Also, one of the difficulties for most of the energy functionalbased loss functions during training is the unstable performance in early steps, because the random initial guess in CNN may cause instabilities. Due to the long range nature, the minimization of the elastic interaction energy functional is not sensitive to the initialization. As a result, the proposed new CNN method with this elastic interactionbased loss function has demonstrated stable performance in the early stage of the training, as shown by the results of experiments in Section 3.
The proposed elastic interactionbased loss function can be implemented efficiently using the Fast Fourier Transform (FFT), making it easy to incorporate in CNN. We examine our new loss function on three retinal image datasets, DRIVE, STARE and CHASEDB1, and the results indicate that our method is able to achieve a considerable improvement compared to commonly used pixelwise loss functions (cross entropy and dice loss) and other recent loss functions.
2 Methodology
2.1 Review of the framework of Deep Neural Network
In this section, we will briefly review the framework of deep neural network (DNN) under supervised learning. In supervised learning, the objective function (i.e. the loss function ) that we want to minimize is a function between the output of DNN and ground truth. The output of DNN is generated by forward propagation of the input, i.e. the input of DNN is processed by each hidden layer and activation layer in this network in the forward direction. Then the parameters within each hidden layer are updated by gradient descent in back propagation. The entire process of DNN is as follows,
(1) 
where is the loss function to be minimized. are input and output of network and ground truth, respectively, denotes the number of data and is the number of iterations, are the parameters to be trained, is nonlinear scalar function, means elementwise operation, and is step size in gradient descent. In this work, we will proposed a new objective function for this DNN process.
2.2 Elastic InteractionBased Loss Function
Our elastic interactionbased loss function is inspired by the elastic interaction between dislocations, which are line defects in crystals [5]. These dislocations are connected lines and the elastic interaction between them is longrange. Inspired by this interaction energy, in deep learning segmentation task, we consider the boundary of the ground truth and that of the prediction as two curves with their own energies, during training process, the evolution of the boundary of predicted region is under the supervision of their elastic interaction energy.
Consider a parameterized curve in the plane, which represents the boundary of a region. The curve has an orientation that could be either clockwise or counterclockwise. The elastic energy of the curve is
(2) 
where is the curve with another parameter , vector with being the unit tangent vector and being the line element of the curve , and same for of , is the vector between the points on and on , and the distance between these two points is . This is the elastic energy stored in three dimensional space of a dislocation line in the plane. (Note that this is a simplified version of the dislocation interaction energy [5]. Prefactor of the energy has also been omitted in this formulation.)
The notation in Eq. (2) can also be understood as a collection of curves. Especially, for a system of two curves and , the total elastic energy is
(3)  
In this expression, the first two terms are the selfenergies of the two curves, respectively, and the last term is the interaction energy between them. Note that this total energy of two curves and depend on their orientations.
For a system of two coincident curves with opposite orientations, i.e., , the total energy in Eq. (3) vanishes. This means a perfect segmentation result if curve is the boundary of the object (the ground truth) and is the evolving boundary of prediction [21].
Using an artificial time , we describe the evolution of a curve by minimizing the total energy in the steepest descent direction, which is , where is the velocity of the curve and is the force on it. (This is essentially the dynamics of dislocations with mobility in its unit.) The obtained velocity of the curve is
(4) 
where is the vector between the point and a point on , and is the distance between the two points. and are the unit tangent vector and unit normal vector of at the point , respectively, and and are the unit tangent vector and unit normal vector of at the point . In the derivation of this equation, we have used , , , where is the unit vector in the direction. Note that in Eq. (4), could be a collection of curves.
Therefore, in the image space, derived from (2) by using for the ground truth whose boundary is , and for the moving curve, where and are regularized Dirac delta functions of curves and , the proposed loss function in CNN is,
(5) 
where is the level set representation of the moving curve [14], and
is a hyperparameter. Here
is a smoothing Heaviside function which controls the width of the contour by regularization parameter ,(6) 
The parameter and the Heaviside function control the stregth of the elastic energy generated by the moving curve compared with that generated by the object boundary
. In CNN context, we applied HardTanh activation function whose range is
instead of using the above for convenience. The level set representation can be computed by , where is the softmax output of target class.Similarly, we describe the velocity of curve (the boundary of predicted region) derived from (4) as,
(7) 
The velocity of curve is the gradient of loss with respect to the level representation of predicted curve , and it will be used in back propagation for training.
2.3 Efficient Computation for Loss function and the Backward Gradient
In order to compute the loss function (5) and gradient (7) efficiently in CNN, we reformulate them by Fast Fourier Transform which reduces the computational complexity from to .
Assume the Fourier transform of in equation (5) is , where are the frequencies in Fourier space. It can be calculated that out the fourier transform of is . Therefore, by Parseval’s identity, the loss function can be expressed in the Fourier space as
(8) 
From Eq. (7), the Fourier transform of the gradient of with respect to the output is
(9) 
According to this equation, we obtain the gradient for back propagation by inverse Fourier transform
(10) 
2.4 Discussion on connectivity and fast convergence due to the strong longrange attractive interaction
The elastic interaction between two curves and is strongly attractive when the two curves have opposite orientations, as can be seen from the interaction energy given by the last term in Eq. (3). When the moving curve (the predicted boundary) is set to have an opposite orientation with that of the object boundary (ground truth), the moving curve will be attracted to the object boundary to minimize the total energy in Eq. (3), see Fig. 1(a); and when the two curves coincide, the energy minimum state is reached and the object is identified perfectly by the moving curve.
This interaction between two curves is longrange because the energy density is inversely proportional to the distance between two respective points on them, which decays very slowly as the distance approaches to infinity, see the last term in Eq. (3).
It can be shown that near a curve , the interaction force experienced by a point on another curve given in Eq. (4) has the asymptotic property [5, 21]
(11) 
where is the distance from the point on curve to the curve . Therefore, the elastic interaction provides a strong attractive force to recombine a disconnected moving curve. See a schematic illustration in vessel segmentation in Fig. 1(b). (Here and are the two branches of the moving curve.)
3 Experiment
3.1 Dataset
DRIVE: The Digital Retinal Images for Vessel Extraction (DRIVE) dataset [19] is provided by a diabetic retinopathy screening program in the Netherlands. This dataset includes 40 images ( pixels) which are divided into a training set and test set, both containing 20 images and the corresponding ground truth. In our experiment, we split 5 images from training set as validation set. We train the DNN for vessel segmentation on 15 training data and evaluate our algorithm with 20 test images.
STARE: STARE dataset [6, 7] consists of 20 color retinal images with size pixels. 15 images were used for training and the remaining were used for validation. The hand labeled ground truth are provided for all images in this dataset.
CHASEDB1: This dataset [4] includes 30 color images with size pixels. We split 5 images for validation, 25 images for training. The ground truth is provided for each image.
3.2 Implementation Details
Data preprocessing: Since the retinal images are low contrast usually, we selected the green channels of images for training because the green channels have higher contrast than the red and blue channels. Then all values in images are normalized into .
Architecture and training: In our experiment, we employed UNet [17]
as the endtoend convolutional neural network architecture on our experiment. Based on this architecture, we trained our model 50 epochs with a batch size equal to 6 on training dataset. Adam optimizer was used to optimize this model with learning rate of 0.001. For our elastic interaction loss function in Eq. (
5), we set hyperparameter , andin the Hardtanh (smoothing Heaviside) function. We implemented our model based on Pytorch
[15] and ran all the experiments by the machine equipped with an TESLA P100 with 16GBs of memory.To evaluate the performance of segmentation, sensitivity, specificity, f1 score and AUC (the area under the receiver operating characteristics curve) were calculated.
3.3 Results
Dataset  Method  Sen  Spec  F1 score  AUC 

CE  0.9667  0.7721  0.8025  0.8694  
DICE  0.9615  0.7287  0.7658  0.8451  
DRIVE  AC [3]  0.9647  0.7682  0.7921  0.8664 
SL [9]  0.9660  0.7940  0.8043  0.8800  
Proposed EL  0.9664  0.8067  0.8093  0.8866  
CE  0.9564  0.7296  0.7214  0.8430  
DICE  0.9403  0.6588  0.6212  0.7996  
STARE  AC [3]  0.9419  0.6461  0.6144  0.7940 
SL [9]  0.9540  0.7434  0.7146  0.8487  
Proposed EL  0.9576  0.7449  0.7304  0.8513  
CE  0.9526  0.8408  0.8245  0.8967  
DICE  0.9276  0.7243  0.7186  0.8259  
CHASEDB1  AC [3]  0.9506  0.8215  0.8145  0.8861 
SL [9]  0.9545  0.8207  0.8258  0.8876  
Proposed EL  0.9526  0.8428  0.8248  0.8977 
Quantitative evaluation: We evaluated our model with the elastic interactionbased loss function (EL) on three datasets comparing to models with cross entropy loss function (CE), the dice coefficient loss (DICE), active contour loss (AC) [3] and surface loss [9]. The quantitative results of our proposed loss on three datasets are shown in Table 1. The experimental results show that our proposed loss function has better performance than other loss functions by achieving highest Sensitivity, Specificity, F1 scores and AUC on DRIVE and STARE, highest AUC and Specificity on CHASEDB1.
We examine the stability and convergence of Deep neural network in training steps with our proposed elastic interactionbased loss (EL) compared with those of traditional loss functions (CE and Dice) and other energy functionalbased loss functions (AC and SL). In Fig.2, we can observe that the performance of network under the guidance of EL in early training steps is more stable than that of other two energy functionalbased loss functions, and it also shows higher F1 scores compared with those of other two loss functions. Similarly, in Fig.3, compared to performance of traditional loss functions, EL shows more stable training performance, faster convergence and better result.
Qualitative evaluation: Figure 4 shows the qualitative results of different loss functions. From Fig.4, we can see that the results under the proposed loss outperform other loss functions because our method can segment the much more details of vessel among all of the mentioned loss functions. Specifically, in the first row of Fig.4, the result under the supervision of cross entropy can not extract the thin and long vessel within red box and the prediction is disconnected, thus this segmentation lost a lot of details. While in the prediction from our proposed loss (EL), we can see that it segments the full vessels successfully and the predicted vessels are connected. Therefore, we conclude that the elastic interaction loss function has better performance than other three loss functions in extracting details of images (especially for thin and long structures). Same for the second row.
Computational time: As we mentioned in Section 2.3, computational complexity of our proposed elastic loss (EL) is . In practice, running time of EL durning the training and test processes on our experimental device mentioned on Section 3.2 is comparable with those of traditional loss functions. For DRIVE dataset, it took s on training and took s to segment one test image. CE and DICE took s and s on training, respectively and they use s on segmenting each test image. While for STARE, the running time of EL (Training: s, Test per image: s) is also very close to that of CE (Training: s, Test per image: s) and DICE (Training: s, Test per image: s). Those loss functions showed similar behavior on CHASEDB1.
4 Conclusions and Discussion
In this paper, we propose a new elastic interactionbased loss function that can connect the whole segmented boundary strongly, thus it has great advantages in the segmentation of details of images in medical image datasets. Experimental results show that the proposed loss function indeed enhances the ability of CNN to segment medical images with complex structures and is able to achieve better performance comparing to the commonly used loss functions and other energy functional based loss functions. We would like to remark that the proposed loss function is developed to detect the details of long, thin structures such as vessels in medical images. When it applies to general images without these structures, the proposed method may not necessarily have obvious advantages.
Acknowledgement
This work was supported by The Hong Kong University of Science and Technology IEG19SC04.
References
 [1] Chan, T.F., Shen, J., Vese, L.: Variational pde models in image processing. Notices of the AMS 50(1), 14–26 (2003)
 [2] Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Transactions on image processing 10(2), 266–277 (2001)

[3]
Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11632–11640 (2019)
 [4] Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G., Barman, S.A.: An ensemble classificationbased approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering 59(9), 2538–2548 (2012)
 [5] Hirth, J.P.: Theory of dislocations. Krieger Pub. Co., Malabar, Fla., 2nd, reprint ed.. edn. (1992)
 [6] Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging 19(3), 203–210 (2000)
 [7] Hoover, A., Goldbaum, M.: Locating the optic nerve in a retinal image using the fuzzy convergence of the blood vessels. IEEE transactions on medical imaging 22(8), 951–958 (2003)
 [8] Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International journal of computer vision 1(4), 321–331 (1988)
 [9] Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed, I.B.: Boundary loss for highly unbalanced segmentation. arXiv preprint arXiv:1812.07032 (2018)
 [10] Kim, B., Ye, J.C.: Mumford–shah loss functional for image segmentation with deep learning. IEEE Transactions on Image Processing 29, 1856–1866 (2019)
 [11] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017)
 [12] Milletari, F., Navab, N., Ahmadi, S.A.: Vnet: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). pp. 565–571. IEEE (2016)
 [13] Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on pure and applied mathematics 42(5), 577–685 (1989)
 [14] Osher, S., Sethian, J.A.: Fronts propagating with curvaturedependent speed: algorithms based on hamiltonjacobi formulations. Journal of computational physics 79(1), 12–49 (1988)
 [15] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
 [16] Pham, D.L., Xu, C., Prince, J.L.: Current methods in medical image segmentation. Annual review of biomedical engineering 2(1), 315–337 (2000)
 [17] Ronneberger, O., Fischer, P., Brox, T.: Unet: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computerassisted intervention. pp. 234–241. Springer (2015)
 [18] Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annual review of biomedical engineering 19, 221–248 (2017)
 [19] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Ginneken, B.V.: Ridgebased vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging 23(4), 501–509 (2004)
 [20] Xiang, Y., Chung, A.C., Ye, J.: A new active contour method based on elastic interaction. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). vol. 1, pp. 452–457. IEEE (2005)
 [21] Xiang, Y., Chung, A.C., Ye, J.: An active contour model for image segmentation based on elastic interaction. Journal of computational physics 219(1), 455–476 (2006)
 [22] Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Transactions on image processing 7(3), 359–369 (1998)
 [23] Zhao, F., Xie, X.: An overview of interactive medical image segmentation. Annals of the BMVA 2013(7), 1–22 (2013)