1 Introduction
Deep convolutional neural networks (CNNs) have recently achieved impressive results in lung nodule classification based on computed tomography (CT) [1, 2, 3, 4, 5, 6, 7, 8]. However, this kind of method usually performs a binary classification (malignant vs.
benign) by omitting the unsure nodules—those between benign and malignant—which is a great waste of medical data for machine learning algorithms, especially for datahungry deep learning methods. Therefore, how to leverage unsure data to learn a robust model is crucial for lung nodule classification.
To this end, the ordinal regression has been widely explored to utilize those unsure nodules. The unsure data model (UDM) [9]
was proposed to learn with unsure lung nodules, and it regards this classification as an ordinal regression problem, which optimizes the negative logarithm of cumulative probabilities. However, the UDM has some additional parameters that need to be carefully tuned. The neural stickbreaking (NSB) method calculates the probabilities through the
predicted classification bounds, where is the number of classes [10]. The unimodal method makes each fullyconnected output follows a unimodal distribution such as Poisson or Binomial [11]. However, these methods do not guarantee the strict ordinal relationship. Recently, the convolutional ordinal regression forests (CORFs) aim at solving this problem through the combination of CNNs and random forests
[12], which have been shown effective for lung nodule classification in a meta learning framework [13]. In summary, the existing methods do not explicitly leverage the ordinal relationship resided in the data itself.In this paper, we assume that the ordinal relationship resides in not only the label but also the data itself. Put differently, the ordinal relationship of features from different classes also dominates the generalization ability of the model. Therefore, we propose a meta ordinal weighting network (MOWNet) to learn the ordinal regression between each training sample and a set of samples of all classes simultaneously, where this set is termed as meta ordinal set (MOS). Here, the MOS implies the feature and semantic information of the dataset, and acts as the meta knowledge for the training sample. As shown in Fig. 1, each training sample relates to an MOS, and the meta samples in the MOS are mapped to the corresponding weights representing the meta knowledge of all classes. Furthermore, we propose a novel meta crossentropy (MCE) loss for the training of the MOWNet; each term is weighted by the learned meta weight as shown in Fig. 1. Different from the normal CE loss, the MCE loss indicates that the training sample is guided by the meta weights learned from the MOS. Hence, the MOWNet is able to boost the classification performance and generalization ability with the supervision from meta ordinal knowledge. Moreover, our MOWNet is able to reflect the difficulties of classification of all classes in the dataset, which will be analyzed in Sec.3.4.
The main contributions of this work are summarized as follows. First, we propose a meta ordinal weighting network (MOWNet) for lung nodule classification with unsure nodules. The MOWNet contains a backbone network for classification and a mapping branch for meta ordinal knowledge learning. Second, we propose a meta crossentropy (MCE) loss for the training of the MOWNet in a metalearning scheme, which is based on a meta ordinal set (MOS) that contains a few training samples from all classes and provides the meta knowledge for the target training sample. Last, the experimental results demonstrate the significant performance improvements in contrast to the stateoftheart ordinal regression methods. In addition, the changes in learned meta weights reflect the difficulties of classifying each class.
2 Methodology
This section introduces the proposed meta ordinal set (MOS), meta crossentropy (MCE) loss, and the meta training algorithm, respectively.
2.1 Meta Ordinal Set (MOS)
We assume that the ordinal relationship resides in not only the label but also the data itself. Therefore, we align each target training sample with an MOS that contains samples from each class. The MOS for th training sample is formally defined as follows:
(1) 
where is the number of the classes.Note that the samples in the
are not ordered, but the samples in one class should go to the corresponding multilayer perceptron (MLP) in Fig.
1. Then the MPLs are able to learn the specific knowledge from each class. For a target training sample , is randomly sampled from the training set, and .2.2 Meta CrossEntropy Loss
In order to enable the MOWNet to absorb the meta knowledge provided by the MOS, we propose an MCE loss to align the meta knowledge of each class to the corresponding entropy term:
(2) 
where and are the prediction and the learned meta weight of the th class, respectively. Note that the MCE loss implies no ordinal regression tricks such as cumulative probabilities, it only holds the correlation between the meta data and the predictions. Compared with the conventional CE loss, the MCE loss enables the training samples to be supervised by the corresponding meta data, hence, the learning of the MOWNet takes into account the meta ordinal knowledge resided in the data itself.
2.3 Training Algorithm
The MOWNet is trained in the metalearning scheme, which requires the second derivatives [14, 15, 16, 17, 18], and two parts of the parameters, and , to be updated alternatively. We highlight that the MOWnet offers a novel way of utilizing ordinal relationship encapsulated within the data itself; however, the model is still the same as the one trained with CE loss. What’s more, our MOWNet does not modify the classification head and can be adapted to various backbones.
Although the whole MOWNet contains the two parts of parameters, and , the trained model discards the MLPs () at the inference stage. In other words, the MLPs are only involved in the meta training phase to produce the class specific knowledge. Then the optimal is learned by minimizing the following objective function:
(3)  
where denotes the conventional CE loss of the th meta data with respect to the th training sample, and is the th MLP with being the input (Fig. 1).
Following [14], can be updated through the following objective function:
(4) 
where is the size of the MOS. Then we update these two parts of parameters alternatively using metalearning [14, 15]. First, we calculate the derivative of through Eq. (3):
(5) 
Next, can be updated as follows:
(6) 
Last, is updated based on the :
(7) 
In the above equations, the superscript represents the th iteration during training, and are the learning rates for and , respectively. The training algorithm is detailed in Algorithm 1. Note that after updating (line 8 in the Algorithm 1), the MOWNet obtains the meta knowledge through taking the normal CE loss of the training sample as the input to all MLPs. Here, we can regard the updated MPLs as the prior knowledge for each ordinal class.
To further analyze Eq. (6), we can obtain:
(8)  
(9) 
We can see that Eq. (9) denotes the derivative of the entropy of the training sample (2nd term) is to approach the derivative of the MOS (1st term). This implies that the learning of the backbone network is guided by the meta knowledge. Please refer to http://hmshan.io/papers/mownetsupp.pdf for the detailed derivation.
3 Experiments
In this section, we report the classification performance of our MOWNet on the dataset LIDCIDRI [19].
3.1 Dataset
LIDCIDRI is a publicly available dataset for low dose CTbased lung nodule analysis, which includes 1,010 patients. Each nodule was rated on a scale of 1 to 5 by four thoracic radiologists, indicating an increased probability of malignancy. In this paper, the ROI of each nodule was cropped at its annotated center, with a square shape of a doubled equivalent diameter. An averaged score of a nodule was used as groundtruth for the model training. All volumes were resampled to have 1mm spacing (original spacing ranged from mm to mm) in each dimension, and the cropped ROIs are of the size . The averaged scores range from 1 to 5, and in our experiments, we regard a nodule with a score between 2.5 and 3.5 as the unsure nodule; benign and malignant nodules are those with scores lower than 2.5 and higher than 3.5, respectively [9].
3.2 Implementation Details
We used the VGG16 as the backbone network [20], and made the following changes: 1) the input channel is 32 following [4]
; 2) we only keep the first seven convolutional layers due to a small size of the input, each followed by the batch normalization (BN) and ReLU; and 3) the final classifier is a twolayer perceptron that has 4096 neurons in hidden layer. We use 80% of data for training and the remaining data for testing.
The hyperparameters for all experiments are set as follows: the learning rate is 0.0001 and decayed by 0.1 for every 80 epochs; the minibatch size is 16; weight decay for Adam optimizer is 0.0001
[21]. The symbols P, R, and F1 in our results stand for precision, recall, and F1 score, respectively [9].Method  Accuracy  Benign  Malignant  Unsure  
P  R  F1  P  R  F1  P  R  F1  
CE Loss  0.517  0.538  0.668  0.596  0.562  0.495  0.526  0.456  0.360  0.402 
Poisson [11]  0.542  0.548  0.794  0.648  0.568  0.624  0.594  0.489  0.220  0.303 
NSB [10]  0.553  0.565  0.641  0.601  0.566  0.594  0.580  0.527  0.435  0.476 
UDM [9]  0.548  0.541  0.767  0.635  0.712  0.515  0.598  0.474  0.320  0.382 
CORF [12]  0.559  0.590  0.627  0.608  0.704  0.495  0.581  0.476  0.515  0.495 
MOWNet ()  0.629  0.752  0.489  0.592  0.558  0.851  0.675  0.600  0.675  0.635 
MOWNet ()  0.672  0.764  0.596  0.670  0.600  0.802  0.686  0.642  0.690  0.665 
MOWNet ()  0.687  0.768  0.623  0.688  0.668  0.705  0.686  0.606  0.792  0.687 
3.3 Classification Performance
In our experiments, we mainly focus on the precision of benign class, recall of malignant and unsure classes [9]. We compared our MOWNet with the stateoftheart ordinal regression methods and the normal CE loss. In Table 1, we can see that the MOWNet achieves the best accuracy by a large margin against other methods. Specifically, the MOWNet significantly improves the recall of the unsure class by 0.28 over the previous best result. This is significant for the clinical diagnosis since a higher recall of the unsure class can encourage more followups and reduce the probabilities of the nodules that are misdiagnosed as malignant or benign. In addition, the precision of benign and the recall of the malignant get a great improvement.
3.4 Analysis on Learned Weights
In order to further understand the weighting scheme of the MOWNet, we plot the variations of the weights in in Fig. 3. At the beginning of the training, the weight for the unsure class is increasing while the weight for the malignant class is decreasing, indicating that the MOWnet focuses on classifying the unsure class from the other two classes, and the malignant class is an easyclassified class. Then, at epoch 10, the trends of these two weights become opposite. The curve of the benign fluctuates slightly through the whole training process. At epoch 45, the weights for all the three classes begin to converge. This indicates that the model pays different attentions (weights) to different classes, and these attentions affect the update of the backbone network. At the end of the training, the model has similar sensitivities for each class.
Together with Fig. 2, the malignant samples are easier to be classified than the other two classes at the beginning. At epoch 10, the unsure samples are fused with other samples severely so that it has the highest weight. Simultaneously, the malignant class performs worse than that at the beginning. As the training continues, the weight for the malignant began to increase. At epoch 45, the malignant samples are clustered again and the unsure samples are more centralized than that of the previous epochs. At epoch 97, the model achieves the best accuracy, and it is obvious that the samples are distributed orderly, which demonstrates the effectiveness of the meta ordinal set.
3.5 Effects on the Size of MOS
The definition of the MOS in Eq. (1) shows that the parameter determines the number of samples of each class. Here, we explore the effect of varied . Table 1 shows that when , the MOWNet obtained the best performance. The performance of and is better than that of , which indicates that the more number in MOS, the better generalizability of the model.
4 Conclusions
In this paper, we proposed an MOWNet and the corresponding MOS to explore the ordinal relationship resided in the data itself for lung nodule classification in a metalearning scheme. The experimental results empirically demonstrate a significant improvement compared to existing methods. The visualization results further confirm the effectiveness of the weighting scheme and the learned ordinal relationship.
References
 [1] A. A. A. Setio, F. Ciompi, G. Litjens et al., “Pulmonary nodule detection in CT images: false positive reduction using multiview convolutional networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1160–1169, 2016.
 [2] S. Hussein, R. Gillies, K. Cao, Q. Song, and U. Bagci, “TumorNet: Lung nodule characterization using multiview convolutional neural network with Gaussian process,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI), 2017, pp. 1007–1010.
 [3] H. Shan, G. Wang, M. K. Kalra, R. de Souza, and J. Zhang, “Enhancing transferability of features from pretrained deep neural networks for lung nodule classification,” in The Proceedings of the 2017 International Conference on Fully ThreeDimensional Image Reconstruction in Radiology and Nuclear Medicine (Fully3D), 2017, pp. 65–68.
 [4] Y. Lei, Y. Tian, H. Shan, J. Zhang, G. Wang, and M. K. Kalra, “Shape and marginaware lung nodule classification in lowdose CT images via soft activation mapping,” Medical Image Analysis, vol. 60, p. 101628, 2020.

[5]
Q. Zhang, J. Zhou, and B. Zhang, “A noninvasive method to detect diabetes mellitus and lung cancer using the stacked sparse autoencoder,” in
ICASSP 20202020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1409–1413.  [6] Y. Li, D. Gu, Z. Wen, F. Jiang, and S. Liu, “Classify and explain: An interpretable convolutional neural network for lung cancer diagnosis,” in ICASSP 20202020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1065–1069.
 [7] F. Li, H. Huang, Y. Wu, C. Cai, Y. Huang, and X. Ding, “Lung nodule detection with a 3D convnet via iou selfnormalization and maxout unit,” in ICASSP 20192019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 1214–1218.
 [8] R. Xu, Z. Cong, X. Ye, Y. Hirano, and S. Kido, “Pulmonary textures classification using a deep neural network with appearance and geometry cues,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1025–1029.

[9]
B. Wu, X. Sun, L. Hu, and Y. Wang, “Learning with unsure data for medical
image diagnosis,” in
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
, 2019, pp. 10 590–10 599.  [10] X. Liu, Y. Zou, Y. Song, C. Yang, J. You, and B. K Vijaya Kumar, “Ordinal regression with neuron stickbreaking for medical diagnosis,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 335–344.
 [11] C. Beckham and C. Pal, “Unimodal probability distributions for deep ordinal classification,” arXiv preprint arXiv:1705.05278, 2017.

[12]
H. Zhu, Y. Zhang, H. Shan, L. Che, X. Xu, J. Zhang, J. Shi, and F.Y. Wang, “Convolutional ordinal regression forest for image ordinal estimation,”
IEEE Transactions on Neural Networks and Learning Systems, 2021.  [13] Y. Lei, H. Zhu, J. Zhang, and H. Shan, “Meta ordinal regression forest for learning with unsure lung nodules,” in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020, pp. 442–445.
 [14] J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Metaweightnet: Learning an explicit mapping for sample weighting,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 1919–1930.
 [15] S. Liu, A. Davison, and E. Johns, “Selfsupervised generalisation with meta auxiliary learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 1679–1689.
 [16] R. Vuorio, S.H. Sun, H. Hu, and J. J. Lim, “Multimodal modelagnostic metalearning via taskaware modulation,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 1–12.
 [17] C. Finn, P. Abbeel, and S. Levine, “Modelagnostic metalearning for fast adaptation of deep networks,” arXiv preprint arXiv:1703.03400, 2017.

[18]
M. A. Jamal and G.J. Qi, “Task agnostic metalearning for fewshot
learning,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2019, pp. 11 719–11 727.  [19] S. G. Armato III, G. McLennan, L. Bidaut et al., “The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans,” Medical Physics, vol. 38, no. 2, pp. 915–931, 2011.
 [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” in International Conference on Learning Representations (ICLR), 2015, pp. 1–14.
 [21] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Comments
There are no comments yet.