1 Introduction
The Chest XRay (CXR) is the most commonly performed xray examination which captures details of the lungs, heart, bones and blood vessels. CXRs play a critical role in diagnosing and monitoring conditions such as pneumonia, heart problems and lung cancer. However, it remains one of the most complex imaging studies to interpret [10]. The effectiveness and accuracy of the interpretation heavily relies on the radiologist’s expertise and still there is a substantial clinical error on the outcome [4]. Furthermore, the requirement of human expertise increases the finical cost and time required for evaluation. Therefore, there is a clear need for fast automated evaluations of CXRs.
CXR classification has been widely addressed by the community, yet it remains an open problem. Early developments were based in handcrafted classification e.g. [16]
. However, this set of algorithmic approaches require particular modelling hypothesis to be met (e.g. texture, geometry, intensity), which may not be feasible to fulfill in practice. Due to the incredible results produced by deep learning in the field of computer vision, there has been a rush to apply deep learning architectures to the classification of CXRs
[19, 17, 1], which have shown promising results. The majority of these methods utilise deep convolutional neural network with architectures such as ResNet
[12], due to the success of these architectures in computer vision classification tasks. Several training methods have been considered including: pretrained networks, fine tuned networks and networks trained from scratch on Xray data e.g. [19, 17, 1].However, a major drawback of these techniques is the high dependence on a large corpus of labelled data. Particularly in the medical domain, this might be a strong assumption for a solution, as annotated data contains strong human bias. Although there has been a huge effort in the community to mitigate this drawback by providing datasets such as ChestXray14, the has annotations but is far from being a definite expression of ground truth [14]
. Therefore, by using supervised learning techniques one allows the labelling error and uncertainty to adversely effect the classification output of our machine learn framework. To tackle both the effect of human bias and the limited amount of labelled data, we propose using the power of semisupervised learning and graph representations.
Our Contributions. We propose a novel semisupervised graphbased framework called GraphX. Our contributions are: 1) a new multiclass classification functional with carefully chosen class priors. Our framework is based on the normalised and nonsmooth Laplacian. 2) We demonstrate that our novel framework learns to accurately classify CXRs, with a performance comparable to stateoftheart deep learning techniques, whilst using an extremely smaller amount of labelled data. 3) This work also represents the first time that graph representations have been used for Xray classification.
2 GraphX Framework for XRay Data Classification.
Our approach is motivated by a central problem in medical imaging which is the lack of reliable quality annotated data. Although, transfer learning
[1] or Generative Adversarial Networks [15] somewhat mitigate this problem, they fail to account for the mismatch between expert annotation and ground truth annotation created by human bias and uncertainty. With this motivation in mind, we propose, for the first time, using a semisupervised framework, call GraphX (see Fig. 1 for illustration).Data Representation with Graphs. Although there are different methods for representing data including conventional grid form. In this work, we motivate the use of graph data representations as follows. Firstly, graphs are a natural representation for groups of images where each node represents an individual image. Secondly, given that graph based methods seek to find smooth solutions to the created embedding, they are able to correct for initially mislabelled samples. Lastly, graph has strong and mathematical properties such as sparseness which allows for fast computation.
We represent a given dataset as an undirected weighted graph compromising a set of nodes which are connected by a set of edges with weights that correspond to some similarity measure between the features of nodes and , if ; and functions . Our setting is based on the normalised graph pLaplacian, which reads:
(1) 
where is the degree of node
. The eigenfunctions of the graph Laplacian operator give interesting understanding of the substructures of the graph. Eigenfunctions of a normalised graph Laplacian for
have been successfully used in different applications such as in [2, 7, 8].Learning to Classify under Extreme Minimal Supervision.
However, unlike those works, our framework has a different aim which is to solely obtain classification estimates on the unlabeled samples. That is, to perform a node classification task on
with available classes, given an extremely small amount of labelled nodes . More precisely, given a small amount of labeled data with provided labels and and a large amount of unlabelled data , we seek to infer a function such that gets a good estimate for .Although several works have explored this learning style, either from a pure machine learning perspective e.g.[20] or a medical imaging perspective e.g. [18], these methods seek to only approximate in the graph Laplacian. However, recent developments on machine learning showed that the use of the unnormalised (i.e. without the rescaling by the node degrees in (1)) and non smooth Laplacian, related to total variation, can achieve better performance [5].
To mitigate these current drawbacks in the literature, we propose a novel semisupervised framework, GraphX, based on the normalised and non smooth Laplacian in (1). The function can then be rewritten as: where is the weight matrix and the diagonal matrix containing the degrees . To this end, we generalise the unsupervised binary normalised graph method of [9] to a semisupervised multiclass graph approach. To this aim, our algorithmic approach is as follows.
For each class, , we consider a variable that has values for all nodes of the graph. For all unlabeled nodes , the variables are then coupled with the constraints that for all nodes : This simple coupling indeed leads to faster projection algorithms than simplex [3, 11] or non convex orthogonality constraints between ’s [8]. We assume that a set of annotated nodes are available for each class : for all . Taking a small parameter , we therefore constrain that:
(2) 
This information is then used in an iterative PDE process with a time parameter , in which we seek to minimise the sum of normalised ratios . Denoting and a time step . Then formally, we seek to minimise:
(3) 
under the set of previously described coupling and data (2) constraints. Following [13, 9], a final shifting and a normalisation are necessary at the end of each iteration to prevent from converging to trivial solutions.
When a unique is considered, the scheme iteratively decreases the ratio since , so that the solution of (3) necessarily satisfies:
(4) 
As noticed in [9], the scheme makes converge to a bivalued function that naturally segment the graph. As variables are coupled, the final labelling of a node is chosen from the variable with the highest value: .
Optimisation Scheme. For each time step , the problem (3) is solved at successive time steps using the accelerated primal dual algorithm of [6]. Denoting as the current estimation and initialising , , the algorithm to obtain with an iterative sequence indexed by reads:
where the projection onto the set of constraints combining the coupled constraint and (2) reads pointwise:
(5) 
For positive parameters and satisfying , such process makes converges to , the solution of (3).
3 Experimental Results
This section is devoted to describe in detail the set of experiments that we conducted to validate our GraphX approach.
Data Description. We evaluate our approach using the ChestXray14 [17] dataset, which is composed of frontal chest view Xray with size of . The dataset is composed of 14 classes (pathologies). All measurements were taken from this dataset.
Evaluation Methodology. We validate our theory as follows. Firstly, we visualise the graphical construction and classification tasks of our graphbased semisupervised framework. Secondly, the main part of the evaluation is to compare our GraphX to the stateoftheart methods on Xray classification. We compare ours against two deep learning techniques: WANG17 [17] and YAO18 [19]. To evaluate the classifier output quality of the compared approaches, we performed a ROC analysis using the area under the curve (AUC) per pathology along with their average. Finally, beside the official split, we perform a comparison with random partitions on ChestXray8 using WANG17 [17] as baseline.
Results and Discussion. Firstly, we start by giving some insight into our approach with some visualisations shown in Fig. 2. The left side of the figure shows two graphs in which the first one illustrates the initial state of the graph created after computing the feature distances between the given Xray data while the second one shows the graph after computing (3). The colours on the graph indicates an images belonging to a particular class. The right side shows few sample graph label output, that were correctly classified, of our approach.
To evaluate the performance of our approach, we compared it against state of the art Deep Learning approaches, namely WANG17 [17] and YAO18 [19]. To the best of our knowledge, there are no semisupervised learning method, for Xray classification, that we can compare against. Therefore, we set as our baseline WANG17 and YAO18. Table 1 shows the AUC results of the compared approaches where overall our approach outperformed the other methods across most pathology. Even though YAO18 performs better in some classes, a clear advantage of our approach over these two baselines is that while their approach rely in a huge percentage of data, 70%, we were able to report a better average AUC result with only 20% of the data.
Moreover, due to the semisupervised nature of the GraphX framework, the classification output is very stable with respect to changes in the partition of the dataset. In the plots next to Table 1, we tested the AUC of both the GraphX framework and WANG17 [17] using three different random data partitions, including the partition suggested by Wang. The Wang method is very sensitive to changes in the partition due to the face that supervised methods are heavily reliant on the training set being representative. However, there is minimal change in the performance of GraphX over the three different partitions as the underlying graphical representation is invariant to the partition.
For more detailed analysis of this dependency on the portioning and to further support the advantage of our GraphX, in Table 2, we compare the AUC produced by GraphX against WANG17 using a random split over ChestXray8. We find that GraphX produces a more accurate classification using of the data labels than the WANG17 method does using of the data labels. Furthermore, as we feed GraphX more of the data labels, the classification accuracy increases and becomes competitive against other the deep learning framework of that YAO18 [19] whilst using a far smaller amount of data labels.
4 Conclusion
In this work, we tackled the problem of Xray classification and introduced a novel semisupervised framework based on a graphbased optimisation model, which is the first method that exploits graphbased semisupervised learning for Xray data classification. We also introduced a new multiclass classification functional with carefully selected class priors that allows for a smooth solution. We demonstrated that our method produces highly competitive results on the ChestXray14 data set whilst drastically reducing the need for annotated data.
Acknowledgments
AIAI is supported by the CMIH, University of Cambridge. NP is supported by the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant No 777826. CBS acknowledges Leverhulme Trust (Breaking the nonconvexity barrier), the Philip Leverhulme Prize, the EPSRC grants EP/M00483X/1 and EP/N014588/1, the European Union Horizon 2020, the Marie SkodowskaCurie grant 777826 NoMADS and 691070 CHiPS, the CCIMI and the Alan Turing Institute.
References
 [1] Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest pathology detection using deep learning with nonmedical training. In: International Symposium on Biomedical Imaging (ISBI). pp. 294–297 (2015)
 [2] Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation pp. 1373–1396 (2003)
 [3] Bresson, X., Laurent, T., Uminsky, D., Von Brecht, J.: Multiclass total variation clustering. In: Advances in Neural Information Processing Systems (2013)
 [4] Bruno, M.A., Walker, E.A., Abujudeh, H.H.: Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics 35(6), 1668–1676 (2015)

[5]
Bühler, T., Hein, M.: Spectral clustering based on the graph plaplacian. International Conference on Machine Learning (ICML) (2009)
 [6] Chambolle, A., Pock, T.: A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision (2011)
 [7] Chen, H., Li, K., Zhu, D.e.a.: Inferring groupwise consistent multimodal brain networks via multiview spectral clustering. IEEE Transactions on Medical Imaging (TMI) pp. 1576–1586 (2013)
 [8] Dodero, L., Gozzi, A., Liska, A., Murino, V., Sona, D.: Groupwise functional community detection through joint laplacian diagonalization. In: International Conference on Medical Image Computing and ComputerAssisted Intervention (2014)
 [9] Feld, T.M., Aujol, J.F., Gilboa, G., Papadakis, N.: Rayleigh quotient minimization for absolutely onehomogeneous functionals. Inverse Problems (2019)
 [10] Folio, L.R.: Chest imaging: an algorithmic approach to learning. Springer (2012)

[11]
Gao, Y., AdeliM, E., Kim, M., Giannakopoulos, P., Haller, S., Shen, D.: Medical image retrieval using multigraph learning for mci diagnostic assistance. In: International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI). pp. 86–93 (2015)

[12]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
 [13] Hein, M., Setzer, S., Jost, L., Rangapuram, S.S.: The total variation on hypergraphslearning on hypergraphs revisited. In: Advances in Neural Information Processing Systems (2013)
 [14] Kohli, M.D., Summers, R.M., Geis, J.R.: Medical image data and datasets in the era of machine learning—whitepaper from the 2016 cmimi meeting dataset session. Journal of digital imaging pp. 392–399 (2017)
 [15] Moradi, E., Pepe, A., Initiative, A.D.N., et al.: Machine learning framework for early mribased alzheimer’s conversion prediction in mci subjects. Neuroimage (2015)
 [16] Toriwaki, J.I., Suenaga, Y., Negoro, T., Fukumura, T.: Pattern recognition of chest xray images. Computer Graphics and Image Processing pp. 252–271 (1973)
 [17] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestxray8: Hospitalscale chest xray database and benchmarks on weaklysupervised classification and localization of common thorax diseases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2097–2106 (2017)
 [18] Wang, Z., Zhu, X., et al.: Progressive graphbased transductive learning for multimodal classification of brain disorder disease. In: International Conference on Medical Image Computing and ComputerAssisted Intervention (2016)
 [19] Yao, L., Prosky, J., Poblenz, E., Covington, B., Lyman, K.: Weakly supervised medical diagnosis and localization from multiple resolutions. arXiv preprint arXiv:1803.07703 (2018)
 [20] Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semisupervised learning using gaussian fields and harmonic functions. In: International conference on Machine learning (ICML). pp. 912–919 (2003)