Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation

04/28/2020 ∙ by Zhongyi Han, et al. ∙ Shandong University 0

Automated medical report generation in spine radiology, i.e., given spinal medical images and directly create radiologist-level diagnosis reports to support clinical decision making, is a novel yet fundamental study in the domain of artificial intelligence in healthcare. However, it is incredibly challenging because it is an extremely complicated task that involves visual perception and high-level reasoning processes. In this paper, we propose the neural-symbolic learning (NSL) framework that performs human-like learning by unifying deep neural learning and symbolic logical reasoning for the spinal medical report generation. Generally speaking, the NSL framework firstly employs deep neural learning to imitate human visual perception for detecting abnormalities of target spinal structures. Concretely, we design an adversarial graph network that interpolates a symbolic graph reasoning module into a generative adversarial network through embedding prior domain knowledge, achieving semantic segmentation of spinal structures with high complexity and variability. NSL secondly conducts human-like symbolic logical reasoning that realizes unsupervised causal effect analysis of detected entities of abnormalities through meta-interpretive learning. NSL finally fills these discoveries of target diseases into a unified template, successfully achieving a comprehensive medical report generation. When it employed in a real-world clinical dataset, a series of empirical studies demonstrate its capacity on spinal medical report generation as well as show that our algorithm remarkably exceeds existing methods in the detection of spinal structures. These indicate its potential as a clinical tool that contributes to computer-aided diagnosis.



There are no comments yet.


page 3

page 26

page 29

page 30

page 32

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper devotes to the task of radiologist-level report generation based on spinal images in the field of spine radiology directly and automatically. Automated spinal medical report generation is a novel yet fundamental task in the domain of artificial intelligence (AI) in healthcare. Nowadays, multiple spinal diseases not only have deteriorated the quality of life but have high morbidity rates worldwide. For instance, Neural Foraminal Stenosis (NFS) has attacked about 80% of the elderly population  (Rajaee et al., 2012). In daily radiological practice, radiologists still rely on laborious workloads to prepare tedious medical diagnosis reports through analyzing spinal medical images manually. Time-consuming medical report generation leads to the problem of the delay of a patients’ stay in the hospital and increases the costs of hospital treatment  (Vorbeck et al., 2000). In contrast, automatic report generation systems would offer the potential for faster and more efficient delivery of radiological reports and thus would accelerate the diagnostic process  (Rosenthal et al., 1997). Therefore, automatic report generation is pivotal to expedite the initiation of many specific therapies and contribute to relevant time savings, such that it could help spinal radiologists from laborious workloads to a certain extent.

To date, Computer-Aided Detection (CADe) and Computer-Aided Diagnosis (CADx) techniques in the medical image analysis community have made significant achievements and even can be on par with human experts  (Esteva et al., 2017). However, most of them cannot achieve radiological report generation, let alone the most-related spinal image analysis approaches  (He et al., 2017b, c; Yao et al., 2016; Han et al., 2018c, 2018). Thus, the topic of automated spinal medical report generation based on medical images is still under-explored so far. Besides, magnetic resonance imaging (MRI) is one of the most useful exams in the clinical diagnosis of spinal diseases as it is better to demonstrate spinal anatomy  (Kim et al., 2015). Therefore, this paper is devoted to the radiological report generation based on spinal MRI images to support clinical decision making.

Figure 1: A spine image with target structure of analysis, which includes intervertebral disc (D), vertebral (V), and neural foramen (NF).

However, automated spinal report generation is incredibly challenging because it is an extremely complicated task. Like the manual spinal report generation, the automated way mainly involves two subproblems: 1) analyze spinal MRI images to detect all the spinal structures and 2) discover the causal effect between detected spinal diseases to write final diagnostic reports. On the one hand, the subproblem of analyzing spinal MRI images faces two main difficulties from structural complexity and ambiguous correlations. Furthermore, spinal structures have complexity and variability, as illustrated in Fig. 1. More specifically, each lumbar spine MRI image at an average has 17 target structures composed of six neural foramina, six intervertebral discs, and five lumbar vertebrae. Each type of spinal structure has various scales across normal and abnormal structures  (Han et al., 2018). Spinal structures also exist ambiguous spatial correlations that impede predicting consistent detection results. On the other hand, the subproblem of casual effect analysis mainly faces the difficulty of inexact supervision due to the lack of annotated data. This weak supervision pushes us to conduct unsupervised causal effect analysis, however, which contributes to the difficulty in discovering the pathogenic factors of target spinal diseases precisely  (Han et al., 2018b).

Figure 2: An illustration of the proposed neural-symbolic learning framework.

To solve these problems, we formalize the task of spinal medical report generation as a human-like learning process that involves semantic visual perception and high-level symbolic reasoning. More precisely, we propose the Neural-Symbolic Learning (NSL) framework that combines deep neural learning and symbolic logical reasoning in a mutually beneficial way, as shown in Fig. 2

. NSL learns to detect complex spinal structures through an adversarial graph network as deep neural learning to imitate human visual perception. Based on these discoveries of neural learning, NSL reasons out the causal effect, and further generate unified spinal medical reports through symbolic logical reasoning. The proposed NSL framework can resolve the facing challenges point-to-point. For handling the structural complexity and ambiguous correlations, we design the adversarial graph network that interpolates a symbolic graph reasoning module into a generative adversarial network to accurately segments complex spinal structures with wide variety and variability. The symbolic graph reasoning module embeds prior knowledge graph into the network to perform reasoning over a group of symbolic nodes, whose outputs explicitly represent different properties of each spine structure. For treating the inexact supervision, we use symbolic logical reasoning approaches that include meta-interpretive learning and first-order logic programming by bringing in background knowledge to remedy the lake of supervision information.

Combining neural learning and symbolic reasoning for the medical report generation is proper and novel. As we have shown, it is proper because this combination imitates the process of manual spinal report generation in the clinic. Theoretically, it is novel because this combination endows the superiority of the NSL framework that integrates the advantages of neural learning on noisy data processing and the logical reasoning on knowledge representation. In the history of AI research, neural learning and logical reasoning have almost been separately developed  (Zhou, 2019)

. Neural learning is adept at low-level perceptual tasks but is unable to support secondary reasoning. At the same time, logical reasoning does well in high-level symbol reasoning, but it is hard to handle uncertainty knowledge on noisy data. In other words, modern neural learning adopts a probability and connection mechanism for representing over noisy implicit data. In contrast, classical symbolic AI adopts expressive first-order logic for reasoning over explicitly represented knowledge  

(Russell, 2015). For example, neural learning techniques can recognize target spinal structures, while logical reasoning algorithms can reason out the causal effect by integrating human knowledge. Unifying neural learning and logical reasoning would integrate the low-level perceptual ability and high-level reasoning ability towards robust spinal medical report generation. Accordingly, we formalize the problem of report generation similar to the human decision-making process that bridges perceptual and reasoning strengths in a mutually beneficial way.

In this work, we advance our preliminary attempt  (Han et al., 2018a)

in the following aspects: 1) propose a new framework that integrates neural learning and logical reasoning in a mutually beneficial way; 2) carry out more extensive experiments on performance analysis, validating the significant advantages of the proposed NSL over existing compared state-of-the-art methods; 3) make a more comprehensive review on medical report generation, providing a technique review on the statistical machine learning and logical reasoning.

The contributions of this paper include:

  • We propose a novel framework achieving automated spinal medical report generation. The framework provides, for the first time, a reliable solution by integrating deep neural learning and logical reasoning in the medical image analysis community.

  • We propose a new graph adversarial network that embeds prior knowledge graph into generative adversarial networks. The proposed network dynamically models the high-level semantic correlations between spinal structures to enhance segmentation accuracy. It can also extend to various medical image segmentation tasks.

  • We propose a symbolic logical reasoning model that leverages meta-interpretive learning to induce the causal effects between spinal diseases for discovering valuable pathogenic factors, which are beneficial for the pathogenesis-based diagnosis of spine diseases.

We organize the rest of this paper as follows. In Section 2, we review the related works in terms of medical image analysis and involved methodology. We introduce the NSL framework in Section 3. We then present the details of validated datasets, experiment settings, and exhaustive results in Section 4. Finally, we conclude this work in Section 5.

2 Related Work

In this section, we first review the related works in the medical image analysis community and briefly introduce the related works on methodology. The related works of medical image analysis mainly involve spinal image analysis and medical report generation. The reviewed methodology mainly includes neural learning algorithms and logical reasoning advances.

2.1 The Related Works of Medical Image Analysis

To the best of our knowledge, neither the CADe nor CADx technique has achieved spinal report generation. Existing works in spine radiology include but are limited to abnormality localization, semantic segmentation, and disease classification of spinal structures. General speaking, existing detection works of spinal structures include automated localization  (Alomari et al., 2011; Corso et al., 2008; Štern et al., 2009; Zhan et al., 2012; Cai et al., 2015), automated segmentation  (He et al., 2017b, c; Yao et al., 2016; Xu et al., 2020), and simultaneous localization and segmentation of one or two types of spinal structures  (Ghosha et al., 2011; Huang et al., 2009; Klinder et al., 2008; Peng et al., 2006; Shi et al., 2007; Kelm et al., 2013). Although before-mentioned methods achieved accurate detection of spinal structure, they cannot accomplish the radiological classification of spinal structures. After that, a few radiological classification works are proposed, such as lumbar neural foramen grading  (He et al., 2016), lumbar disc generation grading  (He et al., 2017a; Raja’S et al., 2011; Jamaludin et al., 2017), and spondylolisthesis grading  (Cai et al., 2017). Since the before-mentioned works only achieved a simple analysis of few types of spinal structures, recently, Han et al. (2018) achieved semantic segmentation of various types of spinal structures, paving a solid way for the medical report generation.

The problem of automated diagnostic report generation of other organs in the medical image analysis community has recently received renewed attention with several pioneering works. Zhang et al. (2017)

achieved the report generation of pathology bladder cancer images using a large scale of training sample and natural language processing (NLP) based image captioning approaches.

Wang et al. (2018) achieved the report generation of thorax diseases using lots of chest X-rays images. Li et al. (2018)

realized report generation on a large amount of chest X-rays images dataset by retrieving template sentences or generating simple sentences using reinforcement learning.

Sun et al. (2019) used a common NLP technique to create medical image descriptions of breast diseases from a mammography dataset. Two public patents (Kaufman et al., 2005; Yang et al., 2011) focus on lung report generation and human hand report generation, respectively, but which do not present detailed workflow and framework. Digital speech recognition is also studied to assist radiologists in report generation for faster delivery of radiological reports (Vorbeck et al., 2000). In this study, the proposed framework, instead, uses a little amount of spinal MRI images and achieves segmentation, classification, labeling, and captioning of three type spinal structures to generate unified medical reports.

2.2 The Related Works on Methodology

2.2.1 Neural Learning

Briefly speaking, the advance neural learning algorithms of NSL include dilated convolution, adversarial training, and graph reasoning. The dilated convolution is originally proposed by Holschneider et al. (1990) to compute the wavelet transform. Atrous convolution is then extended into the semantic segmentation (Chen et al., 2016; Yu and Koltun, 2015; Chen et al., 2017), object recognition (Sermanet et al., 2013), and image scanning (Giusti et al., 2013). The adversarial training derives the innovative generative adversarial networks proposed by (Goodfellow et al., 2014). Pioneering works have shown the effectiveness of adversarial training on semantic segmentation (Luc et al., 2016), unsupervised video summarization (Mahasseni et al., 2017), prostate cancer detection (Kohl et al., 2017), brain MRI image segmentation (Moeskops et al., 2017)

, and anomaly detection 

(Schlegl et al., 2017).

The objective of graph reasoning is to capture the relations between objects. The node of the graph always represents the specific objects, and the edge represents the relations between nodes. To endow the local convolution networks with the capability of global graph reasoning, Liang et al. (2018) introduced a new graph layer, symbolic graph reasoning layer, to embed the external human knowledge for enhancing the local feature representation. The symbolic graph reasoning layer can improve the common neural networks’ performance on segmentation and classification. Graph Neural Networks (GNNs) are the representative technology of graph reasoning. Lots of previous works have studied on GNNs and achieving great process (Wu et al., 2019; Zhou et al., 2018). GNNs can be applied in various applications, such as chemistry and biology (Duvenaud et al., 2015), knowledge graph (Schlichtkrull et al., 2018), recommend systems (Ying et al., 2018)

, and computer vision 

(Chen et al., 2019). In this paper, we propose a new graph reasoning module for capturing the relations between spinal structures and improving the high-level semantic representation.

2.2.2 Symbolic Logical Reasoning

At the dawn of AI, logical reasoning was one of the most studied areas of research and has been considered as a fundamental solution of AI (Dai et al., 2018). Representative works of symbolic logical reasoning include expert system (Liao, 2005)

, decision tree 

(Safavian and Landgrebe, 1991), and inductive logic programming (ILP) (Lavrac and Dzeroski, 1994). The drawback of symbolic logical reasoning lies in handling uncertainty and noisy data, which limits its application on complex real-world tasks directly, such as visual understanding, speech recognition, natural language processing, etc

. With the development of statistical learning, lots of complex real-world tasks can be resolved. These achievements gradually set off a wave of statistical machine learning, and lots of mainstream algorithms have been proposed, such as support vector machine 

(Cortes and Vapnik, 1995), Bayes network (Friedman et al., 1997), and neural networks (LeCun et al., 2015). However, statistical machine learning still faces several drawbacks (Russell, 2015). Firstly, it has weak generalization ability, i.e

., it cannot understand the intrinsic subconcepts and the high-order semantic feature of concept classes. As mentioned in the AI community, convolutional neural networks would recognize a dog image to be a panda by applying a certain hardly perceptible perturbation 

(Szegedy et al., 2013). Secondly, lots of statistical machine learning algorithms require a large number of annotated datasets, according to PAC-learning (Valiant, 1984). Finally, lots of statistical machine learning algorithms are black-box without comprehensibility (Murdoch et al., 2019).

Since logical reasoning and machine learning have almost been separately developed in the history of AI research, a fundamental idea to overcame before-mentioned limitations is to unify them in a mutually beneficial way. However, developing a unified framework has been deemed as the holy grail challenge for the AI community (Zhou, 2019). The primary difficulty lies in the fact that modern machine learning cannot provide first-order representation that are necessary inputs for classical symbolic AI (Russell, 2015). In recent years, few works have made efficient attempts to overcome this difficulty. Probabilistic Logic Program (PLP) (De Raedt and Kimmig, 2015) and Statistical Relational Learning (SRL) (Koller et al., 2007) are aiming at integrating probabilistic inference and logical reasoning. However, they usually require semantic-level inputs. Neural logic machine (Dong et al., 2019) and PrediNet (Shanahan et al., 2019) are aiming at instead of traditional logic programming by using pure neural networks but still has the drawbacks of statistical machine learning. Lately, abductive learning achieves a breakthrough and can recognize numbers and resolve unknown mathematical operations simultaneously from images of simple hand-written equations (Zhou, 2019; Dai et al., 2018). This paper proposes a new neural-symbolic learning framework that combines deep neural learning and logical reasoning in a mutually beneficial way, and the results have demonstrated that it can generate robust medical reports in spine radiology.

3 Methodology

In this section, we give the problem setting of spinal report generation in Section 3.1 and then present the details of the neural symbolic learning framework in Section 3.2.

3.1 Learning Set-up

In the real-world scenario, we can observe the weak information about spinal medical reports; that is, there are existing object-level annotations (i.e., semantic segmentation annotations) rather than causal effect annotations in the learning period. Formally, a sample of inexact-supervised labeled training examples is independently and identically drawn according to an underlying distribution defined on , where is a set of MRI images and is a set of semantic segmentation ground-truth maps that can be observed for each instance . Each pixel in a segmentation map has the possibility of classes comprised of types of normal/abnormal spinal structures and background, that is, and is the -th class. Given one spine MRI image , the objective is to generate a medical report . Note that the learner has no access to ground-truth reports.

We conduct this setting because the weakly-supervised learning way is supposedly the only resolution for the spinal report generation. One may wonder the alternative resolution that directly trains end-to-end medical image captioning models using the ground truth of medical reports. However, this resolution is impractical so far. On the one hand, we argue that conventional natural image captioning technologies like 

(Kulkarni et al., 2011) do not meet clinical demands because they cannot achieve accurate prediction of keywords, such as disease types, locations, and causal effect analyses. As expected, these keywords among spinal medical reports are exactly significant clinical concerns

, which are undoubtedly unlearnable for end-to-end image captioning models. Since clinical concerns inside a few keywords decide to the correctness of a radiological report, it is also improper to evaluate the performance of the end-to-end models on computer-made reports compared with radiologist-made reports using NLP evaluation metrics.

On the other hand, the amount of ground truth medical reports do not meet the requirement of end-to-end image captioning models. In daily practice, radiologists always write radiological reports with various styles, which leads to a lack of useful annotated data of the medical report, like the image captioning dataset, Visual Genome (Krishna et al., 2017). We all know that it is impossible to generate medical reports end-to-end using image captioning techniques with a small amount of dataset.

Figure 3: The most practical workflow for spinal medical report generation.

As we have shown, the problem setting implies two critical subproblems comprised of supervised semantic segmentation and unsupervised causal effect analysis. More specifically, as shown in Fig. 3, it is proper to decompose the task into multiple procedures, i.e., detect learnable concerns by object segmentation and radiological classification (i.e., semantic segmentation) using pixel-level annotations first, and then discover the latent unlearnable concerns, causal effect, without any annotations. After these two procedures, we finally fill these discovered concerns in a standard template to generate unified radiological reports.

3.2 Neural Symbolic Learning

Figure 4: Neural symbolic learning framework.

This section presents the Neural-Symbolic Learning framework (NSL) that combines neural learning and logical reasoning to discover the learnable and unlearnable concerns simultaneously. The simple characterization of NSL is illustrated in Fig. 4. NSL comprises of two newly-designed models. Firstly, an adversarial graph network (see Sec. 3.2.1) is for the semantic segmentation of multiple spinal structures. Secondly, a logical reasoning model (see Sec. 3.2.2) is for causal effect analysis and report generation.

3.2.1 Adversarial Graph Network

Fig. 5 presents the whole structure of the adversarial graph network. The adversarial graph network mainly consists of a generative adversarial network and a symbolic graph reasoning module, which are introduced below, respectively.

Generative Adversarial Network

Unlike image generation-oriented traditional generative adversarial networks, our network is designed specifically for semantic segmentation of complex spinal structures. It includes a generative network and a discriminative network with mutual promotion. More specifically, the objective of the generative network is to predicate pixel-level semantic segmentation maps, while the adversarial network is to supervise and promote the generative network. In the training period, the generative network targets generating vivid maps to trick the discriminate network. In contrast, the discriminate network target discriminating inputs maps into fake maps generated by the generative network or true maps from the ground-truth. When an apparent confrontation occurs, the discriminative network actively assists the generative network to look out mismatches in a wide range of higher-order statistics.

Figure 5: An illustration of the structure of the newly-designed adversarial gaph network.

Generative network

. We construct the generative network according to the characteristic of spinal structures. Generally speaking, we set the amount and kernel size of layers according to the receptive fields, to ensure the receptive field of every layer to coincide with target spinal structures in MRI images. The layers of the generative network are organized in an autoencoder. As shown in Fig. 

5, the encoder comprises of two standard convolutional layers, and four dilated convolutional layers. The decoder comprises of two deconvolution layers and one standard convolutional layers. Formally, we denote by the output feature map, and by the kernel with weight and bias . For each point in the input feature map , dilated convolution computes its output as , where

denotes the active function. The

is equivalent to convolve the input feature map with up-sampled kernels, which is produced by inserting zeros between two consecutive values of each kernel along each spatial dimension. Incremental dilated rates of is adaptively used on four dilated convolutional layers, respectively, according to kernel’s receptive fields. With the help of dilated convolution, the generative network can produce semantic task-aware representation using fewer parameters only. In summary, the generative network enables the NSL framework to address the challenges from high variability and complexity of spinal structures in MRI images.

Discriminative network. As shown in Fig. 5

, the discriminative network is a simple classification network comprised of three convolutional layers with large kernels, three batch normalization layers, three average pooling layers, and two fully connected layers with dropout. The inputs of the discriminative network include the ground-truth maps or the generated segmentation maps from the generative network. The output is a single scalar that presents the probability of predicting the inputs from weather or not ground-truth. Note that the discriminate network can enable the generative adversarial network to correct predicted errors and breakthrough small dataset limitations. The discriminate network can avoid over-fitting as well as achieve continued gains on global-level contiguity, which make the generative adversarial network obtain reliable generalization.

Symbolic Graph Reasoning

To leverage the structural correlations of the lumbar spine, we design the symbolic graph reasoning module. The function of symbolic graph reasoning is to improve the segmentation consistency by embedding useful prior knowledge into neural networks. The symbolic graph reasoning performs reasoning over a group of symbolic nodes whose outputs explicitly represent different properties of each semantic in a prior knowledge graph. As illustrated in Fig. 5, we interpolate this module into the center between encoder and decoder of the generative network. The symbolic graph reasoning module firstly constructs a symbolic graph that represents the prior semantic knowledge of spinal structures. It then receives the latent code from the output encoder of the generative network. It finally performs reasoning over the latent code within a symbolic graph. As such, the symbolic graph reasoning module mainly has two processes: a symbolic graph construction process and a symbolic graph embedding process, which are introduced below, respectively.

Symbolic graph construction. The symbolic graph is formulated as , where represents graph node-set and represents graph edge-set. In this task, the symbolic nodes of the symbolic graph represent the normal and abnormal spinal structures, and the edges represent the spatial relationships between them. The symbolic graph construction conducts the construction of a group of symbolic nodes and edges, which explicitly represent different properties of prior knowledge. More specifically, assume the target normal/abnormal spinal structures have entities (classes) of , the is a sparse non-symmetric matrix with the shape of , where denotes the dimension of the value of entities. Graph edges shoulder the responsibility of concept belongings between entities. The symbolic graph adopts soft edges that shoulder the occurrence probabilities. In other words, each node represents one class of normal/abnormal spinal structure; as such, each edge between two nodes represent the relationship between two classes.

As for the value of -th node , we use one common feature descriptor to extract the feature of -th class. More specifically, we averagely extract the semantic feature of -th class’ image patches from the training dataset. For edges, we calculate the occurrence probabilities between two nodes as the value of the connected edge for generating the overall formalized in a matrix with the size of . After creating the symbolic graph, we embed it into the neural networks. Incorporating such high-level prior knowledge can facilitate networks to prune spurious explanations after knowing the relationship of each entity pair, resulting in good semantic coherency (Liang et al., 2018). Symbolic graph embedding. As we have shown, the objective of symbolic graph embedding is to embed the constructed symbolic graph into the autoencoder to enhance local features with prior domain knowledge. Generally speaking, we first use an attention-based mechanism to summarize the local features encoded in the feature map of the encoder into global semantic information. This process is called local semantic attention for shouldering the representations of symbolic nodes. Based on relationship evidence of symbolic nodes, we then integrate the global semantic information with graph representation. This process is called global graph reasoning that leverages semantic constraints from prior knowledge in the spine image to evolve global observations. We finally use the evolved global representations to boost the capability of each local feature representation by a global-local mapping process.

Local semantic attention process. Formally, the local semantic attention process first receives the hidden feature tensor

from decoder outputs, where , and represent the high, width, and depth of feature maps from the final decoder layer, respectively. We uses two convolutional layers with kernels to convert into and , respectively. Next, the tensor is reshaped to and is applied by a softmax in the dimension to formalize the attention mechanism to the importance of distinct symbolic nodes. The tensor is reshaped to and multiply into final output with the same size of graph entities . The unify process can be presented by a function ,


where is the trainable transformation matrix for converting each local feature into the same dimension with entities representation. The function is computed as:


Here is a trainable weight matrix for calculating voting weights. . . The attention weight represents the attention importance of assigning local feature to the node  (Liang et al., 2018).

Global graph reasoning process. The global graph reasoning process performs graph propagation over representations of all symbolic nodes via the matrix multiplication form, resulting in the evolved features :


where . which integrates the prior information by concatenating node representation . is a trainable weight matrix. The node adjacency weight is hard weight (i.e. {0,1}) if adjacent and occur simultaneously in the spinal MRI image. To avoid the feature scale shift problem from large magnitude, is normalized into in which all rows sum to one, such that


where , and added self-connections and

is the identity matrix.

Global local mapping process. Similar to the local semantic attention process, this final process is to map the integrated tensors to the input of the decoder:


where is also a learnable transform matrix. In contrast with , the is the attention weight matrix that maps by evaluating the compatibility of each symbolic node with each local feature :


In summary, combining the high-level constructed symbolic graph and the symbolic graph embedding process leads to hybrid reasoning behaviors, which is also beneficial for merging the prior knowledge in the middle of the autoencoder. The symbolic graph reasoning model is capable of representing the inherent feature of target structures, measuring the connection weight between normal and abnormal structures, and constructing the spine graph into the generative network. The graph reasoning thus enables the generative network to model the latent yet crucial correlations between normal and abnormal structures dynamically.

Learning Strategy

The learning strategy of the adversarial graph network has two stages: 1) construct the symbolic graph, and 2) optimize the network. Since we have present the previous first stages in the corresponding section, we only introduce the second stage.

We denote by the generative network while by the discriminative network, where and are their learnable variables, respectively. As such, represent the predicted segmentation map, and denotes the probability that is ground truth segmentation map. The objective of the adversarial graph network is to generate optimal segmentation maps where the value of each pixel represents a radiological classification result. Inspired by Goodfellow et al. (2014)

, we minimize a hybrid loss function denoted by

, which is defined as:


controls the equilibrate of adversarial training, and is set to one without loss of generality. The generative loss function () is a weighted multi-class cross-entropy loss function. The weight balance the prediction of generative network by computing the pixel amount of target classes. The discriminative loss function is a binary cross-entropy loss function with stable convegence.

Figure 6: An illustration of the workflow of symbolic logical reasoning.

3.2.2 Symbolic Logical Reasoning

Based on the results of the adversarial graph network, the symbolic logical reasoning conducts human-like reasoning that achieves unsupervised causal effect analysis of detected entities of spinal diseases. Symbolic logical reasoning utilizes meta-interpretive learning and first-order logic programming by bringing in background knowledge to remedy the lake of supervision information. Causal effect for spinal report generation refers to the pathological relations between detected spinal diseases, which are value keywords among diagnostic reports in spine radiology. As such, causal effect analysis is inevitable in automated spinal report generation. It is noteworthy that causal effect analysis can significantly 1) promote clinical pathogenesis-based diagnosis, and 2) help early diagnosis when the pathogenic factor is solely occurring. Fig. 6 represents the workflow of symbolic logical reasoning. Generally speaking, we split the task of unsupervised causal effect analysis into two steps: 1) induce the hypothesis of the causal effect between target spinal diseases using meta-interpretive learning; and 2) conduct unsupervised labeling of segmented spinal structures using first-order logic programming. Based on the induced hypothesis, we finally obtain the causal effect between labeled spinal structures. We present the two steps, respectively.

Meta-Interpretive Learning for Hypothesis Induction of Causal Effect between Target Spinal Diseases

The objective of hypothesis induction is to summarize the pathological relations between target spinal diseases. We use meta-interpretive learning, a novel inductive logic programming framework, to induce the causal effect hypothesis of pathological relations formalized by first-order logic clauses. Meta-interpretive learning is proposed by Muggleton et al. (2015), and it supports predicate invention and efficient learning of logic hypothesis because it can execute high-order logic programming. Predicate invention of unknown concepts can expand the closed-world machine learning to open-world machine learning to improve the generalization and robustness. The inputs include a knowledgebase and a set of logical facts . Knowledgebase consists of key background knowledge, such as the common sense of spinal structures. Logical facts can be viewed as training examples comprised of positive and negative examples, i.e., . The training examples are collected according to the relationship facts between spinal structures from the training dataset. Hypothesis induction is to learn a hypothesis that defines the pathological relations by , where and is a set of meta-rules. Meta-rules are second-order logic clauses that view the predicates and functions of first-order logic clauses as variables that can be grounded by abductive reasoning from and . The symbol is entailment, which represents that the label of is true only if both and are satisfied. To learn the logical hypothesis, we use inverse entailment to convert inductive problem to deduction problem: , where is the negation of such that the raw hypothesis can get from the negation of inverse entailment result. A logic hypothesis of a concept class is comprised of a set of logic clauses and can be partitioned into logical atoms,


where is an atom representing a specific target spinal structure. Atoms are the first-order logic formulas without conjunctions (, ), such as where is a predicate and are terms. Terms are constant , variable , or structured term in the form of where is a functor. is literal, which is an atom or its negation . A clause that does not contain any variable is grounded, and grounded atoms are ground facts.

The workflow of a MIL is continuous to prove a set of logic facts according to background knowledge by fetching higher-order meta-rules. The proving process is a predicate substitution process, and a predicate is invented if the substituted predicates do not exist in the knowledgebase . The background knowledge used in this work is the clinical knowledge about the pathological relations between target spinal diseases: lumbar neural foraminal stenosis (NFS), intervertebral disc deformation (IDD), and lumbar vertebral deformation (LVD). The part of background knowledge is shown as follows.

  %Logical Predicate.
  mayCause/2. dis/1.
  %Background Knowledge.
  dis(IDD). dis(LVD).
  dis(NFS). dis(others).

The logical predicate represent the one may cause another . denote that is a kind of disease in . represent the other pathogenic factors. The examples are exacted from the training dataset.

Finally, the logical hypothesis of the causal effect between target spinal diseases is induced as follows:

  cause(A, B, C):- dis(A), dis(B), dis(c),
           mayCause(A, C); mayCause(B, C).

Here the symbol comma () represents disjunction ().

First-Order Logic Programming for Unsupervised Labeling of Segmented Spinal Structures.

After the hypothesis induction process, we find the index of segmented spinal structures, i.e., labeling the order of segmented spinal structures. Since segmentation ground-truth does not have the order information, it desires to do unsupervised labeling. It is worth note that unsupervised labeling is the base for further causal analysis and report generation. We leverage first-order logic programming to achieve this process.

As shown in Fig. 6, the inputs of the unsupervised labeling process are the generated segmentation maps from adversarial graph network, and the outputs are several dictionaries comprised of orders and normalities of spinal structures. The keys of each dictionary are the order of one type structure, while the values of the dictionary are the normality conditions at the sites of one type structure in a lumbar spine. The first step is to discover patterns for location assignment of spinal structures. According to the domain knowledge, locations and surrounding correlations are the inherent patterns inside lumbar spinal structures, i.e., in a lumbar spine, all intervertebral discs are separated by vertebrae that like the black and white grid of the piano. This observation can be described by following first-order logic programming:

  same/2. adj/2. sep/3.
  %Background Knowledge.

are first-order predicates representing is same as , is adjacent with , and separates and , respectively. Symbolic :- represents logical implication () and the comma () represents conjunction (). represent disparate variables. The final clause is a separation hypothesis describing that and are separated by if and only if is same as , and are adjacent with .

Because the segmented structures in segmentation maps always contain a few spots, the second step is a post-processing procedure to eliminate these spots and to estimate the correct label of isolated structures. Clinical concerns among medical reports are the situation of lumbar vertebrae, discs, and neural foramen from L1 to L5. Let lumbar discs as an example, we first calculate out the minimal height of vertebral in the training set and then let the height divided by four be the margin between pixels of intervertebral discs. The order can be determined according to the above logical clauses. We then compare the pixel amounts between normal and abnormal labels and then choose the one that has the most amount pixels as the final label. We finally collect the labeling results formed in a standard dictionary for the next process. After obtaining the order of segmented spinal structures, we input them into the hypothesis of causal effect to analyze the pathological relations between target spinal diseases.

Spinal report generation. In the end, we summarize the discoveries from segmentation and casual effect analysis, then fill these discoveries into a unified template. We use If-Then logical operations to create a unified template. For instance, if the neural foramen, disc, and vertebra are abnormal at L3-L4, the captioning process can output ”At L3-L4, the intervertebral disc has obvious degenerative changes. The above vertebra also has deformation changes. They lead to the neural foraminal stenosis to a certain extent.”. If the neural foramen is normal, and disc or vertebra is abnormal, one can predict the neural foramen has a large possibility to be stenosis.

4 Experiments

4.1 Data and Configuration

The NSL is evaluated on a real-world clinical dataset. This dataset is collected from multi-center and various models of vendors. It includes 253 clinical patients. The average year of patient age is 5338, with 147 females and 106 males. Among sequential T1/T2-weighted MRI scans of each patient, one middle lumbar MRI image was selected to better present neural foramina, discs, and vertebra simultaneously in the sagittal direction. In each MRI image, we can observe three types of spinal structures: neural foramen, intervertebral disc, and lumbar vertebrae. These three types of spinal structures are associated with three types of spinal diseases: lumbar neural foraminal stenosis (NFS), intervertebral disc deformation (IDD), and lumbar vertebral deformation (LVD). The ground-truth was annotated by extracting from clinical reports, which are double-checked by board-certified radiologists.

The framework directly handles clinical MRI images without any pre/post-processing and data augmentation. The feature descriptor for graph construction is the Histogram of Oriented Gradient (HOG). The generative network uses the RMSProp algorithm to optimize the weights

, while the discriminative network uses the Adam optimization algorithm to optimize the weights . The weights of both and are initialized with Xavier initialization. Considering the task of the generative network is harder than the adversarial network, the initial learning rate of RMSProp is set to , while the learning rate of Adam is . In terms of RMSProp optimizer, decay is 0.9, momentum is 0.9, and is -. In terms of Adam optimizer, is 0.9, is 0.999, and is -

. The adversarial graph network is implemented in Python and Tensorflow library 

(Abadi et al., 2016)

. The logical reasoning model is implemented in Prolog. We use mini-batch size is 4, and training epochs is 300 using one Nvidia GPU Titan X with cuDNN v5.1 and Intel CPU Xeon(R) E5-2620@2.5GHz. We split the whole dataset into a training set (80%) and a testing set (20%). Standard five-fold cross-validation on the training set is employed for the model selection.

4.2 Experimental Design

The evaluation metrics include pixel-level accuracy, Dice coefficient, specificity, and sensitivity. The semantic segmentation of one spinal structure is correct if this structure is pixel-wisely segmented and classified correctly.

We compare the semantic segmentation ability of our neural symbolic learning framework (NSL) with several state-of-the-art semantic segmentation networks as follows.

  • Fully Convolutional Network (FCN) (Shelhamer et al., 2017). FCN is a pixels-to-pixels semantic segmentation network. It transforms fully connected layers into convolutional layers with multi-resolution layers. The FCN-VGG16 is used for comparison, and the deconvolution layers of FCN-VGG16 are initialized by bi-linear up-sampling.

  • SegNet (Badrinarayanan et al., 2015). SegNet is an encoder-decoder manner semantic segmentation network, in which the decoder up-samples its lower resolution input feature map. The used backbone network of SegNet is VGG16, with 13 convolutional layers.

  • DeepLabV3+ (Chen et al., 2018). It extends DeepLabV3 (Chen et al., 2017) by adding a decoder module to refine the segmentation results.

  • U-Net (Ronneberger et al., 2015). U-Net is a very popular semantic segmentation network that is primarily designed for medical image segmentation. The core of U-Net is that appends skip connections between encoder and decoder layers.

  • Spine-GAN (Han et al., 2018c)

    . It is the state-of-the-art network for the semantic segmentation of multiple spinal structures. Spine-GAN is a different adversarial network and uses a local long-short term memory module (Local-LSTM) for modeling the spatial relationships of spinal structures.

  • Generative Network without the symbolic graph reasoning module (GN-SGR). GN-SGR is an ablated version of the adversarial graph network with the autoencoder only.

  • Adversarial Graph Network without the symbolic graph reasoning module (AGN-SGR). It is an ablated version combining the generative and discriminative networks.

  • Adversarial Graph Network without the discriminative network (AGN-DN). It is an ablated version by combining the generative network and symbolic graph reasoning as well as removing the discriminative network.

Figure 7: An illustration of the generated radiological reports by combining neural learning and symbolic reasoning in a mutually beneficial way.

4.3 Results

4.3.1 Medical Report Generation

The representative radiological reports generated by the proposed NSL framework are illustrated in Fig. 7. Empirical results prove that NSL can directly generate radiologist-level diagnosis reports with weakly-supervised information. These results justify the significance of unifying deep neural learning and symbolic logical reasoning. These also verify the validity that NSL integrates the advantages of neural learning on noisy data processing and the logical reasoning on the knowledge representation.

As shown in the first report in Fig. 7, the learnable and unlearnable concerns of spinal structures are predicated accurately and reliably, thanks to the powerful segmentation ability from the adversarial graph network. The labeling information in the generated reports are also robust that demonstrates the correctness of the first-order logical programming based on clinical background knowledge. These once justify the value of embedding domain knowledge into the learning process.

As the purple color text presented in Fig. 7, NSL achieves reliable causal effect analysis thanks to the symbolic logical reasoning. NSL can also produce pathological correlations between spinal diseases of NFS, LVD, and IDD, which demonstrate the feasibility and effectiveness of meta-interpretive learning. In the first report shown in Fig. 7, NSL automatically presents that the pathogenic factors of the NFS between L4-L5 are its surrounding L5 vertebra (LVD), L4-L5 intervertebral disc (IDD). Also, in the second report presented in Fig. 7, NSL rightly discovers that the abnormal L5 disc is the pathogenic factor of the L4-L5 NFS.

Generated unified reports justify that the weakly-supervised way is robust, and endows our framework a potential as a clinical tool to relieve radiologists from laborious workloads to a certain extent. Since it is impossible to judge the correctness of computer-made medical reports compared with radiologist-made reports using NLP metrics, it is possible to evaluate the accuracy of keywords about critical concerns in the generated spinal medical reports, by computing the metrics about semantic segmentation performance and labeling accuracy.

Method Pixel accuracy Dice coefficient Specificity Sensitivity
FCN 0.9170.004 0.7540.033 0.7540.035 0.7120.032
SegNet 0.9450.002 0.7600.032 0.7950.043 0.7190.024
DeepLab 0.9530.001 0.8120.021 0.7990.035 0.8270.017
U-Net 0.9200.004 0.7970.013 0.8160.027 0.7700.026
Spine-GAN 0.9620.003 0.8710.004 0.8910.017 0.8600.025
GN-SGR 0.9580.002 0.8410.013 0.8620.018 0.8230.024
AGN-SGR 0.9600.004 0.8630.006 0.8730.015 0.8550.027
AGN-DN 0.9610.003 0.853 0.006 0.8690.023 0.8530.022
NSL 0.9650.004 0.8790.003 0.9030.012 0.8720.023
Table 1: NSL has superior effectiveness on the semantic segmentation, which is demonstrated by the comparison with state-of-the-art methods as well as its ablation studies.
Method Dice coefficient
Normal vertebrae LVD Normal disc IDD Normal foramen NFS
FCN 0.8700.017 0.7010.046 0.7300.055 0.7250.070 0.7850.039 0.7110.018
SegNet 0.8890.009 0.7600.023 0.6950.019 0.7760.014 0.7560.037 0.6840.021
DeepLabv3+ 0.8950.012 0.7650.036 0.7460.035 0.8240.060 0.8330.042 0.8080.013
U-Net 0.8780.007 0.7260.036 0.7720.025 0.8030.020 0.8210.017 0.7820.010
Spine-GAN 0.9300.011 0.8100.016 0.8400.026 0.8730.011 0.9000.011 0.8700.018
GN-SGR 0.9170.009 0.7990.026 0.8090.028 0.8390.011 0.8630.029 0.8150.019
AGN-SGR 0.9290.010 0.8070.015 0.8350.021 0.8570.015 0.8870.009 0.8540.014
AGN-DN 0.9280.010 0.8080.013 0.8360.011 0.8580.014 0.8890.027 0.8580.017
NSL 0.9340.013 0.8210.015 0.8450.021 0.8740.012 0.9130.015 0.8740.015
Table 2: Our method obtains satisfying performance on Dice coefficient.
Method Specificity Sensitivity
FCN 0.8750.025 0.6380.072 0.7450.041 0.7370.085 0.7260.039 0.6720.029
SegNet 0.9060.002 0.7310.012 0.7460.015 0.7380.017 0.7550.032 0.6620.022
DeepLabv3+ 0.8940.010 0.7170.027 0.7860.035 0.7610.020 0.8520.025 0.8650.012
U-Net 0.8890.042 0.7460.031 0.8140.057 0.7290.079 0.8110.049 0.7690.060
Spine-GAN 0.9210.020 0.8440.063 0.9070.047 0.8310.084 0.8710.029 0.8760.029
GN-SGR 0.9070.027 0.8100.039 0.8670.032 0.8170.070 0.8440.027 0.8210.027
AGN-SGR 0.9180.019 0.8040.028 0.8930.020 0.8300.080 0.8690.029 0.8560.039
AGN-GN 0.9190.021 0.8180.043 0.8950.022 0.8350.091 0.8720.024 0.8570.041
NSL 0.9320.025 0.8470.053 0.9150.042 0.8420.085 0.8750.026 0.8790.028
Table 3: NSL shows superior radiological classification effectiveness on specificity and sensitivity of three spinal diseases.
Figure 8: An illustration of semantic segmentation results. The NSL has achieved reliable performance in the semantic segmentation of neural foramen, intervertebral discs, and vertebrae, which demonstrate that NSL is an efficient framework for clinical application in spine radiology. The left, middle, and right columns represent MRI images, ground-truth maps, and generated maps, respectively. Color bars represent: 0:background; 1:normal vertebrae; 2:LVD; 3:normal disc; 4:IDD; 5:Normal foramen; 6:NFS (Best in color).

4.3.2 Semantic Segmentation Performance

Figure 9: An illustration of bad cases of semantic segmentation.
Figure 10: An illustration of generated feature maps from the layer after symbolic graph reasoning. We can see that the learned representation are high-level semantics representing specific spinal structures.

As illustrated in Table 1, we achieve higher performance than the compared state-of-the-art methods in the semantic segmentation of three types of spinal structures. NSL significantly outperforms the FCN network by 4.8% pixel accuracy and 12.5% average Dice coefficient. NSL outperforms the U-Net network by 4.5% pixel accuracy and 8.2% average Dice coefficient. As illustrated in Table 2 and Table 3, the effectiveness and advantages of NSL has also been demonstrated. NSL has simultaneously achieved accurate segmentation, precise radiological classification of neural foramen, intervertebral discs, and vertebrae, as shown in Fig. 8. Even both the structural complexity and ambiguous correlations between various spine structures lead to unusual difficulties, NSL obtains robust performance, which demonstrates its strengths in addressing the spatial relationships and high structures variability. Fig.  11 presents detailed charts that show visible improvement achieve by our algorithm. The representative bad cases are shown in Fig. 9, we can see that bad cases may be caused by the specialized structures of MRI images that have more spinal structures than general MRI images, which seldom impact the report generation performance. The reason for semantic segmentation rather than object detection is that segmentation is better to present more spatial details than object detection.

Figure 11: An analysis of compared methods. Our framework, NSL, obtains best results when compare with existing methods or ablation studies.

Regard to the ablation study, various experimental results are shown in Fig. 11 (e, f) and Table 123 (from 6th row to 9th row) are demonstrating the indispensability and effectiveness of the three modules of adversarial graph network. Firstly, the base generative network without the symbolic graph reasoning module on average achieves 95.8%0.2 pixel accuracy and 84.1%1.3 Dice coefficient, which is already higher than other segmentation networks. This result demonstrates that the generative network could obtain deep semantic representation and preserving fine-grained detailed differences between normal and abnormal structures. Secondly, the generative network with the symbolic graph reasoning module achieves 96.1%0.3 pixel accuracy and 86.9%2.3 Dice coefficient, which exceeds the base module by 0.3% and 2.5%, respectively. This result demonstrates the capability of symbolic logical reasoning in modeling the latent yet crucial spatial correlations between neighboring structures dynamically. Thirdly, the adversarial graph network without the symbolic graph reasoning module on average achieves 96.0%0.4 pixel accuracy and 86.3%0.6 Dice coefficient, which exceeds the base ACAE module by 0.2% and 2.2%, respectively. This result demonstrates that using adversarial training can effectively supervise the generative network to correct the errors of semantic segmentation. Besides, representative feature maps in Fig. 10 is better to demonstrate the ability of the symbolic logical reasoning module intuitively. Finally, the combination of the three modules (NSL) achieves better performance than its ablation version. Regarding radiological classification, NSL also achieves higher specificity and sensitivity than its ablated versions. Therefore, the combination of these sub-modules makes NSL an efficient and reliable resolution for the semantic segmentation of multiple spinal structures.

Figure 12: An illustration of unsupervised labeling results. represents the order index, represents normal, represent vertebra, represents neural foraminal. Normal structures are in green while abnormal structures in yellow.

4.3.3 Unsupervised Labeling Accuracy

As illustrated in Fig. 12, the symbolic logical reasoning model achieves stable unsupervised labeling and produces a highly accurate labeling accuracy rate. Under the condition of accurate semantic segmentation, the labeling accuracy rate is up to 100%, which proves the robustness of first-order logic programming. That once demonstrates the importance of logical reasoning on knowledge representation. The combination of logical reasoning and neural learning can well handle noisy data and knowledge representation towards machine learning with the abilities of generalization, robustness, and interpretability.

5 Conclusion

In this paper, we proposed the Neural-Symbolic Learning (NSL) framework for the automated generation of medical diagnosis reports in spine radiology. NSL combines neural learning and symbolic reasoning in a mutually beneficial way. As such, NSL is a human-like learning framework with visual perception ability and high-level logical reasoning strength. This combination can boost the generalization and interpretability of neural learning, also give a robust solution naturally. Extensive results have demonstrated its effectiveness and potential as a clinical tool to relieve spinal radiologists from laborious workloads. This framework has scalability and sustainability such that it can be easily extended to other diseases with the need of radiological report generation.


This work was funded by the National Natural Science Foundation of China (Grant Nos. 61872225, 61876098, 61573219), Natural Science Foundation of Shandong Province (Grant No. ZR2015FM010), Project of Science and Technology Plan of Shandong Higher Education Institutions Program (Grant No. J15LN20), and Project of Shandong Province Medical and Health Technology Development Program (Grant No. 2016WS0577).


  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. Cited by: §4.1.
  • R. S. Alomari, J. J. Corso, and V. Chaudhary (2011) Labeling of lumbar discs using both pixel- and object-level features with a two-level probabilistic model. IEEE Transactions on Medical Imaging 30 (1), pp. 1–10. External Links: Document, ISSN 0278-0062 Cited by: §2.1.
  • V. Badrinarayanan, A. Kendall, and R. Cipolla (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561. Cited by: item 2.
  • Y. Cai, S. Osman, M. Sharma, M. Landis, and S. Li (2015) Multi-modality vertebra recognition in arbitrary views using 3d deformable hierarchical model. IEEE Transactions on Medical Imaging 34 (8), pp. 1676–1693. External Links: Document, ISSN 0278-0062 Cited by: §2.1.
  • Y. Cai, S. Leungb, J. Warringtonb, S. Pandeyb, O. Shmuilovichb, and S. Lib (2017) Direct spondylolisthesis identification and measurement in mr/ct using detectors trained by articulated parameterized spine model. In Proc. of SPIE Vol, Vol. 10133, pp. 1013319–1. Cited by: §2.1.
  • L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915. Cited by: §2.2.1.
  • L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: §2.2.1, item 3.
  • L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv:1802.02611. Cited by: item 3.
  • Z. Chen, X. Wei, P. Wang, and Y. Guo (2019) Multi-label image recognition with graph convolutional networks. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 5177–5186. Cited by: §2.2.1.
  • J. J. Corso, A. Raja’S, and V. Chaudhary (2008) Lumbar disc localization and labeling with a probabilistic model on both pixel and object features. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 202–210. Cited by: §2.1.
  • C. Cortes and V. Vapnik (1995) Support-vector networks. Machine learning 20 (3), pp. 273–297. Cited by: §2.2.2.
  • W. Dai, Q. Xu, Y. Yu, and Z. Zhou (2018) Tunneling neural perception and logic reasoning through abductive learning. arXiv preprint arXiv:1802.01173. Cited by: §2.2.2, §2.2.2.
  • L. De Raedt and A. Kimmig (2015) Probabilistic (logic) programming concepts. Machine Learning 100 (1), pp. 5–47. Cited by: §2.2.2.
  • H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou (2019) Neural logic machines. arXiv preprint arXiv:1904.11694. Cited by: §2.2.2.
  • D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §2.2.1.
  • A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), pp. 115. Cited by: §1.
  • N. Friedman, D. Geiger, and M. Goldszmidt (1997) Bayesian network classifiers. Machine learning 29 (2-3), pp. 131–163. Cited by: §2.2.2.
  • S. Ghosha, A. Raja’S, V. Chaudharya, and G. Dhillonb (2011) Automatic lumbar vertebra segmentation from clinical ct for wedge compression fracture diagnosis. work 9, pp. 11. Cited by: §2.1.
  • A. Giusti, D. C. Ciresan, J. Masci, L. M. Gambardella, and J. Schmidhuber (2013)

    Fast image scanning with deep max-pooling convolutional neural networks

    In Image Processing (ICIP), 2013 20th IEEE International Conference on, pp. 4034–4038. Cited by: §2.2.1.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.2.1, §3.2.1.
  • Z. Han, B. Wei, S. Leung, J. Chung, and S. Li (2018a) Towards automatic report generation in spine radiology using weakly supervised framework. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Cham, pp. 185–193. External Links: ISBN 978-3-030-00937-3 Cited by: §1.
  • Z. Han, B. Wei, S. Leung, I. B. Nachum, D. Laidley, and S. Li (2018) Automated pathogenesis-based diagnosis of lumbar neural foraminal stenosis via deep multiscale multitask learning. Neuroinformatics 16 (3), pp. 325–337. External Links: ISSN 1559-0089, Document Cited by: §1, §1, §2.1.
  • Z. Han, B. Wei, A. Mercado, S. Leung, and S. Li (2018b) Spine-gan: semantic segmentation of multiple spinal structures. Medical Image Analysis 50, pp. 23 – 35. External Links: ISSN 1361-8415, Document, Link Cited by: §1.
  • Z. Han, B. Wei, A. Mercado, S. Leung, and S. Li (2018c) Spine-gan: semantic segmentation of multiple spinal structures. Medical image analysis 50, pp. 23–35. Cited by: §1, item 5.
  • X. He, M. Landisa, S. Leunga, J. Warringtona, O. Shmuilovicha, and S. Lia (2017a) Automated grading of lumbar disc degeneration via supervised distance metric learning. In Proc. of SPIE Vol, Vol. 10134, pp. 1013443–1. Cited by: §2.1.
  • X. He, A. Lum, M. Sharma, G. Brahm, A. Mercado, and S. Li (2017b) Automated segmentation and area estimation of neural foramina with boundary regression model. Pattern Recognition 63, pp. 625–641. Cited by: §1, §2.1.
  • X. He, Y. Yin, M. Sharma, G. Brahm, A. Mercado, and S. Li (2016) Automated diagnosis of neural foraminal stenosis using synchronized superpixels representation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 335–343. Cited by: §2.1.
  • X. He, H. Zhang, M. Landis, M. Sharma, J. Warrington, and S. Li (2017c) Unsupervised boundary delineation of spinal neural foramina using a multi-feature and adaptive spectral segmentation. Medical image analysis 36, pp. 22–40. Cited by: §1, §2.1.
  • M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian (1990) A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets, pp. 286–297. Cited by: §2.2.1.
  • S. Huang, Y. Chu, S. Lai, and C. L. Novak (2009) Learning-based vertebra detection and iterative normalized-cut segmentation for spinal mri. IEEE transactions on medical imaging 28 (10), pp. 1595–1605. Cited by: §2.1.
  • A. Jamaludin, T. Kadir, and A. Zisserman (2017) SpineNet: automated classification and evidence visualization in spinal mris. Medical Image Analysis 41 (), pp. 63 – 73. Note: Special Issue on the 2016 Conference on Medical Image Computing and Computer Assisted Intervention (Analog to MICCAI 2015) External Links: ISSN 1361-8415, Document, Link Cited by: §2.1.
  • L. Kaufman, M. Mineyev, S. Powers, and D. Goldhaber (2005) Methods for generating a lung report. Google Patents. Note: US Patent 6,901,277 Cited by: §2.1.
  • B. M. Kelm, M. Wels, S. K. Zhou, S. Seifert, M. Suehling, Y. Zheng, and D. Comaniciu (2013) Spine detection in ct and mr using iterated marginal space learning. Medical image analysis 17 (8), pp. 1283–1292. Cited by: §2.1.
  • S. Kim, J. W. Lee, J. W. Chai, H. J. Yoo, Y. Kang, J. Seo, J. M. Ahn, and H. S. Kang (2015) A new mri grading system for cervical foraminal stenosis based on axial t2-weighted images. Korean journal of radiology 16 (6), pp. 1294–1302. Cited by: §1.
  • T. Klinder, R. Wolz, C. Lorenz, A. Franz, and J. Ostermann (2008) Spine segmentation using articulated shape models. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2008, pp. 227–234. Cited by: §2.1.
  • S. Kohl, D. Bonekamp, H. Schlemmer, K. Yaqubi, M. Hohenfellner, B. Hadaschik, J. Radtke, and K. Maier-Hein (2017) Adversarial networks for the detection of aggressive prostate cancer. arXiv preprint arXiv:1702.08014. Cited by: §2.2.1.
  • D. Koller, N. Friedman, S. Džeroski, C. Sutton, A. McCallum, A. Pfeffer, P. Abbeel, M. Wong, D. Heckerman, C. Meek, et al. (2007) Introduction to statistical relational learning. MIT press. Cited by: §2.2.2.
  • R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, et al. (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123 (1), pp. 32–73. Cited by: §3.1.
  • G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg (2011) Baby talk: understanding and generating image descriptions. In Proceedings of the 24th CVPR, Cited by: §3.1.
  • N. Lavrac and S. Dzeroski (1994) Inductive logic programming.. In WLP, pp. 146–160. Cited by: §2.2.2.
  • Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §2.2.2.
  • Y. Li, X. Liang, Z. Hu, and E. P. Xing (2018) Hybrid retrieval-generation reinforced agent for medical image report generation. In Advances in Neural Information Processing Systems, pp. 1530–1540. Cited by: §2.1.
  • X. Liang, Z. Hu, H. Zhang, L. Lin, and E. P. Xing (2018) Symbolic graph reasoning meets convolutions. In Advances in Neural Information Processing Systems, pp. 1853–1863. Cited by: §2.2.1, §3.2.1, §3.2.1.
  • S. Liao (2005) Expert system methodologies and applications—a decade review from 1995 to 2004. Expert systems with applications 28 (1), pp. 93–103. Cited by: §2.2.2.
  • P. Luc, C. Couprie, S. Chintala, and J. Verbeek (2016) Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408. Cited by: §2.2.1.
  • B. Mahasseni, M. Lam, and S. Todorovic (2017) Unsupervised video summarization with adversarial lstm networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.1.
  • P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim (2017) Adversarial training and dilated convolutions for brain mri segmentation. arXiv preprint arXiv:1707.03195. Cited by: §2.2.1.
  • S. H. Muggleton, D. Lin, and A. Tamaddoni-Nezhad (2015) Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited. Machine Learning 100 (1), pp. 49–73. Cited by: §3.2.2.
  • W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu (2019) Interpretable machine learning: definitions, methods, and applications. arXiv preprint arXiv:1901.04592. Cited by: §2.2.2.
  • Z. Peng, J. Zhong, W. Wee, and J. Lee (2006) Automated vertebra detection and segmentation from the whole spine mr images. In Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the, pp. 2527–2530. Cited by: §2.1.
  • A. Raja’S, J. J. Corso, V. Chaudhary, and G. Dhillon (2011) Toward a clinical lumbar cad: herniation diagnosis. International journal of computer assisted radiology and surgery 6 (1), pp. 119–126. Cited by: §2.1.
  • S. S. Rajaee, H. W. Bae, L. E. Kanim, and R. B. Delamarter (2012) Spinal fusion in the united states: analysis of trends from 1998 to 2008. Spine 37 (1), pp. 67–76. Cited by: §1.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Cited by: item 4.
  • D. F. Rosenthal, J. M. Bos, R. A. Sokolowski, J. B. Mayo, K. A. Quigley, R. A. Powell, and M. Teel (1997) A voice-enabled, structured medical reporting system. Journal of the american medical informatics association 4 (6), pp. 436–441. Cited by: §1.
  • S. Russell (2015) Unifying logic and probability. Communications of the ACM 58 (7), pp. 88–97. Cited by: §1, §2.2.2, §2.2.2.
  • S. R. Safavian and D. Landgrebe (1991) A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21 (3), pp. 660–674. Cited by: §2.2.2.
  • T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pp. 146–157. Cited by: §2.2.1.
  • M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling (2018) Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pp. 593–607. Cited by: §2.2.1.
  • P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. Cited by: §2.2.1.
  • M. Shanahan, K. Nikiforou, A. Creswell, C. Kaplanis, D. Barrett, and M. Garnelo (2019) An explicitly relational neural network architecture. arXiv preprint arXiv:1905.10307. Cited by: §2.2.2.
  • E. Shelhamer, J. Long, and T. Darrell (2017) Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence 39 (4), pp. 640–651. Cited by: item 1.
  • R. Shi, D. Sun, Z. Qiu, and K. L. Weiss (2007) An efficient method for segmentation of mri spine images. In Complex Medical Engineering, 2007. CME 2007. IEEE/ICME International Conference on, pp. 713–717. Cited by: §2.1.
  • D. Štern, B. Likar, F. Pernuš, and T. Vrtovec (2009) Automated detection of spinal centrelines, vertebral bodies and intervertebral discs in ct and mr images of lumbar spine. Physics in medicine and biology 55 (1), pp. 247. Cited by: §2.1.
  • L. Sun, W. Wang, J. Li, and J. Lin (2019) Study on medical image report generation based on improved encoding-decoding method. In Intelligent Computing Theories and Application, D. Huang, V. Bevilacqua, and P. Premaratne (Eds.), Cham, pp. 686–696. External Links: ISBN 978-3-030-26763-6 Cited by: §2.1.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §2.2.2.
  • L. G. Valiant (1984) A theory of the learnable. In

    Proceedings of the sixteenth annual ACM symposium on Theory of computing

    pp. 436–445. Cited by: §2.2.2.
  • F. Vorbeck, A. Ba-Ssalamah, J. Kettenbach, and P. Huebsch (2000) Report generation using digital speech recognition in radiology. European Radiology 10 (12), pp. 1976–1982. External Links: ISSN 1432-1084, Document, Link Cited by: §1, §2.1.
  • X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers (2018) TieNet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. arXiv preprint arXiv:1801.04334. Cited by: §2.1.
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2019) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §2.2.1.
  • C. Xu, L. Xu, P. Ohorodnyk, M. Roth, B. Chen, and S. Li (2020) Contrast agent-free synthesis and segmentation of ischemic heart disease images using progressive sequential causal gans. Medical Image Analysis, pp. 101668. Cited by: §2.1.
  • G. Yang, K. Young, S. Huang, J. Shim, and W. L. Nowinski (2011) Method for creating a report from radiological images using electronic report templates. Google Patents. Note: US Patent 20130251233A1 Cited by: §2.1.
  • J. Yao, J. E. Burns, D. Forsberg, A. Seitel, A. Rasoulian, P. Abolmaesumi, K. Hammernik, M. Urschler, B. Ibragimov, R. Korez, et al. (2016) A multi-center milestone study of clinical vertebral ct segmentation. Computerized Medical Imaging and Graphics 49, pp. 16–28. Cited by: §1, §2.1.
  • R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: §2.2.1.
  • F. Yu and V. Koltun (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. Cited by: §2.2.1.
  • Y. Zhan, D. Maneesh, M. Harder, and X. S. Zhou (2012) Robust mr spine detection using hierarchical learning and local articulated model. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 141–148. Cited by: §2.1.
  • Z. Zhang, Y. Xie, F. Xing, M. McGough, and L. Yang (2017) MDNet: a semantically and visually interpretable medical image diagnosis network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3549–3557. External Links: Document, ISSN 1063-6919 Cited by: §2.1.
  • J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §2.2.1.
  • Z. Zhou (2019) Abductive learning: towards bridging machine learning and logical reasoning. Science China Information Sciences 62 (7), pp. 76101. Cited by: §1, §2.2.2.