Decision Support for Intoxication Prediction Using Graph Convolutional Networks

05/02/2020 ∙ by Hendrik Burwinkel, et al. ∙ Technische Universität München 8

Every day, poison control centers (PCC) are called for immediate classification and treatment recommendations if an acute intoxication is suspected. Due to the time-sensitive nature of these cases, doctors are required to propose a correct diagnosis and intervention within a minimal time frame. Usually the toxin is known and recommendations can be made accordingly. However, in challenging cases only symptoms are mentioned and doctors have to rely on their clinical experience. Medical experts and our analyses of a regional dataset of intoxication records provide evidence that this is challenging, since occurring symptoms may not always match the textbook description due to regional distinctions, inter-rater variance, and institutional workflow. Computer-aided diagnosis (CADx) can provide decision support, but approaches so far do not consider additional information of the reported cases like age or gender, despite their potential value towards a correct diagnosis. In this work, we propose a new machine learning based CADx method which fuses symptoms and meta information of the patients using graph convolutional networks. We further propose a novel symptom matching method that allows the effective incorporation of prior knowledge into the learning process and evidently stabilizes the poison prediction. We validate our method against 10 medical doctors with different experience diagnosing intoxication cases for 10 different toxins from the PCC in Munich and show our method's superiority in performance for poison prediction.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Intoxication is undoubtedly one of the most significant factors of global suffering and death. In 2016, the abuse of alcohol alone resulted in 2.8 million deaths globally, and was accountable for 99.2 million DALYs (disability-adjusted life-years) – 4.2% of all DALYs. Other drugs also summed up to 31.8 million DALYs and 451,800 deaths world-wide [7]. In case of an intoxication, fast diagnosis and treatment are essential in order to prevent permanent organ damage or even death [10]. Since not all medical practitioners are experts in the field of toxicology, specialized poison control centers (PCC) like the center in Munich were established. These institutions can be called by anyone – doctor or layman – to help in the classification and treatment of patients. Most of the time, the substance responsible for the intoxication is known. However, when this is not the case, the medical doctor (MD) working at the PCC has to reach a diagnosis solely based on the reported symptoms, without ever seeing the patient face to face and give treatment recommendations accordingly. Especially for inexperienced MDs, this is a challenging task for several reasons. First, the symptom description may not match the symptoms described in the literature that is used to diagnose the patients. This is exacerbated by inter-individual, regional, and inter-institutional differences in the description of symptoms when reaching the doctor. Secondly, not all patients react to intoxication with the same symptoms and they may have further confounding symptoms not caused by the intoxication but due to other medical conditions. Thirdly, meta information like age, gender, weight, or geographic location, are not assessed in a structured way.
Current computer-aided diagnosis (CADx) systems in toxicology do not solve these problems. Most are rule-based expert systems [5, 1, 12] which are very sensitive towards input variations. Furthermore, they do not consider meta information or population context, despite their potential value in diagnosis. We propose a model that can solve both mentioned problems. By employing Graph Convolutional Networks (GCN) [6, 9], we incorporate the meta information and population context into the diagnosis process in a natural way using graph structures. Here, each patient corresponds to a node, and patients are connected according to the similarity of their meta-information [13]

. Connecting patients in this way leads to neighborhoods of similar patients. GCNs perform local filtering of graph-structured data analogous to Convolutional Neural Networks (CNN) on regular grids. This relatively novel concept

[2] already led to advancements in medicine, ranging from human action prediction [16] to drug discovery [14]. It has also been used successfully in personalized disease prediction [13, 3, 8]. Notably, attention mechanisms [4, 11] improve filtering by weighting similarity scores between nodes based on node features, which help to compensate for locally inaccurate graph structure. We base the proposed model on Graph Attention Networks (GAT) [15], one of the leading representatives of this GCN-class.
Contributions. Our approach for toxin prediction leverages structured incorporation of patient meta information to significantly boost performance. We further address the issue of mismatching symptom descriptions by augmenting the GCN with a parallel network layer which learns a conceptual mapping of patient symptoms to textbook symptoms described in literature. This network branch is designed to explicitly incorporate domain prior knowledge from medical literature, and produces an alternative prediction. This stabilizes the output of the model and ensures a reasonable prediction. In a set of experiments on real PCC data, we show that our model outperforms several standard approaches. Ultimately, we compare our model to patient diagnoses made by 10 MDs on a separate real-life test set. The favorable performance of our model demonstrates its high potential for decision support in toxicology.

2 Methodology

Figure 1:

Schematic architecture of ToxNet. The symptom vectors are processed in the graph-based GAT layers and the literature matching network in parallel.

General framework. The proposed network performs the classification of the intoxication of patients with 1D symptom vectors P using non-symptom meta information Q and literature symptom vectors H in an inductive graph approach. Therefore, it optimizes the objective function , where is a graph with vertices containing symptoms P and meta data Q. Binary edges E denote connections between the vertices and Y is a set of poison classes. The symptom vectors contain a binary entry for every considered symptom, 1 if the symptom is present, 0 if not. Therefore, every patient has an individual symptom vector with the occurring symptoms, and every poison has a vector of literature symptoms that should occur for this poison, leading to the symptom sets: , , where is the number of patients, is the number of poison classes, and are the dimensions of the patient and literature symptom vector, respectively. Within , every vector contains the patient’s meta information. For every vertex in the graph we concatenate the patient symptom vector with the meta data of Q and receive X with vectors of dimension . Additionally, the edges E are created based on the similarity of the meta information between two patients. The network processes the patient symptoms within three GAT layers and a learned explicit literature matching layer in parallel. The resulting representations are fused to predict the corresponding intoxication.
Symptom vectors. As described above, every symptom vector corresponds to a binary encoding of all symptoms present. The dimensions and of the vectors refer to the total number of individual symptoms and that are listed within all patient cases and poison descriptions respectively. Since real patient cases also show some symptoms that are not part of the literature, and . The first entries of every correspond to .
Neighborhood generation. The edges represent the neighborhood of every concatenated vector or vertex and define which vertices should be aggregated to update the current representation of within a GAT layer. The neighborhood of is defined as the set of all with . An edge is established when the meta information of and is consistent.
GAT layer. To update the representation of the vectors of X

, the GAT layer applies a shared learnable linear transformation

to all , resulting in a new representation with dimension . For every neighbor , an attention coefficient is calculated using the shared attention mechanism . The coefficient represents the importance of for the update of and is calculated as , where represents the concatenation of and , and denotes a single feed-forward layer. To normalize every attention coefficient and allow easy comparability between coefficients, after applying the leakyReLU activation , for every the softmax function is applied to all coefficients corresponding to .


To update , every feature representation is weighted with the corresponding and summed up to receive the new representation . The GAT network repeats this step multiple times with individually learned , so-called heads, to statistically stabilize the prediction and receive individual attention coefficients . The different representations are concatenated (represented as ) to yield the final new representation:


Here, is the number of used heads and is the attention coefficient of head for the vertices and [15].
Literature symptom matching. For every toxin class of all toxins , the literature provides a list of commonly occurring symptoms. These are encoded in the binary symptom vector for every poison. We design a specific symptom matching layer which learns a mapping of the patient symptom vectors P to the literature symptoms. This concept results in an interpretable transfer function which gives deeper insight into symptom correlations and explicitly incorporates the domain prior knowledge from literature. Due to the described setup of the symptom vectors, the first entries of every correspond to the literature symptoms. Since these should be preserved after the matching procedure, we initialize the first learnable parameters of with the unity matrix and freeze the diagonal during training. Like this, every symptom of is mapped to itself. The remaining symptoms only occurring for the patient cases are transformed into a representation of a dimension in agreement with the symptoms of the literature. As a second transformation, we create a literature layer whose th row is initialized with for all classes and that is kept constant during training. The resulting transformation therefore maps the patient symptoms onto the poison classes with the explicit usage of literature information.
Representation fusion. The output of the last GAT layer is processed by a FC layer to result in with dimension . The GAT and literature representations and are concatenated, activated and transferred through a last learnable linear transformation and a softmax function onto the class output .

3 Experiments and Discussion

3.1 Experimental setup

Dataset. The dataset consists of 8995 patients and was extracted from the PCC database from the years 2001-2019. All cases were mono-intoxications, meaning only one toxin was present and the toxin was known. We chose the following toxins: ACE inhibitors (n=119), acetaminophen (n=1376), antidepressants (selective serotonin re-uptake-inhibitors, n=1073), benzodiazepines (n=577), beta blockers (n=288), calcium channel antagonists (n=75), cocaine (n=256), ethanol (n=2890), NSAIDs (excluding acetaminophen, n=1462)) and opiates (n=879). The ten toxin groups were chosen either because they are part of the most frequently occurring intoxications and lead to a different treatment and intervention or because they have clinically distinct features, lead to severe intoxications, have a specific treatment, and should not be missed. Accordingly, the different classes are unbalanced in their occurrence since e.g. intoxication due to alcohol is a very common phenomenon. Together with the patient symptoms, additional meta information for every case is given. From the full set of available information, we use the parameters age group (child, adult, elder), gender, aetiology, point of entry and week day and year of intoxication to set up the graph structure, since these resulted in best performance.
Graph setup. Our graph is based on the described meta information for every patient. An edge between patient and is established when the meta data is consistent for the medically relevant selection of parameters, i.e., the above-mentioned meta parameters. This results in a sparse graph that at the same time has more meaningful edges (the graph increases the likelihood of patients with same poisonings to become connected).
Network setup.Hyperparameters: optimizer: Adam, learning rate: 0.001, weight decay: 5e-4, loss function: cross entropy, dropout: 0.0, activation: ELU, heads: 5.
Model evaluation.

First, we evaluate our network against different benchmark approaches. Then we compare the different network components within an ablation study. By disabling different parts of the network, each individual contribution is evaluated. Here, ’GAT’ refers to a setup where the GAT pipeline of ToxNet is used alone, ’LitMatch’ to a setup where the parallel literature-matching branch of ToxNet is used alone. Additionally, we test a sequential setting, where the literature matching is performed prior, and the learned features are transferred to the GAT (ToxNet(S)). All experiments use a 10-fold cross-validation. After proving the superiority of our method, we compare our network against the performance of 10 MDs, who are classifying the same unseen subset of the full test data as our method. This subset is divided into 25 individual cases for every MD, and 25 additional cases identical for all MDs, i.e.,

cases. In this setup, we are able to perform a statistical performance analysis on a larger set of cases, but also evaluate the inter-variability of the medical experts to distinguish between easy and difficult cases.

3.2 Experimental results

Method F1 Sc. micro F1 Sc. macro p-val micro p-val macro
Naive Matching 0.201 0.012 0.127 0.007
Decision Tree 0.246 0.016 0.227 0.016
LitMatch 0.474 0.005 0.342 0.023
MLP with meta 0.544 0.015 0.429 0.019
GAT 0.629 0.010 0.458 0.021
ToxNet(S) 0.637 0.013 0.478 0.023
ToxNet 0.661 0.010 0.529 0.036 / /
Table 1: Performance comparison of different methods for poison prediction. Methods are described in detail in Sec. 3 (p-value: 0.01 , 0.005 ).

Performance comparison against other methods. In Tab. 1

, we compare the F1 micro and macro scores of different benchmark approaches against our method ToxNet. The Naive Matching provides a lower baseline by simply voting for the poison which has the most overlap between literature and patient symptoms. The decision tree was trained based on the literature symptoms and then used on the patient symptoms. Both models perform poorly, which leads to the conclusion that the available literature alone is not a good guide for poison classification. With the LitMatch neural network branch from our approach, we maintain the possibility to incorporate literature knowledge explicitly, but receive significantly better results. In the next step, a Multi-Layer Perceptron (MLP) with 3 hidden layers and

, and hidden units respectively (same as GAT) was trained on the patient data to perform the prediction. In order to allow for a fair comparison, the patient’s meta information was concatenated to their symptom vector, thus resulting in both the MLP and GAT using the same information. By comparing the MLP to a standard GAT network, it is observable that the usage of the meta information inside our graph method significantly boosts the classification performance, showing the value that the graph structure adds to the evaluation. Adding the literature information into the method by applying our proposed method ToxNet increases this performance even further. It needs to be stated that this enhancement is reached, although the literature data alone was shown not to be very informative for the task at hand. We therefore assume that there is a synergy effect, and an improvement of the literature might lead to an even stronger boost. To identify the individual contributions, both pipelines within ToxNet (GAT and LitMatch) are also evaluated separately as described above. Within our experiment, we found that the parallel setting of ToxNet is slightly superior to a sequential setting (ToxNet(S)). The results described above are also illustrated in the boxplot in Fig. 2 (left).

Figure 2: Left: Comparison of ToxNet and different benchmark methods over 10-fold cross validation. Right: Comparison of ToxNet and benchmark methods with MDs’ performance over 10 different sets evaluated by one MD each.

Performance comparison against medical experts. In order to evaluate the performance of our method against medical experts, we conducted a survey with 10 medical doctors (MDs) from the toxicology department of the Klinikum rechts der Isar in Munich, where each MD had to classify 50 intoxication cases that were split up as described above. Fig. 2 (right) shows a box plot of the performance of the 10 MDs compared with different benchmark methods as well as our method ToxNet on the ten individual sets of 25 cases each, so 250 in total. All three graph-based approaches clearly outperform the MDs due to the optimized usage of meta information. For this small subset of the full test set, the performance boost of ToxNet compared to GAT is not as severe as for the full test set. However, the overall performance is more stable (smaller margins). In Fig. 3, we performed a detailed inter-variability study on the 25 cases evaluated by all doctors. Except for one case, every intoxication case correctly classified by the majority of MDs, our method accomplished as well. Furthermore, for eight cases, where only half of the MDs or less correctly predicted the intoxication, our method still succeeded. These results demonstrate that our proposed ToxNet architecture can predict simple cases reliably and at an expert-level performance, while additionally providing a high prediction stability on cases that are challenging to a majority of doctors. Even compared to the two best MDs, who correctly classified 12 cases, our method overall resulted in 15 correct poison predictions. Six cases were wrongly classified by all doctors and our method. These are data samples with insufficient documentation quality (e.g. only a single generic symptom) which indicate intrinsic challenges from medical data in the wild.

Figure 3: Clinician inter-variability and comparison with ToxNet. Poison classes are ordered alphabetically, each group separated with a white spacing.

4 Conclusion

In this work, we proposed ToxNet, a new architecture for improved intoxication prediction. The network effectively incorporates patient symptoms, meta-information like age group or residence, and domain prior knowledge from literature. We showed that the usage of meta-data within the graph structure of a graph convolutional network inside ToxNet leads to a significantly higher classification performance than all other methods investigated. In our benchmark study, we explicitly showed that a simple concatenation of the meta-data to the patient symptom vector is not sufficient – the improvement can be attributed to the patient graph. Additionally, we introduced a symptom matching method that allows the explicit usage of literature knowledge and included it into a parallel learning approach which further improved the overall network performance. Although we found that the literature information by itself was not informative enough for a satisfactory classification, we showed that a parallel integration with our graph network still led to synergy effects and an improved classification. We evaluated our network against 10 MDs with different experience levels and found a more stable prediction on both simple and highly challenging intoxication cases, given the high inter-rater variability among experts. We thus demonstrated the potential of ToxNet as a clinical decision support in this highly critical domain of medical intervention. On a wider scale, we believe that our architecture and validation provide a valuable case study: medical expertise can be regionally flavored and affect symptoms in a way that is not necessarily covered by expert literature. A proper modeling of these effects, fused with recent advances in graph-based population models, can lead to significant improvements in the field of computer-aided diagnosis and support clinical practice.

The study was supported by the Carl Zeiss Meditec AG in Oberkochen, Germany, and the German Federal Ministry of Education and Research (BMBF) in connection with the foundation of the German Center for Vertigo and Balance Disorders (DSGZ) (grant number 01 EO 0901).


  • [1] Batista-Navarro, R.T.B., Bandojo, D.A., Gatapia, M.A.K., Santos, R.N.C., Marcelo, A.B., Panganiban, L.C.R., Naval, P.C.: ESP: An expert system for poisoning diagnosis and management. Informatics for Health and Social Care 35(2), 53–63 (2010)
  • [2]

    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine

    34(4), 18–42 (2016)
  • [3] Burwinkel, H., Kazi, A., Vivar, G., Albarqouni, S., Zahnd, G., Navab, N., Ahmadi, S.A.: Adaptive image-feature learning for disease classification using inductive graph networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 640–648. Springer (2019)
  • [4]

    Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 551–561. Association for Computational Linguistics, Stroudsburg, PA, USA (2016)

  • [5] Darmoni, S.J., Massari, P., Droy, J.M., Mahe, N., Blanc, T., Moirot, E., Leroy, J.: SETH: an expert system for the management on acute drug poisoning in adults. Computer Methods and Programs in Biomedicine 43(3-4), 171–176 (1994)
  • [6] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems pp. 3844–3852 (2016)
  • [7] Degenhardt, L., Charlson, F., Ferrari, A., Santomauro, D., Erskine, H., Mantilla-Herrara, A., Whiteford, H., Leung, J., Naghavi, M., Griswold, M., Rehm, J., Hall, W., Sartorius, B., Scott, J., Vollset, S.E., Knudsen, A.K., Haro, J.M., Patton, G., Kopec, J., Carvalho Malta, D., Topor-Madry, R., McGrath, J., Haagsma, J., Allebeck, P., Phillips, M., Salomon, J., Hay, S., Foreman, K., Lim, S., Mokdad, A., Smith, M., Gakidou, E., Murray, C., Vos, T.: The global burden of disease attributable to alcohol and drug use in 195 countries and territories, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. The Lancet Psychiatry 5(12), 987–1012 (2018)
  • [8]

    Kazi, A., Shekarforoush, S., Arvind Krishna, S., Burwinkel, H., Vivar, G., Wiestler, B., Kortüm, K., Ahmadi, S.A., Albarqouni, S., Navab, N.: Graph convolution based attention model for personalized disease prediction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 122–130. Springer (2019)

  • [9] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings. International Conference on Learning Representations, ICLR (2016)
  • [10] Kulling, P., Persson, H.: Role of the intensive care unit in the management of the poisoned patient. Medical toxicology 1(5), 375–86 (1986)
  • [11] Lin, Z., Feng, M., dos Santos, C.N., Yu, M., Xiang, B., Zhou, B., Bengio, Y.: A structured self-attentive sentence embedding. In: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings. International Conference on Learning Representations, ICLR (2017)
  • [12] Long, J.B., Zhang, Y., Brusic, V., Chitkushev, L., Zhang, G.: Antidote Application. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics. pp. 442–448. ACM Press (2017)
  • [13] Parisot, S., Ktena, S.I., Ferrante, E., Lee, M., Moreno, R.G., Glocker, B., Rueckert, D.: Spectral graph convolutions for population-based disease prediction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 177–185. Springer Verlag (2017)
  • [14] Stokes, J.M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N.M., MacNair, C.R., French, S., Carfrae, L.A., Bloom-Ackerman, Z., Tran, V.M., Chiappino-Pepe, A., Badran, A.H., Andrews, I.W., Chory, E.J., Church, G.M., Brown, E.D., Jaakkola, T.S., Barzilay, R., Collins, J.J.: A deep learning approach to antibiotic discovery. Cell (2020)
  • [15] Veličković, P., Casanova, A., Liò, P., Cucurull, G., Romero, A., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR (2018)
  • [16]

    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI Conference on Artificial Intelligence. pp. 7444–7452. AAAI press (2018)