In recent years, advanced natural language processing (NLP) techniques have been applied to electronic health record (EHR) documents to extract useful information. Accessibility to large scale EHR data is very crucial to using such deep learning methods - yet data scarcity persists for most tasks in the healthcare domain.
Assertion detection involves classifying clinical text obtained from the EHR and other hospital information systems (e.g. Radiology Information System/RIS), to determine if a medical concept (entity) ispresent, absent, conditional, hypothetical, possibility or AWSE (associated with someone else). These classes were used in Chen (2019). A few examples of each class from our dataset are shown in Table. 1.
Past works with the i2b2 dataset mostly focused on the present and absent classes with comparatively less work on the more ambiguous classes. Majority of the existing methods either classify the given text only, or use the class further to detect it’s scope in a two stage process. This works well for datasets like i2b2 (Uzuner et al., 2011) in which there exists only one label per example. However, single label per sentence is not a common phenomenon in clinical reports, especially when patients have frequent physician visits or long periods of hospitalization. To address the aforementioned problem, our work highlights the following contributions:
We explored assertion detection in multi-label sentences from radiology (cardiac computerized tomography (CT)) reports.
We cast the assertion detection task as a scope localization problem, thereby solving classification and scope detection in a single stage end-to-end fashion.
2 Related Work
Rule based models like NegExChapman et al. (2011), NegBio Peng et al. (2017) and Gkotsis et al. (2016) were initially used for assertion and negation detection. These approaches typically implement rules and regular expressions to detect cues for classification. NegBio Peng et al. (2017) uses a universal dependency graph to detect the scope of identified class. A constituency parsed tree is used by Gkotsis et al. (2016) to prune out words outside the scope of the detected class. NegEx Chapman et al. (2011) later demonstrated good performance when adapted to many other languages like German, Spanish and French Cotik et al. (2015); Stricker et al. (2015); Costumero et al. (2014); Afzal et al. (2014)
. A few approaches developed syntactic techniques by augmenting dependency parsed trees to rule based systemsMehrabi et al. (2015); Sohn et al. (2012); Cotik et al. (2016). Mackinlay et al. (2012) constructed hand-engineered features using English Resource Grammar to identify negation and hypothetical classes for a BIONLP 2009 task.
|Present||Metoprolol 50 mg po was administered prior to the scan to decrease heart rate|
|Absent||No Chest pain, No Coronary artery Aneurysm, No Aneurysm or wall thickening|
|Conditional||Myocardial perfusion imaging, if not already performed, might improve specificity in this regard if clinically warranted|
Coronary plaque burden and focal Lesion
characterization (if present) were assessed by visual estimate.
|Possibility||This was incompletely imaged but suggests a diaphragmatic arteriovenous malformation|
|AWSE||High risk is or = 10 packs/year or positive family history of lung cancer in first degree relative|
The annotated entities and assertion /labels in the 2010 i2b2/VA challenge (Uzuner et al., 2011) can be regarded as a benchmark for the assertion detection task for clinical text. Kernel methods using SVM (de Bruijn et al., 2011) and Bag-of-Words (Shivade et al., 2015) were proposed for the shared task. Cheng et al. (2017) used a CRF for classification of cues and scope detection. Though these methods have performed better than rule based methods, they fail to generalize well to unseen examples while training.
More recently, with the advent of deep learning achieving state-of-the-art performance in various NLP tasks, an LSTM encoder-decoder architecture (Sutskever et al., 2014) (Hochreiter and Schmidhuber, 1997) can be trained for assertion detection with reasonable success. Attention based models using LSTMs (Fancellu et al., 2016) and GRUs (Rumeng et al., 2017) were explored. Limited amounts of labeled (and unlabeled) clinical text make training deep neural networks a challenging task. Bhatia et al. (2018)
explored a multi-task learning setting by combining a Named Entity Recognition (NER) classification branch to the assertion detection output. All of these methods either identify only the class or use it as a cue to prune the scope of the class from the text. As mentioned above, our work proposes an end-to-end single stage approach to assertion and negation scope detection. A schematic of our approach is shown in Fig.1.
3 Proposed Model
We formulated the assertion and negation problem as follows: Let be a sentence in clinical report consisting of words . We need to identify the assertion classes and corresponding scope in the report defined by the set where, class scopes between and . We put forward this problem as finding bounding boxes over the text that scope a particular class. If is the maximum scope of a class present in the input, we can place prior boxes of lengths at each word
and predict the probability of a particular box containing a class.
3.1 Intersection Over Union
Let be two bounding boxes over text scopes where, is a set of words. We then define the IoU (Intersection over Union) of these two bounding boxes as follows:
Where is the cardinal of a set . A few examples if IoUs are shown in Fig.2.
3.2 Network Design
First, we embed the input sequence in a distributional word vector space aswhere, is a column vector in an embedding matrix . This is the input to our CNN. Each layer in the CNN is a 1D-convolutional layer followed by a non-linearity. Stacking many layers on top of the other increases the receptive field of the network. To cover the largest prior box of length , we need the receptive field of the last layer to be at least .
Our architecture is shown in Fig.3. First we use 6 layers of convolutions followed by 6 layers of
throughout the network. We also use ReLU non-lineartiy after each convolutional layer. The output of the last convolutional layer is then passed through 2 branches of fully-connected layers to produce box confidence scoresand class confidence probabilities respectively. Where, is the number of prior boxes and is the number of classes. It is important to note the receptive field of the last layer is 24.
3.3 Objective Function
Box Confidence Loss: We expect the box confidence branch to predict the IoU of each prior box with the nearest ground truth box. The simplest way to do this is by minimizing the Mean Square Error (MSE) between predicted and ground-truth IoU.
Non-max Suppression Once we have the box confidence scores of prior boxes, we sort them in the decreasing order of their confidence scores and discard the ones lower than a confidence threshold . In the remaining overlapping boxes, we vote for the prior box with high confidence score. The detailed algorithm is shown in Algorithm 1.
Class Confidence Loss: The class confidence branch is expected to predict , the probability of a class given that a prior box has an assertion scope. We first apply softmax on the class confidence score and use cross-entropy loss to maximize the probability of the ground-truth class. Given the class imbalance in the dataset we used, a weighted loss per class was implemented.
Where, is an indicator variable denoting the presence of a class in prior box- and is the weight of class- which is equal to the fraction of examples in a batch that belong to class-.
We optimize the cummulative loss using Adam optimizer.
4 Datasets and Experiments
We evaluated our model on datasets from two hospital sites (Dataset-I and Dataset-II); both have reports with multi-label sentences. First we will elaborate on the data collection and annotation process. Next, we will present some statistics on the datasets and, finally, highlight the performance of our model. Dataset-I and II comprise 151 and 460 cardiac CT reports respectively. All reports were anonymized at the hospital site before we accessed the data. The datasets were annotated by 8 engineers with an average of 217 hours of training in labeling healthcare data.
The annotations were done using BRAT tool (Stenetorp et al., 2012). Rules for annotation were generated after consulting with the Radiologist supervising the annotators. Other Radiologists were consulted to annotate any mentions that were previously unseen or ambiguous and also for the final review. Statistics of the data such as No. of classes per report, No. of tokens in a report and length of class scopes are shown in Tables.5-5.
|Baseline||Scope Localization model|
4.1 Baseline Model
(Bhatia et al., 2018; Chen, 2019; Rumeng et al., 2017). Chen (2019) used a bidirectional attentive encoder on the sentence input to obtain a context vector which is subsequently passed to the softmax and output classification layers. Bhatia et al. (2018) extended this network by adding a shared decoder to predict both assertion class and named entity tag in a multi-task learning framework. However, the input to these seq2seq models is a sentence and the output prediction is a single class. Therefore, the models may not be easily extended to a multi-label dataset without compromising performance. To validate our assumption, we extend the bidirectional encoder and attentive decoder model based on LSTM to our multi-label data by changing the input format. In other words, instead of predicting one class for the entire input sequence, we predict a class for each token so that the scope of a class can also be localized. Two sample sentences (with class labels) are shown in Table.6.
|Report-1||MetoprololP 50P mgP poN wasN administeredN priorN toN theN scanN toN decreaseC heartC rateC|
|Report-2||MyocardialH perfusionH imagingH ,N ifN notN alreadyN performedN ,N mightH improveH specificityH inN thisN regardN ifN clinicallyN warrantedN .N|
4.2 Training and Hyperparameters
Since the datasets have unbalanced classes, we have used stratified sampling Sechidis et al. (2011); Szymański and Kajdanowicz (2017) to represent the classes in the same ratio in train, validation and test sets. To further mitigate the effect of unbalanced classes in each batch of training data, we weighted the cross entropy loss with the inverse of the number of examples for each class. The pre-trained BioWord2Vec (Zhang et al., 2019)
is used in the embedding layer with frozen weights. We used Adam Optimizer with the default learning rate of 0.001 for 400 epochs. Shuffling after each epoch results in different distribution of classes per batch of iteration. This leads to unstable training and therefore takes more epochs for convergence. We have set the number of prior boxes to be 24, little more than the maximum length of a class scope in the training set. Fig-4 shows the performance of the model on validation set with different values of IoU threshold (), the maximum being . Experiments with more layers and higher kernel sizes didn’t improve the performance. This is because the receptive field has to be large enough to span the longest scope in the input i.e 20.
Table.5 shows the performance of the baseline and our CNN-based scope localization models on Datasets-I,II per each class. For a fair comparison with the baseline, the box predictions from our model are converted to a sequence of labels per token. On first impressions, the performance seem to be affected by the quantity of data available for training with the best performance on present class and least performance on AWSE class. After further analysis, it appears that the scope lengths found in the training set is also a crucial factor. Fig.6 shows a histogram of scope lengths available in the training set for each class. The performance on the test set for different scope lengths is shown in Fig.5. As shown, model performance for the present class declines with scope lengths 7, 10, and 20, which reflect sparsity of this class for these scopes in the training set. In contrast, the model performs well on the hypothetical class with scope length 7, reflective of the better distribution of this class for this scope relative to other scopes.
In this work, we have explored a novel approach of scope localization and classification with a single end-to-end CNN model. We demonstrated good performance and thereby make a case for using multi-label clinical text that is often found in real world. For future work, we would like to explore the usage of inception layers; different sets of kernel sizes in each layer. The output layer will then have varying receptive fields i.e scope lengths in our problem. This increases the generalization of the model to scope lengths that are unseen in the training data.
- ContextD: an algorithm to identify contextual properties of medical terms in a dutch clinical corpus. In BMC Bioinformatics, Cited by: §2.
- End-to-end joint entity extraction and negation detection for clinical text. CoRR abs/1812.05270. External Links: Cited by: §2, §4.1.
- A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of the American Medical Informatics Association 18 (5), pp. 552 – 556. Cited by: §2.
Attention-based deep learning system for negation and assertion detection in clinical notes.
International Journal of Artificial Intelligence and Applications10, pp. 1–9. External Links: Cited by: §1, §4.1.
- Automatic negation and speculation detection in veterinary clinical text. In Proceedings of the Australasian Language Technology Association Workshop 2017, Brisbane, Australia, pp. 70–78. External Links: Cited by: §2.
- An approach to detect negation on medical documents in spanish. In Brain Informatics and Health, pp. 366–375. Cited by: §2.
- An approach for automatic classification of radiology reports in spanish. In In Proceedings of 15th MEDINFO, pp. 634–638. Cited by: §2.
- Syntactic methods for negation detection in radiology reports in Spanish. In Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany, pp. 156–165. External Links: Cited by: §2.
- Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. In JAMIA, Cited by: §2.
- Neural networks for negation scope detection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 495–504. External Links: Cited by: §2.
- Don’t let notes be misunderstood: a negation detection method for assessing risk of suicide in mental health records. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 95–105. External Links: Cited by: §2.
- Long short-term memory. Neural Comput. 9 (8), pp. 1735–1780. External Links: Cited by: §2.
- Detecting modification of biomedical events using a deep parsing approach. BMC medical informatics and decision making 12 Suppl 1, pp. S4. External Links: Cited by: §2.
- DEEPEN: a negation detection system for clinical text incorporating dependency relation into negex. Journal of biomedical informatics 54, pp. 213–9. Cited by: §2.
- NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. CoRR abs/1712.05898. External Links: Cited by: §2.
- You only look once: unified, real-time object detection. CoRR abs/1506.02640. External Links: Cited by: 3rd item.
- A hybrid neural network model for joint prediction of presence and period assertions of medical events in clinical notes.. AMIA … Annual Symposium proceedings. AMIA Symposium 2017, pp. 1149–1158. Cited by: §2, §4.1.
- On the stratification of multi-label data. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, ECML PKDD’11, Berlin, Heidelberg, pp. 145–158. External Links: Cited by: §4.2.
- Extending NegEx with kernel methods for negation detection in clinical text. In Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015), Denver, Colorado, pp. 41–46. External Links: Cited by: §2.
- Dependency parser-based negation detection in clinical narratives. In AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, Cited by: §2.
- BRAT: a web-based tool for nlp-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’12, Stroudsburg, PA, USA, pp. 102–107. External Links: Cited by: §4.
- Negated findings detection in radiology reports in spanish: an adaptation of negex to spanish. In In IJCAI - Workshop on Replicability and Reproducibility in Natural Language Processing: adaptative methods, resources and software, Buenos Aires, Argentina, pp. . Cited by: §2.
- Sequence to sequence learning with neural networks. CoRR abs/1409.3215. External Links: Cited by: §2.
- A scikit-based Python environment for performing multi-label classification. ArXiv e-prints. External Links: Cited by: §4.2.
- 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18 (5), pp. 552 – 556. Cited by: §1, §2.
- BioWordVec, improving biomedical word embeddings with subword information and mesh. In Scientific Data, Cited by: §4.2.