BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger

11/03/2020 ∙ by Harsh Patel, et al. ∙ 0

Motivation: The proliferation of Biomedical research articles has made the task of information retrieval more important than ever. Scientists and Researchers are having difficulty in finding articles that contain information relevant to them. Proper extraction of biomedical entities like Disease, Drug/chem, Species, Gene/protein, can considerably improve the filtering of articles resulting in better extraction of relevant information. Performance on BioNer benchmarks has progressively improved because of progression in transformers-based models like BERT, XLNet, OpenAI, GPT2, etc. These models give excellent results; however, they are computationally expensive and we can achieve better scores for domain-specific tasks using other contextual string-based models and LSTM-CRF based sequence tagger. Results: We introduce BioNerFlair, a method to train models for biomedical named entity recognition using Flair plus GloVe embeddings and Bidirectional LSTM-CRF based sequence tagger. With almost the same generic architecture widely used for named entity recognition, BioNerFlair outperforms previous state-of-the-art models. I performed experiments on 8 benchmarks datasets for biomedical named entity recognition. Compared to current state-of-the-art models, BioNerFlair achieves the best F1-score of 90.17 beyond 84.72 on the BioCreative II gene mention (BC2GM) corpus, best F1-score of 94.03 beyond 92.36 on the BioCreative IV chemical and drug (BC4CHEMD) corpus, best F1-score of 88.73 beyond 78.58 on the JNLPBA corpus, best F1-score of 91.1 beyond 89.71 on the NCBI disease corpus, best F1-score of 85.48 beyond 78.98 on the Species-800 corpus, while near best results was observed on BC5CDR-chem, BC3CDR-disease, and LINNAEUS corpus.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is a sharp increase in the number of research papers in the biomedical domain since the pandemic arrived. Scientists around the world are conducting experiments and clinical trials to learn more about the effects of this pandemic on global health and the economy. Because of this, Journals around the world are flooded with biomedical literature and it’s getting difficult to find articles that are relevant, robust, and credible. According to different re-ports, over 100,000 papers are already being published for COVID-19 alone. PubMed alone comprises over 30 million citations for biomedical literature. As reports on information about discoveries and insights are added to the already overwhelming amount of literature, the need for advanced computational tools for text mining and information extraction is more important than ever.

Recent progress of deep learning techniques in natural language processing (NLP) has led to significant advancements on a wide range of tasks and applications. The domain of biomedical text mining has likewise seen an improvement. The performance in biomedical named entity recognition which automatically extracts entities such as disease, gene/protein, chemicals, species has substantially improved

Hong and Lee (2020); Lee et al. (2020)

. We can use BioNer for building biomedical knowledge graph. Other NLP domains like entity relation, question answering (QA), depend upon this graph. Thus, improved performance of BioNer can lead to better performance of other complex NLP tasks. Named Entities in biomedical literature have several characteristics that make their extraction from text particularly challenging

Zhou et al. (2004), including the descriptive naming convention (e.g. ‘normal thy-mic epithelial cells’), abbreviations (e.g. ‘IL2’ for ‘Inter-leukin 2’), non-standardized naming convention (e.g. ‘Nace-tylcysteine’, ‘N-acetyl-cysteine’, ‘NAcetylCysteine’, etc.), c-onjunction and disjunction (e.g. ‘91 and 84 kDa proteins’ comprises two entities ‘91 kDa proteins’ and ‘84 kDa proteins’). Traditionally, NER models for biomedical literature perform-ed efficaciously using feature engineering, i.e. carefully selecting features from the text. These features can be linguistic, orthographic, morphological, contextual Campos et al. (2012). Selecting right features that properly represent target entities requires expert knowledge, lots of trial-error experiments, and is often time consuming whose solution leads to highly specialized models that only works for specialized domains.

Models based on convolutional neural networks was proposed to tackle sequence tagging problems

Collobert et al. (2011)

. This kind of neural network architecture and learning algorithms reduced the need for domain-specific feature engineering. However, these types of networks could not connect with previous information that could improve performance for Named Entity Recognition. RNN’s could capture earlier information through back propagation, but they suffer from the vanishing gradients, exploding gradient problems, and don’t handle long-term dependencies well. The gradients carry information for parameter updates. The text data sequences for NER are generally long. For longer sequences, gradients become vanishingly smaller, resulting in no updates of weights

Bengio et al. (1994)

. These problems are addressed by a special RNN architecture - Long Short-Term Memory (LSTM), capable of handling long-term dependencies

Hochreiter and Schmidhuber (1997).

The neural architecture - BiLSTM-CRFs produces state-of-the-art performance for NER tasks. This architecture comprises two components: BiLSTM that predict the label by capturing information from the text in both directions and CRF that compute transition compatibility between all possible pairs of labels on neighboring tokens. We now consider this neural architecture standard for sequence labeling problems Lample et al. (2016)

. This kind of architecture generally uses vector representation of words (word embeddings) as input to LSTMs. Word2Vec

Mikolov et al. (2013), GloVe Pennington et al. (2014) are some popular context-independent vector representations of words. Many times, character level features of the text are incorporated into word embeddings layer to improve the performance of NER models Kim et al. (2015).

The use of BiLSTM-CRFs along with certain word embeddings led to significant improvement in the performance of NER models. Researchers starting experimenting with this architecture for Biomedical named entity recognition. Some models used character level embedding along with word embedding pre-trained on a large entity independent corpus (Pub-Med abstracts). These models outperformed earlier state-of-the-art models for BioNER Habibi et al. (2017); Luo et al. (2018); Verwimp et al. (2017). All the word embeddings used until now were context independ-ent. They cannot address the polysemous and context dependent nature of words. The introduction of contextualized string embeddings such as flair embeddings Akbik et al. (2018), ELMo Peters et al. (2018) solved this problem. These context-dependent word embeddings when used with BiLSTM-CRFs outperformed all previous models in named entity recognition. Also, transformers based Vaswani et al. (2017) language representation models like BERT Devlin et al. (2018) came that achieved state-of-the-art performance in NER. However, applying these NLP methodologies on biomedical literature has limitations because of the different word distribution of general and biomedical corpora. Since recent language representation models are mostly trained in general domain text, they often face problems on biomedical corpora. Most recent state-of-the-art solutions have shown that using a language representation model pre-trained on biomedical corpora (like PubMed abstracts and PMC full-text articles) gives the best results for Biomedical Named Entity Recognition Hong and Lee (2020); Lee et al. (2020).

This paper represents BioNerFlair, a novel architecture for biomedical named entity recognition. BioNerFlair uses contextualized string embeddings Flair (pre-trained on bio-medical domain) along with GloVe embeddings at the token embeddings layer, then a sequence tagger based on BiLSTM-CRFs is used to extract named entities from biomedical literature. I evaluate the performance of BioNerFlair on 8 benchmarks datasets. BioNerFlair outperforms earlier state-of-the-art models on 5 datasets while shows near similar performance of previous models on other 3 datasets.

2 Materials and methods

The following sections present a description of the corpora used for evaluation. Furthermore, a technical description of the architecture used along with details of evaluation metrics is given.

2.1 Datasets

The statistics of biomedical named entity recognition datasets are listed in Table 1. BioNerFlair performance is evaluated on eight standard corpora of disease, gene/protein, dru-g/chemical, and species for biomedical Ner: The NCBI Doğan et al. (2014) and BC5CDR Li et al. (2016) corpus for disease, BC5CDR Li et al. (2016) and BC4CHEMD Krallinger et al. (2015) corpus for drug/chemical, BC2GM Smith et al. (2008) and JNLPBA Kim et al. (2004) corpus for gene/protein, LINNAEUS Gerner et al. (2010) and Species-800 Pafilis et al. (2013) corpus for species. These datasets are widely used by Biomedical NLP researchers for testing Bio-Ner models. All the datasets are tagged with the IOB tagging scheme. For proper evaluation with other state-of-the-art techniques, the same data split for training, validation, and testing from earlier works Lee et al. (2020); Wang et al. (2019) is adopted.

Datasets Entity type Number of annotations
NCBI Disease Disease 6881
BC5CDR Disease 12694
BC5CDR Drug/Chem. 15411
BC4CHEMD Drug/Chem. 79824
BC2GM Gene/Protein 20703
JNLPBA Gene/Protein 35460
LINNAEUS Species 4077
Species-800 Species 3708

Note: The number of annotations from Habibi et al. (2017), Zhu et al. (2018), and Lee et al. (2020) is provided.

Table 1: Statistics of the biomedical named entity recognition datasets

2.2 Model architecture

BioNerFlair comprises of three layers, namely token embedding layer giving contextualized vector representation of input sequence, passed into vanilla BiLSTM-CRF sequence labeler as depicted in Figure 2, giving state-of-the-art results on BioNer tasks.

2.2.1 Token embedding layer

The token embeddings layer takes as input a sequence of tokens , and outputs a fixed-dimensional vector representation of each token . The output here is the concatenation (Equation 1) of pre-computed GloVe embeddings Pennington et al. (2014) and contextualized flair embeddings Akbik et al. (2018) pre-trained on on roughly 3 million full texts and about 25 million abstracts from the PubMed. Analysis by Akbik et al. (2018), shows that combining flair embeddings with classic world embeddings improves the performance of NER models. In BioNerFlair, GloVe embedding is combined with flair embedding.


Flair embedding is a contextualized character level word embedding that combines the best attributes of different kinds of embeddings. As shown in recent studies Lee et al. (2020); Dang et al. (2018), that pre-training models on biomedical corpora significantly improves the performance of BioNer models, this study uses a flair embedding model pre-trained on biomedical data and it seems to capture latent syntactic and semantic similarities. Flair embeddings produce vector representation from hidden states that computes not only on the characters of the word but also the characters of the surrounding context like illustrated in Figure 1. Since flair embedding is pre-trained on biomedical corpora and extracts context based on linguistic features at the character level, it handles rare, misspelled, different naming conventions of the words, frequently occurring in biomedical literature very well.

Figure 1: Extraction of flair embeddings in sentential context. It passes the words as a sequence of characters. Output of hidden states are concatenated to form final embedding.

2.2.2 Bidirectional Long Short-Term Memory (BiLSTM)

A Long Short Term Memory network (LSTM), is a special kind of RNN introduced by Hochreiter and Schmidhuber (1997), explicitly designed to avoid long-term dependency problem. LSTMs does not suffer from vanishing and exploding gradient problems. Unlike RNN, LSTMs can therefore remember information for long periods of time. LSTMs are equipped with memory cells along with an adaptive gating mechanism that regulates the information added or removed from the memory cells. There are three layers in a typical LSTM. A sigmoid layer that decides what information to remove (forget gate), a concatenation of sigmoid and tanh layer that decides what new information to add (input gate), another sigmoid layer that decides the output (output gate). LSTM memory cell is implemented using equations as follows:


In the above Equations,

denotes logistic sigmoid function, and i, f, O, and C are the input gate, forget gate, output gate and cell vectors. In BioNerFlair, the final word embeddings are passed into a BiLSTM network as is seems to capture past features and future features efficiently for a specific time frame

Lample et al. (2016); Graves et al. (2013); Huang et al. (2015). The bidirectional LSTM network is trained using back-propagation through time Boden (2002).

2.2.3 Conditional Random Fields

Conditional Random Fields (CRFs) Lafferty et al. (2001) is a probabilistic discriminative sequence modeling framework that brings in all the advantages of MEMMs models Ratnaparkhi (1996); McCallum et al. (2000) while also solving the label bias problem.

Given a training dataset of data sequences to be labeled and their corresponding label sequences

, CRFs maximize the log-likelihood of conditional probability of label sequences given their data sequences, that is:

Figure 2:

Architecture of BioNerFlair. Flair embedding and GloVe embedding vector representation for a word is computed and concatenated at word embeddings layer. The result is processed by BiLSTM layer and then by CRF layer. The output is the most probable tag sequence, as estimated by CRF.

2.3 Evaluation metrics

The performance of BioNerFlair is evaluated by training models for each dataset. I used pre-processes versions of BioNer datasets provided by Lee et al. (2020)

. Also, the same data split is used for training and testing the models. Models are evaluated using precision (P), recall (R), and F1 score metrics on the test corpora. A predicted entity is considered correct if and only if both the entity type and boundary exactly match with annotations in test data. Precision and recall are computed using true positives (TP), false positives (FP), and false negatives (FN). All calculations are done using flair NLP library.


3 Results and discussion

3.1 Experimental setups

All the models are trained using Flair NLP library, a simple frame-work for state-of-the-art NLP tasks built directly upon PyTorch. I used GPU (12 GB) provided for free by Google Colab to train models. The maximum sequence length was set to 512 to get the best training speed without running out of GPU memory while the mini-batch size for all experiments was set to 32.

Model training is started using an initial learning rate of 0.1, patience of 3, and annealing factor of 0.5. A high learning rate of 0.1 works well at starting when using Stochastic Gradient Descent optimizer and is gradually reduced as the model converges. Flair embeddings dropout is set to 0.5. These hyper-parameters are same for all the models. Because of the smaller size of training data and fast GPU, training time of most of the models was less than an hour. However, for the BC4CHEMD dataset, the model could not fit into GPU memory because of which training time increased to around 5 hours.

Flair NLP library also comes with Hunflair Weber et al. (2020), a NER tagger for biomedical text. HunFlair comes with models for genes/proteins, chemicals, diseases, species and cell lines. HunFlair models are trained with multiple datasets at same time due to which it outperforms tools like SciSpacy Neumann et al. (2019) for unseen text but does not give state-of-the-art results on gold standard datasets. In BioNerFlair, I trained models from scratch for each dataset giving results mentioned above. For experiments, I tried to fine tune HunFlair models on target corpus but the model doesn’t fit within 12GB of GPU memory.

width=1 Type Dataset Metrics SOTA DTranNER BERT BioBERT v1.1 BioNerFlair Disease NCBI disease P 88.30 88.21 84.12 88.22 91.21 R 89.00 89.04 87.19 91.25 91.01 F 88.60 88.62 85.63 89.71 91.11 BC5CDR P 89.61 86.75 81.97 86.47 87.88 R 83.09 87.70 82.48 87.84 85.73 F 86.23 87.22 82.41 87.15 86.77 Drug/chem. BC5CDR P 94.26 94.28 90.94 93.68 91.22 R 92.38 94.04 91.38 93.26 92.51 F 93.31 94.16 91.16 93.47 91.85 BC4CHEMD P 92.29 91.94 91.19 92.80 95.42 R 90.01 92.04 88.92 91.92 92.72 F 91.14 91.99 90.04 92.36 94.03 Gene/protein BC2GM P 81.81 84.21 81.17 84.32 89.67 R 81.57 84.84 82.42 85.12 90.69 F 81.69 84.56 81.79 84.72 90.17 JNLPBA P 74.43 - 69.57 72.24 86.29 R 83.22 - 81.20 83.56 91.51 F 78.58 - 74.94 77.49 88.73 Species LINNAEUS P 92.80 - 91.17 90.77 97.36 R 94.29 - 84.30 85.83 84.75 F 93.54 - 87.60 88.24 90.06 Species-800 P 74.37 - 69.35 72.80 86.83 R 75.96 - 74.05 75.36 84.25 F 74.98 - 71.63 74.06 85.48 Note

: Marco Precision (P), Recall (R), and F1 (F) scores on each dataset are reported. The best scores are in bold, and the second-best scores are underlined. We list the scores of state-of-the-art (SOTA) models on different datasets as follows: scores of

Xu et al. (2019) on NCBI Disease, scores of Sachan et al. (2018) on BC2GM, scores of Lou et al. (2017) on BC5CDR-disease, scores of Luo et al. (2018) on BC4CHEMD, scores of Yoon et al. (2019) on BC5CDR-chemical and JNLPBA and scores of Giorgi and Bader (2018) on LINNAEUS and Species-800. Scores of BioBERT Lee et al. (2020) and DTranNER Hong and Lee (2020) models are also reported.

Table 2: Test results for biomedical named entity recognition.

3.2 Experimental results

Results of the BioNerFlair method for different datasets are shown in Table 2. The performance of BioNerFlair is compared with other recent state-of-the-art methods. BioNerFlair outperformed state-of-the-art methods on five out of eight datasets while shows near best performance on the remaining three datasets. We can see the biggest improvement in the gene/protein category. BioNerFlair achieves the best F1 score of 90.17 beyond 84.72 on BC2GM corpus and an F1 score of 88.73 beyond 78.58 on JNLPBA corpus. For the species category, BioNerFlair achieves the best F1 score of 85.48 beyond 74.98 on Species-800 corpus, while gets second best score on LINNAEUS corpus. We can notice the same thing for disease and drug/chemical category where BioNerFlair achieves state-of-the-art results of one dataset while getting near best score for other datasets. Even though BioNerFlair does not get best results on BC5CDR corpus for disease and chemical, the results are still competitive when compared with other recent methods and significant improvements can be seen on other datasets.

3.3 Use of different word embeddings

In BioNerFlair, I use GloVe embedding and flair embedding at the token embedding layer. Flair NLP library provides the option of Stacked embedding, which allows us to combine different embeddings together. Flair supports classic word embeddings, character embedding, contextualized word embeddings, pre-trained transformer embedding. Therefore, we can experiment with different pairs of embeddings for sequence labeling tasks. The initial plan for this experiment was to use the concatenation of XLNet Yang et al. (2019), GloVe embedding, and pooled variant of flair embedding Akbik et al. (2019). However, this combination of embeddings requires lots of GPU memory because of which I used the combination of embeddings mentioned above. If more resources are available, we can possibly further improve the performance of BioNer models.

4 Conclusion

In conclusion, this article presents BioNerFlair, a metho-d to train models for biomedical named entity recognition using Flair plus GloVe embeddings and a sequence tagger. This paper shows that using contextualized word embedding pre-trained on biomedical corpora significantly improves the results of BioNer models. I evaluated the performance of BioNerFlair on eight datasets. BioNerFlair achieves state-of-the-art results on five datasets. For future study, I plan to experiment with different contextualized and transformer-based word embeddings to further improve the performance of Biomedical Named Entity recognition models.


I would like to thank the Department of Computer Science and Engineering, Medi-Caps University for the support. I also thank the anonymous reviewers for their comments and suggestions.


This research did not receive any specific grant from fun-ding agencies in the public, commercial, or not-for-profit sectors.

Availability and implementation

Source code and data is available at

Conflict of interest statement

Declarations of interest: none


  • A. Akbik, T. Bergmann, and R. Vollgraf (2019) Pooled contextualized embeddings for named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 724–728. Cited by: §3.3.
  • A. Akbik, D. Blythe, and R. Vollgraf (2018) Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649. Cited by: §1, §2.2.1.
  • Y. Bengio, P. Simard, and P. Frasconi (1994) Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5 (2), pp. 157–166. Cited by: §1.
  • M. Boden (2002)

    A guide to recurrent neural networks and backpropagation

    the Dallas project. Cited by: §2.2.2.
  • D. Campos, S. Matos, and J. L. Oliveira (2012)

    Biomedical named entity recognition: a survey of machine-learning tools

    Theory and Applications for Advanced Text Mining, pp. 175–195. Cited by: §1.
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa (2011) Natural language processing (almost) from scratch. Journal of machine learning research 12 (ARTICLE), pp. 2493–2537. Cited by: §1.
  • T. H. Dang, H. Le, T. M. Nguyen, and S. T. Vu (2018) D3NER: biomedical named entity recognition using crf-bilstm improved with fine-tuned embeddings of various linguistic information. Bioinformatics 34 (20), pp. 3539–3546. Cited by: §2.2.1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • R. I. Doğan, R. Leaman, and Z. Lu (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics 47, pp. 1–10. Cited by: §2.1.
  • M. Gerner, G. Nenadic, and C. M. Bergman (2010) LINNAEUS: a species name identification system for biomedical literature. BMC bioinformatics 11 (1), pp. 85. Cited by: §2.1.
  • J. M. Giorgi and G. D. Bader (2018) Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34 (23), pp. 4087–4094. Cited by: Table 2.
  • A. Graves, A. Mohamed, and G. Hinton (2013) Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. Cited by: §2.2.2.
  • M. Habibi, L. Weber, M. Neves, D. L. Wiegandt, and U. Leser (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33 (14), pp. i37–i48. Cited by: §1, Table 1.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §1, §2.2.2.
  • S. Hong and J. Lee (2020) DTranNER: biomedical named entity recognition with deep learning-based label-label transition model. BMC bioinformatics 21 (1), pp. 53. Cited by: §1, §1, Table 2.
  • Z. Huang, W. Xu, and K. Yu (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991. Cited by: §2.2.2.
  • J. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier (2004) Introduction to the bio-entity recognition task at jnlpba. In Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp. 70–75. Cited by: §2.1.
  • Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush (2015) Character-aware neural language models. arXiv preprint arXiv:1508.06615. Cited by: §1.
  • M. Krallinger, O. Rabal, F. Leitner, M. Vazquez, D. Salgado, Z. Lu, R. Leaman, Y. Lu, D. Ji, D. M. Lowe, et al. (2015) The chemdner corpus of chemicals and drugs and its annotation principles. Journal of cheminformatics 7 (1), pp. 1–17. Cited by: §2.1.
  • J. Lafferty, A. McCallum, and F. C. Pereira (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Cited by: §2.2.3.
  • G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer (2016) Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360. Cited by: §1, §2.2.2.
  • J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 (4), pp. 1234–1240. Cited by: §1, §1, §2.1, §2.2.1, §2.3, Table 1, Table 2.
  • J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C. Wiegers, and Z. Lu (2016) BioCreative v cdr task corpus: a resource for chemical disease relation extraction. Database 2016. Cited by: §2.1.
  • Y. Lou, Y. Zhang, T. Qian, F. Li, S. Xiong, and D. Ji (2017) A transition-based joint model for disease named entity recognition and normalization. Bioinformatics 33 (15), pp. 2363–2371. Cited by: Table 2.
  • L. Luo, Z. Yang, P. Yang, Y. Zhang, L. Wang, H. Lin, and J. Wang (2018) An attention-based bilstm-crf approach to document-level chemical named entity recognition. Bioinformatics 34 (8), pp. 1381–1388. Cited by: §1, Table 2.
  • A. McCallum, D. Freitag, and F. C. Pereira (2000)

    Maximum entropy markov models for information extraction and segmentation.

    In Icml, Vol. 17, pp. 591–598. Cited by: §2.2.3.
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §1.
  • M. Neumann, D. King, I. Beltagy, and W. Ammar (2019) Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669. Cited by: §3.1.
  • E. Pafilis, S. P. Frankild, L. Fanini, S. Faulwetter, C. Pavloudi, A. Vasileiadou, C. Arvanitidis, and L. J. Jensen (2013) The species and organisms resources for fast and accurate identification of taxonomic names in text. PloS one 8 (6), pp. e65390. Cited by: §2.1.
  • J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §1, §2.2.1.
  • M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365. Cited by: §1.
  • A. Ratnaparkhi (1996) A maximum entropy model for part-of-speech tagging. In Conference on empirical methods in natural language processing, Cited by: §2.2.3.
  • D. S. Sachan, P. Xie, M. Sachan, and E. P. Xing (2018) Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In Machine Learning for Healthcare Conference, pp. 383–402. Cited by: Table 2.
  • L. Smith, L. K. Tanabe, R. J. nee Ando, C. Kuo, I. Chung, C. Hsu, Y. Lin, R. Klinger, C. M. Friedrich, K. Ganchev, et al. (2008) Overview of biocreative ii gene mention recognition. Genome biology 9 (S2), pp. S2. Cited by: §2.1.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §1.
  • L. Verwimp, J. Pelemans, P. Wambacq, et al. (2017) Character-word lstm language models. arXiv preprint arXiv:1704.02813. Cited by: §1.
  • X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz, and J. Han (2019) Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35 (10), pp. 1745–1752. Cited by: §2.1.
  • L. Weber, M. Sänger, J. Münchmeyer, M. Habibi, and U. Leser (2020) HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition. arXiv preprint arXiv:2008.07347. Cited by: §3.1.
  • K. Xu, Z. Yang, P. Kang, Q. Wang, and W. Liu (2019) Document-level attention-based bilstm-crf incorporating disease dictionary for disease named entity recognition. Computers in biology and medicine 108, pp. 122–132. Cited by: Table 2.
  • Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le (2019) Xlnet: generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pp. 5753–5763. Cited by: §3.3.
  • W. Yoon, C. H. So, J. Lee, and J. Kang (2019) Collabonet: collaboration of deep neural networks for biomedical named entity recognition. BMC bioinformatics 20 (10), pp. 249. Cited by: Table 2.
  • G. Zhou, J. Zhang, J. Su, D. Shen, and C. Tan (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20 (7), pp. 1178–1190. Cited by: §1.
  • H. Zhu, I. C. Paschalidis, and A. Tahmasebi (2018) Clinical concept extraction with contextual word embedding. arXiv preprint arXiv:1810.10566. Cited by: Table 1.