DeepAI
Log In Sign Up

An Instance Transfer based Approach Using Enhanced Recurrent Neural Network for Domain Named Entity Recognition

Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure.

READ FULL TEXT VIEW PDF

page 3

page 4

page 5

page 6

page 7

page 8

page 9

page 11

04/30/2020

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Named Entity Recognition (NER) performance often degrades rapidly when a...
05/17/2017

Transfer Learning for Named-Entity Recognition with Neural Networks

Recent approaches based on artificial neural networks (ANNs) have shown ...
11/15/2021

Zero-Shot Learning in Named-Entity Recognition with External Knowledge

A significant shortcoming of current state-of-the-art (SOTA) named-entit...
02/25/2019

Transfer Learning for Sequences via Learning to Collocate

Transfer learning aims to solve the data sparsity for a target domain by...
10/11/2022

SEE-Few: Seed, Expand and Entail for Few-shot Named Entity Recognition

Few-shot named entity recognition (NER) aims at identifying named entiti...
03/28/2022

Using Domain Knowledge for Low Resource Named Entity Recognition

In recent years, named entity recognition has always been a popular rese...
05/27/2021

Neural Entity Recognition with Gazetteer based Fusion

Incorporating external knowledge into Named Entity Recognition (NER) sys...

1 Introduction

In recent years, Web data and knowledge management attracts the interests from industry and research fields. There are various promising applications, such as intelligent recommendation, machine Question & Answer, knowledge graph and so on. Named entity recognition is a fundamental and very important step in the automatic information extraction. A recognition method with high quality can directly improve the follow-up processing results of Web data management products. NER research shows great successes in various domains and becomes a hot topic 

Sun et al. (2016); Eiselt and Figueroa (2013), such as social media Vavliakis et al. (2013); Yao and Sun (2016), language texts Karaa and Slimani (2017), biomedicine Song et al. (2016); Amith et al. (2017), and so on. As we know, texts of different domains may vary from features, writing styles and structures. Domain NER meets a challenge that annotating data for new domains is labor intensive.

For domain NER, a new domain (target domain) has no or a few labelled data. However, it is natural to think that if some similar domain (source domain) with enough lablled data already exists, it is possible to borrow some from this similar domain. Domain adaptation targets at transferring the source domain knowledge to the target domain Liu et al. (2016)) , which is a effective way to solve the problem of labelling large amount of data on new corpus or domains. In other words, if a NER modle is trained well for some fixed source domain, it is interesting to study how to deploy them across one or more different target domains.

In this paper, we focus on domain NER for the politics text domain in Chinese high schools to support an automatic question and answer system (Q&A) that will take the national college entrance examination (NCEE) in the future. However there is no public politics text corpus used by Chinese high school for NER task. Moreover, there is no labeled data. We notice that People’s Daily corpus is free for public download and similar to politics text. Therefore, we propose an instance transfer based approach for domain NER with enhanced Recurrent Neural Network (RNN). First, we design an instance transfer strategy to extract similar sentences from the People’s Daily corpus (source domain). Here, instances mean labelled data and politics text is our target domain. Then, recurrent neural network (RNN) model can be trained based on the transferred instances. Moreover, we improve the traditional RNN model by enhancing its activation function and structure. Finally, an instance transfer enhanced RNN (ERNN) model is proposed to do NER for politics text target domain, which is trained based on the transferred instances from similar source domain (People’s daily corpus). Compared with the traditional RNN model, experimental results show that our ERNN with the instance transfer strategy can get improvement from 80.52% to 93.28% in terms of F1 measure.

In addition, we consider other situation where a small number of labelled data is available to further investigate the performance of our proposed approach. Since the politics text as target domain data is needed to obtain, we collect texts from high school books and relevant websites and manually label a very small set of them, making labeled data much less than the unlabeled one. Experimental results show that labeling data for target domain is quite useful, reaching at 92.13 in terms of F1 measure. However, our instance transfer based enhanced RNN can further improve the F1 value to 93.81. Finally, we adopt the co-training approach using our proposed ERNN model and CRF by taking advantage of large unannotated target domain data, which get the F1 value of 94.02.

The rest of the paper is organized as follows. In Section 2

, we introduce related work on NER (especially on domain NER), recurrent neural network and transfer learning. Section 

3 elaborates our approach including the instance transfer strategy, our proposed ERNN. Experiments and results are given in Section 4, then we make the conclusion and future work in Section i5.

2 Related Work

As the virtual step in information extraction and a shared task of CoNLL-2003 Sang and Meulder (2003), NER has been widely studied nowadays and most are about domain NER. Zhang et al. Zhang et al. (2016)

employ conditional random fields (CRF) and structure support vector machines (SSVMs) to do chemical entity mention recognition in patients and chemical passage detection. Chen et al. 

Chen et al. (2015)

propose novel active learning algorithms to significantly save annotation cost for clinical NER task. Besides, considering the condition that available labeled data is not at hand sometimes, Brooke et al. 

Brooke et al. (2016) present a NER system for tagging fiction. To avoid the lack of annotated data, they bootstrap a model from term clusters and leverage multiple instances of the same name in a text instead. However it needs to pass through the corpus to build a feature vector corresponding to all the contexts in one document. Moreover, a large number of researches about domain NER are on biomedical fields Murugesan et al. (2017); Crichton et al. (2017), and the typical methods are CRF Seker and Eryigit (2017); Jochim and Deleris (2017)

, and other supervised learning approaches 

Jain (2015).

As neural networks have shown promising results in domain NER Crichton et al. (2017) , such as RNN, LSTM, and so on. Tomori et al. Tomori et al. (2016) use deep neural networks for referring to the real world (i.e. game states) to improve Japanese chess NER. Zeng and Sun et al. Zeng et al. (2017)

combine the bidirectional long short-term memory (LSTM) and CRF to automatically explore words and characters level features. However, transfer learning and semi-supervised learning, like co-training, which are effective ways for the situation when train data is much less than unannotated data.

When considering how to deal with the situation of lack of labelled data, Pen and Yang Pan and Yang (2010) have categorized and reviewed the reserch progress on transfer learning for classification, regression, and clustering problems. Similar to the work of Qu et al. Qu et al. (2016) in domain adaption, in this paper we study a instance transfer strategy, which is hot today but rarely used in NER Arnold et al. (2008); Chen et al. (2014), to make use of out-of-domain data (source domain).

Co-training is also an effecitve way to use unlabelled data in supervised learning task, Li and Huang et al. Li et al. (2013) use a bilingual co-training and maximum entropy model to carry out English and Chinese NER. Munkhdalai et al. Munkhdalai et al. (2012) modify the original co-training to cover knowledge from unlabeled data to recognize bio named entities in text. Besides, clustering whould be another research direction for this problem Wang et al. (2015, 2018); Wang and Wu (2018). Our interest is domain NER for Chinese high school news texts. In this paper we also put our proposed ERNN into a co-training approach to see how much improvement can be obtained through large amounts of unlabeled data.

The contributions and differences of our work from others are listed as follow:

  1. We design a instance transfer strategy by selecting similar sentences from a source domain.

  2. We add an additional layer to the traditional RNN structure by coupling knowledge from the transferred similar instances.

  3. We conduct experiments on two datasets. No matter there is no labelled data or no in target domain, our proposed approach can improve the quality of NER.

3 Our Approach

3.1 Problem Description

Manually labeling data is time consuming in most machine learning methods. However, there are some other domain data with well-labeled data. As Pen and Yang have declared 

Pan and Yang (2010), in such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning without taking lots of efforts to label data. Our interest is Chinese high school politics text NER to support automatic Question & Answer application. There is no labelled data at hand for this new domain, but we find that some public NER data, like People’s Daily corpus, is similar to it. The similar domain with well-labeled data is called source domain, while the new domain without enough labelled data is called target domain. In this paper, we propose an instance transfer enhanced RNN model for domain NER by leveraging labelled data from a source domain. In Section 3.2, we will introduce an our instance transfer strategy that will select similar labeled data (sentences) from a source domain. In Section 3.3, we will discuss how to improve the traditional RNN model by adding those selected data into the network structure of RNN.

3.2 Our Instance Transfer Strategy

We design a instance transfer strategy where the target domain is Chinese high school politics texts, and People’s Daily is the source domain. Since texts from different domains may vary from features, writing styles and structures, it is not suitable to use all labelled data from the source domain. In this paper we first consider two functions to compute sentence similarity between source domain and target domain. The two functions are Gaussian radial basis function (RBF) and polynomial kernel function which are described as following:

  • Gaussian RBF kernel:

    (1)
  • polynomial kernel function:

    (2)

Here the stands for the sentence vector in source domain, while represents the mean of target domain corpus matrix. And we set =1 and d=2. We use these two functions to evaluate the sentence similarity between source domain and target domain and sort source domain sentences according to their similarity scores.

Then two transfer strategies are given to select top similar sentences:

  1. Directly regard the top n sentences as target domain training data.

  2. Repeat the nth sentence k times in source domain where the value of k depends on n. The original source corpus has been enlarged.

3.3 Our ERNN Model

Through our instance transfer strategy, target domain borrows labelled data from source domain and top similar sentences have been assigned more weights than others. Therefore, the next step of our approach is to enhance RNN model by utilizing those labelled data. RNN is designed to solve the problems involved in time and sequence Mikolov et al. (2010)

. Not only the output of moment

is influenced by that of previous moment, but also the nodes between hidden layer are all connected to each other. In this paper, we propose an enhanced RNN by modifying both its activation function and structure to do domain NER task.

3.3.1 Activation Function

As described in Chung et al. (2016), a deep neural network is an ensemble of vectors and matrices that stand for bias and weight values, and nonlinear activation function. A suitable activation function means much for a deep neural network. Its changes can speed up model training Xing et al. (2016) and enhance stability Liew et al. (2016), etc. However, for a given domain, the best nonlinear function remains unknown, even though many rectifier-type nonlinear functions have been proposed as activation functions Chung et al. (2016)

. The performance of a same activation function may be widely different when applying it to different tasks. For NER task, compared to other frequently used activation functions such as ReLu, PReLu, tanh and so on, the sigmoid function shows best in our preliminary experiments. Its definition is given in Eq. 

3.

(3)

However, the model training process involves the derivation, while the derivative of sigmoid function tends to zero when the argument x approaches infinity (which called the saturation phenomenon) both at its left and right. This may make it harder to train model. Even so, sigmoid is most similar to the reflex mechanism on the biological neuron level and the interval of its output is always between 0 and 1, which can represent the prediction probability of a label. In our experiments, its linear approximation function expressed by Eq. 

4, shows a good performance and a is set to be 0.2 and b is 0.5 A more intuitive depiction is shown in Figure 1.

(4)
Figure 1: Sigmoid and its linear approximation function

Focusing on our NER task, a new activation function is proposed by combining the two above functions (shown in Eq. 5). Here parameters and are coefficients and they are determined by experimental performances.

(5)

The new activation function has some advantages compared to its two original functions: sigmoid and linear approximation. It ameliorates the former’s saturation phenomenon on the one hand and smooths the latter on the other hand. In our experiments, the results using the proposed activation function are better than that using sigmoid or its linear approximation.

3.3.2 Model Structure

In order to take full advantage of source domain data, we also improve RNN’s structure and propose an instance transfer enhanced RNN (ERNN), which can input the source domain sentences into RNN. Our ERNN is modified based on the Elman-type described in Mesnil et al. (2013).

The new structure we modified is depicted as Figure 2 and Figure 3. In Figure 2, the left shows for the original structure and the right describes our modified recurrent neural network when unfolding its structure in time of the computation involved in its forward computation. nodes represent the hidden layer, meanwhile indicates the confluent layer. The source domain data combines with the output of the hidden layer . The overview of the structure is shown in Figure 3 that depicts the domain source data input compared to Figure 2. As shown in Figure 2, , , , , are weights and parameters. The output of the is computed by Eq. 6. In output layer, we use softmax to get the prediction results i.e. ), as shown in Eq. 7.

(6)
(7)

Here is the activation function, i stands for the source domain data, as marked in Figure3. and

are bias vectors. In this way we achieve the reuse of the valuable source domain corpus.

Figure 2: The RNN structure before/after modified
Figure 3: The overview of our ERNN

4 Experiments

We conduct two sets of experiments on two totally different datasets for different purposes. One is NER experiments among three popular models, i.e., HMM, CRF and RNN, which aims to explore the classification quality with different training data sizes. The other is domain NER using our ERNN with instance transfer.

4.1 Data Description

There are two datasets used in our experiments. For NER experiment among three popular models, we mainly focus the changes of the classification performance with different amount of training data. We conduct the experiment on a standard corpus named ATIS Mesnil et al. (2013). It contains 127 classes and uses the in/out/begin (IOB) representation. A sentence in ATIS can be expressed as in Table 1.

The other dataset is collected from Chinese high school politics textbooks and websites, which is the target domain in this paper. The source domain corpus is People’s Daily 1998 that is open online. In addition, due to lack of label information, we set 11 classes for target domain based on People’s Daily 1998, including person names, regular time words, single time words such as those means ”night”, proprietary organization and company name. For example, Chinese characters are ”Xinhua News Agency” and ”the United Front Department”, and special region name, noun and etc.

Sentence flight from memphis to tacoma
Label O O B-fromloc O B-toloc
.city_name .city_name
Sentence limousine service at logan airport
Label B-transport_type O O B-toloc.airport_name I-airport_name
Table 1: Sentences in ATIS

4.2 Data Pre-processing

Specifically, for NER experiment, ATIS contains 3983 training sentences and 893 testing sentences. In order to find out how it works when the labeled data is less than the unlabeled one, we reorganized the training set. Based on ATIS original train/test proportion (3983/893), we randomly select 20%, 40%, 60%, 80% and 100% of total train data (3983 sentences) as our training sets to train two traditional statistical models(Hidden Markov Model and CRF) and RNN model and one RNN model. For domain NER experiment, we first pick the most common words from target domain data as our dictionary. By counting the times a word has appeared in this corpus, we select top 13,450 words as our dictionary and other words are marked as ’unknown’ in a sentence. After this, we clean the data based on a rule that is if a sentence meets any of the following two conditions, it would be regarded as noisy data and abandoned.

  • The length is less than 3

  • Too many ’unknown’ tags(more than 50%)

After preprocessing, the People’s Daily sentences have been reduced from 19484 to 4818, meanwhile the number of politics text sentences is also reduced from 82700 to 15584. Since there is no annotated data of the latter, we manually mark 3043 sentences. Table 2 shows the statistics of the labeled and unlabeled data size of both datasets.

Corpus- Labeled Train Test Unla-
Domain Data Set Set beled
Source 4818 3855 963 0
3043(manu-
Target ally labeled) 2035 1008 13549
Table 2: The statistcs of domain and target datasets

For People’s Daily, we use 80%/20% of the total 4818 sentences as our train/test data sets. For our target domain data in severe lack of tagged data, we use manually labeled 2035 sentences and tag other 1008 sentences from the large unannotated set as the test set.

4.3 Evaluation and Setup

In this paper, all the experimental results were evaluated by the popular evaluation measures, i.e., precision (P), recall (R) and F1-score. In addition, for supervised learning experiment, we used K-fold cross validations. K is different based on the different size of train set. For example, when using 20% of the original train set, K is 5, and when using 40% and 60%, K is 3.

We also do some preliminary experiments to set the ERNN parameters. For example, the word embedding we use is trained by target domain texts via word2vec, a Google open source project which brings about 14.1% performance improvement. Besides, we set the ERNN context window size equal to 1, which performs better than those 5, 7 and 9 (probably because there are too many non-entity labels in a sentence). For instance transfer, since the RBF similarity function achieves a better result, we use it to enlarge the source domain corpus.

4.4 NER Experimental Results of Three Popular NER Models

In this part, we compare the NER performance of a current popular deep learning model, i.e., RNN with two traditional models, i.e., HMM and CRF. And we also mainly explore the NER performance with different training data sizes. We conduct several experiments using ATIS’s total training data’s 20%, 40%, 60%, 80% and 100% to train the three popular NER models. The results are shown in Figure 

4, Figure 5 and Figure 6.

Figure 4: Results of the HMM model

From Figure 4, Figure 5 and Figure 6 we can see how the three models behave when being trained on different data sizes. Overall, the P, R and F1 of these three models are gradually rising when using more training data. However, the performances are not growing linearly when the data size gets larger. While using more and more data to train a model, the degree of improvement becomes smaller. Actually, when the training data size increases from 20% to 40%, the three models generally achieve a highest improvement (about 4% improvement in terms of F1 measure). After that, increasing the training data size brings little benefits.

Figure 5: Results of the CRF model

This indicates that when meeting a new domain for NER, labelled data will actually improve performance. But manully labelling data is labor intensive. Putting more labor on labelling data could not bring us larger improvement considering the time and the labor we need. Therefore, in this paper, we propose an instance transfer strategy by transferring labelled data from other well labelled domain, which is a effective way to solve the problem of lack of labelled data in a new domain.

Figure 6: Results of RNN model

4.5 Domain NER Experimental Results

4.5.1 Instance Transfer

We have designed the two transfer learning strategies introduced in Section 3.2. One is directly using the RBF top n (here we set n=800) similar sentences from source domain as new additional target domain training data. The other is increasing the source corpus by Eq. 8, i.e. the nth most similar sentences repeat k times. Our preliminary results show that the latter is better. Therefore, due to page limit, we just discuss the latter in this paper. Finally, 41500 sentences are obtained as labelled data. We still keep the 80%/20% train/test proportion when using the enlarged training data to train our ERNN model. The parameters are given in Equation 8.

(8)

Since in the above experiment, RNN show better performance than HMM and CRF, RNN related baselines and our approach are listed as follows(Here ’IT’ means ’Instance Transfer’). As discussed in Section 2, most of current researches that study deep neural network for NER work on trying different variants of deep learning models by assuming that a number of training data is ready. However, we consider two situations. One is that there is no labelled data in target domain, i.e., RNN_D_IT and ERNN_IT. The other is there is a few labelled data in target domain, i.e., RNN_L, RNN_L_D_IT and ERNN_L_IT. Our problem context is quite different from most of current work.

  1. RNN_D_IT: train a traditional RNN model (left part in Figure 2) directly with source domain. This training process is intuitive and any NER model can be applied like this.

  2. ERNN_IT: train our ERNN model (Figure  3) with selected instances from source domain by our instance transfer strategy (Eq. 8).

  3. RNN_L: train a traditional RNN model (left part in Figure 2)with a few lablled data from target domain. This is a traditional training process for supervised learning and more results are discussed in Section 4.4.

  4. RNN_L_D_IT: train a traditional RNN model (left part in Figure 2) with a few lablled data from target domain and all instances in source domain. The problem context is the same as that of Qu et al. Qu et al. (2016), but RNN is used in our paper instead of CRF.

  5. ERNN_L_IT: train our ERNN model (Figure  3) with a few lablled data from target domain and transferred instances in source domain by our instance transfer strategy (Eq. 8).

For the situation that there is no lablled data in target domain, the experimental results are reported in Table 3. The results of RNN_D_IT tell us that directly using all the labelled data from source domain directly is not much satisfactory. With the help of our instance transfer strategy, similar sentences are selected and repeatedly used as training data. The F1 measure of our ERNN_IT reaches at 93.28%, with the improvement about 15.84%.

Experiments P(%) R(%) F1(%)
RNN_D_IT 76.11 85.48 80.52
ERNN_IT 92.67 93.90 93.28
Table 3: P, R and F1 scores without labelled data

In addition, the other situation for domain NER is considered where a few labelled data is available. Therefore, experiments are conducted by adding our manually labelled data into RNN_D_IT and ERNN_IT. Experimental results are show in 4. Traditional RNN model gets the F1 value of 92.13, which further indicates that manually labelled data for target domain can largely improve the quality of domain NER. Under this situation. our ERNN_L_IT can still obtain the F1 value of 93.81, and it benefits from our designed instance transfer strategy and adding a particular layer to RNN. The F1 value of RNN_L_D_T is 93.06, which says that source domain can help target domain NER. Transfer strategy should be studied. The results in Table 3 and Table 3 tell us that transfer learning is an effective way to leverage the lablled data from source domain corpus.

Experiments P(%) R(%) F1(%)
RNN_L 91.84 92.42 92.13
RNN_L_D_T 94.57 91.60 93.06
ERNN_L_IT 93.16 94.47 93.81
Table 4: P, R and F1 scores with a few labelled data

4.5.2 Co-training

The experiments in instance transfer part aim to utilize the source domain data to raise the recognition performances. Along with instance transfer learning methods, we also consider taking advantages of unannotated data since we have much of them at hand. Specifically, we conduct experiments by co-training our ERNN and a statistical probability model i.e. CRF. Co-training is kind of a semi-supervised strategy that is firstly proposed by Blum and Mitchell Blum and Mitchell (1998) in 1998 and the conditional independence of the data views is declared as a required criterion. However, Abney Abney (2002) shows that the independence assumption can be relaxed, which means co-training is still effective under a weaker independence assumption. In this paper, we explore the effect of co-training with our ERNN and adopt it to leverage the large unannotated in-domain data.

The initial training data size is 2035 sentences here, while the unannotated data is 12541 sentences. We select top 800 high confidence level sentences in each iteration and after iterated 10 times, all 12541 sentences get labeled. Table 5 shows the results of each model in each iteration.

CRF (%) ERNN (%)
P R F1 P R F1
94.77 84.57 89.38 93.16 94.47 93.81

95.47
84.95 89.90 92.50 94.96 93.71
95.57 86.25 90.67 93.84 93.39 93.61
96.18 86.49 91.08 93.92 93.88 93.90
96.02 86.86 91.21 93.04 94.81 93.92
96.70 86.57 91.36 95.04 93.01 94.02
96.61 87.12 91.62 94.23 93.75 93.99
96.89 87.31 91.85 94.51 93.17 93.83
96.65 87.77 92.00 94.31 93.70 94.00
96.92 88.09 92.30 95.16 91.68 93.39
Table 5: P, R and F1 score of ERNN and CRF in each iteration of co-training

For the co-training experiments, large unannotated data also brings rise to the quality of NER, but not as much as that of instance transfer experiments. Due to the CRF’s low starting point, the ERNN F1 score goes down after reaching 94.02 in terms of F1 measure.

5 Conclusions and Future Work

Domain NER is important and useful in various applications. In this paper, we study instance transfer and RNN to improve the quality of domain NER. We leverage the source domain data by proposing an instance transfer enhanced RNN called ERNN. In addition, we adopt co-training strategy to leverage the large unannotated in-domain data to further improve the recognition performances. In the future, we are going to make a further exploration to automatic QA system, relationship extraction between entities.

References

  • Abney (2002) Steven P. Abney. 2002. Bootstrapping. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA., pages 360–367.
  • Amith et al. (2017) Muhammad Amith, Yaoyun Zhang, Hua Xu, and Cui Tao. 2017. Knowledge-based approach for named entity recognition in biomedical literature: A use case in biomedical software identification. In

    Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part II

    , pages 386–395.
  • Arnold et al. (2008) Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2008. Exploiting feature hierarchy for transfer learning in named entity recognition. In ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15-20, 2008, Columbus, Ohio, USA, pages 245–253.
  • Blum and Mitchell (1998) Avrim Blum and Tom M. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In

    Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, July 24-26, 1998.

    , pages 92–100.
  • Brooke et al. (2016) Julian Brooke, Adam Hammond, and Timothy Baldwin. 2016. Bootstrapped text-level named entity recognition for literature. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers.
  • Chen et al. (2015) Yukun Chen, Thomas A. Lasko, Qiaozhu Mei, Joshua C. Denny, and Hua Xu. 2015. A study of active learning methods for named entity recognition in clinical text. Journal of Biomedical Informatics, 58:11–18.
  • Chen et al. (2014) Yukun Chen, Yaoyun Zhang, Qiaozhu Mei, Dead Account, Joshua C. Denny, and Hua Xu. 2014. A preliminary study of coupling transfer learning with active learning for clinical named entity recognition between two institutions. In AMIA 2014, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 15-19, 2014.
  • Chung et al. (2016) Hoon Chung, Sung Joo Lee, and Jeon Gue Park. 2016. Deep neural network using trainable activation functions. In 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 348–352.
  • Crichton et al. (2017) Gamal K. O. Crichton, Sampo Pyysalo, Billy Chiu, and Anna Korhonen. 2017. A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinformatics, 18(1):368:1–368:14.
  • Eiselt and Figueroa (2013) Andreas Eiselt and Alejandro Figueroa. 2013. A two-step named entity recognizer for open-domain search queries. In

    Sixth International Joint Conference on Natural Language Processing, IJCNLP 2013, Nagoya, Japan, October 14-18, 2013

    , pages 829–833.
  • Jain (2015) Devanshu Jain. 2015. Supervised named entity recognition for clinical data. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015.
  • Jochim and Deleris (2017) Charles Jochim and Léa Amandine Deleris. 2017. Named entity recognition in the medical domain with constrained crf models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 1: Long Papers, pages 839–849.
  • Karaa and Slimani (2017) Wahiba Karaa and Thabet Slimani. 2017. A new approach for arabic named entity recognition. Int. Arab J. Inf. Technol., 14(3):332–338.
  • Li et al. (2013) Yegang Li, Heyan Huang, Xingjian Zhao, and Shumin Shi. 2013. Named entity recognition based on bilingual co-training. In Chinese Lexical Semantics - 14th Workshop, CLSW 2013, Zhengzhou, China, May 10-12, 2013. Revised Selected Papers, pages 480–489.
  • Liew et al. (2016) Shan Sung Liew, Mohamed Khalil Hani, and Rabia Bakhteri. 2016.

    Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems.

    Neurocomputing, 216:718–734.
  • Liu et al. (2016) Gaowen Liu, Yan Yan, Ramanathan Subramanian, Jingkuan Song, Guoyu Lu, and Nicu Sebe. 2016. Active domain adaptation with noisy labels for multimedia analysis. World Wide Web, 19(2):199–215.
  • Mesnil et al. (2013) Grégoire Mesnil, Xiaodong He, Li Deng, and Yoshua Bengio. 2013. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pages 3771–3775.
  • Mikolov et al. (2010) Tomas Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pages 1045–1048.
  • Munkhdalai et al. (2012) Tsendsuren Munkhdalai, Meijing Li, Taewook Kim, Oyun-Erdene Namsrai, Seon-Phil Jeong, Jungpil Shin, and Keun Ho Ryu. 2012. Bio named entity recognition based on co-training algorithm. In 26th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2012, Fukuoka, Japan, March 26-29, 2012, pages 857–862.
  • Murugesan et al. (2017) Gurusamy Murugesan, Sabenabanu Abdulkadhar, Balu Bhasuran, and Jeyakumar Natarajan. 2017. BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. EURASIP J. Bioinformatics and Systems Biology, 2017:7.
  • Pan and Yang (2010) Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng., 22(10):1345–1359.
  • Qu et al. (2016) Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, and Timothy Baldwin. 2016. Named entity recognition for novel types by transfer learning. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 899–905.
  • Sang and Meulder (2003) Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pages 142–147.
  • Seker and Eryigit (2017) Gökhan Akin Seker and Gülsen Eryigit. 2017. Extending a crf-based named entity recognition model for turkish well formed text and user generated content. Semantic Web, 8(5):625–642.
  • Song et al. (2016) Dingxin Song, Lishuang Li, Liuke Jin, and Degen Huang. 2016. Biomedical named entity recognition based on recurrent neural networks with different extended methods. IJDMB, 16(1):17–31.
  • Sun et al. (2016) Huiyu Sun, Ralph Grishman, and Yingchao Wang. 2016. Domain adaptation with active learning for named entity recognition. In Cloud Computing and Security - Second International Conference, ICCCS, 2016, Nanjing, China, July 29-31, 2016, Revised Selected Papers, Part II, pages 611–622.
  • Tomori et al. (2016) Suzushi Tomori, Takashi Ninomiya, and Shinsuke Mori. 2016. Domain specific named entity recognition referring to the real world by deep neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers.
  • Vavliakis et al. (2013) Konstantinos N. Vavliakis, Andreas L. Symeonidis, and Pericles A. Mitkas. 2013. Event identification in web social media through named entity recognition and topic modeling. Data Knowl. Eng., 88:1–24.
  • Wang et al. (2015) Y. Wang, X. Lin, L. Wu, W. Zhang, Q. Zhang, and X. Huang. 2015. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Transactions on Image Processing, 24(11):3939–3949.
  • Wang and Wu (2018) Y. Wang and L. Wu. 2018.

    Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering.

    Neural Networks, 103:57–70.
  • Wang et al. (2018) Y. Wang, L. Wu, X. Lin, and J. Gao. 2018. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 29(10):4833–4843.
  • Xing et al. (2016) Anhao Xing, Qingwei Zhao, and Yonghong Yan. 2016. Speeding up deep neural networks in speech recognition with piecewise quantized sigmoidal activation function. IEICE Transactions, 99-D(10):2558–2561.
  • Yao and Sun (2016) Yangjie Yao and Aixin Sun. 2016. Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web, 19(5):783–805.
  • Zeng et al. (2017) Donghuo Zeng, Chengjie Sun, Lei Lin, and Bingquan Liu. 2017. LSTM-CRF for drug-named entity recognition. Entropy, 19(6):283.
  • Zhang et al. (2016) Yaoyun Zhang, Jun Xu, Hui Chen, Jingqi Wang, Yonghui Wu, Manu Prakasam, and Hua Xu. 2016. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database, 2016.