A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign

by   Pham Quang Nhat Minh, et al.
Alt, Inc.

In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.



There are no comments yet.


page 1

page 2

page 3

page 4


A Feature-Rich Vietnamese Named-Entity Recognition Model

In this paper, we present a feature-based named-entity recognition (NER)...

Named Entity Recognition with Extremely Limited Data

Traditional information retrieval treats named entity recognition as a p...

Multi-Grained Named Entity Recognition

This paper presents a novel framework, MGNER, for Multi-Grained Named En...

Nested Named Entity Recognition via Second-best Sequence Learning and Decoding

When an entity name contains other names within it, the identification o...

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

Patent texts contain a large amount of entity information. Through named...

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

In this paper, we present NEREL, a Russian dataset for named entity reco...

A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

We study a variant of domain adaptation for named-entity recognition whe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Word Level-1 Tag Level-2 Tag Joint Tag
ông O O O+O
Ngô_Văn_Quý B-PER O B-PER+O
- O O O+O
Phó O O O+O
Chủ_tịch O O O+O
Table 1: Generating joint-tags by combing entity tags at all levels of a token

Named-entity recognition (NER) is an important task in information extraction. The task is to identify in a text, spans that are entities and classify them into pre-defined categories. There have been some conferences and shared tasks for evaluating NER systems in English and other languages, such as MUC-6 

Sundheim (1995), CoNLL 2002 Sang (2002) and CoNLL 2003 Sang and Meulder (2003).

In Vietnamese language, VLSP 2016 NER evaluation Huyen and Luong (2016) is the first evaluation campaign that aims to systematically compare NER systems for Vietnamese language. Similar to CoNLL 2003 shared-task, in VLSP 2016, four named-entity types were considered: person (PER), organization (ORG), location (LOC), and miscellaneous entities (MISC). In VLSP 2016, organizers provided the training/test with gold word segmentation, PoS and chunking tags. While that setting can help participant teams to reduce effort of data processing and solely focus on developing NER algorithms, it is not so realistic setting. In VLSP 2018 NER evaluation, only raw texts with XML tags were provided. Therefore, we need to choose appropriate Vietnamese NLP tools for preprocessing steps such as word segmentation, PoS tagging, and chunking.

In the report, we describe our NER system at VLSP 2018 NER evaluation campaign. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features and adopted Conditional Random Fields (CRF) Lafferty et al. (2001) for training and testing.

In the VLSP 2018 NER task, similar as VLSP 2016, there are nested entities the NER dataset. An entity may contain other entities inside them. We categorize entities in VLSP 2018 NER dataset into three levels.

  • Level-1 entities are entities that do not contain other entities inside them. For example: ¡ENAMEX TYPE=“LOC”¿Hà Nội¡/ENAMEX¿.

  • Level-2 entities are entities contain only level-1 entities inside them. For example: ¡ENAMEX TYPE=“ORG”¿UBND thành phố ¡ENAMEX TYPE=“LOC”¿Hà Nội¡/ENAMEX¿¡/ENAMEX¿.

  • Level-3 entities are entities that contain at least one level-2 entity and may contain some level-1 entities. For example ¡ENAMEX TYPE=“ORG”¿Khoa Toán, ¡ENAMEX TYPE=“ORG”¿ĐHQG ¡ENAMEX TYPE=“LOC”¿Hà Nội¡/ENAMEX¿¡/ENAMEX¿¡/ENAMEX¿

In our data statistics, we see that the number of level-3 entities is too small compared with the number of level-1 and level-2 entities, so we decided to ignore them in building the model. We just consider level-1 and level-2 entities.

In order to deal with nested named-entities, we investigated two methods. The first method trains separated models for each level of entities. The second method trains a single model on the training data in which tags are generated by combing entity tags of entities of all levels. Table 1 shows an example of how we combined entity tags at all levels of a token to create join tags.

We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.

The rest of the paper is organized as follows. In section 2, we described our participant NER system. In section 3, we present our evaluation results. Finally, section 4 gives conclusions about the work.

2 System description

We formalize NER task as a sequence labeling problem by using the B-I-O tagging scheme and we apply a popular sequence labeling model, Conditional Random Fields to the problem. In this section, first we present how we preprocess the data and then present features that we used in our model.

2.1 Preprocessing

In our NER system, we performed sentence and word segmentation on the data. For sentence segmentation, we just used a simple regular expression to detect sentence boundaries that match the pattern: period followed by a space and upper-case character. Actually, to produce result submissions, we also try not to perform sentence segmentation.

For word segmentation, we adopted RDRsegmenter Nguyen et al. (2018) which is the state-of-the-art Vietnamese word segmentation tool. Both training and development data are the converted into data files in CoNLL 2003 format with two columns: words and their BIO tags. Due to errors of word segmentation tool, there may be boundary-conflict problem between entity boundary and word boundary. In such cases, we decided to tag words as “O” (outside entity).

2.2 Features

Basically, features in the proposed NER model are categorized into word, word-shape features, features based on word representations including word clusters and word embedding. Note that, we extract unigram and bigram features within the context surrounding the current token with the window size of . More specifically, for a feature of the current word, unigram and bigram features are as follows.

  • unigrams: [-2], [-1], [0], [1], [2]

  • bigrams: [-2][-1], [-1][0], [0][1], [1][2]

2.2.1 Word Features

We extract word-identity unigrams and bigrams within the window of size 5. We use both word surfaces and their lower-case forms. Beside words, we also extract prefixes and suffixes of surfaces of words within the context of the current word. In our model, we use prefixes and suffixes of lengths from 1 to 4 characters.

2.2.2 Word Shapes

In addition to word identities, we use word shapes to improve prediction ability, especially for unknown or rare words and reduce data spareness problem. We used the same word shapes as presented in Minh (2018).

2.2.3 Brown cluster-based features

Brown clustering algorithm is a hierarchical clustering algorithm for assigning words to clusters 

Brown et al. (1992). Each cluster contains words which are semantically similar. Output clusters are represented as bit-strings. Brown-cluster-based features in our NER model include whole bit-string representations of words and their prefixes of lengths 2, 4, 6, 8, 10, 12, 16, 20. Note that, we only extract unigrams for Brown-cluster-based features.

In experiments, we used the Brown clustering implementation of Liang Liang (2005) and applied the tool on the raw text data collected through a Vietnamese news portal. We performed word clustering on the same preprocessed text data which were used to generate word embeddings in Le-Hong et al. (2017). The number of word clusters used in our experiments is 5120.

2.2.4 Word embeddings

Word-embedding features have been used for a CRF-based Vietnamese NER model in Le-Hong et al. (2017)

. The basic idea is adding unigram features corresponding to dimensions of word representation vectors.

In the paper, we apply the same word-embedding features as in Le-Hong et al. (2017). We generated pre-trained word vectors by applying Glove Pennington et al. (2014) on the same text data used to run Brown clustering. The dimension of word vectors in 25.

3 Evaluation

3.1 Data sets

Type Train Dev Test
Level-1 Level-2 Level-3 Level-1 Level-2 Level-3 Level-1 Level-2 Level-3
LOC 8831 7 0 3043 2 0 2525 2 0
ORG 3471 1655 63 1203 690 14 1616 557 22
PER 6427 0 0 2168 0 0 3518 1 0
MISC 805 1 0 179 1 0 296 0 0
Total 19534 1663 63 6593 694 14 7955 561 22
Table 2: Number of entities of each type in each level in train/dev and test set

Table 2 showed the data statistics on training set, development set, and official test set. The number of organization entities (ORG) at level 3 is too small, so we only consider level-1 and level-2 entities in training and evaluation. Level-2 entities are almost of ORG types.

3.2 Evaluation Measures

We used Precision, Recall, F1 score as evaluation measures. Note that, due to the fact that word segmentation may cause boundary conflict between entities and words, we convert words in the data into syllables before we evaluate Precision, Recall, F1 scores.

We consider four entity types: LOCATION, MISCELLANEOUS, ORGANIZATION, and PERSON in evaluation, and use the evaluation script of CoNLL-2013 for evaluation.

3.3 NER models

For evaluation on the development set, we train three NER models as follows on the training data of VLSP 2018 NER task.

  • Level-1 model is trained by using level-1 entity tags.

  • Level-2 model is trained by using level-2 entity tags.

  • Joint model is trained using joint tags which combine level-1 and level-2 tags of each word.

3.4 Results

Model Precision Recall F1
Level-1 Model 91.04 84.41 87.6
Joint Model 90.42 84.72 87.47
Table 3: Evaluation results on dev set of recognizing level-1 entities
Method Precision Recall F1
Level-2 85.81 72.44 78.56
Joint Model 84.36 77.06 80.54
Table 4: Evaluation results on dev set of recognizing level-2 entities

Table 3 and Table 4 shows the evaluation results on development set of recognizing level-1 and level-2 entities, respectively. The level-1 model obtained slightly better F1 score than joint model in recognizing level-1 entities while joint model outperformed level-2 model in recognizing level-2 entities. We also see that the level-2 model got higher precision than joint model but much lower recall than joint model. A plausible explanation for that phenomena is that information of level-1 tags helps to recognize more level-2 entities.

3.5 Result Submissions

We trained models on the data set obtained by combining provided training and development data and used the trained models for recognizing entities on the test set.

In order to produce submitted results, we use methods as follows.

  • Using level-1 and level-2 model for recognizing level-1 and level-2 entities, respectively. We refer this method as Separated method.

  • We use joint model to recognize joint tags for each word of a sentence, then split joint tags into level-1 and level-2 tags. We refer this method as Joint method.

  • We use the joint model for recognizing level-2 entities and level-1 model for recognizing level-1 entities. We refer this method as Hybrid method.

In recognition, there are some cases that predicted level-1 entities contains level-2 entities inside them. In such cases, we omit predicted level-2 entities inside predicted level-1 entities. The reason is that accuracy of level-1 entity recognition on dev set is much higher than the accuracy of level-2 entity recognition.

We submitted six runs at VLSP 2018 NER evaluation campaign as showed in Table 5. We try two preprocessing approaches: with sentence segmentation and without sentence segmentation. The reason why we try those preprocessing approaches is that we would like to know the influence of sequence lengths on the accuracy of our model.

Runs Method Sent Segmentation
Run-1 Hybrid YES
Run-2 Hybrid NO
Run-3 Joint YES
Run-4 Joint NO
Run-5 Separated YES
Run-6 Separated NO
Table 5: Six submitted runs
Run Precision Recall F1
Run-1 76.08 70.68 73.28
Run-2 76.75 70.37 73.42
Run-3 76.32 70.25 73.16
Run-4 76.16 70.98 73.48
Run-5 75.70 70.28 72.89
Run-6 76.26 69.90 72.94
Table 6: Official evaluation results on test set, which consider entities at all levels
Category Precision Recall F1
PER 79.30 79.68 79.49
LOC 79.21 79.69 79.45
ORG 66.83 60.17 63.33
MISC 51.40 25.00 33.64
All 76.16 70.98 73.48
Table 7: Evaluation results of Run-4 on test set for each entity category

Table 6 shows the official evaluation results for our six submitted runs. As indicated in the table, run 4 which uses Joint model obtained the highest F1 score among six runs. Using Joint model or Hybrid model obtained better F1 scores than using Separated methods. We also see that the difference between a system that performs sentence segmentation and a system that does not perform sentence segmentation is very small.

Table 7 shows the Precision, Recall, F1 scores for each entity category of run 4.

Run Precision Recall F1
Run-1 73.82 79.43 76.52
Run-2 73.45 80.04 76.60
Run-3 73.21 79.56 76.26
Run-4 73.95 79.33 76.55
Run-5 73.80 79.46 76.53
Run-6 73.46 80.08 76.63
Table 8: Evaluation results on test set for level-1 entities
Run Precision Recall F1
Run-1 43.24 82.94 56.84
Run-2 43.06 82.59 56.61
Run-3 45.20 81.41 58.12
Run-4 44.48 82.51 57.80
Run-5 39.32 83.08 53.38
Run-6 36.83 84.15 51.24
Table 9: Evaluation results on test set for level-2 entities

Table 8 and Table 9 showed the evaluation results on test set of six submitted runs for level-1 and level-2 entities, respectively.

Run-6 (using level-1 and level-2 models separately without sentence segmentation) obtained the best accuracy of recognizing level-1 entities among submitted runs (%) and Run-3 (Joint model, sentence segmentation) obtained the best accuracy of recognizing level-2 entities ().

Using joint model obtained better F1 scores of recognizing both levels of entities than just those of the model trained on solely on level-1 and level-2 entity tags. That result is consistent with the result on the development set.

4 Conclusions

We haved presented a feature-based model for Vietnamese named-entity recognition and evaluation results at VLSP 2018 NER evaluation campaign. We compared several methods for recognizing nested entities. Experimental results showed that combining tags of entities at all levels for training a sequence labeling model improved the accuracy of nested named-entity recognition. As the future work, we plan to investigate deep learning methods such as BiLSTM-CNN-CRF 

Ma and Hovy (2016) for nested named entity recognition.


  • Brown et al. (1992) Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Comput. Linguist., 18(4):467–479.
  • Huyen and Luong (2016) Nguyen Thi Minh Huyen and Vu Xuan Luong. 2016. Vlsp 2016 shared task: Named entity recognition. In Proceedings of Vietnamese Speech and Language Processing (VLSP).
  • Lafferty et al. (2001) John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289.
  • Le-Hong et al. (2017) Phuong Le-Hong, Quang Nhat Minh Pham, Thai-Hoang Pham, Tuan-Anh Tran, and Dang-Minh Nguyen. 2017. An empirical study of discriminative sequence labeling models for vietnamese text processing. In Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE 2017).
  • Liang (2005) Percy Liang. 2005. Semi-supervised learning for natural language. Ph.D. thesis, Massachusetts Institute of Technology.
  • Ma and Hovy (2016) Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1064–1074.
  • Minh (2018) Pham Quang Nhat Minh. 2018. A feature-rich vietnamese named-entity recognition model. arXiv preprint arXiv:1803.04375.
  • Nguyen et al. (2018) Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, and Mark Johnson. 2018. A Fast and Accurate Vietnamese Word Segmenter. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018).
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In

    Empirical Methods in Natural Language Processing (EMNLP)

    , pages 1532–1543.
  • Sang (2002) Erik F. Tjong Kim Sang. 2002. Introduction to the conll-2002 shared task: Language-independent named entity recognition. CoRR, cs.CL/0209010.
  • Sang and Meulder (2003) Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In CoNLL.
  • Sundheim (1995) Beth Sundheim. 1995. Overview of results of the muc-6 evaluation. In MUC.