Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text

04/14/2020
by   Shengbin Jia, et al.
0

Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging ambiguous information of word segmentation. Such uncertain information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., candidate position embedding -> position selective attention -> adaptive word convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experiments results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of more than 2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2016

Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning

Named entity recognition, and other information extraction tasks, freque...
research
10/24/2015

Combine CRF and MMSEG to Boost Chinese Word Segmentation in Social Media

In this paper, we propose a joint algorithm for the word segmentation on...
research
02/18/2022

TURNER: The Uncertainty-based Retrieval Framework for Chinese NER

Chinese NER is a difficult undertaking due to the ambiguity of Chinese c...
research
10/29/2020

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Existing approaches for named entity recognition suffer from data sparsi...
research
10/10/2022

Social Media Personal Event Notifier Using NLP and Machine Learning

Social media apps have become very promising and omnipresent in daily li...
research
11/14/2016

F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media

We focus on named entity recognition (NER) for Chinese social media. Wit...
research
01/01/2023

Is word segmentation necessary for Vietnamese sentiment classification?

To the best of our knowledge, this paper made the first attempt to answe...

Please sign up or login with your details

Forgot password? Click here to reset