Combine CRF and MMSEG to Boost Chinese Word Segmentation in Social Media

10/24/2015
by   Yao Yushi, et al.
0

In this paper, we propose a joint algorithm for the word segmentation on Chinese social media. Previous work mainly focus on word segmentation for plain Chinese text, in order to develop a Chinese social media processing tool, we need to take the main features of social media into account, whose grammatical structure is not rigorous, and the tendency of using colloquial and Internet terms makes the existing Chinese-processing tools inefficient to obtain good performance on social media. In our approach, we combine CRF and MMSEG algorithm and extend features of traditional CRF algorithm to train the model for word segmentation, We use Internet lexicon in order to improve the performance of our model on Chinese social media. Our experimental result on Sina Weibo shows that our approach outperforms the state-of-the-art model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text

Chinese word segmentation is necessary to provide word-level information...
research
03/02/2016

Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning

Named entity recognition, and other information extraction tasks, freque...
research
09/24/2018

An Iterative Refinement Approach for Social Media Headline Prediction

In this study, we propose a novel iterative refinement approach to predi...
research
06/10/2019

Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media

Recognizing named entities in a document is a key task in many NLP appli...
research
10/06/2016

Do They All Look the Same? Deciphering Chinese, Japanese and Koreans by Fine-Grained Deep Learning

We study to what extend Chinese, Japanese and Korean faces can be classi...
research
01/27/2018

Deep Neural Networks In Fully Connected CRF For Image Labeling With Social Network Metadata

We propose a novel method for predicting image labels by fusing image co...
research
12/18/2020

An Improved Approach for Estimating Social POI Boundaries With Textual Attributes on Social Media

It has been insufficiently explored how to perform density-based cluster...

Please sign up or login with your details

Forgot password? Click here to reset