Raw-to-End Name Entity Recognition in Social Media

08/14/2019
by   Liyuan Liu, et al.
0

Taking word sequences as the input, typical named entity recognition (NER) models neglect errors from pre-processing (e.g., tokenization). However, these errors can influence the model performance greatly, especially for noisy texts like tweets. Here, we introduce Neural-Char-CRF, a raw-to-end framework that is more robust to pre-processing errors. It takes raw character sequences as inputs and makes end-to-end predictions. Word embedding and contextualized representation models are further tailored to capture textual signals for each character instead of each word. Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries. Moreover, we observe our model performance remains unchanged after replacing tokenization with string matching, which demonstrates its potential to be tokenization-free. Extensive experimental results on two public datasets demonstrate the superiority of our proposed method over the state of the art. The implementations and datasets are made available at: https://github.com/LiyuanLucasLiu/Raw-to-End.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2017

End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level

This paper demonstrates end-to-end neural network architectures for Viet...
research
08/27/2019

A Morpho-Syntactically Informed LSTM-CRF Model for Named Entity Recognition

We propose a morphologically informed model for named entity recognition...
research
10/14/2022

TweetNERD – End to End Entity Linking Benchmark for Tweets

Named Entity Recognition and Disambiguation (NERD) systems are foundatio...
research
05/31/2018

Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts

Despite the long history of named-entity recognition (NER) task in the n...
research
05/12/2022

NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition

Recently, Flat-LAttice Transformer (FLAT) has achieved great success in ...
research
05/18/2022

A reproducible experimental survey on biomedical sentence similarity: a string-based method sets the state of the art

This registered report introduces the largest, and for the first time, r...
research
11/25/2019

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Spelling error detection serves as a crucial preprocessing in many natur...

Please sign up or login with your details

Forgot password? Click here to reset