Segmenting Natural Language Sentences via Lexical Unit Analysis

by   Yangming Li, et al.

In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks. Given a natural language sentence, LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one. LUA enjoys a number of appealing properties such as inherently guaranteeing the predicted segmentation to be valid and facilitating globally optimal training and inference. Besides, the practical time complexity of LUA can be reduced to linear time, which is very efficient. We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our models have achieved the state-of-the-art performances on 13 of them. The results also show that the F1 score of identifying long-length segments is notably improved.



There are no comments yet.


page 1

page 2

page 3

page 4


A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

We propose a new Named entity recognition (NER) method to effectively ma...

Neural Sequence Segmentation as Determining the Leftmost Segments

Prior methods to text segmentation are mostly at token level. Despite th...

Chinese Lexical Analysis with Deep Bi-GRU-CRF Network

Lexical analysis is believed to be a crucial step towards natural langua...

Character-Level Feature Extraction with Densely Connected Networks

Generating character-level features is an important step for achieving g...

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Lexicon information and pre-trained models, such as BERT, have been comb...

Machine learning approach of Japanese composition scoring and writing aided system's design

Automatic scoring system is extremely complex for any language. Because ...

Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual Entailment in NTCIR RITE Evaluation Tasks

We computed linguistic information at the lexical, syntactic, and semant...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.