Subword Encoding in Lattice LSTM for Chinese Word Segmentation

10/30/2018
by   Jie Yang, et al.
0

We investigate a lattice LSTM network for Chinese word segmentation (CWS) to utilize words or subwords. It integrates the character sequence features with all subsequences information matched from a lexicon. The matched subsequences serve as information shortcut tunnels which link their start and end characters directly. Gated units are used to control the contribution of multiple input links. Through formula derivation and comparison, we show that the lattice LSTM is an extension of the standard LSTM with the ability to take multiple inputs. Previous lattice LSTM model takes word embeddings as the lexicon input, we prove that subword encoding can give the comparable performance and has the benefit of not relying on any external segmentor. The contribution of lattice LSTM comes from both lexicon and pretrained embeddings information, we find that the lexicon information contributes more than the pretrained embeddings information through controlled experiments. Our experiments show that the lattice structure with subword encoding gives competitive or better results with previous state-of-the-art methods on four segmentation benchmarks. Detailed analyses are conducted to compare the performance of word encoding and subword encoding in lattice LSTM. We also investigate the performance of lattice LSTM structure under different circumstances and when this model works or fails.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2018

Chinese NER Using Lattice LSTM

We investigate a lattice-structured LSTM model for Chinese NER, which en...
research
08/15/2018

Multiple Character Embeddings for Chinese Word Segmentation

Chinese word segmentation (CWS) is often regarded as a character-based s...
research
11/25/2019

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Spelling error detection serves as a crucial preprocessing in many natur...
research
04/24/2020

FLAT: Chinese NER Using Flat-Lattice Transformer

Recently, the character-word lattice structure has been proved to be eff...
research
08/16/2019

Simplify the Usage of Lexicon in Chinese NER

Recently, many works have tried to utilizing word lexicon to augment the...
research
06/14/2016

Neural Word Segmentation Learning for Chinese

Most previous approaches to Chinese word segmentation formalize this pro...

Please sign up or login with your details

Forgot password? Click here to reset