Neural Word Segmentation with Rich Pretraining

04/28/2017
by   Jie Yang, et al.
0

Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2016

Word Segmentation on Micro-blog Texts with External Lexicon and Heterogeneous Data

This paper describes our system designed for the NLPCC 2016 shared task ...
research
11/13/2017

Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

Character-based sequence labeling framework is flexible and efficient fo...
research
04/24/2017

A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

In this paper, we introduce a trie-structured Bayesian model for unsuper...
research
06/16/2023

Multi-task 3D building understanding with multi-modal pretraining

This paper explores various learning strategies for 3D building type cla...
research
09/04/2018

Segmentation-free compositional n-gram embedding

Applying conventional word embedding models to unsegmented languages, wh...
research
03/18/2015

Text Segmentation based on Semantic Word Embeddings

We explore the use of semantic word embeddings in text segmentation algo...
research
10/03/2019

Character Feature Engineering for Japanese Word Segmentation

On word segmentation problems, machine learning architecture engineering...

Please sign up or login with your details

Forgot password? Click here to reset