CASICT Tibetan Word Segmentation System for MLWS2017

10/17/2017
by   Jiawei Hu, et al.
0

We participated in the MLWS 2017 on Tibetan word segmentation task, our system is trained in a unrestricted way, by introducing a baseline system and 76w tibetan segmented sentences of ours. In the system character sequence is processed by the baseline system into word sequence, then a subword unit (BPE algorithm) split rare words into subwords with its corresponding features, after that a neural network classifier is adopted to token each subword into "B,M,E,S" label, in decoding step a simple rule is used to recover a final word sequence. The candidate system for submition is selected by evaluating the F-score in dev set pre-extracted from the 76w sentences. Experiment shows that this method can fix segmentation errors of baseline system and result in a significant performance gain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2018

Simplifying Sentences with Sequence to Sequence Models

We simplify sentences with an attentive neural network sequence to seque...
research
02/22/2020

Extracting and Validating Explanatory Word Archipelagoes using Dual Entropy

The logical connectivity of text is represented by the connectivity of w...
research
09/04/2018

Segmentation-free compositional n-gram embedding

Applying conventional word embedding models to unsegmented languages, wh...
research
10/01/2020

Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT

Words are properly segmented in the Persian writing system; in practice,...
research
09/12/2017

Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence Labelling

This paper presents our segmentation system developed for the MLP 2017 s...
research
03/31/2022

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

The accuracy of prosodic structure prediction is crucial to the naturaln...
research
03/22/2023

W2KPE: Keyphrase Extraction with Word-Word Relation

This paper describes our submission to ICASSP 2023 MUG Challenge Track 4...

Please sign up or login with your details

Forgot password? Click here to reset