A Fast and Accurate Vietnamese Word Segmenter

09/19/2017
by   Dat Quoc Nguyen, et al.
0

We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.

READ FULL TEXT
research
12/12/2014

A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

In this paper, we propose a new approach to construct a system of transf...
research
12/02/2021

Editing a classifier by rewriting its prediction rules

We present a methodology for modifying the behavior of a classifier by d...
research
09/11/2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

The inexorable growth of online shopping and e-commerce demands scalable...
research
11/16/2019

AttaCut: A Fast and Accurate Neural Thai Word Segmenter

Word segmentation is a fundamental pre-processing step for Thai Natural ...
research
10/26/2018

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Sentence simplification aims to reduce the complexity of a sentence whil...
research
10/23/2019

A Maximum Likelihood Approach to Extract Polylines from 2-D Laser Range Scans

Man-made environments such as households, offices, or factory floors are...
research
09/20/2020

Scale-Localized Abstract Reasoning

We consider the abstract relational reasoning task, which is commonly us...

Please sign up or login with your details

Forgot password? Click here to reset