Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

06/14/2020
by   Duc-Vu Nguyen, et al.
0

In this paper, we approach Vietnamese word segmentation as a binary classification by using the Support Vector Machine classifier. We inherit features from prior works such as n-gram of syllables, n-gram of syllable types, and checking conjunction of adjacent syllables in the dictionary. We propose two novel ways to feature extraction, one to reduce the overlap ambiguity and the other to increase the ability to predict unknown words containing suffixes. Different from UETsegmenter and RDRsegmenter, two state-of-the-art Vietnamese word segmentation methods, we do not employ the longest matching algorithm as an initial processing step or any post-processing technique. According to experimental results on benchmark Vietnamese datasets, our proposed method obtained a better F1-score than the prior state-of-the-art methods UETsegmenter, and RDRsegmenter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2015

Breaking Sticks and Ambiguities with Adaptive Skip-gram

Recently proposed Skip-gram model is a powerful method for learning high...
research
01/10/2022

Identification of chicken egg fertility using SVM classifier based on first-order statistical feature extraction

This study aims to identify chicken eggs fertility using the support vec...
research
08/22/2018

A Characterwise Windowed Approach to Hebrew Morphological Segmentation

This paper presents a novel approach to the segmentation of orthographic...
research
09/28/2018

Learning Confidence Sets using Support Vector Machines

The goal of confidence-set learning in the binary classification setting...
research
03/20/2018

UnibucKernel: A kernel-based learning method for complex word identification

In this paper, we present a kernel-based learning approach for the 2018 ...
research
09/12/2020

Relation Detection for Indonesian Language using Deep Neural Network – Support Vector Machine

Relation Detection is a task to determine whether two entities are relat...

Please sign up or login with your details

Forgot password? Click here to reset