Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

06/28/2018
by   Andrew Matteson, et al.
0

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

11/26/2019

Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

We present experiments with part-of-speech tagging for Bulgarian, a Slav...
10/21/2020

LemMED: Fast and Effective Neural Morphological Analysis with Short Context Windows

We present LemMED, a character-level encoder-decoder for contextual morp...
07/12/2019

Automated Word Stress Detection in Russian

In this study we address the problem of automated word stress detection ...
08/22/2018

A Characterwise Windowed Approach to Hebrew Morphological Segmentation

This paper presents a novel approach to the segmentation of orthographic...
03/05/2022

Extracting linguistic speech patterns of Japanese fictional characters using subword units

This study extracted and analyzed the linguistic speech patterns that ch...
12/18/2021

Morpheme Boundary Detection Grammatical Feature Prediction for Gujarati : Dataset Model

Developing Natural Language Processing resources for a low resource lang...
05/11/2021

Restoring Hebrew Diacritics Without a Dictionary

We demonstrate that it is feasible to diacritize Hebrew script without a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.