Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

06/28/2018
by   Andrew Matteson, et al.
0

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2019

Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

We present experiments with part-of-speech tagging for Bulgarian, a Slav...
research
10/21/2020

LemMED: Fast and Effective Neural Morphological Analysis with Short Context Windows

We present LemMED, a character-level encoder-decoder for contextual morp...
research
07/12/2019

Automated Word Stress Detection in Russian

In this study we address the problem of automated word stress detection ...
research
08/22/2018

A Characterwise Windowed Approach to Hebrew Morphological Segmentation

This paper presents a novel approach to the segmentation of orthographic...
research
03/05/2022

Extracting linguistic speech patterns of Japanese fictional characters using subword units

This study extracted and analyzed the linguistic speech patterns that ch...
research
05/11/2021

Restoring Hebrew Diacritics Without a Dictionary

We demonstrate that it is feasible to diacritize Hebrew script without a...
research
12/18/2021

Morpheme Boundary Detection Grammatical Feature Prediction for Gujarati : Dataset Model

Developing Natural Language Processing resources for a low resource lang...

Please sign up or login with your details

Forgot password? Click here to reset