A Sub-Character Architecture for Korean Language Processing

07/20/2017
by   Karl Stratos, et al.
0

We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6 increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Subword ELMo

Embedding from Language Models (ELMo) has shown to be effective for impr...
research
12/10/2016

A Character-Word Compositional Neural Language Model for Finnish

Inspired by recent research, we explore ways to model the highly morphol...
research
03/01/2019

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

Unsupervised neural machine translation (UNMT) requires only monolingual...
research
08/10/2016

Hierarchical Character-Word Models for Language Identification

Social media messages' brevity and unconventional spelling pose a challe...
research
12/19/2014

Leveraging Monolingual Data for Crosslingual Compositional Word Representations

In this work, we present a novel neural network based architecture for i...
research
03/12/2019

Character Eyes: Seeing Language through Character-Level Taggers

Character-level models have been used extensively in recent years in NLP...
research
03/15/2022

Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models

Natural language processing models learn word representations based on t...

Please sign up or login with your details

Forgot password? Click here to reset