A Syllable-based Technique for Word Embeddings of Korean Words

08/05/2017
by   Sanghyuk Choi, et al.
0

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in which word representation is composed of trained syllable vectors. Our model successfully produces morphologically meaningful representation of Korean words compared to the original Skip-gram embeddings. The results also show that it is quite robust to the Out-of-Vocabulary problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2019

hauWE: Hausa Words Embedding for Natural Language Processing

Words embedding (distributed word vector representations) have become an...
research
06/08/2017

Context encoders as a simple but powerful extension of word2vec

With a simple architecture and the ability to learn meaningful word embe...
research
07/20/2020

Morphological Skip-Gram: Using morphological knowledge to improve word representation

Natural language processing models have attracted much interest in the d...
research
07/14/2020

Deep learning models for representing out-of-vocabulary words

Communication has become increasingly dynamic with the popularization of...
research
12/19/2022

Multi hash embeddings in spaCy

The distributed representation of symbols is one of the key technologies...
research
03/02/2019

Predicting and interpreting embeddings for out of vocabulary words in downstream tasks

We propose a novel way to handle out of vocabulary (OOV) words in downst...
research
11/12/2015

Multimodal Skip-gram Using Convolutional Pseudowords

This work studies the representational mapping across multimodal data su...

Please sign up or login with your details

Forgot password? Click here to reset