Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

08/25/2019
by   Kang Min Yoo, et al.
0

We propose a simple approach to train better Korean word representations using additional linguistic annotation also known as Hanja. Using its association with the Chinese, we devise a method to transfer representations from the language by initializing Hanja embeddings with Chinese ones. We evaluate the intrinsic quality of representations built upon our approach through word analogy and similarity tests. In addition, we demonstrate their effectiveness on several downstream tasks including a novel Korean news headline generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2016

Correlation-based Intrinsic Evaluation of Word Vector Representations

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector re...
research
09/06/2018

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that...
research
05/21/2018

Aff2Vec: Affect--Enriched Distributional Word Representations

Human communication includes information, opinions, and reactions. React...
research
03/05/2022

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Word and sentence embeddings are useful feature representations in natur...
research
02/05/2017

All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Real-valued word representations have transformed NLP applications, popu...
research
10/05/2020

On the Effects of Knowledge-Augmented Data in Word Embeddings

This paper investigates techniques for knowledge injection into word emb...
research
05/12/2018

Analogical Reasoning on Chinese Morphological and Semantic Relations

Analogical reasoning is effective in capturing linguistic regularities. ...

Please sign up or login with your details

Forgot password? Click here to reset