Glyph-aware Embedding of Chinese Characters

08/31/2017
by   Falcon Z. Dai, et al.
0

Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character's glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each character's task-relevant semantic and syntactic information in the character-level embedding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Glyce: Glyph-vectors for Chinese Character Representations

It is intuitive that NLP tasks for logographic languages like Chinese sh...
research
04/07/2020

Towards Evaluating the Robustness of Chinese BERT Classifiers

Recent advances in large-scale language representation models such as BE...
research
09/18/2019

Subword ELMo

Embedding from Language Models (ELMo) has shown to be effective for impr...
research
04/17/2017

Learning Character-level Compositionality with Visual Features

Previous work has modeled the compositionality of words by creating char...
research
06/17/2018

Incorporating Chinese Characters of Words for Lexical Sememe Prediction

Sememes are minimum semantic units of concepts in human languages, such ...
research
08/04/2023

Chinese Financial Text Emotion Mining: GCGTS – A Character Relationship-based Approach for Simultaneous Aspect-Opinion Pair Extraction

Aspect-Opinion Pair Extraction (AOPE) from Chinese financial texts is a ...
research
12/23/2017

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Characters have commonly been regarded as the minimal processing unit in...

Please sign up or login with your details

Forgot password? Click here to reset