Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks

12/20/2019
by   Minh Nguyen, et al.
0

Logographs (Chinese characters) have recursive structures (i.e. hierarchies of sub-units in logographs) that contain phonological and semantic information, as developmental psychology literature suggests that native speakers leverage on the structures to learn how to read. Exploiting these structures could potentially lead to better embeddings that can benefit many downstream tasks. We propose building hierarchical logograph (character) embeddings from logograph recursive structures using treeLSTM, a recursive neural network. Using recursive neural network imposes a prior on the mapping from logographs to embeddings since the network must read in the sub-units in logographs according to the order specified by the recursive structures. Based on human behavior in language learning and reading, we hypothesize that modeling logographs' structures using recursive neural network should be beneficial. To verify this claim, we consider two tasks (1) predicting logographs' Cantonese pronunciation from logographic structures and (2) language modeling. Empirical results show that the proposed hierarchical embeddings outperform baseline approaches. Diagnostic analysis suggests that hierarchical embeddings constructed using treeLSTM is less sensitive to distractors, thus is more robust, especially on complex logographs.

READ FULL TEXT
research
02/23/2019

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese charac...
research
06/10/2021

Modeling Hierarchical Structures with Continuous Recursive Neural Networks

Recursive Neural Networks (RvNNs), which compose sequences according to ...
research
09/22/2018

Medical Knowledge Embedding Based on Recursive Neural Network for Multi-Disease Diagnosis

The representation of knowledge based on first-order logic captures the ...
research
05/01/2018

Nugget Proposal Networks for Chinese Event Detection

Neural network based models commonly regard event detection as a word-wi...
research
01/29/2019

Glyce: Glyph-vectors for Chinese Character Representations

It is intuitive that NLP tasks for logographic languages like Chinese sh...
research
03/01/2016

Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs

Recursive neural networks (RNN) and their recently proposed extension re...
research
11/23/2021

ReGroup: Recursive Neural Networks for Hierarchical Grouping of Vector Graphic Primitives

Selection functionality is as fundamental to vector graphics as it is fo...

Please sign up or login with your details

Forgot password? Click here to reset