Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

08/10/2017
by   Yuanzhi Ke, et al.
0

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90 80 embedding-based models respectively. The results suggest that the radical embedding-based approach is cost-effective for machine learning on Chinese and Japanese.

READ FULL TEXT

page 8

page 9

research
08/16/2017

Learning Chinese Word Representations From Glyphs Of Characters

In this paper, we propose new methods to learn Chinese word representati...
research
01/29/2019

Glyce: Glyph-vectors for Chinese Character Representations

It is intuitive that NLP tasks for logographic languages like Chinese sh...
research
03/22/2023

Evaluating Transformer Models and Human Behaviors on Chinese Character Naming

Neural network models have been proposed to explain the grapheme-phoneme...
research
07/03/2019

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features

This paper describes a conditional neural network architecture for Manda...
research
12/16/2020

Building domain specific lexicon based on TikTok comment dataset

In the sentiment analysis task, predicting the sentiment tendency of a s...
research
11/09/2020

Text Classification through Glyph-aware Disentangled Character Embedding and Semantic Sub-character Augmentation

We propose a new character-based text classification framework for non-a...
research
04/09/2021

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Chinese character decomposition has been used as a feature to enhance Ma...

Please sign up or login with your details

Forgot password? Click here to reset