Chinese Character Decomposition for Neural MT with Multi-Word Expressions

04/09/2021
by   Lifeng Han, et al.
4

Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2017

Learning Chinese Word Representations From Glyphs Of Characters

In this paper, we propose new methods to learn Chinese word representati...
research
10/15/2021

Why don't people use character-level machine translation?

We present a literature and empirical survey that critically assesses th...
research
03/01/2019

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

Unsupervised neural machine translation (UNMT) requires only monolingual...
research
11/09/2022

HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions

With the fast development of Machine Translation (MT) systems, especiall...
research
08/10/2017

Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

The character vocabulary can be very large in non-alphabetic languages s...
research
05/21/2020

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Multi-word expressions (MWEs) are a hot topic in research in natural lan...
research
11/24/2022

Chinese Character Recognition with Radical-Structured Stroke Trees

The flourishing blossom of deep learning has witnessed the rapid develop...

Please sign up or login with your details

Forgot password? Click here to reset