Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

11/06/2020
by   Gongbo Tang, et al.
0

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2016

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Nearly all previous work on neural machine translation (NMT) has used qu...
research
03/15/2017

Character-based Neural Embeddings for Tweet Clustering

In this paper we show how the performance of tweet clustering can be imp...
research
10/15/2019

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Neural Machine Translation (NMT) models generally perform translation us...
research
09/13/2020

Combining Word and Character Vector Representation on Neural Machine Translation

This paper describes combinations of word vector representation and char...
research
12/19/2022

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

Language tasks involving character-level manipulations (e.g., spelling c...
research
09/06/2018

Character-Aware Decoder for Neural Machine Translation

Standard neural machine translation (NMT) systems operate primarily on w...
research
08/29/2018

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Translating characters instead of words or word-fragments has the potent...

Please sign up or login with your details

Forgot password? Click here to reset