DeepAI AI Chat
Log In Sign Up

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

by   Gongbo Tang, et al.

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.


page 1

page 2

page 3

page 4


Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Nearly all previous work on neural machine translation (NMT) has used qu...

Character-based Neural Embeddings for Tweet Clustering

In this paper we show how the performance of tweet clustering can be imp...

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Neural Machine Translation (NMT) models generally perform translation us...

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

Language tasks involving character-level manipulations (e.g., spelling c...

Combining Word and Character Vector Representation on Neural Machine Translation

This paper describes combinations of word vector representation and char...

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Translating characters instead of words or word-fragments has the potent...