Applying the Transformer to Character-level Transduction

05/20/2020
by   Shijie Wu, et al.
0

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. The model offers other benefits as well: It trains faster and has fewer parameters. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, few shows success on outperforming recurrent models with the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration. Code is available at https://github.com/shijie-wu/neural-transducer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

Exact Hard Monotonic Attention for Character-Level Transduction

Many common character-level, string-to-string transduction tasks, e.g., ...
research
09/05/2018

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Text normalization is an important enabling technology for several NLP t...
research
10/11/2022

Understanding the Failure of Batch Normalization for Transformers in NLP

Batch Normalization (BN) is a core and prevalent technique in accelerati...
research
09/13/2021

Post-OCR Document Correction with large Ensembles of Character Sequence Models

In this paper, we propose a novel method based on character sequence-to-...
research
06/06/2021

Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation

Automatic microblog hashtag generation can help us better and faster und...
research
06/16/2019

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B

In this paper we propose a novel neural approach for automatic decipherm...
research
05/11/2020

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

The attention mechanisms are playing a boosting role in advancements in ...

Please sign up or login with your details

Forgot password? Click here to reset