Patching Leaks in the Charformer for Efficient Character-Level Generation

05/27/2022
by   Lukas Edman, et al.
0

Character-based representations have important advantages over subword-based ones for morphologically rich languages. They come with increased robustness to noisy input and do not need a separate tokenization step. However, they also have a crucial disadvantage: they notably increase the length of text sequences. The GBST method from Charformer groups (aka downsamples) characters to solve this, but allows information to leak when applied to a Transformer decoder. We solve this information leak issue, thereby enabling character grouping in the decoder. We show that Charformer downsampling has no apparent benefits in NMT over previous downsampling methods in terms of translation quality, however it can be trained roughly 30 English–Turkish translation indicate the potential of character-level models for morphologically-rich languages.

READ FULL TEXT
research
08/08/2023

Character-level NMT and language similarity

We explore the effectiveness of character-level neural machine translati...
research
04/30/2020

Character-Level Translation with Self-attention

We explore the suitability of self-attention models for character-level ...
research
09/07/2019

Neural Machine Translation with Byte-Level Subwords

Almost all existing machine translation models are built on top of chara...
research
04/29/2020

Towards Character-Level Transformer NMT by Finetuning Subword Systems

Applying the Transformer architecture on the character level usually req...
research
11/12/2019

Character-based NMT with Transformer

Character-based translation has several appealing advantages, but its pe...
research
06/04/2023

Does Character-level Information Always Improve DRS-based Semantic Parsing?

Even in the era of massive language models, it has been suggested that c...
research
11/17/2021

Character Transformations for Non-Autoregressive GEC Tagging

We propose a character-based nonautoregressive GEC approach, with automa...

Please sign up or login with your details

Forgot password? Click here to reset