A Call for Prudent Choice of Subword Merge Operations

05/24/2019
by   Shuoyang Ding, et al.
0

Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration of different BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair. Our exploration could provide guidance for selecting proper BPE configurations in the future. Most prominently: we show that for LSTM-based architectures, it is necessary to experiment with a wide range of different BPE operations as there is no typical optimal BPE configuration, whereas for Transformer architectures, smaller BPE size tends to be a typically optimal choice. We urge the community to make prudent choices with subword merge operations, as our experiments indicate that a sub-optimal BPE configuration alone could easily reduce the system performance by 3-4 BLEU points.

READ FULL TEXT

page 3

page 5

research
05/04/2023

What changes when you randomly choose BPE merge operations? Not much

We introduce three simple randomized variants of byte pair encoding (BPE...
research
04/17/2020

Enriching the Transformer with Linguistic and Semantic Factors for Low-Resource Machine Translation

Introducing factors, that is to say, word features such as linguistic in...
research
11/28/2020

Using Multiple Subwords to Improve English-Esperanto Automated Literary Translation Quality

Building Machine Translation (MT) systems for low-resource languages rem...
research
08/03/2019

The TALP-UPC System for the WMT Similar Language Task: Statistical vs Neural Machine Translation

Although the problem of similar language translation has been an area of...
research
06/29/2023

A Formal Perspective on Byte-Pair Encoding

Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data...
research
10/19/2018

Optimizing Segmentation Granularity for Neural Machine Translation

In neural machine translation (NMT), it is has become standard to transl...

Please sign up or login with your details

Forgot password? Click here to reset