In this paper, we investigate the driving factors behind concatenation, ...
With the success of large-scale pre-training and multilingual modeling i...
We rerank with scores from pretrained masked language models like BERT t...
We evaluate three simple, normalization-centric changes to improve
Trans...
Neural sequence-to-sequence models, particularly the Transformer, are th...
We explore two solutions to the problem of mistranslating rare words in
...
We present a simple method to improve neural translation of a low-resour...