Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

09/25/2018
by   Xiang Kong, et al.
0

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2021

WeChat Neural Machine Translation Systems for WMT21

This paper introduces WeChat AI's participation in WMT 2021 shared news ...
research
05/29/2021

Korean-English Machine Translation with Multiple Tokenization Strategy

This work was conducted to find out how tokenization methods affect the ...
research
06/10/2019

Improving Neural Language Modeling via Adversarial Training

Recently, substantial progress has been made in language modeling by usi...
research
06/30/2019

The University of Sydney's Machine Translation System for WMT19

This paper describes the University of Sydney's submission of the WMT 20...
research
04/24/2019

Low-Memory Neural Network Training: A Technical Report

Memory is increasingly often the bottleneck when training neural network...
research
05/18/2023

Less is More! A slim architecture for optimal language translation

The softmax attention mechanism has emerged as a noteworthy development ...
research
09/05/2018

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

A major obstacle in reinforcement learning-based sentence generation is ...

Please sign up or login with your details

Forgot password? Click here to reset