Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

05/05/2018
by   Robert Lim, et al.
0

Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks entails tuning hyper-parameters that would yield the best performance. Unfortunately, the number of parameters for machine translation include discrete categories as well as continuous options, which makes for a combinatorial explosive problem. This research explores optimizing hyper-parameters when training deep learning neural networks for machine translation. Specifically, our work investigates training a language model with Marian NMT. Results compare NMT under various hyper-parameter settings across a variety of modern GPU architecture generations in single node and multi-node settings, revealing insights on which hyper-parameters matter most in terms of performance, such as words processed per second, convergence rates, and translation accuracy, and provides insights on how to best achieve high-performing NMT systems.

READ FULL TEXT

page 4

page 7

research
10/06/2020

On the Sparsity of Neural Machine Translation Models

Modern neural machine translation (NMT) models employ a large number of ...
research
03/11/2017

Massive Exploration of Neural Machine Translation Architectures

Neural Machine Translation (NMT) has shown remarkable progress over the ...
research
01/07/2017

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Neural Machine Translation (NMT) is a new approach for Machine Translati...
research
08/11/2023

Optimizing transformer-based machine translation model for single GPU training: a hyperparameter ablation study

In machine translation tasks, the relationship between model complexity ...
research
09/01/2019

Towards Understanding Neural Machine Translation with Word Importance

Although neural machine translation (NMT) has advanced the state-of-the-...
research
11/07/2019

Can Neural Networks Learn Symbolic Rewriting?

This work investigates if the current neural architectures are adequate ...
research
12/03/2020

Distributed Training and Optimization Of Neural Networks

Deep learning models are yielding increasingly better performances thank...

Please sign up or login with your details

Forgot password? Click here to reset