research
∙
09/30/2020
Learning Hard Retrieval Cross Attention for Transformer
The Transformer translation model that based on the multi-head attention...
research
∙
07/13/2020
Transformer with Depth-Wise LSTM
Increasing the depth of models allows neural models to model complicated...
research
∙
06/25/2020
Learning Source Phrase Representations for Neural Machine Translation
The Transformer translation model (Vaswani et al., 2017) based on a mult...
research
∙
05/05/2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
The choice of hyper-parameters affects the performance of neural models....
research
∙
03/21/2020
Analyzing Word Translation of Transformer Layers
The Transformer translation model is popular for its effective paralleli...
research
∙
11/08/2019
Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
The Transformer translation model employs residual connection and layer ...
research
∙
08/09/2019
UdS Submission for the WMT 19 Automatic Post-Editing Task
In this paper, we describe our submission to the English-German APE shar...
research
∙
03/18/2019