Input Combination Strategies for Multi-Source Transformer Decoder

11/12/2018
by   Jindřich Libovický, et al.
0

In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.

READ FULL TEXT

page 5

page 8

research
05/16/2020

Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning

In sequence-to-sequence learning, the attention mechanism has been a gre...
research
10/28/2021

Understanding How Encoder-Decoder Architectures Attend

Encoder-decoder networks with attention have proven to be a powerful way...
research
06/28/2023

Sequential Attention Source Identification Based on Feature Representation

Snapshot observation based source localization has been widely studied d...
research
04/21/2017

Attention Strategies for Multi-Source Sequence-to-Sequence Learning

Modeling attention in neural multi-source sequence-to-sequence learning ...
research
11/06/2021

Transformer Based Bengali Chatbot Using General Knowledge Dataset

An AI chatbot provides an impressive response after learning from the tr...
research
12/08/2020

The Role of Interpretable Patterns in Deep Learning for Morphology

We examine the role of character patterns in three tasks: morphological ...
research
02/11/2018

Tree-to-tree Neural Networks for Program Translation

Program translation is an important tool to migrate legacy code in one l...

Please sign up or login with your details

Forgot password? Click here to reset