Double Path Networks for Sequence to Sequence Learning

06/13/2018
by   Kaitao Song, et al.
Nanjing University
Microsoft
Peking University
0

Encoder-decoder based Sequence to Sequence learning (S2S) has made remarkable progress in recent years. Different network architectures have been used in the encoder/decoder. Among them, Convolutional Neural Networks (CNN) and Self Attention Networks (SAN) are the prominent ones. The two architectures achieve similar performances but use very different ways to encode and decode context: CNN use convolutional layers to focus on the local connectivity of the sequence, while SAN uses self-attention layers to focus on global semantics. In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion. During the encoding step, we develop a double path architecture to maintain the information coming from different paths with convolutional layers and self-attention layers separately. To effectively use the encoded context, we develop a cross attention module with gating and use it to automatically pick up the information needed during the decoding step. By deeply integrating the two paths with cross attention, both types of information are combined and well exploited. Experiments show that our proposed method can significantly improve the performance of sequence to sequence learning over state-of-the-art systems.

READ FULL TEXT

page 4

page 9

07/22/2018

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

A sequence-to-sequence model is a neural network module for mapping two ...
08/11/2018

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Current state-of-the-art machine translation systems are based on encode...
03/06/2022

CNN self-attention voice activity detector

In this work we present a novel single-channel Voice Activity Detector (...
01/22/2019

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Self-attention has demonstrated great success in sequence-to-sequence ta...
06/07/2019

Assessing incrementality in sequence-to-sequence models

Since their inception, encoder-decoder models have successfully been app...
10/19/2020

SAINT+: Integrating Temporal Features for EdNet Correctness Prediction

We propose SAINT+, a successor of SAINT which is a Transformer based kno...
11/07/2018

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impres...

Please sign up or login with your details

Forgot password? Click here to reset