R-Transformer: Recurrent Neural Network Enhanced Transformer

07/12/2019
by   Zhiwei Wang, et al.
0

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in capturing long-term dependencies in a variety of sequence modeling tasks. Despite their success, however, these models lack necessary components to model local structures in sequences and heavily rely on position embeddings that have limited effects and require a considerable amount of design efforts. In this paper, we propose the R-Transformer which enjoys the advantages of both RNNs and the multi-head attention mechanism while avoids their respective drawbacks. The proposed model can effectively capture both local structures and global long-term dependencies in sequences without any use of position embeddings. We evaluate R-Transformer through extensive experiments with data from a wide range of domains and the empirical results show that R-Transformer outperforms the state-of-the-art methods by a large margin in most of the tasks. We have made the code publicly available at <https://github.com/DSE-MSU/R-transformer>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

Multi-Behavior Hypergraph-Enhanced Transformer for Sequential Recommendation

Learning dynamic user preference has become an increasingly important co...
research
02/21/2020

Transformer Hawkes Process

Modern data acquisition routinely produce massive amounts of event seque...
research
03/26/2021

A Practical Survey on Faster and Lighter Transformers

Recurrent neural networks are effective models to process sequences. How...
research
07/05/2019

A Bi-directional Transformer for Musical Chord Recognition

Chord recognition is an important task since chords are highly abstract ...
research
08/22/2022

ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

Prototypical part network (ProtoPNet) has drawn wide attention and boost...
research
06/08/2023

Sequence-to-Sequence Model with Transformer-based Attention Mechanism and Temporal Pooling for Non-Intrusive Load Monitoring

This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based o...
research
03/30/2022

A Fast Transformer-based General-Purpose Lossless Compressor

Deep-learning-based compressor has received interests recently due to mu...

Please sign up or login with your details

Forgot password? Click here to reset