RRWKV: Capturing Long-range Dependencies in RWKV

06/08/2023
by   Leilei Wang, et al.
0

Owing to the impressive dot-product attention, the Transformers have been the dominant architectures in various natural language processing (NLP) tasks. Recently, the Receptance Weighted Key Value (RWKV) architecture follows a non-transformer architecture to eliminate the drawbacks of dot-product attention, where memory and computational complexity exhibits quadratic scaling with sequence length. Although RWKV has exploited a linearly tensor-product attention mechanism and achieved parallelized computations by deploying the time-sequential mode, it fails to capture long-range dependencies because of its limitation on looking back at previous information, compared with full information obtained by direct interactions in the standard transformer. Therefore, the paper devises the Retrospected Receptance Weighted Key Value (RRWKV) architecture via incorporating the retrospecting ability into the RWKV to effectively absorb information, which maintains memory and computational efficiency as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

The NLP Task Effectiveness of Long-Range Transformers

Transformer models cannot easily scale to long sequences due to their O(...
research
12/15/2022

Efficient Long Sequence Modeling via State Space Augmented Transformer

Transformer models have achieved superior performance in various natural...
research
02/25/2019

Star-Transformer

Although the fully-connected attention-based model Transformer has achie...
research
09/12/2017

Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields

Despite successful applications across a broad range of NLP tasks, condi...
research
10/08/2022

Hierarchical Graph Transformer with Adaptive Node Sampling

The Transformer architecture has achieved remarkable success in a number...
research
10/06/2021

Ripple Attention for Visual Perception with Sub-quadratic Complexity

Transformer architectures are now central to modeling in natural languag...
research
03/26/2021

A Practical Survey on Faster and Lighter Transformers

Recurrent neural networks are effective models to process sequences. How...

Please sign up or login with your details

Forgot password? Click here to reset