Hierarchical Attention Encoder Decoder

06/01/2023
by   Asier Mujika, et al.
0

Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. Hierarchical autoregressive approaches that compress data have been proposed as a solution, but these methods still generate outputs at the original data frequency, resulting in slow and memory-intensive models. In this paper, we propose a model based on the Hierarchical Recurrent Encoder Decoder (HRED) architecture. This model independently encodes input sub-sequences without global context, processes these sequences using a lower-frequency model, and decodes outputs at the original data frequency. By interpreting the encoder as an implicitly defined embedding matrix and using sampled softmax estimation, we develop a training algorithm that can train the entire model without a high-frequency decoder, which is the most memory and compute-intensive part of hierarchical approaches. In a final, brief phase, we train the decoder to generate data at the original granularity. Our algorithm significantly reduces memory requirements for training autoregressive models and it also improves the total training wall-clock time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

An Exploration of Encoder-Decoder Approaches to Multi-Label Classification for Legal and Biomedical Text

Standard methods for multi-label text classification largely rely on enc...
research
06/03/2019

Masked Non-Autoregressive Image Captioning

Existing captioning models often adopt the encoder-decoder architecture,...
research
06/10/2016

Length bias in Encoder Decoder Models and a Case for Global Conditioning

Encoder-decoder networks are popular for modeling sequences probabilisti...
research
10/07/2022

A Unified Encoder-Decoder Framework with Entity Memory

Entities, as important carriers of real-world knowledge, play a key role...
research
01/08/2021

A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules

A wide variety of deep reinforcement learning (DRL) models have recently...
research
07/26/2018

Variational Memory Encoder-Decoder

Introducing variability while maintaining coherence is a core task in le...

Please sign up or login with your details

Forgot password? Click here to reset