Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

04/17/2020
by   Yekun Chai, et al.
0

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.

READ FULL TEXT
research
11/15/2019

Sequential Recommendation with Relation-Aware Kernelized Self-Attention

Recent studies identified that sequential Recommendation is improved by ...
research
09/28/2018

SALSA-TEXT : self attentive latent space based adversarial text generation

Inspired by the success of self attention mechanism and Transformer arch...
research
05/16/2020

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

Transformer-based acoustic modeling has achieved great suc-cess for both...
research
04/08/2022

Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition

While the Transformer architecture has become ubiquitous in the machine ...
research
08/30/2021

Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

The highly popular Transformer architecture, based on self-attention, is...
research
11/11/2022

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

This study proposes an improved end-to-end multi-target tracking algorit...

Please sign up or login with your details

Forgot password? Click here to reset