Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries

10/13/2022
by   Chao Ma, et al.
0

In this paper, we show that structures similar to self-attention are natural to learn many sequence-to-sequence problems from the perspective of symmetry. Inspired by language processing applications, we study the orthogonal equivariance of seq2seq functions with knowledge, which are functions taking two inputs – an input sequence and a “knowledge” – and outputting another sequence. The knowledge consists of a set of vectors in the same embedding space as the input sequence, containing the information of the language used to process the input sequence. We show that orthogonal equivariance in the embedding space is natural for seq2seq functions with knowledge, and under such equivariance the function must take the form close to the self-attention. This shows that network structures similar to self-attention are the right structures to represent the target function of many seq2seq problems. The representation can be further refined if a “finite information principle” is considered, or a permutation equivariance holds for the elements of the input sequence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2019

Are Transformers universal approximators of sequence-to-sequence functions?

Despite the widespread adoption of Transformer models for NLP tasks, the...
research
12/04/2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition

Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...
research
04/28/2018

Data-Driven Methods for Solving Algebra Word Problems

We explore contemporary, data-driven techniques for solving math word pr...
research
05/16/2020

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

Transformer-based acoustic modeling has achieved great suc-cess for both...
research
10/19/2021

Inductive Biases and Variable Creation in Self-Attention Mechanisms

Self-attention, an architectural motif designed to model long-range inte...
research
11/28/2019

Self-attention with Functional Time Representation Learning

Sequential modelling with self-attention has achieved cutting edge perfo...
research
01/28/2022

O-ViT: Orthogonal Vision Transformer

Inspired by the tremendous success of the self-attention mechanism in na...

Please sign up or login with your details

Forgot password? Click here to reset