Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input

05/30/2023
by   Shokichi Takakura, et al.
0

Despite the great success of Transformer networks in various applications such as natural language processing and computer vision, their theoretical aspects are not well understood. In this paper, we study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Although inputs and outputs are both infinite dimensional, we show that when the target function has anisotropic smoothness, Transformers can avoid the curse of dimensionality due to their feature extraction ability and parameter sharing property. In addition, we show that even if the smoothness changes depending on each input, Transformers can estimate the importance of features for each input and extract important features dynamically. Then, we proved that Transformers achieve similar convergence rate as in the case of the fixed smoothness. Our theoretical results support the practical success of Transformers for high dimensional data.

READ FULL TEXT
research
07/05/2023

Sumformer: Universal Approximation for Efficient Transformers

Natural language processing (NLP) made an impressive jump with the intro...
research
12/20/2019

Are Transformers universal approximators of sequence-to-sequence functions?

Despite the widespread adoption of Transformer models for NLP tasks, the...
research
06/02/2021

Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines

Despite their ubiquity in core AI fields like natural language processin...
research
05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...
research
09/23/2020

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

Deep learning has achieved notable success in various fields, including ...
research
01/01/2021

Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

The advent of the Transformer can arguably be described as a driving for...
research
05/30/2023

Universality and Limitations of Prompt Tuning

Despite the demonstrated empirical efficacy of prompt tuning to adapt a ...

Please sign up or login with your details

Forgot password? Click here to reset