On the Computational Power of Decoder-Only Transformer Language Models

05/26/2023

∙

This article presents a theoretical evaluation of the computational universality of decoder-only transformer models. We extend the theoretical literature on transformer models and show that decoder-only transformer architectures (even with only a single layer and single attention head) are Turing complete under reasonable assumptions. From the theoretical analysis, we show sparsity/compressibility of the word embedding to be a necessary condition for Turing completeness to hold.

READ FULL TEXT

On the Computational Power of Decoder-Only Transformer Language Models

Sign in with Google

Consider DeepAI Pro