On the Computational Power of Decoder-Only Transformer Language Models

05/26/2023
by   Jesse Roberts, et al.
0

This article presents a theoretical evaluation of the computational universality of decoder-only transformer models. We extend the theoretical literature on transformer models and show that decoder-only transformer architectures (even with only a single layer and single attention head) are Turing complete under reasonable assumptions. From the theoretical analysis, we show sparsity/compressibility of the word embedding to be a necessary condition for Turing completeness to hold.

READ FULL TEXT

page 2

page 4

research
01/10/2019

On the Turing Completeness of Modern Neural Network Architectures

Alternatives to recurrent neural networks, in particular, architectures ...
research
09/19/2023

A Family of Pretrained Transformer Language Models for Russian

Nowadays, Transformer language models (LMs) represent a fundamental comp...
research
08/12/2015

Syntax Evolution: Problems and Recursion

Why did only we humans evolve Turing completeness? Turing completeness i...
research
04/04/2019

Visualizing Attention in Transformer-Based Language models

We present an open-source tool for visualizing multi-head self-attention...
research
10/31/2019

Parameter Sharing Decoder Pair for Auto Composing

Auto Composing is an active and appealing research area in the past few ...
research
06/16/2020

On the Computational Power of Transformers and Its Implications in Sequence Modeling

Transformers are being used extensively across several sequence modeling...
research
01/20/2021

PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer

Most research on pseudo relevance feedback (PRF) has been done in vector...

Please sign up or login with your details

Forgot password? Click here to reset