Transformers Learn Shortcuts to Automata

10/19/2022
by   Bingbin Liu, et al.
0

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are these shallow and non-recurrent models finding? We investigate this question in the setting of learning automata, discrete dynamical systems naturally suited to recurrent modeling and expressing algorithmic tasks. Our theoretical results completely characterize shortcut solutions, whereby a shallow Transformer with only o(T) layers can exactly replicate the computation of an automaton on an input sequence of length T. By representing automata using the algebraic structure of their underlying transformation semigroups, we obtain O(log T)-depth simulators for all automata and O(1)-depth simulators for all automata whose associated groups are solvable. Empirically, we perform synthetic experiments by training Transformers to simulate a wide variety of automata, and show that shortcut solutions can be learned via standard training. We further investigate the brittleness of these solutions and propose potential mitigations.

READ FULL TEXT

page 8

page 31

page 32

page 34

research
10/22/2019

Depth-Adaptive Transformer

State of the art sequence-to-sequence models perform a fixed number of c...
research
11/04/2015

Turing Computation with Recurrent Artificial Neural Networks

We improve the results by Siegelmann & Sontag (1995) by providing a nove...
research
04/23/2019

A general architecture of oritatami systems for simulating arbitrary finite automata

In this paper, we propose an architecture of oritatami systems with whic...
research
12/21/2022

Fair Must Testing for I/O Automata

The concept of must testing is naturally parametrised with a chosen comp...
research
01/31/2023

Continuous Spatiotemporal Transformers

Modeling spatiotemporal dynamical systems is a fundamental challenge in ...
research
12/13/2022

Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata

A current goal in the graph neural network literature is to enable trans...
research
09/07/2016

A modular architecture for transparent computation in Recurrent Neural Networks

Computation is classically studied in terms of automata, formal language...

Please sign up or login with your details

Forgot password? Click here to reset