Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

07/06/2022
by   Samuel Cognolato, et al.
0

Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to solve a symbolic addition task by discovering effective arithmetic procedures. Although the problem might seem trivial at first glance, generalizing arithmetic knowledge to operations involving a higher number of terms, possibly composed by longer sequences of digits, has proven extremely challenging for neural networks. Here we show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution; most notably, it does so by discovering human-like calculation strategies such as place value alignment.

READ FULL TEXT
research
01/17/2023

Learning to solve arithmetic problems with a virtual abacus

Acquiring mathematical skills is considered a key challenge for modern A...
research
02/25/2021

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

The ability to perform arithmetic tasks is a remarkable trait of human i...
research
01/13/2021

Neural Sequence-to-grid Module for Learning Symbolic Rules

Logical reasoning tasks over symbols, such as learning arithmetic operat...
research
09/23/2018

Neural Arithmetic Expression Calculator

This paper presents a pure neural solver for arithmetic expression calcu...
research
02/03/2023

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

We propose a new class of linear Transformers called FourierLearner-Tran...
research
07/07/2023

Teaching Arithmetic to Small Transformers

Large language models like GPT-4 exhibit emergent capabilities across ge...

Please sign up or login with your details

Forgot password? Click here to reset