Investigating the Limitations of Transformers with Simple Arithmetic Tasks

02/25/2021
by   Rodrigo Nogueira, et al.
0

The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values. We find that how a number is represented in its surface form has a strong influence on the model's accuracy. In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., "32"), and it struggles to learn with character-level representations (e.g., "3 2"). By introducing position tokens (e.g., "3 10e1 2"), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as we use the proper surface representation. This result bolsters evidence that subword tokenizers and positional encodings are components in current transformer designs that might need improvement. Moreover, we show that regardless of the number of parameters and training examples, models cannot learn addition rules that are independent of the length of the numbers seen during training. Code to reproduce our experiments is available at https://github.com/castorini/transformers-arithmetic

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Length Generalization in Arithmetic Transformers

We examine how transformers cope with two challenges: learning basic int...
research
04/21/2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

In recent years, Large Language Models such as GPT-3 showed remarkable c...
research
07/06/2022

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Mathematical reasoning is one of the most impressive achievements of hum...
research
08/24/2022

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

The ability to extrapolate, i.e., to make predictions on sequences that ...
research
04/15/2020

Neural Status Registers

Neural networks excel at approximating functions and finding patterns in...
research
09/06/2023

GPT Can Solve Mathematical Problems Without a Calculator

Previous studies have typically assumed that large language models are u...
research
06/07/2015

Visual Learning of Arithmetic Operations

A simple Neural Network model is presented for end-to-end visual learnin...

Please sign up or login with your details

Forgot password? Click here to reset