Grokking modular arithmetic

01/06/2023
by   Andrey Gromov, et al.
0

We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as “grokking”. Concretely, we present (i) fully-connected two-layer networks that exhibit grokking on various modular arithmetic tasks under vanilla gradient descent with the MSE loss function in the absence of any regularization; (ii) evidence that grokking modular arithmetic corresponds to learning specific feature maps whose structure is determined by the task; (iii) analytic expressions for the weights – and thus for the feature maps – that solve a large class of modular arithmetic tasks; and (iv) evidence that these feature maps are also found by vanilla gradient descent as well as AdamW, thereby establishing complete interpretability of the representations learnt by the network.

READ FULL TEXT

page 3

page 15

page 16

research
09/04/2016

Automatic Generation of Vectorized Montgomery Algorithm

Modular arithmetic is widely used in crytography and symbolic computatio...
research
01/31/2019

Efficient and secure modular operations using the Adapted Modular Number System

The Adapted Modular Number System (AMNS) is a sytem of representation of...
research
11/26/2019

Emergent Structures and Lifetime Structure Evolution in Artificial Neural Networks

Motivated by the flexibility of biological neural networks whose connect...
research
03/01/2021

Deep Learning with a Classifier System: Initial Results

This article presents the first results from using a learning classifier...
research
10/16/2020

A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

It was conjectured that any neural network of any structure and arbitrar...
research
05/04/2023

Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability

We introduce Brain-Inspired Modular Training (BIMT), a method for making...

Please sign up or login with your details

Forgot password? Click here to reset