Explaining grokking through circuit efficiency

09/05/2023
by   Vikrant Varma, et al.
0

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do not, suggesting there is a critical dataset size at which memorisation and generalisation are equally efficient. We make and confirm four novel predictions about grokking, providing significant evidence in favour of our explanation. Most strikingly, we demonstrate two novel and surprising behaviours: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalisation to partial rather than perfect test accuracy.

READ FULL TEXT
research
09/01/2023

Bichromatic Perfect Matchings with Crossings

We consider bichromatic point sets with n red and n blue points and stud...
research
01/29/2019

Optimising Clifford Circuits with Quantomatic

We present a system of equations between Clifford circuits, all derivabl...
research
12/31/2019

Efficient classical simulation of random shallow 2D quantum circuits

Random quantum circuits are commonly viewed as hard to simulate classica...
research
05/01/2021

A Single-Layer Asymmetric RNN: Potential Low Hardware Complexity Linear Equation Solver

A single layer neural network for the solution of linear equations is pr...
research
08/09/2013

Surprise: Youve got some explaining to do

Why are some events more surprising than others? We propose that events ...
research
06/13/2012

Sensitivity analysis in decision circuits

Decision circuits have been developed to perform efficient evaluation of...

Please sign up or login with your details

Forgot password? Click here to reset