A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

03/21/2023
by   William Merrill, et al.
0

Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks undergoing grokking on the sparse parity task, and find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that dominates model predictions. On an optimization level, we find that this subnetwork arises when a small subset of neurons undergoes rapid norm growth, whereas the other neurons in the network decay slowly in norm. Thus, we suggest that the grokking phase transition can be understood to emerge from competition of two largely distinct subnetworks: a dense one that dominates before the transition and generalizes poorly, and a sparse one that dominates afterwards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2023

The Algorithmic Phase Transition of Random Graph Alignment Problem

We study the graph alignment problem over two independent Erdős-Rényi gr...
research
09/04/2018

Random Language Model: a path to principled complexity

Many complex generative systems use languages to create structured objec...
research
08/10/2020

A Phase Transition in Minesweeper

We study the average-case complexity of the classic Minesweeper game in ...
research
06/26/2018

Phase transition in the knapsack problem

We examine the phase transition phenomenon for the Knapsack problem from...
research
01/04/2020

Phase Transitions in the Edge/Concurrent Vertex Model

Although it is well-known that some exponential family random graph mode...
research
07/17/2019

Learnability for the Information Bottleneck

The Information Bottleneck (IB) method (tishby2000information) provides ...
research
11/19/2022

Phase transition and higher order analysis of L_q regularization under dependence

We study the problem of estimating a k-sparse signal _0∈ R^p from a set ...

Please sign up or login with your details

Forgot password? Click here to reset