Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability

05/04/2023
by   Ziming Liu, et al.
0

We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structures in symbolic formulas, interpretable decision boundaries and features for classification, and mathematical structure in algorithmic datasets. The ability to directly see modules with the naked eye can complement current mechanistic interpretability strategies such as probes, interventions or staring at all weights.

READ FULL TEXT
research
03/10/2020

Neural Networks are Surprisingly Modular

The learned weights of a neural network are often considered devoid of s...
research
10/05/2021

NEWRON: A New Generalization of the Artificial Neuron to Enhance the Interpretability of Neural Networks

In this work, we formulate NEWRON: a generalization of the McCulloch-Pit...
research
08/04/2022

Modular Grammatical Evolution for the Generation of Artificial Neural Networks

This paper presents a novel method, called Modular Grammatical Evolution...
research
06/02/2023

Independent Modular Networks

Monolithic neural networks that make use of a single set of weights to l...
research
10/05/2020

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Neural networks (NNs) whose subnetworks implement reusable functions are...
research
01/06/2023

Grokking modular arithmetic

We present a simple neural network that can learn modular arithmetic tas...

Please sign up or login with your details

Forgot password? Click here to reset