Polysemanticity and Capacity in Neural Networks

10/04/2022
by   Adam Scherlis, et al.
0

Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult and so we aim to understand its causes. We propose doing so through the lens of feature capacity, which is the fractional dimension each feature consumes in the embedding space. We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features. Polysemanticity is more prevalent when the inputs have higher kurtosis or sparsity and more prevalent in some architectures than others. Given an optimal allocation of capacity, we go on to study the geometry of the embedding space. We find a block-semi-orthogonal structure, with differing block sizes in different models, highlighting the impact of model architecture on the interpretability of its neurons.

READ FULL TEXT

page 3

page 10

research
11/22/2022

Interpreting Neural Networks through the Polytope Lens

Mechanistic interpretability aims to explain what a neural network has l...
research
09/15/2023

Sparse Autoencoders Find Highly Interpretable Features in Language Models

One of the roadblocks to a better understanding of neural networks' inte...
research
02/12/2019

Capacity allocation analysis of neural networks: A tool for principled architecture design

Designing neural network architectures is a task that lies somewhere bet...
research
05/02/2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Despite rapid adoption and deployment of large language models (LLMs), t...
research
06/16/2022

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and...
research
03/11/2019

Scaling up deep neural networks: a capacity allocation perspective

Following the recent work on capacity allocation, we formulate the conje...
research
04/14/2021

An Interpretability Illusion for BERT

We describe an "interpretability illusion" that arises when analyzing th...

Please sign up or login with your details

Forgot password? Click here to reset