A technical note on bilinear layers for interpretability

05/05/2023
by   Lee Sharkey, et al.
0

The ability of neural networks to represent more features than neurons makes interpreting them challenging. This phenomenon, known as superposition, has spurred efforts to find architectures that are more interpretable than standard multilayer perceptrons (MLPs) with elementwise activation functions. In this note, I examine bilinear layers, which are a type of MLP layer that are mathematically much easier to analyze while simultaneously performing better than standard MLPs. Although they are nonlinear functions of their input, I demonstrate that bilinear layers can be expressed using only linear operations and third order tensors. We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits, which was previously limited to attention-only transformers. These results suggest that bilinear layers are easier to analyze mathematically than current architectures and thus may lend themselves to deeper safety insights by allowing us to talk more formally about circuits in neural networks. Additionally, bilinear layers may offer an alternative path for mechanistic interpretability through understanding the mechanisms of feature construction instead of enumerating a (potentially exponentially) large number of features in large models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2021

Projections onto hyperbolas or bilinear constraint sets in Hilbert spaces

Sets of bilinear constraints are important in various machine learning m...
research
12/02/2021

A framework for fitting quadratic-bilinear systems with applications to models of electrical circuits

In this contribution, we propose a data-driven procedure to fit quadrati...
research
03/19/2020

On Bilinear Time Domain Identification

The Loewner framework (LF) in combination with Volterra series (VS) offe...
research
10/20/2020

Advantages of Bilinear Koopman Realizations for the Modeling and Control of Systems with Unknown Dynamics

Nonlinear dynamical systems can be made easier to control by lifting the...
research
10/06/2022

Transformers Can Be Expressed In First-Order Logic with Majority

Characterizing the implicit structure of the computation within neural n...
research
10/23/2019

A Unifying Framework of Bilinear LSTMs

This paper presents a novel unifying framework of bilinear LSTMs that ca...
research
12/20/2017

An Order Preserving Bilinear Model for Person Detection in Multi-Modal Data

We propose a new order preserving bilinear framework that exploits low-r...

Please sign up or login with your details

Forgot password? Click here to reset