One of the roadblocks to a better understanding of neural networks' inte...
The ability of neural networks to represent more features than neurons m...
The increasing capabilities of artificial intelligence (AI) systems make...
Mechanistic interpretability aims to explain what a neural network has
l...
We study objective robustness failures, a type of out-of-distribution
ro...