Revisiting Model Stitching to Compare Neural Representations

06/14/2021
by   Yamini Bansal, et al.
0

We revisit and extend model stitching (Lenc Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models A and B, we consider a "stitched model” formed by connecting the bottom-layers of A to the top-layers of B, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations”, by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better” by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in” to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity”, akin to mode-connectivity: typical minima reached by SGD can all be stitched to each other with minimal change in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2023

Model Stitching: Looking For Functional Similarity Between Representations

Model stitching (Lenc Vedaldi 2015) is a compelling methodology to c...
research
04/18/2018

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machin...
research
07/24/2023

On Privileged and Convergent Bases in Neural Network Representations

In this study, we investigate whether the representations learned by neu...
research
10/25/2022

Exploring Mode Connectivity for Pre-trained Language Models

Recent years have witnessed the prevalent application of pre-trained lan...
research
10/28/2022

Reliability of CKA as a Similarity Measure in Deep Learning

Comparing learned neural representations in neural networks is a challen...
research
06/18/2018

Using Mode Connectivity for Loss Landscape Analysis

Mode connectivity is a recently introduced frame- work that empirically ...

Please sign up or login with your details

Forgot password? Click here to reset