Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning

06/18/2020
by   Bryan A. Plummer, et al.
0

We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation that many neural networks are severely overparameterized, resulting in significant waste in computational resources as well as being susceptible to overfitting. SSNs address this by learning where and how to share parameters between layers in a neural network while avoiding degenerate solutions that result in underfitting. Specifically, we automatically construct parameter groups that identify where parameter sharing is most beneficial. Then, we map each group's weights to construct layers with learned combinations of candidates from a shared parameter pool. SSNs can share parameters across layers even when they have different sizes, perform different operations, and/or operate on features from different modalities. We evaluate our approach on a diverse set of tasks, including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1 knowledge distillation, where we obtain state-of-the-art results when combined with traditional distillation methods.

READ FULL TEXT
research
06/06/2022

Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

We propose an algorithm that compresses the critical information of a la...
research
12/02/2019

Implicit Priors for Knowledge Sharing in Bayesian Neural Networks

Bayesian interpretations of neural network have a long history, dating b...
research
02/27/2017

Equivariance Through Parameter-Sharing

We propose to study equivariance in deep neural networks through paramet...
research
02/14/2023

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Deep learning has contributed greatly to many successes in artificial in...
research
12/11/2017

Learning Nested Sparse Structures in Deep Neural Networks

Recently, there have been increasing demands to construct compact deep a...
research
05/26/2023

Set-based Neural Network Encoding

We propose an approach to neural network weight encoding for generalizat...
research
12/06/2022

Leveraging Different Learning Styles for Improved Knowledge Distillation

Learning style refers to a type of training mechanism adopted by an indi...

Please sign up or login with your details

Forgot password? Click here to reset