Discrete Key-Value Bottleneck

07/22/2022
by   Frederik Träuble, et al.
4

Deep neural networks perform well on prediction and classification tasks in the canonical setting where data streams are i.i.d., labeled data is abundant, and class labels are balanced. Challenges emerge with distribution shifts, including non-stationary or imbalanced data streams. One powerful approach that has addressed this challenge involves self-supervised pretraining of large encoders on volumes of unlabeled data, followed by task-specific tuning. Given a new task, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable (key, value) codes. In this setup, we follow the encode; process the representation via a discrete bottleneck; and decode paradigm, where the input is fed to the pretrained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a limited number of these (key, value) pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the proposed model to minimize the effect of the distribution shifts and show that such a discrete bottleneck with (key, value) pairs reduces the complexity of the hypothesis class. We empirically verified the proposed methods' benefits under challenging distribution shift scenarios across various benchmark datasets and show that the proposed model reduces the common vulnerability to non-i.i.d. and non-stationary training distributions compared to various other baselines.

READ FULL TEXT

page 7

page 11

page 24

research
07/06/2023

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays

This work presents the first applications of self-supervised learning ap...
research
09/03/2022

Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions

The paradigm of machine intelligence moves from purely supervised learni...
research
07/13/2022

Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach

Continual learning (CL) over non-stationary data streams remains one of ...
research
03/15/2023

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

While Multiple Instance Learning (MIL) has shown promising results in di...
research
09/02/2020

Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams

As learning from non-stationary streams of data has been proven a challe...
research
12/09/2021

Extending the WILDS Benchmark for Unsupervised Adaptation

Machine learning systems deployed in the wild are often trained on a sou...
research
06/15/2023

ReLoop2: Building Self-Adaptive Recommendation Models via Responsive Error Compensation Loop

Industrial recommender systems face the challenge of operating in non-st...

Please sign up or login with your details

Forgot password? Click here to reset