Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

06/04/2021
by   Jannik Kossen, et al.
19

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

READ FULL TEXT

page 10

page 11

page 13

page 14

page 16

page 23

page 29

page 37

research
01/25/2018

Data-Driven Impulse Response Regularization via Deep Learning

We consider the problem of impulse response estimation for stable linear...
research
10/12/2021

Relative Molecule Self-Attention Transformer

Self-supervised learning holds promise to revolutionize molecule propert...
research
06/08/2021

Staircase Attention for Recurrent Processing of Sequences

Attention mechanisms have become a standard tool for sequence modeling t...
research
12/08/2021

A Simple and efficient deep Scanpath Prediction

Visual scanpath is the sequence of fixation points that the human gaze t...
research
12/09/2022

Mitigation of Spatial Nonstationarity with Vision Transformers

Spatial nonstationarity, the location variance of features' statistical ...
research
06/21/2017

NPGLM: A Non-Parametric Method for Temporal Link Prediction

In this paper, we try to solve the problem of temporal link prediction i...
research
07/10/2020

Deep Contextual Clinical Prediction with Reverse Distillation

Healthcare providers are increasingly using learned methods to predict a...

Please sign up or login with your details

Forgot password? Click here to reset