Statistical Foundations of Prior-Data Fitted Networks

05/18/2023
by   Thomas Nagler, et al.
0

Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. Instead of training the network to an observed training set, a fixed model is pre-trained offline on small, simulated training sets from a variety of tasks. The pre-trained model is then used to infer class probabilities in-context on fresh training sets with arbitrary size and distribution. Empirically, PFNs achieve state-of-the-art performance on tasks with similar size to the ones used in pre-training. Surprisingly, their accuracy further improves when passed larger data sets during inference. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. While PFNs are motivated by Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but untrained predictors explains their behavior. A predictor's variance vanishes if its sensitivity to individual training samples does and the bias vanishes only if it is appropriately localized around the test feature. The transformer architecture used in current PFN implementations ensures only the former. These findings shall prove useful for designing architectures with favorable empirical behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Pre-trained machine learning (ML) models have shown great performance fo...
research
04/25/2022

On-demand compute reduction with stochastic wav2vec 2.0

Squeeze and Efficient Wav2vec (SEW) is a recently proposed architecture ...
research
02/01/2019

Do we train on test data? Purging CIFAR of near-duplicates

We find that 3.3 sets, respectively, have duplicates in the training set...
research
01/22/2022

Data-Centric Machine Learning in Quantum Information Science

We propose a series of data-centric heuristics for improving the perform...
research
11/01/2021

Multi network InfoMax: A pre-training method involving graph convolutional networks

Discovering distinct features and their relations from data can help us ...
research
03/13/2023

A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges

While deep learning has achieved great success on various tasks, the tas...
research
05/11/2023

Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*

To advance the neural encoding of Portuguese (PT), and a fortiori the te...

Please sign up or login with your details

Forgot password? Click here to reset