The DEformer: An Order-Agnostic Distribution Estimating Transformer

06/13/2021
by   Michael A. Alcorn, et al.
0

Order-agnostic autoregressive distribution estimation (OADE), i.e., autoregressive distribution estimation where the features can occur in an arbitrary order, is a challenging problem in generative machine learning. Prior work on OADE has encoded feature identity (e.g., pixel location) by assigning each feature to a distinct fixed position in an input vector. As a result, architectures built for these inputs must strategically mask either the input or model weights to learn the various conditional distributions necessary for inferring the full joint distribution of the dataset in an order-agnostic way. In this paper, we propose an alternative approach for encoding feature identities, where each feature's identity is included alongside its value in the input. This feature identity encoding strategy allows neural architectures designed for sequential data to be applied to the OADE task without modification. As a proof of concept, we show that a Transformer trained on this input (which we refer to as "the DEformer", i.e., the distribution estimating Transformer) can effectively model binarized-MNIST, approaching the average negative log-likelihood of fixed order autoregressive distribution estimating algorithms while still being entirely order-agnostic.

READ FULL TEXT
research
11/23/2017

An Improved Training Procedure for Neural Autoregressive Data Completion

Neural autoregressive models are explicit density estimators that achiev...
research
05/07/2016

Neural Autoregressive Distribution Estimation

We present Neural Autoregressive Distribution Estimation (NADE) models, ...
research
05/03/2023

SeqAug: Sequential Feature Resampling as a modality agnostic augmentation method

Data augmentation is a prevalent technique for improving performance in ...
research
07/15/2022

Algorithms to estimate Shapley value feature attributions

Feature attributions based on the Shapley value are popular for explaini...
research
07/11/2019

Time2Vec: Learning a Vector Representation of Time

Time is an important feature in many applications involving events that ...
research
02/12/2015

MADE: Masked Autoencoder for Distribution Estimation

There has been a lot of recent interest in designing neural network mode...
research
03/13/2020

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for no...

Please sign up or login with your details

Forgot password? Click here to reset