Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs

06/27/2023
by   Navindu Leelarathna, et al.
0

Variational Autoencoders and their many variants have displayed impressive ability to perform dimensionality reduction, often achieving state-of-the-art performance. Many current methods however, struggle to learn good representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an inherently challenging setting. We address this challenge by using an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach. Specifically, we present an alternative factorisation of the joint posterior that induces a form of implicit data augmentation that yields greater sample efficiency. Through a series of experiments on eight real-world datasets, we show that our method learns better latent representations in HDLSS settings, which leads to higher accuracy in a downstream classification task. Furthermore, we verify that our approach has a positive effect on disentanglement and achieves a lower estimated Total Correlation on learnt representations. Finally, we show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2021

Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

In this paper, we propose a new method to perform data augmentation in a...
research
08/19/2022

GreenKGC: A Lightweight Knowledge Graph Completion Method

Knowledge graph completion (KGC) aims to discover missing relationships ...
research
09/22/2022

Embedding-Assisted Attentional Deep Learning for Real-World RF Fingerprinting of Bluetooth

A scalable and computationally efficient framework is designed to finger...
research
10/08/2021

SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning

Self-supervised learning has been shown to be very effective in learning...
research
03/06/2020

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

Variational Autoencoders (VAEs) provide a flexible and scalable framewor...
research
03/21/2023

Data Augmentation For Label Enhancement

Label distribution (LD) uses the description degree to describe instance...
research
09/12/2020

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

In the problem of learning disentangled representations, one of the prom...

Please sign up or login with your details

Forgot password? Click here to reset