HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets

04/07/2023
by   Witold Wydmański, et al.
0

Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challenge, we introduce HyperTab, a hypernetwork-based approach to solving small sample problems on tabular datasets. By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks, where each target model is specialized to process a specific lower-dimensional view of the data. Since each view plays the role of data augmentation, we virtually increase the number of training samples while keeping the number of trainable parameters unchanged, which prevents model overfitting. We evaluated HyperTab on more than 40 tabular datasets of a varying number of samples and domains of origin, and compared its performance with shallow and deep learning models representing the current state-of-the-art. We show that HyperTab consistently outranks other methods on small data (with a statistically significant difference) and scores comparable to them on larger datasets. We make a python package with the code available to download at https://pypi.org/project/hypertab/

READ FULL TEXT

page 3

page 9

research
06/01/2022

Hopular: Modern Hopfield Networks for Tabular Data

While Deep Learning excels in structured data as encountered in vision a...
research
08/31/2021

When are Deep Networks really better than Random Forests at small sample sizes?

Random forests (RF) and deep networks (DN) are two of the most popular m...
research
08/06/2021

Ensemble Augmentation for Deep Neural Networks Using 1-D Time Series Vibration Data

Time-series data are one of the fundamental types of raw data representa...
research
09/16/2023

Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

As a modern ensemble technique, Deep Forest (DF) employs a cascading str...
research
10/18/2019

Automatic Data Augmentation by Learning the Deterministic Policy

Aiming to produce sufficient and diverse training samples, data augmenta...
research
08/08/2020

Unravelling Small Sample Size Problems in the Deep Learning World

The growth and success of deep learning approaches can be attributed to ...
research
10/29/2020

Analyzing the tree-layer structure of Deep Forests

Random forests on the one hand, and neural networks on the other hand, h...

Please sign up or login with your details

Forgot password? Click here to reset