When are Deep Networks really better than Random Forests at small sample sizes?

08/31/2021
by   Haoyin Xu, et al.
123

Random forests (RF) and deep networks (DN) are two of the most popular machine learning methods in the current scientific literature and yield differing levels of performance on different data modalities. We wish to further explore and establish the conditions and domains in which each approach excels, particularly in the context of sample size and feature dimension. To address these issues, we tested the performance of these approaches across tabular, image, and audio settings using varying model parameters and architectures. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found RF to excel at tabular and structured data (image and audio) with small sample sizes, whereas DN performed better on structured data with larger sample sizes. Although we plan to continue updating this technical report in the coming months, we believe the current preliminary results may be of interest to others.

READ FULL TEXT
research
04/07/2023

HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets

Deep learning has achieved impressive performance in many domains, such ...
research
06/01/2022

Hopular: Modern Hopfield Networks for Tabular Data

While Deep Learning excels in structured data as encountered in vision a...
research
07/27/2021

Spatial prediction of apartment rent using regression-based and machine learning-based approaches with a large dataset

Employing a large dataset (at most, the order of n = 10^6), this study a...
research
04/23/2019

Regression-Enhanced Random Forests

Random forest (RF) methodology is one of the most popular machine learni...
research
12/13/2019

Systematic Overestimation of Machine Learning Performance in Neuroimaging Studies of Depression

We currently observe a disconcerting phenomenon in machine learning stud...
research
05/27/2023

Learning Capacity: A Measure of the Effective Dimensionality of a Model

We exploit a formal correspondence between thermodynamics and inference,...
research
09/25/2019

Manifold Forests: Closing the Gap on Neural Networks

Decision forests (DF), in particular random forests and gradient boostin...

Please sign up or login with your details

Forgot password? Click here to reset