Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining

Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performance, putting aside memory concerns by proposing memory-costly solutions. Besides, most data stream mining solutions have been centred around ensembles, which combine the memory costs of their weak learners, usually VFDTs. To reduce the memory cost, keeping the predictive performance, this study proposes the Strict VFDT (SVFDT), a novel algorithm based on the VFDT. The SVFDT algorithm minimises unnecessary tree growth, substantially reducing memory usage and keeping competitive predictive performance. Moreover, since it creates much more shallow trees than VFDT, SVFDT can achieve a shorter processing time. Experiments were carried out comparing the SVFDT with the VFDT in 11 benchmark data stream datasets. This comparison assessed the trade-off between accuracy, memory, and processing time. Statistical analysis showed that the proposed algorithm obtained similar predictive performance and significantly reduced processing time and memory use. Thus, SVFDT is a suitable option for data stream mining with memory and time limitations, recommended as a weak learner in ensemble-based solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2019

Online Local Boosting: improving performance in online decision trees

As more data are produced each day, and faster, data stream mining is gr...
research
08/03/2018

Hoeffding Trees with nmin adaptation

Machine learning software accounts for a significant amount of energy co...
research
05/06/2022

Green Accelerated Hoeffding Tree

State-of-the-art machine learning solutions mainly focus on creating hig...
research
12/27/2016

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous ...
research
12/11/2020

Hard-ODT: Hardware-Friendly Online Decision Tree Learning Algorithm and System

Decision trees are machine learning models commonly used in various appl...
research
04/03/2020

Unpack Local Model Interpretation for GBDT

A gradient boosting decision tree (GBDT), which aggregates a collection ...
research
04/23/2015

Use of Ensembles of Fourier Spectra in Capturing Recurrent Concepts in Data Streams

In this research, we apply ensembles of Fourier encoded spectra to captu...

Please sign up or login with your details

Forgot password? Click here to reset