Revisiting column-generation-based matheuristic for learning classification trees

08/22/2023
by   Krunal Kishor Patel, et al.
0

Decision trees are highly interpretable models for solving classification problems in machine learning (ML). The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete optimization models in the literature address the optimality problem but only work well on relatively small datasets. <cit.> proposed a column-generation-based heuristic approach for learning decision trees. This approach improves scalability and can work with large datasets. In this paper, we describe improvements to this column generation approach. First, we modify the subproblem model to significantly reduce the number of subproblems in multiclass classification instances. Next, we show that the data-dependent constraints in the master problem are implied, and use them as cutting planes. Furthermore, we describe a separation model to generate data points for which the linear programming relaxation solution violates their corresponding constraints. We conclude by presenting computational results that show that these modifications result in better scalability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2018

Constructing classification trees using column generation

This paper explores the use of Column Generation (CG) techniques in cons...
research
06/16/2020

Toward Theory of Applied Learning. What is Machine Learning?

Various existing approaches to formalize machine learning (ML) problem a...
research
03/26/2017

Structured Learning of Tree Potentials in CRF for Image Segmentation

We propose a new approach to image segmentation, which exploits the adva...
research
04/15/2017

Machine Learning and the Future of Realism

The preceding three decades have seen the emergence, rise, and prolifera...
research
08/09/2022

Explainable prediction of Qcodes for NOTAMs using column generation

A NOtice To AirMen (NOTAM) contains important flight route related infor...
research
11/09/2020

Binary Matrix Factorisation via Column Generation

Identifying discrete patterns in binary data is an important dimensional...
research
02/03/2021

A Scalable Two Stage Approach to Computing Optimal Decision Sets

Machine learning (ML) is ubiquitous in modern life. Since it is being de...

Please sign up or login with your details

Forgot password? Click here to reset