Minimization of Gini impurity via connections with the k-means problem

09/28/2018
by   Eduardo Sany Laber, et al.
0

The Gini impurity is one of the measures used to select attribute in Decision Trees/Random Forest construction. In this note we discuss connections between the problem of computing the partition with minimum Weighted Gini impurity and the k-means clustering problem. Based on these connections we show that the computation of the partition with minimum Weighted Gini is a NP-Complete problem and we also discuss how to obtain new algorithms with provable approximation for the Gini Minimization problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2018

Degree-constrained 2-partitions of graphs

A (δ≥ k_1,δ≥ k_2)-partition of a graph G is a vertex-partition (V_1,V_2)...
research
02/24/2020

Variational Wasserstein Barycenters for Geometric Clustering

We propose to compute Wasserstein barycenters (WBs) by solving for Monge...
research
09/02/2019

Guided Random Forest and its application to data approximation

We present a new way of constructing an ensemble classifier, named the G...
research
09/07/2018

The Partition Spanning Forest Problem

Given a set of colored points in the plane, we ask if there exists a cro...
research
07/03/2023

Complexity Dichotomies for the Maximum Weighted Digraph Partition Problem

We introduce and study a new optimization problem on digraphs, termed Ma...
research
07/13/2018

Approximation Algorithms for Clustering via Weighted Impurity Measures

An impurity measures I:R^k →R^+ maps a k-dimensional vector v to a non-...
research
05/19/2020

k-sums: another side of k-means

In this paper, the decades-old clustering method k-means is revisited. T...

Please sign up or login with your details

Forgot password? Click here to reset