Trading Complexity for Sparsity in Random Forest Explanations

08/11/2021
by   Gilles Audemard, et al.
0

Random forests have long been considered as powerful model ensembles in machine learning. By training multiple decision trees, whose diversity is fostered through data and feature subsampling, the resulting random forest can lead to more stable and reliable predictions than a single decision tree. This however comes at the cost of decreased interpretability: while decision trees are often easily interpretable, the predictions made by random forests are much more difficult to understand, as they involve a majority vote over hundreds of decision trees. In this paper, we examine different types of reasons that explain "why" an input instance is classified as positive or negative by a Boolean random forest. Notably, as an alternative to sufficient reasons taking the form of prime implicants of the random forest, we introduce majoritary reasons which are prime implicants of a strict majority of decision trees. For these different abductive explanations, the tractability of the generation problem (finding one reason) and the minimization problem (finding one shortest reason) are investigated. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity. Sufficient reasons - for which the identification problem is DP-complete - are slightly larger than majoritary reasons that can be generated using a simple linear- time greedy algorithm, and significantly larger than minimal majoritary reasons that can be approached using an anytime P ARTIAL M AX SAT algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2021

On the Explanatory Power of Decision Trees

Decision trees have long been recognized as models of choice in sensitiv...
research
10/16/2022

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

The need to learn from positive and unlabeled data, or PU learning, aris...
research
02/19/2018

Finding Influential Training Samples for Gradient Boosted Decision Trees

We address the problem of finding influential training samples for a par...
research
09/30/2020

Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

A "non-greedy" variation of the random forest algorithm is presented to ...
research
10/23/2018

On PAC-Bayesian Bounds for Random Forests

Existing guarantees in terms of rigorous upper bounds on the generalizat...
research
06/11/2023

Well-Calibrated Probabilistic Predictive Maintenance using Venn-Abers

When using machine learning for fault detection, a common problem is the...
research
03/20/2022

On the Computation of Necessary and Sufficient Explanations

The complete reason behind a decision is a Boolean formula that characte...

Please sign up or login with your details

Forgot password? Click here to reset