Iterative Rule Extension for Logic Analysis of Data: an MILP-based heuristic to derive interpretable binary classification from large datasets

10/25/2021
by   Marleen Balvert, et al.
0

Data-driven decision making is rapidly gaining popularity, fueled by the ever-increasing amounts of available data and encouraged by the development of models that can identify beyond linear input-output relationships. Simultaneously the need for interpretable prediction- and classification methods is increasing, as this improves both our trust in these models and the amount of information we can abstract from data. An important aspect of this interpretability is to obtain insight in the sensitivity-specificity trade-off constituted by multiple plausible input-output relationships. These are often shown in a receiver operating characteristic (ROC) curve. These developments combined lead to the need for a method that can abstract complex yet interpretable input-output relationships from large data, i.e. data containing large numbers of samples and sample features. Boolean phrases in disjunctive normal form (DNF) are highly suitable for explaining non-linear input-output relationships in a comprehensible way. Mixed integer linear programming (MILP) can be used to abstract these Boolean phrases from binary data, though its computational complexity prohibits the analysis of large datasets. This work presents IRELAND, an algorithm that allows for abstracting Boolean phrases in DNF from data with up to 10,000 samples and sample characteristics. The results show that for large datasets IRELAND outperforms the current state-of-the-art and can find solutions for datasets where current models run out of memory or need excessive runtimes. Additionally, by construction IRELAND allows for an efficient computation of the sensitivity-specificity trade-off curve, allowing for further understanding of the underlying input-output relationship.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2018

Boolean Decision Rules via Column Generation

This paper considers the learning of Boolean rules in either disjunctive...
research
07/03/2021

Fair Decision Rules for Binary Classification

In recent years, machine learning has begun automating decision making i...
research
11/16/2021

Interpretable and Fair Boolean Rule Sets via Column Generation

This paper considers the learning of Boolean rules in either disjunctive...
research
04/15/2022

The Distributed Information Bottleneck reveals the explanatory structure of complex systems

The fruits of science are relationships made comprehensible, often by wa...
research
06/07/2021

Automation for Interpretable Machine Learning Through a Comparison of Loss Functions to Regularisers

To increase the ubiquity of machine learning it needs to be automated. A...
research
07/09/2019

Identifying the Influential Inputs for Network Output Variance Using Sparse Polynomial Chaos Expansion

Sensitivity analysis (SA) is an important aspect of process automation. ...

Please sign up or login with your details

Forgot password? Click here to reset