Computing the Collection of Good Models for Rule Lists

by   Kota Mata, et al.

Since the seminal paper by Breiman in 2001, who pointed out a potential harm of prediction multiplicities from the view of explainable AI, global analysis of a collection of all good models, also known as a `Rashomon set,' has been attracted much attention for the last years. Since finding such a set of good models is a hard computational problem, there have been only a few algorithms for the problem so far, most of which are either approximate or incomplete. To overcome this difficulty, we study efficient enumeration of all good models for a subclass of interpretable models, called rule lists. Based on a state-of-the-art optimal rule list learner, CORELS, proposed by Angelino et al. in 2017, we present an efficient enumeration algorithm CorelsEnum for exactly computing a set of all good models using polynomial space in input size, given a dataset and a error tolerance from an optimal model. By experiments with the COMPAS dataset on recidivism prediction, our algorithm CorelsEnum successfully enumerated all of several tens of thousands of good rule lists of length at most ℓ = 3 in around 1,000 seconds, while a state-of-the-art top-K rule list learner based on Lawler's method combined with CORELS, proposed by Hara and Ishihata in 2018, found only 40 models until the timeout of 6,000 seconds. For global analysis, we conducted experiments for characterizing the Rashomon set, and observed large diversity of models in predictive multiplicity and fairness of models.


Learning Certifiably Optimal Rule Lists for Categorical Data

We present the design and implementation of a custom discrete optimizati...

Falling Rule Lists

Falling rule lists are classification models consisting of an ordered li...

Interpretable multiclass classification by MDL-based rule lists

Interpretable classifiers have recently witnessed an increase in attenti...

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

We aim to produce predictive models that are not only accurate, but are ...

Incomplete List Setting of the Hospitals/Residents Problem with Maximally Satisfying Lower Quotas

To mitigate the imbalance in the number of assignees in the Hospitals/Re...

Space and Time Bounded Multiversion Garbage Collection

We present a general technique for garbage collecting old versions for m...

Discovering outstanding subgroup lists for numeric targets using MDL

The task of subgroup discovery (SD) is to find interpretable description...

Please sign up or login with your details

Forgot password? Click here to reset