Features Compression based on Counterfactual Analysis
Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allow changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. A small and interpretable Decision Tree is trained on the discretized dataset that is stable and robust. Numerical results on real-world datasets show the effectiveness of the approach.
READ FULL TEXT