Sentence-Based Model Agnostic NLP Interpretability

by   Yves Rychener, et al.

Today, interpretability of Black-Box Natural Language Processing (NLP) models based on surrogates, like LIME or SHAP, uses word-based sampling to build the explanations. In this paper we explore the use of sentences to tackle NLP interpretability. While this choice may seem straight forward, we show that, when using complex classifiers like BERT, the word-based approach raises issues not only of computational complexity, but also of an out of distribution sampling, eventually leading to non founded explanations. By using sentences, the altered text remains in-distribution and the dimensionality of the problem is reduced for better fidelity to the black-box at comparable computational complexity.



page 1

page 2

page 3

page 4


Interpretable & Explorable Approximations of Black Box Models

We propose Black Box Explanations through Transparent Approximations (BE...

Programs as Black-Box Explanations

Recent work in model-agnostic explanations of black-box machine learning...

Can Explanations Be Useful for Calibrating Black Box Models?

One often wants to take an existing, trained NLP model and use it on dat...

Post-hoc Interpretability for Neural NLP: A Survey

Natural Language Processing (NLP) models have become increasingly more c...

Understanding surrogate explanations: the interplay between complexity, fidelity and coverage

This paper analyses the fundamental ingredients behind surrogate explana...

Geometry matters: Exploring language examples at the decision boundary

A growing body of recent evidence has highlighted the limitations of nat...

Towards Global Explanations for Credit Risk Scoring

In this paper we propose a method to obtain global explanations for trai...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.