Log In Sign Up

Logic Constraints to Feature Importances

In recent years, Artificial Intelligence (AI) algorithms have been proven to outperform traditional statistical methods in terms of predictivity, especially when a large amount of data was available. Nevertheless, the "black box" nature of AI models is often a limit for a reliable application in high-stakes fields like diagnostic techniques, autonomous guide, etc. Recent works have shown that an adequate level of interpretability could enforce the more general concept of model trustworthiness. The basic idea of this paper is to exploit the human prior knowledge of the features' importance for a specific task, in order to coherently aid the phase of the model's fitting. This sort of "weighted" AI is obtained by extending the empirical loss with a regularization term encouraging the importance of the features to follow predetermined constraints. This procedure relies on local methods for the feature importance computation, e.g. LRP, LIME, etc. that are the link between the model weights to be optimized and the user-defined constraints on feature importance. In the fairness area, promising experimental results have been obtained for the Adult dataset. Many other possible applications of this model agnostic theoretical framework are described.


page 1

page 2

page 3

page 4


Explainable Artificial Intelligence (XAI) for Internet of Things: A Survey

Black-box nature of Artificial Intelligence (AI) models do not allow use...

Visualizing the Feature Importance for Black Box Models

In recent years, a large amount of model-agnostic methods to improve the...

Towards a Shapley Value Graph Framework for Medical peer-influence

eXplainable Artificial Intelligence (XAI) is a sub-field of Artificial I...

Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review

Death by suicide is the seventh of the leading death cause worldwide. Th...

On Baselines for Local Feature Attributions

High-performing predictive models, such as neural nets, usually operate ...

1 Introduction

Trustworthiness of an Artificial Intelligence (AI) model, i.e. the stability of the performances under many possible future scenarios, is a fundamental requirement for the real world applications. The topic has became particularly relevant in the last decades, since technology development and data availability led the adoption of models more and more complex, widen the gap between performances on train/test data and reliability of the models.

Model trustworthiness is usually linked to other factors, including the interpretability of the algorithm, the stationary of data, the possible bias in the data, etc. ([lipton2018mythos], [ribeiro2016should]). Especially in the field of interpretability, many work has been done in order to explain and interpret the models developed by AI in a human comprehensible manner. The main reason behind these effort is that the human experience and its capacity for abstraction allow to monitor the process of the model decisions in a sound way, trying to mitigate the risk of data-driven models.

Anyway, an effective interaction between the model and the human is still lacking and mostly of the current machine learning approaches tends to rely too heavily on training/testing data. On the other hand, sources of knowledge like domain knowledge, expert opinions, understanding from related problem etc. could be very important for a better definition of the model.

Here we present a novel framework trying to bridge the gap between data-driven optimization and human high-level domain knowledge. The approach provides for the inclusion of the human understanding of the relevance/importance of the input features. The basic idea is to extend the empirical loss with a regularization term depending on the constraints defined by the apriori knowledge on the importance of the features. We provide experimental results on the fairness topic.

2 Bibliographic review

There are few existing feature weighting approaches aimed at improving the performances of machine learning models. In [al2011empirical]

the author exploits weak domain knowledge in the form of feature importance to help the learning of Importance-Aided Neural Networks (IANN). The feature importance is based on the absolute weight of the first hidden layer neurons of the network. IANN is successfully applied in

[diersen2011classification]. In [zhang2010ontology] an ontology-based clustering algorithm is introduced along with a feature weights mechanism able to reflect the different features’ importance. [peng2018novel]

uses both correlation and mutual information to weight the features for the algorithms SVM, KNN, and Naive Bayes. Instead, in order to accelerate the learning process, in

[iqbal2011using] the algorithm is required to match the correlation between the features and the predictive function with the empirical correlation.

Anyway, none of the previous works define a general framework including the knowledge on the importance of the input features in the framework of explainable machine learning.

3 Mathematical setting of feature importance

In this section, we review the existing approaches aimed at assigning an importance score for each feature of a given input example in relation to the task of the model, i.e. the so-called local explainable methodologies. It is worth mentioning that the importance of a feature is one of the most used strategies to gain local explainability from an opaque machine learning model.

Let us consider a predictor function going from the d-dim feature space to the 1-dim target space :

Such a predictor function is the output of a learner:

able to process a supervised dataset where and with instances. In order to fix the idea, we can think of as a Deep Neural Network providing the predictor function .

Definition 1 (Local feature importance).

The local feature importance is a function mapping a predictor and a single instance to a

-dim vector of real values in the range


The local feature importance is a measure of how much the model relies on each feature for the prediction made by on the particular pattern . Basically, the quantity tells us how much the -th feature contributes with respect to the others for a specific prediction. In the limit cases of or the feature can be considered respectively useless or the most important one for the prediction done by the predictor on the pattern .


Given a linear predictor , the function

is a local feature importance function.

Definition 2 (Local feature importance methods).

Local feature importance methods are methods that given a predictor , with its learner and the dataset, computes a local feature importance function .

The existing methodologies for the computation of feature importance are reviewed in [guidotti2018survey] and [arrieta2020explainable]. Permutation feature importance methods quantify the feature importance through the variation of a loss metric by perturbing the values of a selected feature on a set of instances in the training or validation set. The approach is firstly introduced in [breiman2001random]

for random forest and in

[recknagel1997artificial] for Neural Network. Other methods, such as class model visualization [simonyan2013deep], compute the partial derivative of the score function with respect to the input, and [montavon2018methods] introduce expert distribution for the input giving activation maximization. In the paper [shrikumar2017learning] the author introduces deep lift

and computes the discrete gradients with respect to a baseline point, by backpropagating the scoring difference through each unit. Instead, integrated gradients

[sundararajan2016gradients] cumulates the gradients with respect to inputs along the path from a given baseline to the instance.

A set of well known methods called Additive Feature Attribution Methods (AFAM) defined in [lundberg2017unified] rely on the redistribution of the predicted value over the input features. They are designed to mimic the behaviour of a predictive function with a surrogate Boolean linear function . This surrogate function takes values in a space of the transformed vector of the input features: :

Keeping the notation introduced above, by defining

we have a local feature importance method.

Among the additive feature attribution methods, the popular LIME (Local Interpretable Model-agnostic Explanations) [ribeiro2016should] builds the linear approximated model with a sampling procedure in the neighborhood of the specific point. By considering proper weights to the linear coefficients of LIME, the author in [lundberg2017unified] demostrated that SHAP (SHapley Additive exPlanation) is the unique solution of additive feature attribution methods granting a set of desirable properties (local accuracy, missingness and consistency). This last method lays the foundation on the Shapley method introduced in [strumbelj2010efficient] and [vstrumbelj2014explaining]

for solving the problem of redistributing a reward (prediction) to a set of player (features) in coalitional game theory framework. Finally,

Layer-wise Relevance Propagation (LRP) in [bach2015pixel] backpropagates the prediction along the network, by fixing a redistribution rule based on the weights among the neurons.

The set of feature importance methods, along with the data type and the model to which the method is referred are reported in Table 1. For a tabular dataset, the feature importances are usually represented as a rank reported in a histogram. For images or texts, the subset of the input which is mostly in charge of the predictions gives rise to saliency masks; for example they can be parts of the image or a sentences of a text.

Method Data Type Model Reference AFAM
SHAP ANY AGN [lundberg2017unified] v
LIME ANY AGN [ribeiro2016should] v
Shapley value TAB AGN [strumbelj2010efficient] [vstrumbelj2014explaining] v
Permutation feature importance ANY NN/TE [breiman2001random] [recknagel1997artificial] -
class model visualization IMG NN [simonyan2013deep] -
activation maximization IMG NN [montavon2018methods] -
LRP ANY NN [bach2015pixel] v
Taylor Decomposition ANY NN [bach2015pixel] -
DeepLift ANY NN [shrikumar2017learning] v
Integrated Gradients ANY NN [sundararajan2016gradients] -
GAM ANY AGN [lou2012intelligible] [lou2013accurate] -
Table 1: Review of the most known feature importance methods in Explainability. TAB: Tabular dataset, IMG: Images, AGN: Model agnostic methodology, NN: Neural Network, TE: Tree Ensemble, AFAM: Additive Feature Attribution Methods.

4 Constraints to feature importance

The overall goal of the present work is to define a framework where the local importance of the model’s features can be constrained to specific intervals. We introduce a novel regularization loss term , related to the not fulfillment of the feature importance’s constraints:


where is the usual empirical risk loss and the -dim vector of importances. It is worth observing that in Eq. (1) we explicated the dependence of the importance on the structure of the black box model via the weights of the model .

Let us suppose a First Order Logic (FOL) formula with variable containing an apriori statement with inequalities on the features’ importances. For example, we could require that, for every , both the feature and the feature should not be important for the prediction function to properly work:


with and .

In order to treat the logic formula with real value functions, each inequality of the FOL formula can be transformed into a new variable through the following transformation:


Although Eq. (3) is a quite natural choice for an increasing function from to , other choices are possible. In Figure 1 we represent the variable of Eq. (3) for the case of a generic feature.

Figure 1: Example of loss as for the inequality on the importance of the constrained feature for .

So, thanks to Eq. (3), the aforementioned FOL formula Eq. (2) can be written as:

Then, we exploit the framework of t-norm fuzzy logic that generalizes Boolean logic to variables assuming values in . We can convert the formula depending on the losses by exploiting a T-norm in the following:

that is an average over the t-norm of the truth degree when grounded over its domain. Then, a loss term can be defined by exploiting the logic constraints, e.g.

where is the strength of the regularization.

Finally, the partial derivative of the logic part of the loss with respect to the -th weight of the

-th importance loss function is

and the derivative to being evaluated is: . By resuming, the scheme is the following:

  1. write the FOL formula depending on the feature importance, in turn, depending on the model weights through the chosen feature importance method;

  2. convert the inequality terms into loss terms ;

  3. convert the FOL formula with the t-norm into an overall loss term;

  4. the loss term is optimized in an iterative process by computing the importance at each step of the algorithm.

5 Fairness through feature importance constraints

Fairness is a natural field where the constraints to feature importance can be applied. In the following, we resume the principal fairness measures and we discuss how they can be translated by using our proposed scheme based on Constraints to Feature Importance, denoted hereafter as CTFI.

The Demographic Parity

(DP) fairness metric is satisfied when, given the random variable

representing the binary predictor and a protected Boolean feature we have:

DP is a very strong requirement: groups based on a sensitive feature, e.g. black and white, should have the same rate of positive prediction, even if differences are present.

DP can be translated into a constraint, where the importance of the protected feature needs to be lower than a given threshold :


The possible well-known issue of unfairness due to correlated features (see for instance [calders2013unbiased]) can be potentially solved by setting a constraint also for the features that are correlated with the protected one. Obviously, the regularization strength of the -th correlated feature should be lower with respect to , for instance given by the product of and the Pearson correlation between the -th and -th feature, i.e., :


The advantage of this formulation is that the constraints are smooth between and , and can be used both with binary and continuous features.

A measure of discrepancy from DP, which will be useful for the experimental part, is the Disparate impact (DI):


A possible relaxation of DP is where we grant that the protected attribute can be used to discriminate among groups that are actually different in the ground truth label , i.e., between and , but not within each one. This is called

Equalized odd

(EOD) and is described in paper [hardt2016equality]. We say that a predictor satisfies EOD if and are independent, conditional on :

EOD, too, can be easily written in the framework of CTFI as:

with .

A quite natural measure of discrepancy from EOD is the average equality of odds difference (EO):


Finally, another measure of fairness discrepancy defined in [kusner2017counterfactual] is counterfactual fairness difference (CF):


where the idea is to evaluate the differences of the prediction’s probabilities by changing the protected feature of the patterns from

to .

6 Toy example: constraint of the form

As a toy example useful to test the effectiveness of the proposed scheme, we used the German credit risk dataset ( instances), available in [Dua:2019], containing information about bank account holders and a binary target variable denoting the credit risk. The considered features are reported in Table 2.

Feature Description Range
Age Age of the costumer numerical [19, 74]
Job Job qualification ordinal [0, 3]
Amount Credit Amount (€ ) of the loan numerical
Duration Duration (year) of the loan numerical [4, 72]
Gender Male (1) Vs Female (0) Boolean
Table 2: Features for the German credit risk dataset.

We exploited a neural network with one hidden layer and neurons. The learning rate of SGD is with epochs. The activation function is ReLU and the loss is given by the binary cross-entropy.

After the training phase, the Layer-wise Relevance Propagation method has been applied to the instances of the testing set (50% of the overall samples) for computing the feature importances. The black line in Figure 2 reports the average feature importance computed with LRP. We observe that the most relevant feature is the duration of the loan, followed by the amount and the gender.

Let us introduce a constraint to the importance of gender feature that we want to be less-equal than zero (see Eq. (4)), with a regularization . As expected, we observe (green line in Figure 2) that the gender feature has become useless for the model predictions. Basically, the model found another solution, by giving more importance to other features, e.g. the job.

Figure 2: Feature importance (LRP) for the original model (black line), for the model with the constraints on the gender feature (green line) and that constraining also the correlated features (red line).

Furthermore, we computed the correlation matrix between the different features and we found out that, for instance, the age feature is correlated with the gender (with a Pearson correlation coefficient of ). So, in the third experiment we constrained also the other features, by using different regularization strength given by:

for the -th feature (see Eq. (5)). We note from the results reported in Figure 2 with a red line, that the correlated age feature decreases its importance, coherently with the expectation.

7 Fairness through constraints to feature importance

In this section we report the results of the experimental part related to fairness. We tested the CTFI scheme proposed in the previous section to the Adult income data set, also considered by [kamishima2011fairness]. It contains instances with attributes (see Table 3 for the description) and a binary classification task for people earning more or less than per year. The protected attribute we will examine is the race, categorized as white and non-white. In order to better evaluate the fairness metrics with a uniform test set, the dataset has been balanced and the chosen split of training/test is 50%. The model is a Neural Network with one hidden layer and neurons. The learning rate is , the number of epochs is and the batch size is fixed to in order to compute the local feature importance for each analyzed pattern. The activation function is the ReLU function and the loss is given by the binary cross-entropy.

Feature Range
Age numerical [19, 74]
Race Boolean: white Vs non-white
Sex Boolean: female Vs male
Education ordinal: [1, 5]
Native-country Boolean: US Vs other
Marital-status Boolean: single Vs couple
Relationship ordinal: [1, 5]
Employment type ordinal: [1, 5]
fnlwgt continuous
Capital loss Boolean: Yes Vs NO
Capital gain Boolean: Yes Vs NO
hours-per-week continuous
Table 3: Features for the Adult income dataset.

We used the constraint defined in Eq. (4) with for the race feature; whereas as a fairness metric we consider both the disparate impact (DI) defined in Eq. (6), the average equality of odds difference (EO) in Eq. (7) and counterfactual fairness (CF) reported in Eq. (8). For the accuracy, we calculate the Area under the ROC curve (ROC-AUC).

Firstly, we evaluated the different fairness metrics in the testing set, with an increasing level of the regularization strength ( values from to ). In Figure 3 we report the accuracy metric (ROC-AUC in the lower plot) and the three measures of fairness: disparate impact (DI), average equality of odds difference (EO), and counterfactual fairness (CF) as a function of the regularization strength.

Figure 3: Accuracy (ROC-AUC); and fairness measured as disparate impact (DI), average equality of odds difference (EO) and counterfactual fairness (CF) (for each measure, the higher the values the higher the fairness levels), by constraining only the race feature (left Panel) and by constraining both race and the correlated ones (right Panel), as a function of the regularization strength.

In Figure 3 we observe that, while the level of the ROC-AUC score practically remains the same, both DI, EO, and CF grow as the regularization strength augments, denoting an increased level of all the fairness measures. In particular, the CF measure reaches the value of , meaning that the protected feature no longer affects the predictions. The other two measures, DI and EO, do not reach the maximum level ( and respectively) because of the issue of correlated features. However, when also the correlated features are constrained through Eq. (5), we note that the increase of fairness is more pronounced (right panel of Figure 3).

Then, as a further analysis we compared the fairness/accuracy levels obtained with the CTFI methodology111With regularization chosen to be and constraining also the other features, by using regularization strengths given by (see Eq. (5)) to the following benchmark methodologies:

  1. the unawareness method [grgic2016case], avoiding to use the race feature during the training phase;

  2. a pre-processing method based on the undersampling of the samples with protected attribute;

  3. the pre-processing method called reweighing [kamiran2012data] that assigns weights to the samples in the training dataset to reduce bias.

The AIF-360 library was used to apply the benchmark methodologies and the fairness metrics. All the models are coded in the Pytorch environment and available at the Github repository

In Figure 4 we report the results of the accuracy measure given by the ROC-AUC (x-axis) and fairness metric EO (y-axis) for the different methodologies (unawareness, undersampling, reweighting) and the CFTI. The values are reported in Table 4.

Figure 4: Results of accuracy (ROC-AUC) and fairness (EO) for the different methodologies: original, unawareness, undersampling, reweighting and CTFI.
method ROC-AUC EO
original 0.809 -0.104
unawareness 0.781 -0.080
undersampling 0.797 -0.036
reweighting 0.792 -0.062
CTFI 0.810 -0.063
Table 4: Results of the trade-off between accuracy (ROC-AUC) and fairness (EO) for the different methodologies: original, unawareness, undersampling, reweighting and CTFI.

We note that, with respect to the original model, the unawareness, the undersampling, and the reweighting methodologies grant a high level of fairness at the expense of accuracy. On the other side, the CTFI methodology provides higher fairness metrics with a similar level of accuracy.

8 Conclusion and future work

In this work we have presented a novel model agnostic framework able to inject the apriori knowledge on the relevance of the input features into a machine learning model. This ”weighted” approach can contribute to bridge the gap between the fully data-driven models and the human-guided ones. The advantage of the proposed method is the flexibility: the logic constraints are fully customizable and do not depend neither on the nature of input features (numerical, categorical etc.) nor on the architecture of the model, nor on the algorithms chosen for the computation of the feature importance, e.g. SHAP, LIME, LRP, etc.

A further application of the proposed framework is to enforce an apriori selective attention of the model on particular features, e.g. . This can be useful for example when the user wants to focus on some relevant words in the text, or a region of an image (see Figure 5).

Figure 5: The middle part of an image may be subject to an apriori focus.

Furthermore, there could be many cases where the users want to inject prior knowledge in the form of feature importance in the model. For example, from experience, one could know that one feature should be less important than another for the business of the company, e.g. the age, the gender in a particular financial context. Also in the medical field, the a priori knowledge of the input features’ importance can improve model performances where the sample sizes are limited. Another possibility is when we apriori know which feature is less reliable e.g. less stationary with respect to the others.222

In linear regression a similar problem is called attenuation bias, where errors in the input features cause the weights going toward zero.

It is worth noting that the constraints can be settled for just a portion of the dataset.

As future work we are interested in providing a software solution for the integration of the proposed framework within the popular machine learning software. Another future work is to apply the logic constraints to other contexts, in terms of both dataset, e.g. images, text etc. and models (random forest, SVM etc.). Finally, the usage of other measures based on information entropy can be explored in order to take into account the problem of correlation between features.