1 Introduction
Trustworthiness of an Artificial Intelligence (AI) model, i.e. the stability of the performances under many possible future scenarios, is a fundamental requirement for the real world applications. The topic has became particularly relevant in the last decades, since technology development and data availability led the adoption of models more and more complex, widen the gap between performances on train/test data and reliability of the models.
Model trustworthiness is usually linked to other factors, including the interpretability of the algorithm, the stationary of data, the possible bias in the data, etc. ([lipton2018mythos], [ribeiro2016should]). Especially in the field of interpretability, many work has been done in order to explain and interpret the models developed by AI in a human comprehensible manner. The main reason behind these effort is that the human experience and its capacity for abstraction allow to monitor the process of the model decisions in a sound way, trying to mitigate the risk of datadriven models.
Anyway, an effective interaction between the model and the human is still lacking and mostly of the current machine learning approaches tends to rely too heavily on training/testing data. On the other hand, sources of knowledge like domain knowledge, expert opinions, understanding from related problem etc. could be very important for a better definition of the model.
Here we present a novel framework trying to bridge the gap between datadriven optimization and human highlevel domain knowledge. The approach provides for the inclusion of the human understanding of the relevance/importance of the input features. The basic idea is to extend the empirical loss with a regularization term depending on the constraints defined by the apriori knowledge on the importance of the features. We provide experimental results on the fairness topic.
2 Bibliographic review
There are few existing feature weighting approaches aimed at improving the performances of machine learning models. In [al2011empirical]
the author exploits weak domain knowledge in the form of feature importance to help the learning of ImportanceAided Neural Networks (IANN). The feature importance is based on the absolute weight of the first hidden layer neurons of the network. IANN is successfully applied in
[diersen2011classification]. In [zhang2010ontology] an ontologybased clustering algorithm is introduced along with a feature weights mechanism able to reflect the different features’ importance. [peng2018novel]uses both correlation and mutual information to weight the features for the algorithms SVM, KNN, and Naive Bayes. Instead, in order to accelerate the learning process, in
[iqbal2011using] the algorithm is required to match the correlation between the features and the predictive function with the empirical correlation.Anyway, none of the previous works define a general framework including the knowledge on the importance of the input features in the framework of explainable machine learning.
3 Mathematical setting of feature importance
In this section, we review the existing approaches aimed at assigning an importance score for each feature of a given input example in relation to the task of the model, i.e. the socalled local explainable methodologies. It is worth mentioning that the importance of a feature is one of the most used strategies to gain local explainability from an opaque machine learning model.
Let us consider a predictor function going from the ddim feature space to the 1dim target space :
Such a predictor function is the output of a learner:
able to process a supervised dataset where and with instances. In order to fix the idea, we can think of as a Deep Neural Network providing the predictor function .
Definition 1 (Local feature importance).
The local feature importance is a function mapping a predictor and a single instance to a
dim vector of real values in the range
:The local feature importance is a measure of how much the model relies on each feature for the prediction made by on the particular pattern . Basically, the quantity tells us how much the th feature contributes with respect to the others for a specific prediction. In the limit cases of or the feature can be considered respectively useless or the most important one for the prediction done by the predictor on the pattern .
Example.
Given a linear predictor , the function
is a local feature importance function.
Definition 2 (Local feature importance methods).
Local feature importance methods are methods that given a predictor , with its learner and the dataset, computes a local feature importance function .
The existing methodologies for the computation of feature importance are reviewed in [guidotti2018survey] and [arrieta2020explainable]. Permutation feature importance methods quantify the feature importance through the variation of a loss metric by perturbing the values of a selected feature on a set of instances in the training or validation set. The approach is firstly introduced in [breiman2001random]
for random forest and in
[recknagel1997artificial] for Neural Network. Other methods, such as class model visualization [simonyan2013deep], compute the partial derivative of the score function with respect to the input, and [montavon2018methods] introduce expert distribution for the input giving activation maximization. In the paper [shrikumar2017learning] the author introduces deep liftand computes the discrete gradients with respect to a baseline point, by backpropagating the scoring difference through each unit. Instead, integrated gradients
[sundararajan2016gradients] cumulates the gradients with respect to inputs along the path from a given baseline to the instance.A set of well known methods called Additive Feature Attribution Methods (AFAM) defined in [lundberg2017unified] rely on the redistribution of the predicted value over the input features. They are designed to mimic the behaviour of a predictive function with a surrogate Boolean linear function . This surrogate function takes values in a space of the transformed vector of the input features: :
Keeping the notation introduced above, by defining
we have a local feature importance method.
Among the additive feature attribution methods, the popular LIME (Local Interpretable Modelagnostic Explanations) [ribeiro2016should] builds the linear approximated model with a sampling procedure in the neighborhood of the specific point. By considering proper weights to the linear coefficients of LIME, the author in [lundberg2017unified] demostrated that SHAP (SHapley Additive exPlanation) is the unique solution of additive feature attribution methods granting a set of desirable properties (local accuracy, missingness and consistency). This last method lays the foundation on the Shapley method introduced in [strumbelj2010efficient] and [vstrumbelj2014explaining]
for solving the problem of redistributing a reward (prediction) to a set of player (features) in coalitional game theory framework. Finally,
Layerwise Relevance Propagation (LRP) in [bach2015pixel] backpropagates the prediction along the network, by fixing a redistribution rule based on the weights among the neurons.The set of feature importance methods, along with the data type and the model to which the method is referred are reported in Table 1. For a tabular dataset, the feature importances are usually represented as a rank reported in a histogram. For images or texts, the subset of the input which is mostly in charge of the predictions gives rise to saliency masks; for example they can be parts of the image or a sentences of a text.
Method  Data Type  Model  Reference  AFAM 

SHAP  ANY  AGN  [lundberg2017unified]  v 
LIME  ANY  AGN  [ribeiro2016should]  v 
Shapley value  TAB  AGN  [strumbelj2010efficient] [vstrumbelj2014explaining]  v 
Permutation feature importance  ANY  NN/TE  [breiman2001random] [recknagel1997artificial]   
class model visualization  IMG  NN  [simonyan2013deep]   
activation maximization  IMG  NN  [montavon2018methods]   
LRP  ANY  NN  [bach2015pixel]  v 
Taylor Decomposition  ANY  NN  [bach2015pixel]   
DeepLift  ANY  NN  [shrikumar2017learning]  v 
Integrated Gradients  ANY  NN  [sundararajan2016gradients]   
GAM  ANY  AGN  [lou2012intelligible] [lou2013accurate]   
4 Constraints to feature importance
The overall goal of the present work is to define a framework where the local importance of the model’s features can be constrained to specific intervals. We introduce a novel regularization loss term , related to the not fulfillment of the feature importance’s constraints:
(1) 
where is the usual empirical risk loss and the dim vector of importances. It is worth observing that in Eq. (1) we explicated the dependence of the importance on the structure of the black box model via the weights of the model .
Let us suppose a First Order Logic (FOL) formula with variable containing an apriori statement with inequalities on the features’ importances. For example, we could require that, for every , both the feature and the feature should not be important for the prediction function to properly work:
(2) 
with and .
In order to treat the logic formula with real value functions, each inequality of the FOL formula can be transformed into a new variable through the following transformation:
(3) 
Although Eq. (3) is a quite natural choice for an increasing function from to , other choices are possible. In Figure 1 we represent the variable of Eq. (3) for the case of a generic feature.
Then, we exploit the framework of tnorm fuzzy logic that generalizes Boolean logic to variables assuming values in . We can convert the formula depending on the losses by exploiting a Tnorm in the following:
that is an average over the tnorm of the truth degree when grounded over its domain. Then, a loss term can be defined by exploiting the logic constraints, e.g.
where is the strength of the regularization.
Finally, the partial derivative of the logic part of the loss with respect to the th weight of the
th importance loss function is
and the derivative to being evaluated is: . By resuming, the scheme is the following:

write the FOL formula depending on the feature importance, in turn, depending on the model weights through the chosen feature importance method;

convert the inequality terms into loss terms ;

convert the FOL formula with the tnorm into an overall loss term;

the loss term is optimized in an iterative process by computing the importance at each step of the algorithm.
5 Fairness through feature importance constraints
Fairness is a natural field where the constraints to feature importance can be applied. In the following, we resume the principal fairness measures and we discuss how they can be translated by using our proposed scheme based on Constraints to Feature Importance, denoted hereafter as CTFI.
The Demographic Parity
(DP) fairness metric is satisfied when, given the random variable
representing the binary predictor and a protected Boolean feature we have:DP is a very strong requirement: groups based on a sensitive feature, e.g. black and white, should have the same rate of positive prediction, even if differences are present.
DP can be translated into a constraint, where the importance of the protected feature needs to be lower than a given threshold :
(4) 
The possible wellknown issue of unfairness due to correlated features (see for instance [calders2013unbiased]) can be potentially solved by setting a constraint also for the features that are correlated with the protected one. Obviously, the regularization strength of the th correlated feature should be lower with respect to , for instance given by the product of and the Pearson correlation between the th and th feature, i.e., :
(5) 
The advantage of this formulation is that the constraints are smooth between and , and can be used both with binary and continuous features.
A measure of discrepancy from DP, which will be useful for the experimental part, is the Disparate impact (DI):
(6) 
A possible relaxation of DP is where we grant that the protected attribute can be used to discriminate among groups that are actually different in the ground truth label , i.e., between and , but not within each one. This is called
Equalized odd
(EOD) and is described in paper [hardt2016equality]. We say that a predictor satisfies EOD if and are independent, conditional on :EOD, too, can be easily written in the framework of CTFI as:
with .
A quite natural measure of discrepancy from EOD is the average equality of odds difference (EO):
(7) 
Finally, another measure of fairness discrepancy defined in [kusner2017counterfactual] is counterfactual fairness difference (CF):
(8) 
where the idea is to evaluate the differences of the prediction’s probabilities by changing the protected feature of the patterns from
to .6 Toy example: constraint of the form
As a toy example useful to test the effectiveness of the proposed scheme, we used the German credit risk dataset ( instances), available in [Dua:2019], containing information about bank account holders and a binary target variable denoting the credit risk. The considered features are reported in Table 2.
Feature  Description  Range 

Age  Age of the costumer  numerical [19, 74] 
Job  Job qualification  ordinal [0, 3] 
Amount  Credit Amount (€ ) of the loan  numerical 
Duration  Duration (year) of the loan  numerical [4, 72] 
Gender  Male (1) Vs Female (0)  Boolean 
We exploited a neural network with one hidden layer and neurons. The learning rate of SGD is with epochs. The activation function is ReLU and the loss is given by the binary crossentropy.
After the training phase, the Layerwise Relevance Propagation method has been applied to the instances of the testing set (50% of the overall samples) for computing the feature importances. The black line in Figure 2 reports the average feature importance computed with LRP. We observe that the most relevant feature is the duration of the loan, followed by the amount and the gender.
Let us introduce a constraint to the importance of gender feature that we want to be lessequal than zero (see Eq. (4)), with a regularization . As expected, we observe (green line in Figure 2) that the gender feature has become useless for the model predictions. Basically, the model found another solution, by giving more importance to other features, e.g. the job.
Furthermore, we computed the correlation matrix between the different features and we found out that, for instance, the age feature is correlated with the gender (with a Pearson correlation coefficient of ). So, in the third experiment we constrained also the other features, by using different regularization strength given by:
for the th feature (see Eq. (5)). We note from the results reported in Figure 2 with a red line, that the correlated age feature decreases its importance, coherently with the expectation.
7 Fairness through constraints to feature importance
In this section we report the results of the experimental part related to fairness. We tested the CTFI scheme proposed in the previous section to the Adult income data set, also considered by [kamishima2011fairness]. It contains instances with attributes (see Table 3 for the description) and a binary classification task for people earning more or less than per year. The protected attribute we will examine is the race, categorized as white and nonwhite. In order to better evaluate the fairness metrics with a uniform test set, the dataset has been balanced and the chosen split of training/test is 50%. The model is a Neural Network with one hidden layer and neurons. The learning rate is , the number of epochs is and the batch size is fixed to in order to compute the local feature importance for each analyzed pattern. The activation function is the ReLU function and the loss is given by the binary crossentropy.
Feature  Range 

Age  numerical [19, 74] 
Race  Boolean: white Vs nonwhite 
Sex  Boolean: female Vs male 
Education  ordinal: [1, 5] 
Nativecountry  Boolean: US Vs other 
Maritalstatus  Boolean: single Vs couple 
Relationship  ordinal: [1, 5] 
Employment type  ordinal: [1, 5] 
fnlwgt  continuous 
Capital loss  Boolean: Yes Vs NO 
Capital gain  Boolean: Yes Vs NO 
hoursperweek  continuous 
We used the constraint defined in Eq. (4) with for the race feature; whereas as a fairness metric we consider both the disparate impact (DI) defined in Eq. (6), the average equality of odds difference (EO) in Eq. (7) and counterfactual fairness (CF) reported in Eq. (8). For the accuracy, we calculate the Area under the ROC curve (ROCAUC).
Firstly, we evaluated the different fairness metrics in the testing set, with an increasing level of the regularization strength ( values from to ). In Figure 3 we report the accuracy metric (ROCAUC in the lower plot) and the three measures of fairness: disparate impact (DI), average equality of odds difference (EO), and counterfactual fairness (CF) as a function of the regularization strength.
In Figure 3 we observe that, while the level of the ROCAUC score practically remains the same, both DI, EO, and CF grow as the regularization strength augments, denoting an increased level of all the fairness measures. In particular, the CF measure reaches the value of , meaning that the protected feature no longer affects the predictions. The other two measures, DI and EO, do not reach the maximum level ( and respectively) because of the issue of correlated features. However, when also the correlated features are constrained through Eq. (5), we note that the increase of fairness is more pronounced (right panel of Figure 3).
Then, as a further analysis we compared the fairness/accuracy levels obtained with the CTFI methodology^{1}^{1}1With regularization chosen to be and constraining also the other features, by using regularization strengths given by (see Eq. (5)) to the following benchmark methodologies:

the unawareness method [grgic2016case], avoiding to use the race feature during the training phase;

a preprocessing method based on the undersampling of the samples with protected attribute;

the preprocessing method called reweighing [kamiran2012data] that assigns weights to the samples in the training dataset to reduce bias.
The AIF360 library was used to apply the benchmark methodologies and the fairness metrics. All the models are coded in the Pytorch environment and available at the Github repository
https://github.com/nicolapicchiotti/ctfi.In Figure 4 we report the results of the accuracy measure given by the ROCAUC (xaxis) and fairness metric EO (yaxis) for the different methodologies (unawareness, undersampling, reweighting) and the CFTI. The values are reported in Table 4.
method  ROCAUC  EO 

original  0.809  0.104 
unawareness  0.781  0.080 
undersampling  0.797  0.036 
reweighting  0.792  0.062 
CTFI  0.810  0.063 
We note that, with respect to the original model, the unawareness, the undersampling, and the reweighting methodologies grant a high level of fairness at the expense of accuracy. On the other side, the CTFI methodology provides higher fairness metrics with a similar level of accuracy.
8 Conclusion and future work
In this work we have presented a novel model agnostic framework able to inject the apriori knowledge on the relevance of the input features into a machine learning model. This ”weighted” approach can contribute to bridge the gap between the fully datadriven models and the humanguided ones. The advantage of the proposed method is the flexibility: the logic constraints are fully customizable and do not depend neither on the nature of input features (numerical, categorical etc.) nor on the architecture of the model, nor on the algorithms chosen for the computation of the feature importance, e.g. SHAP, LIME, LRP, etc.
A further application of the proposed framework is to enforce an apriori selective attention of the model on particular features, e.g. . This can be useful for example when the user wants to focus on some relevant words in the text, or a region of an image (see Figure 5).
Furthermore, there could be many cases where the users want to inject prior knowledge in the form of feature importance in the model. For example, from experience, one could know that one feature should be less important than another for the business of the company, e.g. the age, the gender in a particular financial context. Also in the medical field, the a priori knowledge of the input features’ importance can improve model performances where the sample sizes are limited. Another possibility is when we apriori know which feature is less reliable e.g. less stationary with respect to the others.^{2}^{2}2
In linear regression a similar problem is called attenuation bias, where errors in the input features cause the weights going toward zero.
It is worth noting that the constraints can be settled for just a portion of the dataset.As future work we are interested in providing a software solution for the integration of the proposed framework within the popular machine learning software. Another future work is to apply the logic constraints to other contexts, in terms of both dataset, e.g. images, text etc. and models (random forest, SVM etc.). Finally, the usage of other measures based on information entropy can be explored in order to take into account the problem of correlation between features.