DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data
For sake of reliability, it is necessary for models in real-world applications, such as financial applications, to be both powerful and globally interpretable. Simple linear classifiers, e.g., Logistic Regression (LR), are globally interpretable, but not powerful enough to model complex nonlinear interactions among features in tabular data. Meanwhile, Deep Neural Networks (DNNs) have shown great effectiveness for modeling tabular data. However, DNN can only implicitly model feature interactions in the hidden layers, and is not globally interpretable. Accordingly, it will be promising if we can propose a new automatic feature crossing method to find the feature interactions in DNN, and use them as cross features in LR. In this way, we can take advantage of the strong expressive ability of DNN and the good interpretability of LR. Recently, local piece-wise interpretability of DNN has been widely studied. The piece-wise interpretations of a specific feature are usually inconsistent in different samples, which is caused by feature interactions in the hidden layers. Inspired by this, we give a definition of the interpretation inconsistency in DNN, and accordingly propose a novel method called DNN2LR. DNN2LR can generate a compact and accurate candidate set of cross feature fields, and thus promote the efficiency of searching for useful cross feature fields. The whole process of learning feature crossing in DNN2LR can be done via simply training a DNN model and a LR model. Extensive experiments have been conducted on five public datasets, as well as two real-world datasets. The final model, a LR model empowered with cross features, generated by DNN2LR can achieve better performances compared with complex DNN models.
READ FULL TEXT