Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction

11/13/2022
by   Grace Yang, et al.
8

Traditional evaluation metrics for classification in natural language processing such as accuracy and area under the curve fail to differentiate between models with different predictive behaviors despite their similar performance metrics. We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level to provide insights into disparities in their decision-making logic. We assess the sensitivity score on a set of representative words in the test set using two classifiers trained for hospital readmission classification with similar performance statistics. Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores. The results indicate that the language model's sensitivity score aligns better with the professionals than the xgboost classifier on tf-idf embeddings, which suggests that xgboost uses some spurious features. Overall, this metric offers a novel perspective on assessing models' robustness by quantifying their discrepancy with professional opinions. Our code is available on GitHub (https://github.com/nyuolab/Model_Sensitivity).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2022

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

Natural language explanation (NLE) models aim at explaining the decision...
research
05/28/2023

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion

This paper introduces a novel approach to evaluating deep learning model...
research
11/04/2020

Learning and Evaluating Representations for Deep One-class Classification

We present a two-stage framework for deep one-class classification. We f...
research
10/14/2021

Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features

Recent work finds modern natural language processing (NLP) models relyin...
research
10/18/2021

BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation

Natural language processing (NLP) systems are increasingly trained to ge...
research
09/20/2021

Language Identification with a Reciprocal Rank Classifier

Language identification is a critical component of language processing p...
research
10/21/2022

Extending F_1 metric, probabilistic approach

This article explores the extension of well-known F_1 score used for ass...

Please sign up or login with your details

Forgot password? Click here to reset