InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

10/14/2022

∙

Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We instead argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations.

READ FULL TEXT

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Sign in with Google

Consider DeepAI Pro