Towards learning to explain with concept bottleneck models: mitigating information leakage

11/07/2022
by   Joshua Lockhart, et al.
0

Concept bottleneck models perform classification by first predicting which of a list of human provided concepts are true about a datapoint. Then a downstream model uses these predicted concept labels to predict the target label. The predicted concepts act as a rationale for the target prediction. Model trust issues emerge in this paradigm when soft concept labels are used: it has previously been observed that extra information about the data distribution leaks into the concept predictions. In this work we show how Monte-Carlo Dropout can be used to attain soft concept predictions that do not contain leaked information.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset