DeepAI AI Chat
Log In Sign Up

Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

by   Xiang Zhou, et al.

We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference. We show that models can capture human judgement distribution by applying additional distribution estimation methods, namely, Monte Carlo (MC) Dropout, Deep Ensemble, Re-Calibration, and Distribution Distillation. All four of these methods substantially outperform the softmax baseline. We show that MC Dropout is able to achieve decent performance without any distribution annotations while Re-Calibration can further give substantial improvements when extra distribution annotations are provided, suggesting the value of multiple annotations for the example in modeling the distribution of human judgements. Moreover, MC Dropout and Re-Calibration can achieve decent transfer performance on out-of-domain data. Despite these improvements, the best results are still far below estimated human upper-bound, indicating that the task of predicting the distribution of human judgements is still an open, challenging problem with large room for future improvements. We showcase the common errors for MC Dropout and Re-Calibration. Finally, we give guidelines on the usage of these methods with different levels of data availability and encourage future work on modeling the human opinion distribution for language reasoning.


page 1

page 2

page 3

page 4


Ex uno plures: Splitting One Model into an Ensemble of Subnetworks

Monte Carlo (MC) dropout is a simple and efficient ensembling method tha...

Improving MC-Dropout Uncertainty Estimates with Calibration Error-based Optimization

Uncertainty quantification of machine learning and deep learning methods...

Single Shot MC Dropout Approximation

Deep neural networks (DNNs) are known for their high prediction performa...

Hey Human, If your Facial Emotions are Uncertain, You Should Use Bayesian Neural Networks!

Facial emotion recognition is the task to classify human emotions in fac...

Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation

Out-of-Domain (OOD) detection is a key component in a task-oriented dial...

Pushing the bounds of dropout

We show that dropout training is best understood as performing MAP estim...

Adapting Neural Models with Sequential Monte Carlo Dropout

The ability to adapt to changing environments and settings is essential ...