Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset

09/01/2021
by   Nyoungwoo Lee, et al.
0

In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out-of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Most machine learning models operate under the assumption that the train...
research
07/07/2021

Predicting with Confidence on Unseen Distributions

Recent work has shown that the performance of machine learning models ca...
research
02/27/2023

Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

In reliable decision-making systems based on machine learning, models ha...
research
10/20/2021

Distributionally Robust Classifiers in Sentiment Analysis

In this paper, we propose sentiment classification models based on BERT ...
research
07/01/2021

On the Practicality of Deterministic Epistemic Uncertainty

A set of novel approaches for estimating epistemic uncertainty in deep n...
research
10/28/2019

Evaluating Lottery Tickets Under Distributional Shifts

The Lottery Ticket Hypothesis suggests large, over-parameterized neural ...
research
07/15/2021

Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

There has been significant research done on developing methods for impro...

Please sign up or login with your details

Forgot password? Click here to reset