Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data

12/18/2022
by   Maxime Darrin, et al.
0

As more and more conversational and translation systems are deployed in production, it is essential to implement and to develop effective control mechanisms guaranteeing their proper functioning and security. An essential component to ensure safe system behavior is out-of-distribution (OOD) detection, which aims at detecting whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, it has received much less attention in text generation. This paper addresses the problem of OOD detection for machine translation and dialog generation from an operational perspective. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection ODD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples that are well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF breaks this curse and achieve good results in OOD detection while increasing performance.

READ FULL TEXT

page 8

page 9

page 25

page 26

research
04/28/2022

NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures

Being able to rank the similarity of short text segments is an interesti...
research
08/09/2023

Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance

Dialect classification is used in a variety of applications, such as mac...
research
05/25/2022

R2D2: Robust Data-to-Text with Replacement Detection

Unfaithful text generation is a common problem for text generation syste...
research
09/26/2022

Informative Text Generation from Knowledge Triples

As the development of the encoder-decoder architecture, researchers are ...
research
09/07/2022

SynSciPass: detecting appropriate uses of scientific text generation

Approaches to machine generated text detection tend to focus on binary c...
research
04/11/2022

Toward More Effective Human Evaluation for Machine Translation

Improvements in text generation technologies such as machine translation...

Please sign up or login with your details

Forgot password? Click here to reset