An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference

by   Tianyu Liu, et al.

The prior work on natural language inference (NLI) debiasing mainly targets at one or few known biases while not necessarily making the models more robust. In this paper, we focus on the model-agnostic debiasing strategies and explore how to (or is it possible to) make the NLI models robust to multiple distinct adversarial attacks while keeping or even strengthening the models' generalization power. We firstly benchmark prevailing neural NLI models including pretrained ones on various adversarial datasets. We then try to combat distinct known biases by modifying a mixture of experts (MoE) ensemble method and show that it's nontrivial to mitigate multiple NLI biases at the same time, and that model-level ensemble method outperforms MoE ensemble method. We also perform data augmentation including text swap, word substitution and paraphrase and prove its efficiency in combating various (though not all) adversarial attacks at the same time. Finally, we investigate several methods to merge heterogeneous training data (1.35M) and perform model ensembling, which are straightforward but effective to strengthen NLI models.


page 3

page 8


Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

Despite neural networks have achieved prominent performance on many natu...

Adversarial Text Normalization

Text-based adversarial attacks are becoming more commonplace and accessi...

simple but effective techniques to reduce biases

There have been several studies recently showing that strong natural lan...

There is Strength in Numbers: Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Natural Language Inference (NLI) datasets contain annotation artefacts r...

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learn...

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

It is well known that natural language models are vulnerable to adversar...

Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

A limited amount of studies investigates the role of model-agnostic adve...