Generating Label Cohesive and Well-Formed Adversarial Claims

09/17/2020
by   Pepa Atanasova, et al.
0

Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural language inference model. We then train a conditional language model to generate semantically valid statements, which include the found universal triggers. We find that the generated attacks maintain the directionality and semantic validity of the claim better than previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

Generating Semantically Valid Adversarial Questions for TableQA

Adversarial attack on question answering systems over tabular data (Tabl...
research
09/25/2021

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

It is well known that natural language models are vulnerable to adversar...
research
03/13/2019

Adversarial attacks against Fact Extraction and VERification

This paper describes a baseline for the second iteration of the Fact Ext...
research
05/01/2020

Universal Adversarial Attacks with Natural Triggers for Text Classification

Recent work has demonstrated the vulnerability of modern text classifier...
research
06/13/2021

Target Model Agnostic Adversarial Attacks with Query Budgets on Language Understanding Models

Despite significant improvements in natural language understanding model...
research
11/20/2020

Detecting Universal Trigger's Adversarial Attack with Honeypot

The Universal Trigger (UniTrigger) is a recently-proposed powerful adver...
research
10/01/2019

TMLab: Generative Enhanced Model (GEM) for adversarial attacks

We present our Generative Enhanced Model (GEM) that we used to create sa...

Please sign up or login with your details

Forgot password? Click here to reset