Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples

10/24/2021
by   Yi Xiang Marcus Tan, et al.
0

Few-shot classifiers have been shown to exhibit promising results in use cases where user-provided labels are scarce. These models are able to learn to predict novel classes simply by training on a non-overlapping set of classes. This can be largely attributed to the differences in their mechanisms as compared to conventional deep networks. However, this also offers new opportunities for novel attackers to induce integrity attacks against such models, which are not present in other machine learning setups. In this work, we aim to close this gap by studying a conceptually simple approach to defend few-shot classifiers against adversarial attacks. More specifically, we propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering, to flag out adversarial support sets which destroy the understanding of a victim classifier for a certain class. Our extended evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance, across three different few-shot classifiers and across different attack strengths, beating baselines. Our observed results allow our approach to establishing itself as a strong detection method for support set poisoning attacks. We also show that our approach constitutes a generalizable concept, as it can be paired with other filtering functions. Finally, we provide an analysis of our results when we vary two components found in our detection approach.

READ FULL TEXT

page 1

page 7

page 9

research
12/09/2020

Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity

Few-shot classifiers excel under limited training samples, making it use...
research
10/19/2021

Multi-concept adversarial attacks

As machine learning (ML) techniques are being increasingly used in many ...
research
01/21/2022

Identifying Adversarial Attacks on Text Classifiers

The landscape of adversarial attacks against text classifiers continues ...
research
11/30/2021

Mitigating Adversarial Attacks by Distributing Different Copies to Different Users

Machine learning models are vulnerable to adversarial attacks. In this p...
research
11/30/2021

FROB: Few-shot ROBust Model for Classification and Out-of-Distribution Detection

Nowadays, classification and Out-of-Distribution (OoD) detection in the ...
research
01/12/2021

Improving Classification Accuracy with Graph Filtering

In machine learning, classifiers are typically susceptible to noise in t...
research
09/19/2019

Argumentative Relation Classification as Plausibility Ranking

We formulate argumentative relation classification (support vs. attack) ...

Please sign up or login with your details

Forgot password? Click here to reset