Benchmarking Robustness to Adversarial Image Obfuscations

01/30/2023
by   Florian Stimberg, et al.
0

Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors may obfuscate policy violating images (e.g. overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond ImageNet-C and ImageNet-C̅ by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by ℓ_p-norm bounded adversaries. We evaluate 33 pretrained models on the benchmark and train models with different augmentations, architectures and training methods on subsets of the obfuscations to measure generalization. We hope this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations.

READ FULL TEXT

page 13

page 20

page 21

page 22

page 25

page 26

page 27

page 28

research
09/22/2018

Unrestricted Adversarial Examples

We introduce a two-player contest for evaluating the safety and robustne...
research
04/06/2023

Benchmarking Robustness to Text-Guided Corruptions

This study investigates the robustness of image classifiers to text-guid...
research
06/12/2021

Disrupting Model Training with Adversarial Shortcuts

When data is publicly released for human consumption, it is unclear how ...
research
03/03/2023

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

While adversarial training has been extensively studied for ResNet archi...
research
02/25/2022

ARIA: Adversarially Robust Image Attribution for Content Provenance

Image attribution – matching an image back to a trusted source – is an e...
research
08/13/2020

Semantically Adversarial Learnable Filters

We present the first adversarial framework that crafts perturbations tha...
research
04/27/2023

DataComp: In search of the next generation of multimodal datasets

Large multimodal datasets have been instrumental in recent breakthroughs...

Please sign up or login with your details

Forgot password? Click here to reset