Hack The Box: Fooling Deep Learning Abstraction-Based Monitors

07/10/2021
by   Sara Hajj Ibrahim, et al.
0

Deep learning is a type of machine learning that adapts a deep hierarchy of concepts. Deep learning classifiers link the most basic version of concepts at the input layer to the most abstract version of concepts at the output layer, also known as a class or label. However, once trained over a finite set of classes, some deep learning models do not have the power to say that a given input does not belong to any of the classes and simply cannot be linked. Correctly invalidating the prediction of unrelated classes is a challenging problem that has been tackled in many ways in the literature. Novelty detection gives deep learning the ability to output "do not know" for novel/unseen classes. Still, no attention has been given to the security aspects of novelty detection. In this paper, we consider the case study of abstraction-based novelty detection and show that it is not robust against adversarial samples. Moreover, we show the feasibility of crafting adversarial samples that fool the deep learning classifier and bypass the novelty detection monitoring at the same time. In other words, these monitoring boxes are hackable. We demonstrate that novelty detection itself ends up as an attack surface.

READ FULL TEXT
research
02/28/2018

Novelty Detection with GAN

The ability of a classifier to recognize unknown inputs is important for...
research
02/23/2023

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Abstraction is a desirable capability for deep learning models, which me...
research
05/11/2019

Segregation Network for Multi-Class Novelty Detection

The problem of multiple class novelty detection is gaining increasing im...
research
10/24/2022

Novelty Detection in Time Series via Weak Innovations Representation: A Deep Learning Approach

We consider novelty detection in time series with unknown and nonparamet...
research
08/20/2021

An Adaptable Deep Learning-Based Intrusion Detection System to Zero-Day Attacks

The intrusion detection system (IDS) is an essential element of security...
research
11/17/2020

Measuring the Novelty of Natural Language Text Using the Conjunctive Clauses of a Tsetlin Machine Text Classifier

Most supervised text classification approaches assume a closed world, co...
research
11/07/2022

Interpreting deep learning output for out-of-distribution detection

Commonly used AI networks are very self-confident in their predictions, ...

Please sign up or login with your details

Forgot password? Click here to reset