Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples

06/21/2021
by   Amir Mahdi Sadeghzadeh, et al.
0

Several recent studies have shown that Deep Neural Network (DNN)-based classifiers are vulnerable against model extraction attacks. In model extraction attacks, an adversary exploits the target classifier to create a surrogate classifier imitating the target classifier with respect to some criteria. In this paper, we investigate the hardness degree of samples and demonstrate that the hardness degree histogram of model extraction attacks samples is distinguishable from the hardness degree histogram of normal samples. Normal samples come from the target classifier's training data distribution. As the training process of DNN-based classifiers is done in several epochs, we can consider this process as a sequence of subclassifiers so that each subclassifier is created at the end of an epoch. We use the sequence of subclassifiers to calculate the hardness degree of samples. We investigate the relation between hardness degree of samples and the trust in the classifier outputs. We propose Hardness-Oriented Detection Approach (HODA) to detect the sample sequences of model extraction attacks. The results demonstrate that HODA can detect the sample sequences of model extraction attacks with a high success rate by only watching 100 attack samples. We also investigate the hardness degree of adversarial examples and indicate that the hardness degree histogram of adversarial examples is distinct from the hardness degree histogram of normal samples.

READ FULL TEXT
research
02/13/2022

Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial Examples to Triggered Samples

Deep neural networks (DNNs) are known to be vulnerable to backdoor attac...
research
09/03/2021

A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples

In this work, we show how to jointly exploit adversarial perturbation an...
research
06/17/2021

Localized Uncertainty Attacks

The susceptibility of deep learning models to adversarial perturbations ...
research
05/07/2018

PRADA: Protecting against DNN Model Stealing Attacks

As machine learning (ML) applications become increasingly prevalent, pro...
research
10/14/2022

Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

Evaluation of models on benchmarks is unreliable without knowing the deg...
research
10/13/2022

COLLIDER: A Robust Training Framework for Backdoor Data

Deep neural network (DNN) classifiers are vulnerable to backdoor attacks...
research
06/20/2023

FDINet: Protecting against DNN Model Extraction via Feature Distortion Index

Machine Learning as a Service (MLaaS) platforms have gained popularity d...

Please sign up or login with your details

Forgot password? Click here to reset