Trojan Model Detection Using Activation Optimization

06/08/2023
by   Mohamed E. Hussein, et al.
0

Due to data's unavailability or large size, and the high computational and human labor costs of training machine learning models, it is a common practice to rely on open source pre-trained models whenever possible. However, this practice is worry some from the security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model's behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present our preliminary work on a novel method for Trojan model detection. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. Our method achieves state of the art performance on two public datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2020

Incorporating Count-Based Features into Pre-Trained Models for Improved Stance Detection

The explosive growth and popularity of Social Media has revolutionised t...
research
06/08/2022

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Transfer learning aims to leverage knowledge from pre-trained models to ...
research
04/14/2020

Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models....
research
12/24/2022

Boosting Out-of-Distribution Detection with Multiple Pre-trained Models

Out-of-Distribution (OOD) detection, i.e., identifying whether an input ...
research
08/24/2023

Pre-trained Model-based Automated Software Vulnerability Repair: How Far are We?

Various approaches are proposed to help under-resourced security researc...
research
04/13/2021

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

Model extraction increasingly attracts research attentions as keeping co...
research
01/20/2020

Model Reuse with Reduced Kernel Mean Embedding Specification

Given a publicly available pool of machine learning models constructed f...

Please sign up or login with your details

Forgot password? Click here to reset