Deep Q learning for fooling neural networks

11/13/2018
by   Mandar Kulkarni, et al.
0

Deep learning models are vulnerable to external attacks. In this paper, we propose a Reinforcement Learning (RL) based approach to generate adversarial examples for the pre-trained (target) models. We assume a semi black-box setting where the only access an adversary has to the target model is the class probabilities obtained for the input queries. We train a Deep Q Network (DQN) agent which, with experience, learns to attack only a small portion of image pixels to generate non-targeted adversarial images. Initially, an agent explores an environment by sequentially modifying random sets of image pixels and observes its effect on the class probabilities. At the end of an episode, it receives a positive (negative) reward if it succeeds (fails) to alter the label of the image. Experimental results with MNIST, CIFAR-10 and Imagenet datasets demonstrate that our RL framework is able to learn an effective attack policy.

READ FULL TEXT

page 4

page 5

research
12/19/2019

A New Ensemble Method for Concessively Targeted Multi-model Attack

It is well known that deep learning models are vulnerable to adversarial...
research
05/03/2018

Siamese networks for generating adversarial examples

Machine learning models are vulnerable to adversarial examples. An adver...
research
09/13/2018

Query-Efficient Black-Box Attack by Active Learning

Deep neural network (DNN) as a popular machine learning model is found t...
research
11/10/2019

Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Recent studies have revealed that neural network-based policies can be e...
research
01/11/2020

Sparse Black-box Video Attack with Reinforcement Learning

Adversarial attacks on video recognition models have been explored recen...
research
05/31/2018

Sequential Attacks on Agents for Long-Term Adversarial Goals

Reinforcement learning (RL) has advanced greatly in the past few years w...
research
08/02/2019

AdvGAN++ : Harnessing latent layers for adversary generation

Adversarial examples are fabricated examples, indistinguishable from the...

Please sign up or login with your details

Forgot password? Click here to reset