Discovering the Hidden Vocabulary of DALLE-2

06/01/2022
by   Giannis Daras, et al.
11

We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that means birds and (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.

READ FULL TEXT

page 1

page 3

page 4

research
09/09/2021

Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation

We aim to explain a black-box classifier with the form: `data X is class...
research
04/15/2021

Discover the Hidden Attack Path in Multi-domain Cyberspace Based on Reinforcement Learning

In this work, we present a learning-based approach to analysis cyberspac...
research
11/05/2020

Towards Dark Jargon Interpretation in Underground Forums

Dark jargons are benign-looking words that have hidden, sinister meaning...
research
06/07/2019

A cryptographic approach to black box adversarial machine learning

We propose an ensemble technique for converting any classifier into a co...
research
06/24/2017

Challenges of facet analysis and concept placement in universal classifications: the example of architecture in UDC

The paper discusses the challenges of faceted vocabulary organization in...
research
01/21/2021

Copycat CNN: Are Random Non-Labeled Data Enough to Steal Knowledge from Black-box Models?

Convolutional neural networks have been successful lately enabling compa...
research
08/29/2023

Uncovering the Unseen: Discover Hidden Intentions by Micro-Behavior Graph Reasoning

This paper introduces a new and challenging Hidden Intention Discovery (...

Please sign up or login with your details

Forgot password? Click here to reset