Understanding invariance via feedforward inversion of discriminatively trained classifiers

by   Piotr Teterwak, et al.

A discriminatively trained neural net classifier achieves optimal performance if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the output logits. This finding is based on inversion techniques that map deep embeddings back to images. Although the logit inversions seldom produce coherent, natural images or recognizable object classes, they do recover some visual detail. We explore this phenomenon further using a novel synthesis of methods, yielding a feedforward inversion model that produces remarkably high fidelity reconstructions, qualitatively superior to those of past efforts. When applied to an adversarially robust classifier model, the reconstructions contain sufficient local detail and global structure that they might be confused with the original image in a quick glance, and the object category can clearly be gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019), with conditioning on logits instead of one-hot class labels. We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images. We believe that our method can inspire future investigations into the nature of information flow in a neural net and can provide diagnostics for improving discriminative models.


page 3

page 10

page 11

page 15

page 16

page 17

page 18

page 19


Single Image Backdoor Inversion via Robust Smoothed Classifiers

Backdoor inversion, the process of finding a backdoor trigger inserted i...

EDICT: Exact Diffusion Inversion via Coupled Transformations

Finding an initial noise vector that produces an input image when fed in...

IMAGINE: Image Synthesis by Image-Guided Model Inversion

We introduce an inversion based method, denoted as IMAge-Guided model IN...

Adversarial Manipulation of Deep Representations

We show that the representation of an image in a deep neural network (DN...

From Continuity to Editability: Inverting GANs with Consecutive Images

Existing GAN inversion methods are stuck in a paradox that the inverted ...

Brain-like representational straightening of natural movies in robust feedforward neural networks

Representational straightening refers to a decrease in curvature of visu...

Explaining Neural Networks by Decoding Layer Activations

To derive explanations for deep learning models, ie. classifiers, we pro...

Please sign up or login with your details

Forgot password? Click here to reset