Guide Me: Interacting with Deep Networks

03/30/2018
by   Christian Rupprecht, et al.
2

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or improve the performance of a learned visual processing algorithm. In this paper, we explore methods to flexibly guide a trained convolutional neural network through user input to improve its performance during inference. We do so by inserting a layer that acts as a spatio-semantic guide into the network. This guide is trained to modify the network's activations, either directly via an energy minimization scheme or indirectly through a recurrent model that translates human language queries to interaction weights. Learning the verbal interaction is fully automatic and does not require manual text annotations. We evaluate the method on two datasets, showing that guiding a pre-trained network can improve performance, and provide extensive insights into the interaction between the guide and the CNN.

READ FULL TEXT

page 2

page 4

page 7

page 8

research
03/20/2017

I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

Translating information between text and image is a fundamental problem ...
research
03/03/2022

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Image captioning is a fast-growing research field of computer vision and...
research
05/26/2022

Prompt-based Learning for Unpaired Image Captioning

Unpaired Image Captioning (UIC) has been developed to learn image descri...
research
11/28/2018

Towards Task Understanding in Visual Settings

We consider the problem of understanding real world tasks depicted in vi...
research
11/20/2018

How You See Me

Convolution Neural Networks is one of the most powerful tools in the pre...
research
09/08/2022

Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

We propose Fast text2StyleGAN, a natural language interface that adapts ...
research
01/30/2023

PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

Many deep learning tasks require annotations that are too time consuming...

Please sign up or login with your details

Forgot password? Click here to reset