Visual Language Modeling on CNN Image Representations

11/09/2015
by   Hiroharu Kato, et al.
0

Measuring the naturalness of images is important to generate realistic images or to detect unnatural regions in images. Additionally, a method to measure naturalness can be complementary to Convolutional Neural Network (CNN) based features, which are known to be insensitive to the naturalness of images. However, most probabilistic image models have insufficient capability of modeling the complex and abstract naturalness that we feel because they are built directly on raw image pixels. In this work, we assume that naturalness can be measured by the predictability on high-level features during eye movement. Based on this assumption, we propose a novel method to evaluate the naturalness by building a variant of Recurrent Neural Network Language Models on pre-trained CNN representations. Our method is applied to two tasks, demonstrating that 1) using our method as a regularizer enables us to generate more understandable images from image features than existing approaches, and 2) unnaturalness maps produced by our method achieve state-of-the-art eye fixation prediction performance on two well-studied datasets.

READ FULL TEXT

page 2

page 6

research
05/07/2015

Language Models for Image Captioning: The Quirks and What Works

Two recent approaches have achieved state-of-the-art results in image ca...
research
03/03/2022

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Image captioning is a fast-growing research field of computer vision and...
research
02/05/2016

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

Generating natural language descriptions for images is a challenging tas...
research
07/05/2020

Deep Convolutional Neural Network for Identifying Seam-Carving Forgery

Seam carving is a representative content-aware image retargeting approac...
research
08/30/2017

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

We propose a DTCWT ScatterNet Convolutional Neural Network (DTSCNN) form...
research
05/19/2023

LLM Itself Can Read and Generate CXR Images

Building on the recent remarkable development of large language models (...
research
12/30/2022

An Experience-based Direct Generation approach to Automatic Image Cropping

Automatic Image Cropping is a challenging task with many practical downs...

Please sign up or login with your details

Forgot password? Click here to reset