Learning to Associate Words and Images Using a Large-scale Graph

05/22/2017
by   Heqing Ya, et al.
0

We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230-dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77 average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.

READ FULL TEXT

page 1

page 4

research
09/12/2017

Human Associations Help to Detect Conventionalized Multiword Expressions

In this paper we show that if we want to obtain human evidence about con...
research
09/30/2019

Unsupervised Projection Networks for Generative Adversarial Networks

We propose the use of unsupervised learning to train projection networks...
research
10/21/2022

Describing Sets of Images with Textual-PCA

We seek to semantically describe a set of images, capturing both the att...
research
03/20/2023

Controllable Ancient Chinese Lyrics Generation Based on Phrase Prototype Retrieving

Generating lyrics and poems is one of the essential downstream tasks in ...
research
08/20/2016

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

A picture is worth a thousand words. Not until recently, however, we not...
research
05/02/2018

Images & Recipes: Retrieval in the cooking context

Recent advances in the machine learning community allowed different use ...
research
12/29/2016

Learning Visual N-Grams from Web Data

Real-world image recognition systems need to recognize tens of thousands...

Please sign up or login with your details

Forgot password? Click here to reset