Sampled Image Tagging and Retrieval Methods on User Generated Content

11/21/2016
by   Karl Ni, et al.
0

Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in the metadata tags. Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance. As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability.

READ FULL TEXT

page 7

page 8

page 13

research
05/31/2016

Fast Zero-Shot Image Tagging

The well-known word analogy experiments show that the recent word vector...
research
10/30/2020

Multimodal Metric Learning for Tag-based Music Retrieval

Tag-based music retrieval is crucial to browse large-scale music librari...
research
07/30/2015

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

To date, there have been massive Semi-Structured Documents (SSDs) during...
research
04/18/2016

Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging

Automatic image annotation has been an important research topic in facil...
research
09/04/2019

Large-scale Tag-based Font Retrieval with Generative Feature Learning

Font selection is one of the most important steps in a design workflow. ...
research
11/27/2017

Separating Self-Expression and Visual Content in Hashtag Supervision

The variety, abundance, and structured nature of hashtags make them an i...
research
06/15/2022

Born for Auto-Tagging: Faster and better with new objective functions

Keyword extraction is a task of text mining. It is applied to increase s...

Please sign up or login with your details

Forgot password? Click here to reset