E^2BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

09/18/2017
by   Xiaobin Liu, et al.
0

Traditional Bag-of-visual Words (BoWs) model is commonly generated with many steps including local feature extraction, codebook generation, and feature quantization, etc. Those steps are relatively independent with each other and are hard to be jointly optimized. Moreover, the dependency on hand-crafted local feature makes BoWs model not effective in conveying high-level semantics. These issues largely hinder the performance of BoWs model in large-scale image applications. To conquer these issues, we propose an End-to-End BoWs (E^2BoWs) model based on Deep Convolutional Neural Network (DCNN). Our model takes an image as input, then identifies and separates the semantic objects in it, and finally outputs the visual words with high semantic discriminative power. Specifically, our model firstly generates Semantic Feature Maps (SFMs) corresponding to different object categories through convolutional layers, then introduces Bag-of-Words Layers (BoWL) to generate visual words for each individual feature map. We also introduce a novel learning algorithm to reinforce the sparsity of the generated E^2BoWs model, which further ensures the time and memory efficiency. We evaluate the proposed E^2BoWs model on several image search datasets including CIFAR-10, CIFAR-100, MIRFLICKR-25K and NUS-WIDE. Experimental results show that our method achieves promising accuracy and efficiency compared with recent deep learning based retrieval works.

READ FULL TEXT

page 4

page 7

research
08/20/2014

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods

Vector-quantized local features frequently used in bag-of-visual-words a...
research
01/18/2015

Image classification by visual bag-of-words refinement and reduction

This paper presents a new framework for visual bag-of-words (BOW) refine...
research
12/01/2017

Video retrieval based on deep convolutional neural network

Recently, with the enormous growth of online videos, fast video retrieva...
research
01/26/2017

Deep Region Hashing for Efficient Large-scale Instance Search from Images

Instance Search (INS) is a fundamental problem for many applications, wh...
research
10/17/2022

Natural Scene Image Annotation Using Local Semantic Concepts and Spatial Bag of Visual Words

The use of bag of visual words (BOW) model for modelling images based on...
research
12/02/2019

DeepLofargram: A Deep Learning based Fluctuating Dim Frequency Line Detection and Recovery

This paper investigates the problem of dim frequency line detection and ...
research
11/05/2019

Deep Flow Collaborative Network for Online Visual Tracking

The deep learning-based visual tracking algorithms such as MDNet achieve...

Please sign up or login with your details

Forgot password? Click here to reset