Crawler for Image Acquisition from World Wide Web

11/11/2019
by   R Rajkumar, et al.
0

Due to the advancement in computer communication and storage technologies, large amount of image data is available on World Wide Web (WWW). In order to locate a particular set of images the available search engines may be used with the help of keywords. Here, the filtering of unwanted data is not done. For the purpose of retrieving relevant images with appropriate keyword(s) an image crawler is designed and implemented. Here, keyword(s) are submitted as query and with the help of sender engine, images are downloaded along with metadata like URL, filename, file size, file access date and time etc.,. Later, with the help of URL, images already present in repository and newly downloaded are compared for uniqueness. Only unique URLs are in turn considered and stored in repository. The images in the repository are used to build novel Content Based Image Retrieval (CBIR) system in future. This repository may be used for various purposes. This image crawler tool is useful in building image datasets which can be used by any CBIR system for training and testing purposes.

READ FULL TEXT
research
05/26/2021

Quotient Space-Based Keyword Retrieval in Sponsored Search

Synonymous keyword retrieval has become an important problem for sponsor...
research
08/19/2010

A Miniature-Based Image Retrieval System

Due to the rapid development of World Wide Web (WWW) and imaging technol...
research
08/02/2018

Evaluating search engines and defining a consensus implementation

Different search engines provide different outputs for the same keyword....
research
04/05/2023

Hog 2023.1: a collaborative management tool to handle Git-based HDL repository

Hog (HDL on Git) is an open-source tool designed to manage Git-based HDL...
research
05/06/2023

Fairness in Image Search: A Study of Occupational Stereotyping in Image Retrieval and its Debiasing

Multi-modal search engines have experienced significant growth and wides...
research
04/18/2018

ArXiv and the REF open access policy

HEFCE's Policy for open access in the post-2014 Research Excellence Fram...
research
03/23/2021

HSEarch: semantic search system for workplace accident reports

Semantic search engines, which integrate the output of text mining (TM) ...

Please sign up or login with your details

Forgot password? Click here to reset