A Web Page Classifier Library Based on Random Image Content Analysis Using Deep Learning

12/18/2019
by   Leonardo Espinosa Leal, et al.
0

In this paper, we present a methodology and the corresponding Python library 1 for the classification of webpages. Our method retrieves a fixed number of images from a given webpage, and based on them classifies the webpage into a set of established classes with a given probability. The library trains a random forest model build upon the features extracted from images by a pre-trained deep network. The implementation is tested by recognizing weapon class webpages in a curated list of 3859 websites. The results show that the best method of classifying a webpage into the studies classes is to assign the class according to the maximum probability of any image belonging to this (weapon) class being above the threshold, across all the retrieved images. Further research explores the possibilities for the developed methodology to also apply in image classification for healthcare applications.

READ FULL TEXT
research
03/19/2015

A General Framework for Multi-focal Image Classification and Authentication: Application to Microscope Pollen Images

In this article, we propose a general framework for multi-focal image cl...
research
10/24/2020

Towards Benchmark Datasets for Machine Learning Based Website Phishing Detection: An experimental study

In this paper, we present a general scheme for building reproducible and...
research
02/11/2019

Deep Learning Methods for Event Verification and Image Repurposing Detection

The authenticity of images posted on social media is an issue of growing...
research
05/30/2020

Web page classification with Google Image Search results

In this paper, we introduce a novel method that combines multiple neural...
research
01/12/2016

Learning Subclass Representations for Visually-varied Image Classification

In this paper, we present a subclass-representation approach that predic...
research
07/07/2021

ADAPT : Awesome Domain Adaptation Python Toolbox

ADAPT is an open-source python library providing the implementation of s...

Please sign up or login with your details

Forgot password? Click here to reset