Efficient Hierarchical Clustering for Classification and Anomaly Detection

08/25/2020
by   Ishita Doshi, et al.
0

We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that can be used both for efficient and scalable real-time multiclass classification as well as in detecting new anomalies in user-generated content. Our methods have low query time, linear space usage, and come with theoretical guarantees with respect to a specific hierarchical clustering cost function (Dasgupta, 2016). We compare our solutions against a range of classification techniques and demonstrate excellent empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2020

A Novel Algorithm for Optimized Real Time Anomaly Detection in Timeseries

Observations in data which are significantly different from its neighbou...
research
06/16/2023

Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

This paper presents two efficient hierarchical clustering (HC) algorithm...
research
04/18/2018

NHAD: Neuro-Fuzzy Based Horizontal Anomaly Detection In Online Social Networks

Use of social network is the basic functionality of today's life. With t...
research
12/12/2018

Real-Time Anomaly Detection With HMOF Feature

Anomaly detection is a challenging problem in intelligent video surveill...
research
10/16/2015

A cost function for similarity-based hierarchical clustering

The development of algorithms for hierarchical clustering has been hampe...
research
11/17/2018

Towards Scalable Subscription Aggregation and Real Time Event Matching in a Large-Scale Content-Based Network

Although many scalable event matching algorithms have been proposed to a...
research
11/13/2017

PRE-render Content Using Tiles (PRECUT). 1. Large-Scale Compound-Target Relationship Analyses

Visualizing a complex network is computationally intensive process and d...

Please sign up or login with your details

Forgot password? Click here to reset