A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

10/06/2016
by   Muhammad Imran, et al.
0

An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-to-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

A Clustering-based Framework for Classifying Data Streams

The non-stationary nature of data streams strongly challenges traditiona...
research
02/27/2019

When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP

Sexism is very common in social media and makes the boundaries of freedo...
research
01/04/2019

Online Social Media Recommendation over Streams

As one of the most popular services over online communities, the social ...
research
12/13/2022

AWT – Clustering Meteorological Time Series Using an Aggregated Wavelet Tree

Both clustering and outlier detection play an important role for meteoro...
research
06/29/2023

Computationally Assisted Quality Control for Public Health Data Streams

Irregularities in public health data streams (like COVID-19 Cases) hampe...
research
07/16/2019

Modeling Human Annotation Errors to Design Bias-Aware Systems for Social Stream Processing

High-quality human annotations are necessary to create effective machine...
research
01/21/2021

Active Hybrid Classification

Hybrid crowd-machine classifiers can achieve superior performance by com...

Please sign up or login with your details

Forgot password? Click here to reset