Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions

12/02/2018
by   Jasper Linmans, et al.
0

Detecting controversy in general web pages is a daunting task, but increasingly essential to efficiently moderate discussions and effectively filter problematic content. Unfortunately, controversies occur across many topics and domains, with great changes over time. This paper investigates neural classifiers as a more robust methodology for controversy detection in general web pages. Current models have often cast controversy detection on general web pages as Wikipedia linking, or exact lexical matching tasks. The diverse and changing nature of controversies suggest that semantic approaches are better able to detect controversy. We train neural networks that can capture semantic information from texts using weak signal data. By leveraging the semantic properties of word embeddings we robustly improve on existing controversy detection methods. To evaluate model stability over time and to unseen topics, we asses model performance under varying training conditions to test cross-temporal, cross-topic, cross-domain performance and annotator congruence. In doing so, we demonstrate that weak-signal based neural approaches are closer to human estimates of controversy and are more robust to the inherent variability of controversies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2022

Change detection optimization in frequently changing web pages

Web pages at present have become dynamic and frequently changing, compar...
research
03/06/2022

Detection of Change Frequency in Web Pages to Optimize Server-based Scheduling

The Internet at present has become vast and dynamic with the ever increa...
research
01/02/2023

Using meaning instead of words to track topics

The ability to monitor the evolution of topics over time is extremely va...
research
10/14/2021

Is Stance Detection Topic-Independent and Cross-topic Generalizable? – A Reproduction Study

Cross-topic stance detection is the task to automatically detect stances...
research
05/15/2021

A Large Visual, Qualitative and Quantitative Dataset of Web Pages

The World Wide Web is not only one of the most important platforms of co...
research
02/15/2018

Cross-topic Argument Mining from Heterogeneous Sources Using Attention-based Neural Networks

Argument mining is a core technology for automating argument search in l...
research
08/23/2023

DarkDiff: Explainable web page similarity of TOR onion sites

In large-scale data analysis, near-duplicates are often a problem. For e...

Please sign up or login with your details

Forgot password? Click here to reset