Content-Based Features to Rank Influential Hidden Services of the Tor Darknet

10/05/2019
by   Mhd Wesam Al-Nabki, et al.
0

The unevenness importance of criminal activities in the onion domains of the Tor Darknet and the different levels of their appeal to the end-user make them tangled to measure their influence. To this end, this paper presents a novel content-based ranking framework to detect the most influential onion domains. Our approach comprises a modeling unit that represents an onion domain using forty features extracted from five different resources: user-visible text, HTML markup, Named Entities, network topology, and visual content. And also, a ranking unit that, using the Learning-to-Rank (LtR) approach, automatically learns a ranking function by integrating the previously obtained features. Using a case-study based on drugs-related onion domains, we obtained the following results. (1) Among the explored LtR schemes, the listwise approach outperforms the benchmarked methods with an NDCG of 0.95 for the top-10 ranked domains. (2) We proved quantitatively that our framework surpasses the link-based ranking techniques. Also, (3) with the selected feature, we observed that the textual content, composed by text, NER, and HTML features, is the most balanced approach, in terms of efficiency and score obtained. The proposed framework might support Law Enforcement Agencies in detecting the most influential domains related to possible suspicious activities.

READ FULL TEXT

page 1

page 4

research
11/04/2018

Structure and Content of the Visible Darknet

In this paper, we analyze the topology and the content found on the "dar...
research
05/26/2020

Ranking-Incentivized Quality Preserving Content Modification

The Web is a canonical example of a competitive retrieval setting where ...
research
09/02/2021

Coordinating Narratives and the Capitol Riots on Parler

Coordinated disinformation campaigns are used to influence social media ...
research
10/24/2019

Comparison of Quality Indicators in User-generated Content Using Social Media and Scholarly Text

Predicting the quality of a text document is a critical task when presen...
research
07/12/2019

A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams

We present a new machine learning and text information extraction approa...
research
10/23/2020

Ranking Creative Language Characteristics in Small Data Scenarios

The ability to rank creative natural language provides an important gene...
research
05/02/2017

Quantifying the relation between performance and success in soccer

The availability of massive data about sports activities offers nowadays...

Please sign up or login with your details

Forgot password? Click here to reset