The Language of Legal and Illegal Activity on the Darknet

by   Leshem Choshen, et al.

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drug-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.


Unsupervised Simplification of Legal Texts

The processing of legal texts has been developing as an emerging field i...

Towards De-identification of Legal Texts

In many countries, personal information that can be published or shared ...

Towards Grammatical Tagging for the Legal Language of Cybersecurity

Legal language can be understood as the language typically used by those...

The NAI Suite – Drafting and Reasoning over Legal Texts

A prototype for automated reasoning over legal texts, called NAI, is pre...

Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France

Artificial Intelligence techniques are already popular and important in ...

Please sign up or login with your details

Forgot password? Click here to reset