Automated Discovery of Internet Censorship by Web Crawling

04/09/2018
by   Alexander Darer, et al.
0

Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community. We present a new approach for discovering filtered domains in different countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of Jan 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources.

READ FULL TEXT

page 7

page 8

page 9

research
06/04/2018

CensorSeeker: Generating a Large, Culture-Specific Blocklist for China

Internet censorship measurements rely on lists of websites to be tested,...
research
09/04/2023

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Domain lists are a key ingredient for representative censuses of the Web...
research
06/04/2018

Automatically Generating a Large, Culture-Specific Blocklist for China

Internet censorship measurements rely on lists of websites to be tested,...
research
05/22/2018

AdGraph: A Machine Learning Approach to Automatic and Effective Adblocking

Filter lists are widely deployed by adblockers to block ads and other fo...
research
05/29/2018

In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery

Existing methods for active topology discovery within the IPv6 Internet ...
research
08/26/2017

Navigation Objects Extraction for Better Content Structure Understanding

Existing works for extracting navigation objects from webpages focus on ...
research
12/12/2019

Investigating the effectiveness of web adblockers

We investigate adblocking filters and the extent to which websites and a...

Please sign up or login with your details

Forgot password? Click here to reset