CensorSeeker: Generating a Large, Culture-Specific Blocklist for China

06/04/2018
by   Austin Hounsel, et al.
0

Internet censorship measurements rely on lists of websites to be tested, or "block lists" that are curated by third parties. Unfortunately, many of these lists are not public, and those that are tend to focus on a small group of topics, leaving other types of sites and services untested. To increase and diversify the set of sites on existing block lists, we develop CensorSeeker, which uses search engines and natural language techniques to discover a much wider range of websites that are censored in China. Using this tool, we create a list of 821 websites outside the Alexa Top 1000 that cover Chinese politics, minority human rights organizations, and oppressed religions. Importantly, none of the sites we discover are present on the current largest block list. The list that we develop not only vastly expands the set of sites that current Internet measurement tools can test, but it also deepens our understanding of the nature of content that is censored in China. We have released both this new block list and the code for generating it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Automatically Generating a Large, Culture-Specific Blocklist for China

Internet censorship measurements rely on lists of websites to be tested,...
research
04/09/2018

Automated Discovery of Internet Censorship by Web Crawling

Censorship of the Internet is widespread around the world. As access to ...
research
12/12/2019

Investigating the effectiveness of web adblockers

We investigate adblocking filters and the extent to which websites and a...
research
02/22/2017

Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources

Real-time monitoring and responses to emerging public health threats rel...
research
05/29/2018

A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists

A broad range of research areas including Internet measurement, privacy,...
research
05/29/2020

Tracing Cryptocurrency Scams: Clustering Replicated Advance-Fee and Phishing Websites

Over the past few years, there has been a growth in activity, public kno...
research
02/25/2022

AutoFR: Automated Filter Rule Generation for Adblocking

Adblocking relies on filter lists, which are manually curated and mainta...

Please sign up or login with your details

Forgot password? Click here to reset