Automatically Generating a Large, Culture-Specific Blocklist for China

06/04/2018
by   Austin Hounsel, et al.
0

Internet censorship measurements rely on lists of websites to be tested, or "block lists" that are curated by third parties. Unfortunately, many of these lists are not public, and those that are tend to focus on a small group of topics, leaving other types of sites and services untested. To increase and diversify the set of sites on existing block lists, we use natural language processing and search engines to automatically discover a much wider range of websites that are censored in China. Using these techniques, we create a list of 1125 websites outside the Alexa Top 1,000 that cover Chinese politics, minority human rights organizations, oppressed religions, and more. Importantly, none of the sites we discover are present on the current largest block list. The list that we develop not only vastly expands the set of sites that current Internet measurement tools can test, but it also deepens our understanding of the nature of content that is censored in China. We have released both this new block list and the code for generating it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

CensorSeeker: Generating a Large, Culture-Specific Blocklist for China

Internet censorship measurements rely on lists of websites to be tested,...
research
06/29/2021

No Calm in The Storm: Investigating QAnon Website Relationships

QAnon is a far-right conspiracy theory whose followers largely organize ...
research
02/07/2018

Structure and Stability of Internet Top Lists

Active Internet measurement studies rely on a list of targets to be scan...
research
04/09/2018

Automated Discovery of Internet Censorship by Web Crawling

Censorship of the Internet is widespread around the world. As access to ...
research
02/22/2017

Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources

Real-time monitoring and responses to emerging public health threats rel...
research
12/12/2019

Investigating the effectiveness of web adblockers

We investigate adblocking filters and the extent to which websites and a...
research
05/29/2020

Tracing Cryptocurrency Scams: Clustering Replicated Advance-Fee and Phishing Websites

Over the past few years, there has been a growth in activity, public kno...

Please sign up or login with your details

Forgot password? Click here to reset