DeepAI
Log In Sign Up

"I can't keep it up anymore." The Voat.co dataset

Voat was a news aggregator website that shut down on December 25, 2020. The site had a troubled history and was known for hosting various banned subreddits. This paper presents a dataset with over 2.3M submissions and 16.2M comments posted from 113K users in 7.1K subverses (the equivalent of subreddit for Voat). Our dataset covers the whole lifetime of Voat, from its developing period starting on November 8, 2013, the day it was founded, April 2014, up until the day it shut down (December 25, 2020). This work presents the largest and most complete publicly available Voat dataset, to the best of our knowledge. We also present a preliminary analysis to cover posting activity and daily user and subverse registration on the platform so that researchers interested in our dataset can know what to expect. Our data may prove helpful to false news dissemination studies as we analyze the links users share on the platform, finding that many communities rely on alternative news press, like Breitbart and GatewayPundit, for their daily discussions. Last, we perform network analysis on user interactions finding that many users prefer not to interact with subverses outside their narrative interests, which could be helpful to researchers focusing on polarization and echo chambers. Also, since Voat was one of the platforms many Reddit users migrated to after a ban, we are confident that our dataset will motivate and assist researchers studying deplatforming. In addition, many hateful and conspiratorial communities seem to be very popular on Voat, which makes our work valuable for researchers focusing on toxicity, conspiracy theories, cross-platform studies of social networks, and natural language processing.

READ FULL TEXT VIEW PDF
01/21/2020

Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board

This paper presents a dataset with over 3.3M threads and 134.5M posts fr...
01/23/2020

The Pushshift Telegram Dataset

Messaging platforms, especially those with a mobile focus, have become i...
01/11/2021

An Early Look at the Parler Online Social Network

Parler is as an "alternative" social network promoting itself as a servi...
09/23/2022

News Category Dataset

People rely on news to know what is happening around the world and infor...
11/16/2018

Homogeneity-Based Transmissive Process to Model True and False News in Social Networks

An overwhelming number of true and false news stories are posted and sha...
09/10/2020

"Is it a Qoincidence?": A First Step Towards Understanding and Characterizing the QAnon Movement on Voat.co

Online fringe communities offer fertile grounds for users to seek and sh...
08/19/2021

Unsupervised Topic Discovery in User Comments

On social media platforms like Twitter, users regularly share their opin...

1 Introduction

Social networks are a primary tool in today’s society. They offer countless opportunities for people around the world to connect in various ways, find jobs, entertain themselves, catch up on world happenings, etc. At the same time, social networks sometimes offer a “safe-house” for people that want, among other things, to connect to like-minded individuals towards sharing hate and toxicity [almerekhi2020investigating, vice2017toxicreddit], discussing controversial matters [time2021racismtwitter], and spreading misinformation and disinformation [mit2021fbmisinformation].

Mainstream social networks suffer from users and communities that organize these conversations on their platforms. A common “solution” the administrators result to is to ban these users - deplatforming. The social network that is known to have taken this action many times so far is Reddit, which banned more than subreddits [verge2020redditnans] from its platform - the first one being in 2014 [daily2014firstban]. Research on deplatforming shows that users that had their communities banned met on other platforms and forums and even got more toxic than what they used to be [horta2021platform]. Other than forums, users move to social networks that allow controversial discussions. One of the platforms that many banned Reddit communities decided to migrate to was Voat.

Voat was a Reddit-esque social network founded in April 2014 and shut down in December 2020 [verge2020whatisvoat]. Similar to Reddit, discussions on Voat are divided into various channels - subverses - the equivalent of a subreddit. Users can subscribe to as many subverses they wish but cannot moderate more than ten to prevent users gaining undue influence on the platform. Registration of new users on Voat requires only a unique username and a password. Newcomers can upvote, downvote, and comment on existing submissions but cannot create new submissions under subverses until they achieve a certain amount of upvotes on all of their comments.

Since its foundation, Voat gradually gained popularity over the years, especially after every Reddit cleansing [verge2015migration, times2016migration, daily2017migration, fox2018migration]. Overall, Voat is known for hosting banned extreme communities and users, providing a safe space for like-minded individuals to share their ideas “freely.” Voat has attracted the interest of researchers before as it hosted communities like /v/fatpeoplehate, /v/CoonTown, and /v/Nigger [chandrasekharan2017bag], /v/TheRedPill [saleem2017web], /v/GreatAwakening [papasavva2021qoincidence], etc.

Data Release. In this work, we present, to the best of our knowledge, the largest and most complete dataset of Voat. Along with this paper, we release a dataset [zenodo] that consists of over posts from users in subverses over the lifetime of Voat (November 2013 - December 2020). Specifically, our dataset is four fold:

  • Title, body, and metadata of submissions;

  • content and metadata of comments;

  • user profile data; and

  • subverse profile data.

Relevance. Our dataset provides several opportunities to the research community. First, Voat was evidently the place many banned users and communities moved to after being banned on other platforms [papasavva2020raiders, chandrasekharan2017bag]. To this end, our dataset can assist researchers that focus on deplatforming and user migration. Also, our dataset may aid researchers deepen our understanding of how and when these communities choose their new “home” after being removed from their previous one. Second, our dataset covers numerous offline events like the 2016 and 2020 EU Presidential Election and debate, Brexit, Epstein’s arrest, and various terrorist attacks and unrest around the world that can prove helpful in further analysis of these events. Third, since Voat was a supporter of freedom of expression and online free speech for extreme and hateful communities, it contains a variety of slang language and toxic content that can be useful towards understanding hateful communities.

Paper organization. The rest of the paper is organized as follows. First, we briefly explain what Voat is and how it works in Section 2 before going through its history in Section 3. Then, we describe the process of parsing the Internet Archive Wayback Machine (IAWM) published data, along with the complimentary collection of additional user and subverse data in Section 4. We then describe the structure of our dataset in Section 5 and provide a statistical analysis of the dataset (Section 6), followed by reviewing related work (Section 7). The paper concludes with Section 8.

2 What was Voat?

Voat was a Reddit-esque news aggregator launched in April 2014. It was originally named “WhoaVerse” and renamed as “Voat” in December 2014. The mascot of Voat resembles an angry goat, which was designed and freely offered to the website by a user of the site.

Subverses. Discussions on Voat occur in specific groups of interests under the name of “subverses.” Users could register new subverses on-demand before June 2020, when the administrators disabled this functionality. When a user registers a subverse, they become its owner. The owner of a subverse has complete authority over the subverse: they can deactivate the subverse and appoint other co-owners and moderators. The moderators can delete submissions and comments posted by users and even ban users from posting on the subverse. The owners and moderators can also allow users to post anonymously in their subverse, which replaces the poster username with a random multi-digit number. To prevent users from gaining extreme influence on the platform, Voat limits the number of subverses one can own or moderate [papasavva2021qoincidence].

Users. Voat was proclaiming itself as a free-speech platform that offered its users anonymity. When newcomers register a new account, Voat does not require any personal details to verify the account, like an email address or phone number. A user can insert a username and a password to register, but if they forget their password, there is no way to recover the account.

After registering a new account, users can subscribe to subverses of interest, comment, upvote, and downvote the comments and submissions but cannot post new submissions. To post a new submission, they first need to acquire ten Comment Contribution Points (CCP). When newcomers post comments on existing submissions, they can try and collect a net score of ten upvotes on all of their comments (one downvote cancels one upvote). The privilege of posting submissions is not guaranteed as users may lose it if their CCP falls below ten. Although this functionality may discourage users from being toxic to each other, it might also prevent users from expressing their honest opinions as others may disagree and downvote them. Voat users often refer to themselves as “goats” due to the platform’s mascot.

Submissions and voting system. Voat was a news aggregator platform, and hence users create a new submission by posting a title and a description accompanied by a link to a news source, although it is also possible to post without a link. If the poster provides a link, the submission’s title becomes a hyperlink to the source website. The domain of the source website appears next to the submission’s title, along with the poster’s username and the date and time the submission was posted.

Similar to Reddit, Voat offers a hierarchical, tree-like commenting system: other users can comment on the submission and the comments of other users. As mentioned before, users can upvote or downvote the submission or other users’ comments. In contrast with Reddit, Voat displays the total number of upvotes and downvotes a submission or a comment received. Also, the downvote functionality on Voat is not the same as Reddit’s. Downvoted submissions and comments on Voat alert the moderators of spammy or illegal content so they can take action. This functionality enforces the establishment of echo chambers as users usually downvote content that does not align with their beliefs and usually results in the user that posted the downvoted post either losing their submission posting privilege or even being banned from the subverse.

Content visibility. Voat attempted to provide its users with some ephemerality without deleting its content, but hiding it instead. Voat subverses filter submissions under three tabs, namely, hot, new, and top. Each subverse has 500 active submissions in 20 pages (0 to 19). Hot submissions are the ones that are currently active and discussed, new submissions are the ones that were posted most recently, and top submissions are the most popular submissions of the subverse (many comments). Many subverses disabled the functionality of these tabs, and the submissions shown across all three tabs are the same, just in a different order.

When a user creates a new submission on a subverse, it would typically appear first on the new tab on page 0. At the same time, the last submission of page 19 is archived but not deleted. If one knows the direct link to that submission, they can still reach it but cannot comment or vote it.

Voat API. Voat supported a JSON API service for some time, but its maintenance stopped in October 2020. To collect the submissions of a subverse, one had to request the API of a specific page number (0 to 19) of a subverse’s tab. The response of the API would be the 25 submissions of that page without their comments. To collect the comments, one needs to request them using the submission ID number, in which the API responds with 25 comments at a time.

Thus, to collect all the submissions from a subverse, one needs to request all 20 pages for the three tabs separately from the API. As explained by [papasavva2021qoincidence], the API does not list the archived subverses and does not respond to requests where the page is above 19.

However, if one knows the submission ID and the subverse it was posted in, they can request the API for that specific submission. Since submission IDs on Voat are incremental, one could theoretically collect all of Voat’s submissions by requesting the API for each submission ID incrementally for more than 7.5K subverses; that is 7.5K requests, in the worst case, to collect a single submission. To the best of our knowledge, no study or work managed to collect the full Voat dataset.

SearchVoat. A website named searchvoat.co used to collect the Voat submissions and comments.111https://searchvoat.co/search.php This site is not associated with Voat, but one can browse and find Voat submissions there. The site does not support an API and does not allow web scraping. After Voat shut down, the website transformed into a news aggregator, similar to Voat.222https://searchvoat.co/forum/

3 Voat’s Troubled History

In this section, we present Voat’s history as we believe it highlights the significance of our dataset. WhoaVerse was the original name of the website that was founded in April 2014. The website was a hobby project of Atif Colo (Voat username @Atko). Justin Chastain later joined Colo (Voat username @PuttItOut). The owners advertised the website as Reddit’s alternative with a focus on freedom of expression and speech, which satisfies its users’ needs and wants. In December 2014, WhoaVerse changed its name to Voat and marked its mascot as an angry goat.

In June 2015, after Reddit banned various hateful subreddits [verge2015migration], including /r/nigger and /r/fatpeoplehate, many Reddit users started registering accounts on Voat. The sudden influx of users overloaded the site, causing a lot of temporary down time[voat2015influx].

On June 19, 2015, Voat’s web hosting service, Host Europe,333https://www.hosteurope.de/en/ canceled Voat’s contract claiming that the site is publicizing abusive, insulting, youth-endangering content, along with illegal right-wing extremist content [host2015ban]. Some days later, PayPal froze Voat’s payment processing services [paypal2015ban]. In response, Voat shut down four subverses, two of which hosted sexualized images of minors and the founders attributed the shutdown to political correctness [jailbait]. The site moved to a different hosting provider and started accepting cryptocurrency donations.

In July 2015, Reddit banned a popular administrator that caused another influx of Reddit members registering with Voat, leading to more downtime. In an interview, Colo said that they “provide an alternative platform where users would not be censored and still say whatever they want” [interview2015colo]. Voat was the target of a DDoS attack many times and experienced numerous downtime during its six years of operation. The most significant attack was in July 2015[ddos2015]. Voat, Inc. became a registered corporation in the U.S. in August 2015. Although Voat was based in Switzerland, the U.S. seemed like the best option since “U.S. law with regards to free speech, by far beats every other candidate country we’ve researched” as explained by Colo in a post.

In November 2016, more users relocated to Voat after Reddit banned the /r/pizzagate conspiracy theory subreddit [times2016migration]. In January 2017, Colo resigned as CEO of Voat due to time availability restrictions and was replaced by Chastain. Chastain ran a fundraising campaign in May 2017 after announcing that Voat might have to shut down due to financial issues; Voat managed to stay online.

In November 2017, Reddit banned its incel community (/r/incel), and many of its followers reportedly moved to Voat [daily2017migration]. About a year later, on September 12, 2018, Reddit banned numerous subreddits dedicated to the QAnon conspiracy theory, which again caused many QAnon adherents to migrate to Voat [papasavva2021qoincidence].

In April 2019, Voat’s CEO Chastain asked Voat users to stop threatening people as he had been contacted by a “US agency” about the threats posted on the website.444https://searchvoat.co/v/Voat/3178819 Voat users were not pleased to hear that Voat was working with agencies to remove Voat content and “limiting” the site’s free speech and freedom of expression. Specifically, the first comment on the submission was an anti-Semitic slur, calling for the extermination of Jews [agency2019].

Last, on December 22, 2020, Voat announced again, now for the last time, that it would shut down due to lack of funding.555https://searchvoat.co/v/announcements/4169936 Chastain explained that he had been funding the site himself since March 2020 but had run out of money. On December 25, 2020, Voat shut down and its last submission was posted by Chastain, explaining “@Atko made the first post to Voat, so I am making the last.”666https://searchvoat.co/v/Voat/4174956

In Table 1

, we list some aforementioned Reddit bans that probably affected Voat’s activity. Some of these bans previously captured researchers’ interest. We use these bans in our analysis in Section 

6 to show whether Voat’s activity was indeed affected.

No Date Ban
1 May 9, 2014 /r/beatingwomen [fappeningbans]
2 Sep 6, 2014 /r/TheFappening [fappeningbans]
3 May 7, 2015 /r/nigger [chandrasekharan2017bag]
4 Jun 6, 2015 /r/fatpeoplehate [chandrasekharan2017bag]
5 Nov 23, 2016 /r/pizzagate [times2016migration]
6 Nov 7, 2017 /r/incel [daily2017migration]
7 Mar 15, 2018 /r/CBTS_Stream [fox2018migration]
8 Sep 18, 2018 /r/GreatAwakening [papasavva2020raiders]
Table 1: Reddit bans that reportedly affected Voat’s activity
Count # Users # Subverses
Submissions 2,334,817 80,063 7,616
Comments 15,731,754 153,827 7,515
Subverses 7,094
Users 108,451
Table 2: Number of submissions, comments, user profiles, and subverse profiles in the IAWM dataset.
Submissions Comments Users Subverses
Total 2,380,262 16,263,309 113,431 7,095
Table 3: Final number of submissions, comments, user profiles, and subverse profiles in the dataset.

4 Data Parsing and Data Collection

This section details the methodology and tools employed for our data collection infrastructure.

Submissions and Comments. Following Voat’s shutdown on December 25, 2020, the Internet Archive Wayback Machine (IAWM) released all of Voat’s snapshot captures in Web ARChive (WARC) format [archive]. These WARC captures include all the snapshots the IAWM captured over the lifetime of Voat. A WARC format file consists of single or multiple WARC records (snapshots), and it supports, among other things, the access and scraping of archived data. The files also hold revised and duplicated snapshots [warc].

To parse these snapshots into structured data, we download them and set up a Python parser to collect the submissions and comments. In our case, every WARC file is a collection of various Voat snapshots the IAWM captured. To facilitate the smooth parsing of the WARC files, we use the warcio Python library.777https://pypi.org/project/warcio/ This library offers a convenient and reliable way to read a WARC file by streaming every entry included in the file and automatically detecting the payload. The payload contains the capture itself, i.e., the HTML DOM tree code of the platform. Each WARC file includes the snapshot of the entire platform for a specific time and date, that is, millions of submission pages for thousands of submissions.

Our parser captures the HTML DOM tree code of all the pages included in all the WARC files serially. Then, it passes the HTML DOM tree code to a function that uses the beautifulsoup Python library to read and store in a JSON format the data and metadata of the submissions and comments, i.e., submission title and content, number of upvotes and downvotes, comments, etc. 888https://pypi.org/project/beautifulsoup4/ We ensure that our parser only stores the latest submission version, as WARC files have duplicate data.

User and subverse profiles. To complement our dataset, we also collected user and subverse profiles. A user profile consists of user-related data like username and registration date, whereas a subverse profile consists of subverse creation date, description, and subscriber count. To collect this data, we built a crawler using the IAWM API,999https://pypi.org/project/waybackpy/ along with beautifulsoup and HTML requests.101010https://pypi.org/project/requests-html/

Every user and subverse profile URL is unique, but they all start the same way; voat.co/u for the former and voat.co/v for the latter. First, we request the IAWM API for all the snapshots whose URLs start like users or subverse URLs. We then collect the responses and parse them into JSON format, storing the latest snapshot the IAWM has in its database for every unique username and subverse profile URL.

The above process resulted in the dataset summarized in Table 2. We collect a dataset that consists of more than submissions posted by users in subverses and over comments posted by almost users. Note that IAWM does not have the profile of about 500 subverses and hence we only manage to collect the profiles of subverses ( loss). In addition, we collect almost unique user profiles.

Data collected via Voat API In an attempt to complete our dataset, we incorporate to our dataset, the data that was collected for the [papasavva2021qoincidence] study. For that study we collected submissions and comments posted from users in subverses. For our data collection infrastructure, we used Voat’s API between May 2020 and October 2020, when Voat stopped the maintenance of its API. We find submissions and comments that were missing from the IAWM archive and incorporate them in the released dataset.

Some subverses on Voat offered anonymity to their users by replacing their username with a random eight-digit number (not a unique number for every user). The total number of users that commented or posted a submission (Table 2) does not include anonymous or deleted users. Hence, we assume that Voat’s known user base is users at least, based on the data we collect from the IAWM. It is impossible to know the exact Voat user base since Voat never shared the complete list of user profiles, even when it supported a data API service; to collect a user’s profile, one needs to know the username. This means that we cannot acquire user profile data of “stalkers.” Alas, assuming the total known number of usernames is

, we estimate that about

of the total users’ profile data () is either missing, or deleted users. However, [papasavva2021qoincidence] show that of the users being active in QAnon discussions consisted of deleted profiles. Considering that many usernames were deleted every day on Voat, we estimate that this dataset offers the best representation of Voat’s user base to date. Incorporating [papasavva2021qoincidence] user data with ours, we find additional user profiles and subverse. The final dataset presented and released with this work is detailed in Table 3.

Fair Principles. The data released and presented in this paper aligns with the FAIR guiding principles for scientific data, as described below:111111https://www.go-fair.org/fair-principles/

  • Findable: We assign a unique constant digital object identifier (DOI) to our dataset[zenodo].12121210.5281/zenodo.5841668

  • Accessible: Our dataset is openly accessible.

  • Interoperable: The dataset is stored using the standard JSON format that is widely used for storing data and can be used in various programming languages. We also provide a detailed description of our dataset’s format in Section 5.

  • Reusable: We provide all the available metadata along with our dataset and we extensively document them in this paper, in Section 5.

Ethical Considerations The data collected, presented, and released with this paper are available on the Wayback Machine and also used to be accessible (without the need of a registered account) on Voat before it went down. The collection and release of this dataset do not violate Voat’s or Wayback Machine’s Terms of Service. Although some subverses on Voat allowed users to post anonymously, the overwhelming majority did not offer this functionality. Hence, we detect and collect user profile data of users. The only identification of these user profiles is the unique pseudo name, which is not personally identifiable information. Analysis of the activity generated on Voat to other services could potentially be used to de-anonymize users. We note that we followed standard ethical guidelines [rivers2014ethical] and made no attempt to de-anonymize users.

5 Data Description

We now present the structure of our dataset, available at [zenodo].

Our dataset consists of four parts, namely, submission, comment, user profile, and subverse profile data. We release our data in various newline-delimited JSON files (.ndjson).131313http://ndjson.org/ Each line in a .ndjson file consists of a JSON object that holds various keys and values. Specifically, we release .ndjson files, one for every subverse, that hold the submission data. Similarly, we release .ndjson files that have comment data. We inspect our dataset for the missing subverses’ comments and find that these subverses had no comment activity, only a small number of submissions. Also, a single .ndjson file is released for user profile data, and another one for subverse profile data. In total, we release .ndjson files. Table 4 lists the keys, value data type, and description of our dataset files.

We choose to release the submission and comment data separately for every subverse as we believe it facilitates researchers that want to focus on specific communities. We also use JSON to release our dataset as it is among the most optimal ways to store and share data as it has extensive documentation and is supported by all popular programming languages.141414https://www.loc.gov/preservation/digital/formats/fdd/fdd000381.shtml

Key Value data type Description
subverse_name_submissions.ndjson (7,616 files)
title string Title of the submission
body string The text posted along with the submission
user string Username of the submission creator
time string Time the submission was posted
date string Date the submission was posted
upvotes integer Number of upvotes
downvotes integer Number of downvotes
domain string Domain the submission links to
link string Unique submission URL
submission_id integer Unique submission id
subverse string Full name of the subverse
subverse_name_comments.ndjson (7,515 files)
body string Comment content
user string Username of the comment creator
time string Time the comment was posted
date string Date the comment was posted
upvotes integer Number of upvotes
downvotes integer Number of downvotes
comment_id integer Unique comment id
depth integer Tree depth level of the comment
subverse string Full name of the subverse
root_submission integer Submission id the comment belongs to
user_profiles.ndjson (1 file)
user string Username
reg_date string Registration date
moderates list of strings Subverses the user moderated
owns list of strings Subverses the user created
subverse_profiles.ndjson (1 file)
subverse string The full name of the subverse
subscriber_count integer Number of subscribers
about string Description of the subverse
date_created string Date the subverse was created
Table 4: Description of the dataset files keys and data value types.

6 Data Analysis

In this Section we provide some statistical analysis and visualization of our dataset.

Figure 1: Number of all submissions and comments per day on Voat. Note log scale on y-axis.
(a) Submissions
(b) Comments
Figure 2: Seven day average number of a) submissions and b) comments per day on the top 10 most subscribed subverses on Voat. Note log scale on y-axis.

Posting Activity. First, we show the overall posting activity on Voat. Figure 1 shows the number of submissions and comments per day on the platform. The vertical red dotted lines represent the events listed in Table 1. Although the platform was officially launched in April 2014, the first-ever submission was posted by @Atko on November 8, 2013, on the /v/voatdev subverse. This subverse was discussing the development of Voat, and at the time, only seven users were posting on the platform.

The total number of submissions in 2013 is only . These submissions primarily include discussions of @Atko and @PuttItOut in the /v/voatdev subverse. When the platform was launched in 2014, the total number of submissions peaks to , then in 2015, in 2016, in 2017, in 2018, in 2019, and for the last year, 2020, submissions. Overall, there was no significant increase in activity on the platform after 2016. The most active day on the site is July 10, 2015, with submissions. Manual inspection of our dataset indicates that discussions on that day focuses on Donald Trump, vaccine legislation, Reddit’s CEO Ellen Pao resigning, and other world happenings.

The date with the most submissions on the site is very close to the day Reddit banned many communities like /r/fatpeoplehate and /r/nigger [verge2015migration]. Shortly after Reddit banned these communities, Voat experienced heavy traffic and downtime [voatdown2015]. Regarding comment activity, only comments were posted in 2013, in 2014, in 2015, in 2016, in 2017, in 2018, in 2019, and in 2020 Again, the date with the most comments on the platform is July 10, 2015, with comments.

In addition, we show the overall activity on Voat in the top ten most subscribed subverses, namely, /v/AskVoat, /v/GreatAwakening, /v/QRV, /v/fatpeoplehate, /v/funny, /v/news, /v/politics, /v/theawakening, /v/videos, and /v/whatever, in Figure 2. We present this analysis to show how active the most popular subverses on Voat were since we believe that researchers interested in our dataset might consider these findings useful. The vertical red dotted lines on the figure indicate the bans listed in Table 1. When Reddit refugee crowd joined Voat (ban number 1, 3 and 4 from Table 1) many general discussion subverses like /v/AskVoat, /v/news, /v/politics, /v/videos, /v/funny, and /v/whatever became more active, indicating that this new influx of users bolstered the overall activity on the platform.

Interestingly, not all banned subreddits appeared on Voat as subverses shortly after a Reddit ban frenzy. The subverse /v/GreatAwakening was created on January 1, 2018, nine months before Reddit banned QAnon subreddits (ban no. 8 ). This subverse was the 10th most popular subverse when Voat shut down. QAnon discussion on the platform boomed when /v/theawakening and /v/QRV first appeared on Voat on September 12 and September 22, 2018, respectively, with approximately 200 submissions per day on /v/QRV alone. These three subverses turned out to be among the top 5 most active subverses on the platform, with /v/QRV being the most active in both daily submissions and comments on the whole Voat, within only ten days after being banned from Reddit [awakening2018ban].

The figures discussed in this subsection support the reports that Voat was among the main hubs for Reddit migrating communities. In addition, Figure 2 shows that other than general discussion subverses, the most subscribed subverses focused on hate speech (/v/fatpeoplehate) and conspiracy theories (/v/QRV, /v/theawakening, /v/TheGreatAwakening).

Submission Engagement. We set to discuss the engagement of the users on the platform. In Figure 3

we plot the Cumulative Distribution Functions (CDF) of the number of comments, upvotes, downvotes, and net votes (upvotes minus downvotes) per submission.

Submissions on Voat get a median number of comments, upvotes, downvotes, and a net score of . Comments receive a median and upvotes and downvotes respectively. The most upvoted submission reached over upvotes, posted by Atko in /v/announcements in July 2015, explaining that Voat is experiencing heavy traffic because of Reddit bans and they are working on fixing the issues. The most downvoted submission ( downvotes) was posted in /v/politics with the title “Dear Media: Please Stop Normalizing The Alt-Right.” The most liked comment noted that “someone isn’t happy that Voat is succeeding” and reached upvotes on a submission posted by Atko that was discussing the DDoS attacks Voat was experiencing in July 2015. Last, the most disliked comment received downvotes from a user that was asking @PuttItOut to reconsider the voting system of the site since they lost their submission posting privileges because of people downvoting them when posting their honest opinion. The user asks the CEOs:

[…]ask yourself: Are you fine with a website that caters to some of the most dangerous people currently walking the planet? Take a look at how depraved Trump supporters are, and ask yourself if free speech is worth the cost:[…]

Figure 3: CDF of the number of comments, upvotes, downvotes, and net votes per submission.
Figure 4: Number of users and subverses registered per day. Note log on y-axis.

User registration and Subverse creation. In Figure 4 we plot the number of daily user and subverse registrations on Voat. The vertical dotted lines mark the bans listed in Table 1.

The first Reddit ban that seemed to have influenced Voat’s user base is the one of /r/beatingwomen, on June 9, 2014 [beatingwomen] (ban no. 1). Eleven days after the ban, on June 20, Voat had 145 new subverses in a single day. Specifically, the day with the most subverses ever created on the platform. On June 22, there were 112 new subverses.

Moving on to 2015, we find that July 7 is the date with the most users ever registered on Voat in a single day, registrations, followed by July 5 with registrations. These dates are close to the date Reddit banned various hate subreddits like /r/nigger and /r/fatpeoplehate (ban no. 3 and 4). Also, during summer of 2015, Reddit changed their free speech and content policy [redditpolicy] and the founder noted that “Reddit was not created to be a bastion of free speech.” On July 12 and 13, the platform marked two of the five days with the highest new subverses created, 125 and 112, respectively. The fifth top date with the most user registrations on Voat is September 13, 2018, with users, probably due to Reddit banning QAnon focused subreddits (ban no. 8).

This analysis provides a glance at Voat’s user base and subverse changes over the years. It is apparent that Reddit influenced Voat activity and that the platform was among the preferred Reddit alternatives for banned users.

Links. Since Voat is a news aggregator platform, we also analyze the domains the users posted on the site to show what kind of content the userbase of Voat consumed.

For each submission that redirects users to other domains, we retrieve the name of the subverse the submission is posted in and the external link it redirects to. We count how many times a domain is shared in a community, keeping only the subverse and domain pairs that are the most recurrent in the dataset. The results of this analysis are displayed in Figure 5, an alluvial diagram, where the line thickness represents the number of times the domain was shared on the subverse it points to.

Most of the links that redirect users to Reddit were posted in /v/MeanwhileOnReddit. The subverse focusing on body-shaming, /v/fatpeoplehate, redirected users to Instagram, YouTube, and image sharing services; websites where users can upload images and share the link to that image on other platforms. The /v/news subverse linked YouTube, Voat, online press outlets, and archiving services links. It is known that users in alt-right social networks avoid sharing the direct link to a website and prefer an archive link instead to avoid monetizing the website [zannettou2018understanding]. The majority of the alternative news links (Breitbart, GatewayPundit, and Zero Hedge) are posted on /v/news and /v/WorldToday. Most of the Twitter links on the website were posted in /v/QRV and /v/GreatAwakening. Most of the tweets include Donald Trump’s tweets and other political discussions on Twitter.

Overall, Voat users shared links to other social networks like 4chan, Twitter, and Instagram. News on the website was shared via legitimate online press outlets and other alternative news outlets, along with archiving services links. Most of the images on the platform were shared on /v/funny, /v/fatpeoplehate, and /v/whatever.

Figure 5: Voat’s shared domains ecosystem.

Content Creators. We now take a deep look into Voat’s user ecosystem. We attempt to show how users form clusters based on the subverses they most often engaged with (posted a submission or a comment) to show whether the userbase of Voat is homogeneous or not. Further analysis on Voat’s user base may shed light on what content users prefer to see on Voat and whether all of Voat’s subverses focused on hateful and politically incorrect content.

As shown from [papasavva2021qoincidence, amin2021hatemail], some users are responsible for a large amount of content being shared in some communities, leading to imbalances, influencing the content users consume on the platform. By analyzing each user’s interactions on Voat, we hope to obverse how all these various communities blended after a mass migration from Reddit, or if Voat was nothing more than an aggregate of small, selective echo chambers. This way, researchers interested in our paper can know what to expect.

In Figure 6 we plot a graph network where nodes represent users, and the edges symbolize their interactions. For example, users are linked together if they participated in the same conversation, i.e., they both commented on the same submission, or one of them is the submitter while the other one commented. The weight of the edge is given by the number of interactions shared by the same two users, and the color represents the subverse where the user participated the most.

Figure 6: User and subverse interaction ecosystem on Voat.

The network is composed of a giant cluster, where most of the subverses are mixed together. This cluster includes /v/politics, /v/news, and /v/whatever, which makes sense since these are general discussion subverses, and it is expected that many users meet there for general discussion. However, some of the subverses are strongly isolated in the network. For example, the /v/NeoFAG (yellow) community shows that most users tend to only engage within that subverse. Similarly, /v/GreatAwakening (red) and /v/theawakening (dark blue) seem to be clustered together and somewhat interacting with /v/pizzagate (dark green). Some users that engage with these three subverses also engage in the general discussion subverses, which is aligned with the findings of [papasavva2021qoincidence]. Last, /v/fatpeoplehate (brown) users also seem to form their own cluster while infiltrating the general discussion subverses.

To measure the homophily of these communities, we used the homophily index, which is a metric that indicates how many members of a network favor in-group interactions rather than out-group ones. Given a specific node with E external edges, i.e., edges with nodes from the out-group, and I internal edges, i.e., edges with nodes from the in-group, the homophily index is given by the equation .

An index indicates that the node only interacts with members of the out-group, whereas applies to nodes that only interact within their in-group. Table 5 lists the average homophily index of the members of the subverses highlighted in the legend of Figure 6.

Subverse EI-Homophily Index
politics 0.50
news 0.40
whatever 0.23
theawakening 0.01
GreatAwakening -0.25
pizzagate -0.49
fatpeoplehate -0.61
NeoFAG -0.74
Table 5: Average homophily index between subverses and members.

Users who are very active on popular subverses such as /v/politics and /v/news have a high average homophily index, meaning they mostly interact with users from the out-group. The opposite can be said for subverses like /v/theawakening, /v/GreatAwakening, /v/pizzagate, and especially, /v/NeoFAG and /v/fatpeoplehate. These communities do not converse a lot outside of their social group. The index is almost zero for /v/theawakening, meaning users from this community interact as much with the out-group as with the in-group. By looking at the community, this can be explained by the fact that users from the communities gravitating around the QAnon narrative, i.e., /v/theawakening, and /v/GreatAwakening, are more connected than other communities. As a result, the external edges can be nothing more than crossovers between these two subverses. The userbases of /v/NeoFAG and /v/fatpeoplehate seem to be the ones that only prefer to interact with members of their community.

We present this analysis to motivate researchers studying user interactions and echo chambers. Further research using our dataset may shed light on whether Voat was a bastion of echo chambers or not, along with what narratives users within these communities exchanged.

7 Related Work

In this section, we present existing work focusing on Voat, and other dataset papers similar to ours. Voat attracted the interest of researchers over the past years, especially after Reddit started banning communities in 2015. Although some papers mention that their dataset is available upon request, these datasets only include data from a couple of subverses that cover a short period of time. To the best of our knowledge, our Voat dataset is 1) the only one to be openly and publicly available online, and 2) the most complete and largest one, covering the whole history of Voat, along with data of the users that ever posted a submission or a comment on the platform.

Voat research. Newell et al. [newell2016user] collect data from various platforms, including Voat and Reddit and perform, among others, computational analysis to identify the primary motivations that drive users to move to other platforms. Chandrasekharan et al. [chandrasekharan2017bag] collect data from 4chan, Reddit, MetaFilter, and Voat and build a model to detect abusive content online. The Voat subverses used in this work include /v/CoonTown, /v/Nigger, and /v/fatpeoplehate, all focused on hate towards individuals of specific body or race characteristics, created on Voat shortly after the 2015 Reddit bans [verge2015migration]. Similarly, Saleem et al. [saleem2017web]

collect data from Reddit, Voat, and three online forums to train a classifier that detects hateful speech. Their Voat dataset includes data from /v/CoonTown, /v/fatpeoplehate, and /v/TheRedPill. A study on deepfakes finds that pornographic deepfakes are mainly created for circulation within the community 

[popova2019reading]. The study uses data from Voat’s /v/DeepFake and the site mrdeepfakes.com, which both were created after Reddit banned the subreddit /r/DeepFakes in 2018 [verge2018deep].

Khalid and Srinivasan [khalid2020style] compare the features of 872K comments from /v/politics, /v/television, and /v/travel, to Reddit and 4chan comments building a classifier that predicts the origin of the comments based on its style and content. Papasavva et al. [papasavva2021qoincidence] collect posts from /v/GreatAwakening, /v/news, /v/politics, /v/funny, and /v/AskVoat to provide an empirical exploratory analysis of the QAnon community on Voat. They find, among other things, that /v/GreatAwakening is not as toxic as the general discussion subverses. Last, Aliapoulios et al. [aliapoulios2021gospel] compare Voat’s /v/GreatAwakening and /v/news posts to 4chan, 8kun, Reddit, and Q drops (posts posted by “Q,” the mastermind behind the QAnon conspiracy theory) on a large scale study on QAnon. They find that Voat posts are as threatening as Q drops and that content creators on Reddit and Voat only consist of a small portion of the total community.

Other datasets. One of the largest Reddit datasets is the one of Baumgartner et al. [baumgartner2020pushshift], which presents an archiving platform that collects Reddit data and makes them available to researchers since 2015. The same platform also published over channels and messages from users from Telegram [baumgartner2020telegram]. Fair and Wesslen [fair2019shouting] release a dataset of posts, comments, and user profiles collected from Gab. Aliapoulios et al. [aliapoulios2021early] published a dataset consisting of posts and user profiles from Parler, a Twitter alternative. Last, Papasavva et al. [papasavva2020raiders] present a dataset with over threads and posts from the Politically Incorrect board (/pol/) of the imageboard forum 4chan.

8 Conclusion

In this work, we present and release a Voat dataset comprising more than submissions and comments posted from users in over Voat subverses. We combine data collected from Voat API and IAWM released archives to complete the dataset to the best of our ability. Voat shut down on December 25, 2020, and its data are now otherwise inaccessible. In this work we also perform a preliminary analysis of the released dataset so researchers interested in it can know what to expect.

Overall, we hope this work further motivates and assists researchers focusing on deplatforming and how users organize massive immigration to other platfroms. In addition, our dataset could also help answer numerous questions about how ‘free-speech’ sites operated, e.g., do moderators ban users that express opinions other than the ones aligned with the narratives of a subverse? How do other users vote and how toxic are they towards such content? Do sites like these incentivize users to form echo chambers? What kind of content users in this communities consume, etc.? Also, our dataset could assist multi-platform studies to understand similarities and differences of different communities. Last, since Voat was a bastion of free-speech, we are confident that access to our dataset could assist researchers towards training algorithms in natural language processing and detecting hate speech, fake news dissemination, conspiracy theories, etc. Finally, other than quantitative work, we hope that the data can also be used in qualitative work studying specific events, social theories, and communities.

Acknowledgments. This work was partially funded by the UK EPSRC grant EP/S022503/1 that supports the UCL Centre for Doctoral Training in Cybersecurity.

References