Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

04/04/2020 ∙ by Caleb Ziems, et al. ∙ USC Information Sciences Institute 0

Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Cyberbullying poses a serious threat to the safety of online communities. The Centers for Disease Control and Prevention (CDC) identify cyberbullying as a “growing public health problem in need of additional research and prevention efforts” [7]. Cyberbullying has been linked to negative mental health outcomes, including depression, anxiety, and other forms of self-harm, suicidal ideation, suicide attempts, and difficulties with social and emotional processing [17, 23, 28]. Where traditional bullying was once limited to a specific time and place, cyberbullying can occur at any hour and from any location on earth [3]. Once the first message has been sent, the attack can escalate rapidly as harmful content is spread across shared media, compounding these negative effects [34, 14].

Internet users depend on content moderators to flag abusive text and ban cyberbullies from participating in online communities. However, due to the overwhelming volume of social media data produced every day, manual human moderation is often unfeasible. For this reason, social media platforms are beginning to rely instead on machine learning classifiers for automatic cyberbullying detection [33].

The research community has developed increasingly competitive classifiers to detect harmful or aggressive content in text. Despite significant progress in recent years, however, existing models remain unfit for real-world applications. This is due, in part, to shortcomings in the training and testing data [12, 27, 26]. Most annotation schemes have ignored the importance of social context, and researchers have neglected to provide annotators with objective criteria for distinguishing cyberbullying from other crude messages.

Work aggr rep harm peer power Data Source Size Balance Agreement Context
al2016cybercrime al2016cybercrime Twitter 10,007 6.0%
chatzakou2017mean chatzakou2017mean Twitter 9,484 0.54
hosseinmardi2015analyzing hosseinmardi2015analyzing Instagram 1,954 29.0% 0.50
huang2014cyber huang2014cyber Twitter 4,865 1.9%
reynolds2011using reynolds2011using Formspring 3,915 14.2%
rosa2019automatic rosa2019automatic Formspring 13,160 19.4%
sugandhi2016automatic sugandhi2016automatic Mixed 3,279 12.0%
van2018automatic van2018automatic AskFM 113,698 4.7% 0.59
Table 1: Datasets built from different related definitions of cyberbullying. For each dataset, we report the size, positive class balance, inter-annotator agreement, and whether the study incorporated social context in the annotation process.

To address the urgent need for reliable data, we provide an original annotation framework and an annotated Twitter dataset.111 The key advantages to our labeling approach are:

  • [leftmargin=.2in]

  • Contextually-informed ground truth. We provide annotators with the social context surrounding each message, including the contents of the reply thread and the account information of each user involved.

  • Clear labeling criteria. We ask annotators to provide labels for five clear cyberbullying criteria. These criteria can be combined and adapted for revised definitions of cyberbullying.

Using our new dataset, we experiment with existing NLP features and compare results with a newly-proposed set of features. We designed these features to encode the dynamic relationship between a potential bully and victim, using comparative measures from their relative linguistic and social network profiles. Additionally, our features have low computational complexity, so they can scale to internet-scale datasets, unlike expensive network centrality and clustering measurements.

Results from our experiments suggest that, although existing NLP models can reliably detect aggressive language in text, these lexically-trained classifiers will fall short of the more subtle goal of cyberbullying detection. With

-grams and dictionary-based features, classifiers prove unable to detect harmful intent, visibility among peers, power imbalance, or the repetitive nature of aggression with sufficiently high precision and recall. However, our proposed feature set improves

scores on all four of these social measures. Real-world detection systems can benefit from our proposed approach, incorporating the social aspects of cyberbullying into existing models and training these models on socially-informed ground truth labels.


Existing approaches to cyberbullying detection generally follow a common workflow. Data is collected from social networks or other online sources, and ground truth is established through manual human annotation. Machine learning algorithms are trained on the labeled data using the message text or hand-selected features. Then results are typically reported using precision, recall, and scores. Comparison across studies is difficult, however, because the definition of cyberbullying has not been standardized. Therefore, an important first step for the field is to establish an objective definition of cyberbullying.

Defining Cyberbullying

Some researchers view cyberbullying as an extension of more “traditional” bullying behaviors [10, 21, 24]. In one widely-cited book, the psychologist Dan Olweus defines schoolyard bullying in terms of three criteria: repetition, harmful intent, and an imbalance of power [20]. He then identifies bullies by their intention to “inflict injury or discomfort” upon a weaker victim through repeated acts of aggression.

Social scientists have extensively studied this form of bullying as it occurs among adolescents in school [15, 16]. However, experts disagree whether cyberbullying should be studied as a form of traditional bullying or a fundamentally different phenomenon [15, 21]. Some argue that, although cyberbullying might involve repeated acts of aggression, this condition might not necessarily hold in all cases, since a single message can be otherwise forwarded and publicly viewed without repeated actions from the author [30, 34]. Similarly, the role of power imbalance is uncertain in online scenarios. Power imbalances of physical strength or numbers may be less relevant, whereas bully anonymity and the permanence of online messages may be sufficient to render the victim defenseless [31].

The machine learning community has not reached a unanimous definition of cyberbullying either. They have instead echoed the uncertainty of the social scientists. Moreover, some authors have neglected to publish any objective cyberbullying criteria or even a working definition for their annotators, and among those who do, the formulation varies. This disagreement has slowed progress in the field, since classifiers and datasets cannot be as easily compared. Upon review, however, we found that all available definitions contained a strict subset of the following criteria: aggression (aggr), repetition (rep), harmful intent (harm), visibility among peers (peer), and power imbalance (power). The datasets built from these definitions are outlined in Table 1.

Existing Sources of Cyberbullying Data

According to van2018automatic van2018automatic, data collection is the most restrictive “bottleneck” in cyberbullying research. Because there are very few publicly available datasets, some researchers have turned to crowdsourcing using Amazon Mechanical Turk or similar platforms.

In most studies to date, annotators labeled individual messages instead of message threads, ignoring social context altogether [1, 13, 18, 25, 29, 32]. Only three of the papers that we reviewed incorporated social context in the annotation process. chatzakou2017mean chatzakou2017mean considered batches of time-sorted tweets called sessions, which were grouped by user accounts, but they did not include message threads or any other form of context. van2018automatic van2018automatic presented “original conversation[s] when possible,” but they did not explain when this information was available. hosseinmardi2016prediction hosseinmardi2016prediction was the only study to label full message reply threads as they appeared in the original online source.

Modeling Cyberbullying Behavior

A large body of work has been published on cyberbullying detection and prediction, primarily through the use of natural language processing techniques. Most common approaches have relied on lexical features such as

-grams [12, 33, 35]

, TF-IDF vectors

[9, 19, 32], word embeddings [38], or phonetic representations of messages [37], as well as dictionary-based counts on curse words, hateful or derogatory terms, pronouns, emoticons, and punctuation [1, 6, 25, 29]. Some studies have also used message sentiment [29, 32, 33] or the age, gender, personality, and psychological state of the message author according to text from their timelines [1, 6]. These methods have been reported with appreciable success as shown in Table 2.

Some researchers argue, however, that lexical features alone may not adequately represent the nuances of cyberbullying. hosseinmardi2015analyzing hosseinmardi2015analyzing found that among Instagram media sessions containing profane or vulgar content, only 30% were acts of cyberbullying. They also found that while cyberbullying posts contained a moderate proportion of negative terms, the most negative posts were not considered cases of cyberbullying by the annotators. Instead, these negative posts referred to politics, sports, and other domestic matters between friends [11].

The problem of cyberbullying cuts deeper than merely the exchange of aggressive language. The meaning and intent of an aggressive post is revealed through conversation and interaction between peers. Therefore, to properly distinguish cyberbullying from other uses of aggressive or profane language, future studies should incorporate key indicators from the social context of each message. Specifically, researchers can measure the author’s status or social advantage, the author’s harmful intent, the presence of repeated aggression in the thread, and the visibility of the thread among peers [11, 26, 27].

Since cyberbullying is an inherently social phenomenon, some studies have naturally considered social network measures for classification tasks. Several features have been derived from the network representations of the message interactions. The degree and eigenvector centralities of nodes, the

-core scores, and clustering of communities, as well as the tie strength and betweenness centralities of mention edges have all been shown to improve text-based models [13, 29]. Additionally, bullies and victims can be more accurately identified by their relative network positions. For example, the Jaccard coefficient between neighborhood sets in bully and victim networks has been found to be statistically significant [5]. The ratio of all messages sent and received by each user was also significant.

These findings show promising directions for future work. Social network features may provide the information necessary to reliably classify cyberbullying. However, it may be prohibitively expensive to build out social networks for each user due to time constraints and the limitations of API calls [36]. For this reason, alternative measurements of online social relationships should be considered.

In the present study, we leverage prior work by incorporating linguistic signals into our classifiers. We extend prior work by developing a dataset that better reflects the definitions of cyberbullying presented by social scientists, and by proposing and evaluating a feature set that represents information pertaining to the social processes that underlie cyberbullying behavior.

Work Model Precision Recall F1 Class
zhang2016cyberbullying zhang2016cyberbullying CNN 99.1% 97.0% 98.0% total
al2016cybercrime al2016cybercrime Random Forest 94.1% 93.9% 93.6% total
nahar2014semi nahar2014semi SVM 87.0% 97.0% 92.0% CB
sugandhi2016automatic sugandhi2016automatic SVM 91.0% 91.0% 91.0% total
soni2018time soni2018time Naïve Bayes 80.2% 80.2% 80.2% total
zhao2016automatic zhao2016automatic SVM 76.8% 79.4% 78.0% total
xu2012learning xu2012learning SVM 76.0% 79.0% 77.0% total
hosseinmardi2016prediction hosseinmardi2016prediction Logistic Regression 78.0% 72.0% 75.0% CB
yao2019cyberbullying yao2019cyberbullying CONcISE 69.5% 79.4% 74.1% CB
van2018automatic van2018automatic SVM 73.3% 57.2% 64.3% total
singh2016cyberbullying singh2016cyberbullying Proposed 82.0% 53.0% 64.0% CB
rosa2019automatic rosa2019automatic SVM 46.0% - 45.0% CB
dadvar2013improving dadvar2013improving SVM 31.0% 15.0% 20.0% CB
huang2014cyber huang2014cyber Dagging 76.3% - - CB
Table 2: State of the Art in Cyberbullying Detection. Here, results are reported on either the Cyberbullying (CB) class exclusively or on the entire (total) dataset.

Curating a Comprehensive
Cyberbullying Dataset

Here, we provide an original annotation framework and a new dataset for cyberbullying research, built to unify existing methods of ground truth annotation. In this dataset, we decompose the complex issue of cyberbullying into five key criteria, which were drawn from the social science and machine learning communities. These criteria can be combined and adapted for revised definitions of cyberbullying.

Data Collection

We collected a sample of 1.3 million unlabeled tweets from the Twitter Filter API. Since cyberbullying is a social phenomenon, we chose to filter for tweets containing at least one “@” mention. To restrict our investigation to original English content, we removed all non-English posts and retweets (RTs), narrowing the size of our sample to 280,301 tweets.

Since aggressive language is a key component of cyberbullying [11], we ran the pre-trained classifier of davidson2017automated davidson2017automated over our dataset to identify hate speech and aggressive language and increase the prevalence of cyberbullying examples 222Without this step, our positive class balance would be prohibitively small. See Appendix 1 for details.. This gave us a filtered set of 9,803 aggressive tweets.

We scraped both the user and timeline data for each author in the aggressive set, as well as any users who were mentioned in one of the aggressive tweets. In total, we collected data from 21,329 accounts. For each account, we saved the full user object, including profile name, description, location, verified status, and creation date. We also saved a complete list of the user’s friends and followers, and a 6-month timeline of all their posts and mentions from January through June , 2019. For author accounts, we extended our crawl to include up to four years of timeline content. Lastly, we collected metadata for all tweets belonging to the corresponding message thread for each aggressive message.

Annotation Task

We presented each tweet in the dataset to three separate annotators as a Human Intelligence Task (HIT) on Amazon’s Mechanical Turk (MTurk) platform. By the time of recruitment, 6,897 of the 9,803 aggressive tweets were accessible from the Twitter web page. The remainder of the tweets had been removed, or the Twitter account had been locked or suspended.

We asked our annotators to consider the full message thread for each tweet as displayed on Twitter’s web interface. We also gave them a list of up to 15 recent mentions by the author of the tweet, directed towards any of the other accounts mentioned in the original thread. Then we asked annotators to interpret each tweet in light of this social context, and had them provide us with labels for five key cyberbullying criteria. We defined these criteria in terms of the author account (“who posted the given tweet?”) and the target (“who was the tweet about?” – not necessarily the first mention). We also stated that “if the target is not on Twitter or their handle cannot be identified” the annotator should “please write OTHER.” With this framework established, we gave the definitions for our five cyberbullying criteria as follows.

  1. [leftmargin=.2in]

  2. Aggressive language: (aggr) Regardless of the author’s intent, the language of the tweet could be seen as aggressive. The user either addresses a group or individual, and the message contains at least one phrase that could be described as confrontational, derogatory, insulting, threatening, hostile, violent, hateful, or sexually abusive.

  3. Repetition: (rep) The target user has received at least two aggressive messages in total (either from the author or from another user in the visible thread).

  4. Harmful intent: (harm) The tweet was designed to tear down or disadvantage the target user by causing them distress or by harming their public image. The target does not respond agreeably as to a joke or an otherwise lighthearted comment.

  5. Visibility among peers: (peer) At least one other user besides the target has liked, retweeted, or responded to at least one of the author’s messages.

  6. Power imbalance: (power) Power is derived from authority and perceived social advantage. Celebrities and public figures are more powerful than common users. Minorities and disadvantaged groups have less power. Bullies can also derive power from peer support.

Each of these criteria was represented as a binary label, except for power imbalance, which was ternary. We asked “Is there strong evidence that the author is more powerful than the target? Is the target more powerful? Or if there is not any good evidence, just mark equal.” We recognized that an imbalance of power might arise in a number of different circumstances. Therefore, we did not restrict our definition to just one form of power, such as follower count or popularity.

For instructional purposes, we provided five sample threads to demonstrate both positive and negative examples for each of the five criteria. Two of these threads are shown here. The thread in Figure 0(a) displays bullying behavior that is targeted against the green user, with all five cyberbullying criteria displayed. The thread includes repeated use of aggressive language such as “she really fucking tried” and “she knows she lost.” The bully’s harmful intent is evident in the victim’s defensive responses. And lastly, the thread is visible among four peers as three gang up against one, creating a power imbalance.

The final tweet in Figure 0(b) shows the importance of context in the annotation process. If we read only this individual message, we might decide that the post is cyberbullying, but given the social context here, we can confidently assert that this post is not cyberbullying. Although it contains the aggressive phrase “FUCK YOU TOO BITCH”, the author does not intend harm. The message is part of a joking exchange between two friends or equals, and no other peers have joined in the conversation or interacted with the thread.

After asking workers to review these examples, we gave them a short 7-question quiz to test their knowledge. Workers were given only one quiz attempt, and they were expected to score at least 6 out of 7 questions correctly before they could proceed to the paid HIT. Workers were then paid for each thread that they annotated.

We successfully recruited 170 workers to label all 6,897 available threads in our dataset. They labeled an average of 121.7 threads and a median of 7 threads each. They spent an average time of 3 minutes 50 seconds, and a median time of 61 seconds per thread. For each thread, we collected annotations from three different workers, and from this data we computed our reliability metrics using Fleiss’s Kappa for inter-annotator agreement as shown in Table 3.

We determined ground truth for our data using a 2 out of 3 majority vote as in hosseinmardi2015analyzing hosseinmardi2015analyzing. If the message thread was missing or a target user could not be identified, we removed the entry from the dataset, since later we would need to draw our features from both the thread and the target profile. After filtering in this way, we were left with 5,537 labeled tweets.

Criterion Positive
Balance Inter-annotator
Agreement Cyberbullying
aggression 74.8% 0.23 0.22
repetition 6.6% 0.18 0.27
harmful intent 16.1% 0.42 0.68
visibility among peers 30.1% 0.51 0.07
target power 34.3% 0.37 0.11
author power 3.1% 0.10 -0.02
equal power 59.7% 0.22 -0.09
cyberbullying 0.7% 0.18
Table 3: Analysis of Labeled Twitter Data
(a) Cyberbullying
(b) Not Cyberbullying


(c) Downward overlap


(d) Upward overlap


(e) Inward overlap


(f) Outward overlap


(g) Bidirectional overlap
Figure 1: Cyberbullying or not. The leftmost thread demonstrates all five cyberbullying criteria. Although the thread in the middle contains repeated use of aggressive language, there is no harmful intent, visibility among peers, or power imbalance. Overlap measures. (right) Graphical representation of the neighborhood overlap measures of author and target .

Cyberbullying Transcends Cyberaggression

As discussed earlier, some experts have argued that cyberbullying is different from online aggression [11, 26, 27]. We asked our annotators to weigh in on this issue by asking them the subjective question for each thread: “Based on your own intuition, is this tweet an example of cyberbullying?” We did not use the cyberbullying label as ground truth for training models; we used this label to better understand worker perceptions of cyberbullying. We found that our workers believed cyberbullying will depend on a weighted combination of the five criteria presented in this paper, with the strongest correlate being harmful intent as shown in Table 3.

Furthermore, the annotators decided our dataset contained 74.8% aggressive messages as shown in the Positive Balance column of Table 3. We found that a large majority of these aggressive tweets were not labeled as “cyberbullying.” Rather, only 10.5% were labeled by majority vote as cyberbullying, and only 21.5% were considered harmful. From this data, we propose that cyberbullying and cyberaggression are not equivalent classes. Instead, cyberbullying transcends cyberaggression.

Feature Engineering

We have established that cyberbullying is a complex social phenomenon, different from the simpler notion of cyberaggression. Standard Bag of Words (BoW) features based on single sentences, such as -grams and word embeddings, may thus lead machine learning algorithms to incorrectly classify friendly or joking behavior as cyberbullying [11, 26, 27]. To more reliably capture the nuances of repetition, harmful intent, visibility among peers, and power imbalance, we designed a new set of features from the social and linguistic traces of Twitter users. These measures allow our classifiers to encode the dynamic relationship between the message author and target, using network and timeline similarities, expectations from language models, and other signals taken from the message thread.

For each feature and each cyberbullying criterion, we compare the cumulative distributions of the positive and negative class using the two-sample Kolmogorov-Smirnov test. We report the Kolmogorov-Smirnov statistic (a normalized distance between the CDF of the positive and negative class) as well as the -value with as our level for statistical significance.

Text-based Features

To construct realistic and competitive baseline models, we consider a set of standard text-based features that have been used widely throughout the literature. Specifically, we use the NLTK library [2] to construct unigrams, bigrams, and trigrams for each labeled message. This parallels the work of hosseinmardi2016prediction hosseinmardi2016prediction, van2018automatic van2018automatic, and xu2012learning xu2012learning. Following zhang2016cyberbullying zhang2016cyberbullying, we incorporate counts from the Linguistic Inquiry and Word Count (LIWC) dictionary to measure the linguistic and psychological processes that are represented in the text [22]. We also use a modified version of the Flesch-Kincaid Grade Level and Flesch Reading Ease scores as computed in davidson2017automated davidson2017automated. Lastly, we encode the sentiment scores for each message using the Valence Aware Dictionary and sEntiment Reasoner (VADER) of hutto2014vader hutto2014vader.

Social Network Features

Network features have been shown to improve text-based models [14, 29], and they can help classifiers distinguish between bullies and victims [5]. These features may also capture some of the more social aspects of cyberbullying, such as power imbalance and visibility among peers. However, many centrality measures and clustering algorithms require detailed network representations. These features may not be scalable for real-world applications. We propose a set of low-complexity measurements that can be used to encode important higher-order relations at scale. Specifically, we measure the relative positions of the author and target accounts in the directed following network by computing modified versions of Jaccard’s similarity index as we now explain.

Neighborhood Overlap

Let be the set of all accounts followed by user and let be the set of all accounts that follow user . Then is the neighborhood set of . We consider five related measurements of neighborhood overlap for a given author and target , listed here.

Downward overlap measures the number of two-hop paths from the author to the target along following relationships; upward overlap measures two-hop paths in the opposite direction. Inward overlap measures the similarity between the two users’ follower sets, and outward overlap measures the similarity between their sets of friends. Bidirectional overlap then is a more generalized measure of social network similarity. We provide a graphical depiction for each of these features on the right side of Figure 1.

High downward overlap likely indicates that the target is socially relevant to the author, as high upward overlap indicates the author is relevant to the target. Therefore, when the author is more powerful, downward overlap is expected to be lower and upward overlap is expected be higher. This trend is slight but visible in the cumulative distribution functions of Figure 

2 (a): downward overlap is indeed lower when the author is more powerful than when the users are equals (). However, there is not a significant difference for upward overlap (). We also observe that, when the target is more powerful, downward and upward overlap are both significantly lower ( and respectively). It is reasonable to assume that messages can be sent to celebrities and other powerful figures without the need for common social connections.

Next, we consider inward and outward overlap. When the inward overlap is high, the author and target could have more common visibility. Similarly, if the outward overlap is high, then the author and target both follow similar accounts, so they might have similar interests or belong to the same social circles. Both inward and outward overlaps are expected to be higher when a post is visible among peers. This is true of both distributions in Figure 2. The difference in outward overlap is significant (, ), and the difference for inward overlap is short of significant (, ).

(a) Downward Overlap
(b) Upward Overlap
(c) Inward Overlap
(d) Outward Overlap
Figure 2: Cumulative Distribution Functions for neighborhood overlap on relevant features. These measures are shown to be predictive of power imbalance and visibility among peers.

User-based features

We also use basic user account metrics drawn from the author and target profiles. Specifically, we count the friends and followers of each user, their verified status, and the number of tweets posted within six-month snapshots of their timelines, as in al2016cybercrime al2016cybercrime, chatzakou2017mean chatzakou2017mean, and hosseinmardi2016prediction hosseinmardi2016prediction.

Timeline Features

Here, we consider linguistic features, drawn from both the author and target timelines. These are intended to capture the social relationship between each user, their common interests, and the surprise of a given message relative to the author’s timeline history.

Message Behavior

To more clearly represent the social relationship between the author and target users, we consider the messages sent between them as follows:

  • Downward mention count: How many messages has the author sent to the target?

  • Upward mention count: How many messages has the target sent to the author?

  • Mention overlap: Let be the set of all accounts mentioned by author , and let be the set of all accounts mentioned by target . We compute the ratio .

  • Multiset mention overlap: Let be the multiset of all accounts mentioned by author (with repeats for each mention), and let be the multiset of all accounts mentioned by target . We measure where takes the multiplicity of each element to be the sum of the multiplicity from and the multiplicity from

The direct mention count measures the history of repeated communication between the author and the target. For harmful messages, downward overlap is higher () and upward overlap is lower () than for harmless messages, as shown in Figure 3. This means malicious authors tend to address the target repeatedly while the target responds with relatively few messages.

Mention overlap is a measure of social similarity that is based on shared conversations between the author and the target. Multiset mention overlap measures the frequency of communication within this shared space. These features may help predict visibility among peers, or repeated aggression due to pile-on bullying situations. We see in Figure 3 that repeated aggression is linked to slightly greater mention overlap (, ), but the trend is significant only for multiset mention overlap (, ).

(a) Downward Mentions
(b) Upward Mentions
(c) Mention Overlap
(d) Multiset Mention Overlap
Figure 3: Cumulative Distribution Functions for message behavior on relevant features. These measures are shown to be indicative of harmful intent and repetition.

Timeline Similarity

Timeline similarity is used to indicate common interests and shared topics of conversation between the author and target timelines. High similarity scores might reflect users’ familiarity with one another, or suggest that they occupy similar social positions. This can be used to distinguish cyberbullying from harmless banter between friends and associates. To compute this metric, we represent the author and target timelines as TF-IDF vectors and . We then take the cosine similarity between the vectors as

A cosine similarity of 1 means that users’ timelines had identical counts across all weighted terms; a cosine similarity of 0 means that their timelines did not contain any words in common. We expect higher similarity scores between friends and associates.

In Figure 4 (a), we see that the timelines were significantly less similar when the target was in a position of greater power (). This is not surprising, since power can be derived from such differences between social groups. We do not observe the same dissimilarity when the author was more powerful (). What we do observe is likely caused by noise from extreme class imbalance and low inter-annotator agreement on labels for author power.

Turning to Figure 4 (b), we see that aggressive messages were less likely to harbor harmful intent if they were sent between users with similar timelines (). Aggressive banter between friends is generally harmless, so again, this confirms our intuitions.

(a) Timeline Similarity
(b) Timeline Similarity
Figure 4: Cumulative Distribution Functions for timeline similarity on relevant features. These measures are shown to be predictive of power imbalance and harmful intent.
(a) New Words Ratio
(b) Cross Entropy
Figure 5: Cumulative Distribution Functions for language models on relevant features. These measures are shown to be predictive of harmful intent.

Language Models

Harmful intent is difficult to measure in isolated messages because social context determines pragmatic meaning. We attempt to approximate the author’s harmful intent by measuring the linguistic “surprise” of a given message relative to the author’s timeline history. We do this in two ways: through a simple ratio of new words, and through the use of language models.

To estimate historical language behavior, we count unigram and bigram frequencies from a 4-year snapshot of the author’s timeline. Then, after removing all URLs, punctuation, stop words, mentions, and hashtags from the original post, we take the cardinality of the set unigrams in the post having zero occurrences in the timeline. Lastly, we divide this count by the length of the processed message to arrive at our

new words ratio. We can also build a language model from the bigram frequencies, using Kneser-Ney smoothing as implemented in NLTK [2]. From the language model, we compute the surprise of the original message according to its cross-entropy, given by

where is composed of bigrams , and

is the probability of the

th bigram from the language model.

We see in Figure 5 that harmfully intended messages have a greater density of new words (). This is intuitive, since attacks may be staged around new topics of conversation. However, the cross entropy of these harmful messages is slightly lower than for harmless messages (). This may be due to harmless jokes, since joking messages might depart more from the standard syntax of the author’s timeline.

Thread Features

Finally, we turn to the messages of the thread itself to compute measures of visibility and repeated aggression.


To determine the public visibility of the author’s post, we collect basic measurements from the interactions of other users in the thread. They are as follows.

  • Message count: Count the messages posted in the thread

  • Reply message count: Count the replies posted in the thread after the author’s first comment.

  • Reply user count: Count the users who posted a reply in the thread after the author’s first comment.

  • Maximum author favorites: The largest number of favorites the author received on a message in the thread.

  • Maximum author retweets: The largest number of retweets the author received on a message in the thread.

Feature BoW Text User Proposed Combined
LIWC, VADER, Flesch-Kincaid
Friend/following counts, tweet count, verified
Neighborhood overlap measures
Mention counts and overlaps
Timeline similarity
New words ratio, cross-entropy
Thread visibility features
Thread aggression features
Table 4: Feature Combinations
Criterion BoW Text User Proposed Combined
aggression 82.5% 82.3% 77.1% 78.7% 82.6%
repetition 7.8% 13.4% 7.7% 15.3% 31.7%
harmful intent 29.6% 49.4% 35.8% 34.5% 55.3%
visibility among peers 30.8% 34.3% 34.0% 42.2% 46.8%
author power 1.9% 3.6% 7.6% 9.8% 17.0%
target power 43.5% 51.5% 77.6% 75.2% 77.0%
Table 5: Precision


To detect repeated aggression, we again employ the hate speech and offensive language classifier of davidson2017automated davidson2017automated. Each message is given a binary label according to the classifier-assigned class: aggressive (classified as hate speech or offensive language) or non-aggressive (classified as neither hate speech nor offensive language). From these labels, we derive the following features.

  • Aggressive message count: Count the messages in the thread classified as aggressive

  • Aggressive author message count: Count the author’s messages that were classified as aggressive

  • Aggressive user count: Of the users who posted a reply in the thread after the author first commented, count how many had a message classified as aggressive

Experimental Evaluation

Using our proposed features from the previous section and ground truth labels from our annotation task, we trained a separate Logistic Regression classifier for each of the five cyberbullying criteria, and we report precision, recall, and measures over each binary label independently. We averaged results using five-fold cross-validation, with 80% of the data allocated for training and 20% of the data allocated for testing at each iteration. To account for the class imbalance in the training data, we used the synthetic minority over-sampling technique (SMOTE) [4]. We did not over-sample testing sets, however, to ensure that our tests better match the class distributions obtained as we did by pre-filtering for aggressive directed Twitter messages.

We compare our results across the five different feature combinations given in Table 4. Note that because we do not include thread features in the User set, it can be used for cyberbullying prediction and early intervention. The Proposed set can be used for detection, sinct it is a collection of all newly proposed features, including thread features. The Combined adds these to the baseline text features.

Criterion BoW Text User Proposed Combined
aggression 77.0% 84.8% 47.8% 51.6% 85.6%
repetition 17.6% 7.3% 49.5% 64.3% 26.2%
harmful intent 40.2% 44.4% 63.4% 67.7% 52.7%
visibility among peers 34.8% 20.4% 47.1% 54.2% 33.7%
author power 6.5% 1.6% 74.1% 80.0% 11.9%
target power 49.4% 43.3% 73.3% 80.8% 71.1%
Table 6: Recall
Criterion BoW Text User Proposed Combined
aggression 79.7% 83.5% 59.0% 62.3% 84.1%
repetition 10.8% 9.4% 13.3% 24.7% 28.7%
harmful intent 34.1% 46.7% 38.7% 45.7% 53.8%
visibility among peers 32.7% 25.5% 39.5% 47.4% 45.5%
author power 2.9% 2.2% 13.7% 17.5% 14.0%
target power 46.2% 47.0% 75.3% 77.9% 73.9%
Table 7: Scores

The performance of the different classifiers is summarized in Tables 5, 6, and 7. Here, we see that Bag of Words and text-based methods performed well on the aggressive language classification task, with an score of 83.5%. This was expected and the score aligns well with the success of other published results of Table 2.

Cyberbullying detection is more complex than simply identifying aggressive text, however. We find that these same baseline methods fail to reliably detect repetition, harmful intent, visibility among peers, and power imbalance, as shown by the low recall scores in Table 6. We conclude that our investigation of socially informed features was justified.

Our proposed set of features beats recall scores for lexically trained baselines in all but the aggression criterion. We also improve precision scores for repetition, visibility among peers, and power imbalance. When we combine all features, we see our scores beat baselines for each criterion. This demonstrates the effectiveness of our approach, using linguistic similarity and community measurements to encode social characteristics for cyberbullying classification.

Similar results were obtained by replacing our logistic regression model with any of a random forest model, support vector machine (SVM), AdaBoost, or Multilayer Perceptron (MLP). We report all precision, recall, and

scores in Appendix 2, Tables 9-17. We chose to highlight logistic regression because it can be more easily interpreted. As a result, we can identify the relative importance of our proposed features. The feature weights are also given in Appendix 2, Tables 22-22. There we observe a trend. The aggressive language and repetition criteria are dominated by lexical features; the harmful intent is split between lexical and historical communication features; and the visibility among peers and target power criteria are dominated by our proposed social features.

Although we achieve moderately competitive scores in most categories, our classifiers are still over-classifying cyberbullying cases. Precision scores are generally much lower than recall scores across all models. To reduce our misclassification of false positives and better distinguish between joking or friendly banter and cyberbullying, it may be necessary to mine for additional social features. Overall, we should work to increase all scores to above 0.8 before we can consider our classifiers ready for real-world applications [26].



Our study focuses on the Twitter ecosystem and a small part of its network. The initial sampling of tweets was based on a machine learning classifier of aggressive English language. This classifier has an F1 score of 0.90 [8]. Even with this filter, only 0.7% of tweets were deemed by a majority of MTurk workers as cyberbullying (Table 3). This extreme class imbalance can disadvantage a wide range of machine learning models. Moreover, the MTurk workers exhibited only moderate inter-annotator agreement (Table 3). We also acknowledge that notions of harmful intent and power imbalance can be subjective, since they may depend on the particular conventions or social structure of a given community. For these reasons, we recognize that cyberbullying still has not been unambiguously defined. Moreover, their underlying constructs are difficult to identify. In this study, we did not train workers to recognize subtle cues for interpersonal popularity, nor the role of anonymity in creating a power imbalance.

Furthermore, because we lack the authority to define cyberbullying, we cannot assert a two-way implication between cyberbullying and the five criteria outlined here. It may be possible for cyberbullying to exist with only one criterion present, such as harmful intent. Our five criteria also might not span all of the dimensions of cyberbullying. However, they are representative of the literature in both the social science and machine learning communities, and they can be used in weighted combinations to accommodate new definitions.

The main contribution of our paper is not that we solved the problem of cyberbullying detection. Instead, we have exposed the challenge of defining and measuring cyberbullying activity, which has been historically overlooked in the research community.

Future Directions

Cyberbullying detection is an increasingly important and yet challenging problem to tackle. A lack of detailed and appropriate real-world datasets stymies progress towards more reliable detection methods. With cyberbullying being a systemic issue across social media platforms, we urge the development of a methodology for data sharing with researchers that provides adequate access to rich data to improve on the early detection of cyberbullying while also addressing the sensitive privacy issues that accompany such instances.


In this study, we produced an original dataset for cyberbullying detection research and an approach that leverages this dataset to more accurately detect cyberbullying. Our labeling scheme was designed to accommodate the cyberbullying definitions that have been proposed throughout the literature. In order to more accurately represent the nature of cyberbullying, we decomposed this complex issue into five representative characteristics. Our classes distinguish cyberbullying from other related behaviors, such as isolated aggression or crude joking. To help annotators infer these distinctions, we provided them with the full context of each message’s reply thread, along with a list of the author’s most recent mentions. In this way, we secured a new set of labels for more reliable cyberbullying representations.

From these ground truth labels, we designed a new set of features to quantify each of the five cyberbullying criteria. Unlike previous text-based or user-based features, our features measure the relationship between a message author and target. We show that these features improve the performance of standard text-based models. These results demonstrate the relevance of social-network and language-based measurements to account for the nuanced social characteristics of cyberbullying.

Despite improvements over baseline methods, our classifiers have not attained the high levels of precision and recall that should be expected of real-world detection systems. For this reason, we argue that the challenging task of cyberbullying detection remains an open research problem.


This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR0011890019, and by the National Science Foundation (NSF) under Grant No. 1659886 and Grant No. 1553579.


  • [1] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana (2016) Cybercrime detection in online communications: the experimental case of cyberbullying detection in the twitter network. Computers in Human Behavior 63, pp. 433–443. Cited by: Existing Sources of Cyberbullying Data, Modeling Cyberbullying Behavior.
  • [2] S. Bird, E. Klein, and E. Loper (2009) Natural language processing with python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”. Cited by: Text-based Features, Language Models.
  • [3] D. Chatzakou, N. Kourtellis, J. Blackburn, E. De Cristofaro, G. Stringhini, and A. Vakali (2017) Mean birds: detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference, pp. 13–22. Cited by: Introduction.
  • [4] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer (2002) SMOTE: synthetic minority over-sampling technique. JAIR 16, pp. 321–357. Cited by: Experimental Evaluation.
  • [5] C. Chelmis, D. Zois, and M. Yao (2017) Mining patterns of cyberbullying on twitter. In ICDMW, pp. 126–133. Cited by: Modeling Cyberbullying Behavior, Social Network Features.
  • [6] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong (2013) Improving cyberbullying detection with user context. In European Conference on Information Retrieval, pp. 693–696. Cited by: Modeling Cyberbullying Behavior.
  • [7] C. David-Ferdon and M. F. Hertz (2009) Electronic media and youth violence; a CDC issue brief for researchers. Cited by: Introduction.
  • [8] T. Davidson, D. Warmsley, M. Macy, and I. Weber (2017) Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media, Cited by: Limitations.
  • [9] K. Dinakar, R. Reichart, and H. Lieberman (2011) Modeling the detection of textual cyberbullying. In fifth international AAAI conference on weblogs and social media, Cited by: Modeling Cyberbullying Behavior.
  • [10] S. Hinduja and J. W. Patchin (2008) Cyberbullying: an exploratory analysis of factors related to offending and victimization. Deviant behavior 29 (2), pp. 129–156. Cited by: Defining Cyberbullying.
  • [11] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. Han, Q. Lv, and S. Mishra (2015) Analyzing labeled cyberbullying incidents on the instagram social network. In International conference on social informatics, pp. 49–66. Cited by: Modeling Cyberbullying Behavior, Modeling Cyberbullying Behavior, Data Collection, Cyberbullying Transcends Cyberaggression, Feature Engineering.
  • [12] H. Hosseinmardi, R. I. Rafiq, R. Han, Q. Lv, and S. Mishra (2016) Prediction of cyberbullying incidents in a media-based social network. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 186–192. Cited by: Introduction, Modeling Cyberbullying Behavior.
  • [13] Q. Huang, V. K. Singh, and P. K. Atrey (2014) Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia, pp. 3–6. Cited by: Existing Sources of Cyberbullying Data, Modeling Cyberbullying Behavior.
  • [14] Y. Huang and C. Chou (2010) An analysis of multiple factors of cyberbullying among junior high school students in taiwan. Computers in Human Behavior 26 (6), pp. 1581–1590. Cited by: Introduction, Social Network Features.
  • [15] R. M. Kowalski and S. P. Limber (2013) Psychological, physical, and academic correlates of cyberbullying and traditional bullying. Journal of Adolescent Health 53 (1), pp. S13–S20. Cited by: Defining Cyberbullying.
  • [16] Q. Li (2006) Cyberbullying in schools: a research of gender differences. School psychology international 27 (2), pp. 157–170. Cited by: Defining Cyberbullying.
  • [17] K. Miller (2016) Cyberbullying and its consequences: how cyberbullying is contorting the minds of victims and bullies alike, and the law’s limited available redress. S. Cal. Interdisc. LJ 26, pp. 379. Cited by: Introduction.
  • [18] V. Nahar, S. Al-Maskari, X. Li, and C. Pang (2014) Semi-supervised learning for cyberbullying detection in social networks. In Australasian Database Conference, pp. 160–171. Cited by: Existing Sources of Cyberbullying Data.
  • [19] V. Nahar, X. Li, C. Pang, and Y. Zhang (2013) Cyberbullying detection based on text-stream classification. In The 11th Australasian Data Mining Conference (AusDM 2013), Cited by: Modeling Cyberbullying Behavior.
  • [20] D. Olweus (1994) Bullying at school. In Aggressive behavior, pp. 97–130. Cited by: Defining Cyberbullying.
  • [21] D. Olweus (2012) Cyberbullying: an overrated phenomenon?. European Journal of Developmental Psychology 9 (5), pp. 520–538. Cited by: Defining Cyberbullying, Defining Cyberbullying.
  • [22] J. W. Pennebaker, R. J. Booth, and M. E. Francis (2007) LIWC2007: linguistic inquiry and word count. Austin, Texas: liwc. net. Cited by: Text-based Features.
  • [23] M. Price, J. Dalgleish, et al. (2010) Cyberbullying: experiences, impacts and coping strategies as described by australian young people. Youth Studies Australia 29 (2), pp. 51. Cited by: Introduction.
  • [24] J. Raskauskas and A. D. Stoltz (2007) Involvement in traditional and electronic bullying among adolescents.. Developmental psychology 43 (3), pp. 564. Cited by: Defining Cyberbullying.
  • [25] K. Reynolds, A. Kontostathis, and L. Edwards (2011) Using machine learning to detect cyberbullying. In 2011 10th International Conference on Machine learning and applications and workshops, Vol. 2, pp. 241–244. Cited by: Existing Sources of Cyberbullying Data, Modeling Cyberbullying Behavior.
  • [26] H. Rosa, N. Pereira, R. Ribeiro, P. Ferreira, J. Carvalho, S. Oliveira, L. Coheur, P. Paulino, A. V. Simão, and I. Trancoso (2019) Automatic cyberbullying detection: a systematic review. Computers in Human Behavior 93, pp. 333–345. Cited by: Introduction, Modeling Cyberbullying Behavior, Cyberbullying Transcends Cyberaggression, Feature Engineering, Experimental Evaluation.
  • [27] S. Salawu, Y. He, and J. Lumsden (2017) Approaches to automated detection of cyberbullying: a survey. IEEE Transactions on Affective Computing. Cited by: Introduction, Modeling Cyberbullying Behavior, Cyberbullying Transcends Cyberaggression, Feature Engineering.
  • [28] H. Sampasa-Kanyinga, P. Roumeliotis, and H. Xu (2014) Associations between cyberbullying and school bullying victimization and suicidal ideation, plans and attempts among canadian schoolchildren. PloS one 9 (7), pp. e102145. Cited by: Introduction.
  • [29] V. K. Singh, Q. Huang, and P. K. Atrey (2016) Cyberbullying detection using probabilistic socio-textual information fusion. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 884–887. Cited by: Existing Sources of Cyberbullying Data, Modeling Cyberbullying Behavior, Modeling Cyberbullying Behavior, Social Network Features.
  • [30] R. Slonje, P. K. Smith, and A. Frisén (2013) The nature of cyberbullying, and strategies for prevention. Computers in human behavior 29 (1), pp. 26–32. Cited by: Defining Cyberbullying.
  • [31] R. Slonje and P. K. Smith (2008) Cyberbullying: another main type of bullying?. Scandinavian journal of psychology 49 (2), pp. 147–154. Cited by: Defining Cyberbullying.
  • [32] R. Sugandhi, A. Pande, A. Agrawal, and H. Bhagat (2016) Automatic monitoring and prevention of cyberbullying. International Journal of Computer Applications 8, pp. 17–19. Cited by: Existing Sources of Cyberbullying Data, Modeling Cyberbullying Behavior.
  • [33] C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E. Lefever, B. Verhoeven, G. De Pauw, W. Daelemans, and V. Hoste (2018) Automatic detection of cyberbullying in social media text. PloS one 13 (10), pp. e0203794. Cited by: Introduction, Modeling Cyberbullying Behavior.
  • [34] T. E. Waasdorp and C. P. Bradshaw (2015) The overlap between cyberbullying and traditional bullying. Journal of Adolescent Health 56 (5), pp. 483–488. Cited by: Introduction, Defining Cyberbullying.
  • [35] J. Xu, K. Jun, X. Zhu, and A. Bellmore (2012) Learning from bullying traces in social media. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp. 656–666. Cited by: Modeling Cyberbullying Behavior.
  • [36] M. Yao, C. Chelmis, D. Zois, et al. (2019) Cyberbullying ends here: towards robust detection of cyberbullying in social media. In The World Wide Web Conference, pp. 3427–3433. Cited by: Modeling Cyberbullying Behavior.
  • [37] X. Zhang, J. Tong, N. Vishwamitra, E. Whittaker, J. P. Mazer, R. Kowalski, H. Hu, F. Luo, J. Macbeth, and E. Dillon (2016)

    Cyberbullying detection with a pronunciation based convolutional neural network

    In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 740–745. Cited by: Modeling Cyberbullying Behavior.
  • [38] R. Zhao, A. Zhou, and K. Mao (2016) Automatic detection of cyberbullying on social networks based on bullying features. In Proceedings of the 17th international conference on distributed computing and networking, pp. 43. Cited by: Modeling Cyberbullying Behavior.

Appendix 1: Analysis of the Real-World Class Distribution for Cyberbullying Criteria

To understand the real-world class distribution for the cyberbullying criteria, we randomly selected 222 directed English tweets from an unbiased sample of drawn from the Twitter Decahose stream across the entire month of October 2016. Using the same methodology given in the paper, we had these tweets labeled three times each on Amazon Mechanical Turk. Again, ground truth was determined using 2 out of 3 majority vote. Upon analysis, we found that the positive class balance was prohibitively small, especially for repetition, harmful intent, visibility among peers, and author power, which were all under 5%.

Criterion Positive
Balance Inter-annotator
Agreement Cyberbullying
aggression 6.3% 0.23 0.68
repetition 0.9% 0.04 0.46
harmful intent 1.4% 0.31 0.75
visibility among peers 0.17% 0.51 0.11
target power 22.5% 0.23 0.11
author power 3.6% 0.04 0.06
equal power 64.7% 0.15 -0.14
cyberbullying 2.7% 0.25 -
Table 8: Analysis of Unfiltered Decahose Data

Appendix 2: Model Evaluation

For the sake of comparison, we provide precision, recall, and scores for five different machine learning models:

-nearest neighbors (KNN), random forest, support vector machine (SVM), AdaBoost, and Multilayer Perceptron (MLP). Then we provide feature weights for our logistic regression model trained on each of the five cyberbullying criteria.

Criterion BoW Text User Proposed Combined
aggression 77.6% 80.1% 78.3% 78.7% 79.7%
repetition 6.5% 6.8% 7.7% 16.1% 10.8%
harmful intent 18.4% 28.1% 33.2% 33.4% 43.1%
visibility among peers 28.7% 32.7% 34.8% 42.8% 35.1%
target power 39.3% 43.3% 77.9% 74.5% 69.6%
Table 9: Random Forest Precision
Criterion BoW Text User Proposed Combined
aggression 82.6% 81.6% 77.0% 77.5% 81.6%
repetition 7.8% 9.0% 7.3% 16.6% 25.8%
harmful intent 29.1% 46.4% 34.3% 39.9% 60.0%
visibility among peers 30.5% 32.9% 35.9% 45.8% 46.1%
target power 42.5% 46.5% 78.0% 78.2% 77.9%
Table 10: AdaBoost Precision
Criterion BoW Text User Proposed Combined
aggression 82.8% 78.8% 76.7% 77.4% 78.3%
repetition 7.7% 8.7% 8.6% 16.9% 19.6%
harmful intent 27.4% 42.8% 37.3% 38.4% 46.8%
visibility among peers 30.1% 34.0% 34.3% 41.6% 38.5%
target power 39.6% 45.2% 74.3% 72.0% 68.6%
Table 11: MLP Precision
Criterion BoW Text User Proposed Combined
aggression 56.4% 78.5% 43.7% 45.3% 76.2%
repetition 36.2% 24.9% 46.3% 64.7% 29.9%
harmful intent 42.4% 35.1% 78.4% 78.2% 53.5%
visibility among peers 48.1% 30.6% 50.5% 49.9% 32.5%
target power 60.1% 38.0% 79.0% 81.9% 76.7%
Table 12: Random Forest Recall
Criterion BoW Text User Proposed Combined
aggression 75.0% 86.4% 65.9% 77.4% 86.3%
repetition 23.8% 4.1% 26.8% 31.2% 17.8%
harmful intent 44.4% 37.8% 57.0% 52.8% 50.8%
visibility among peers 41.0% 15.4% 42.8% 43.1% 32.0%
target power 56.0% 39.4% 81.8% 81.0% 75.6%
Table 13: AdaBoost Recall
Criterion BoW Text User Proposed Combined
aggression 64.1% 86.5% 65.5% 68.0% 85.6%
repetition 26.8% 6.8% 22.5% 27.1% 12.6%
harmful intent 51.0% 33.3% 57.0% 57.0% 37.2%
visibility among peers 51.6% 23.5% 45.6% 50.2% 26.5%
target power 61.6% 37.5% 76.5% 76.2% 65.6%
Table 14: MLP Recall
Criterion BoW Text User Proposed Combined
aggression 65.2% 79.3% 56.0% 57.5% 77.9%
repetition 11.0% 10.6% 13.2% 25.8% 15.8%
harmful intent 25.6% 31.1% 46.6% 46.8% 47.7%
visibility among peers 35.7% 30.8% 41.2% 46.1% 33.6%
target power 47.4% 39.9% 78.4% 78.0% 72.8%
Table 15: Random Forest
Criterion BoW Text User Proposed Combined
aggression 78.6% 83.9% 71.0% 77.5% 83.9%
repetition 11.7% 5.6% 11.5% 21.6% 20.9%
harmful intent 35.1% 41.6% 42.8% 45.4% 55.0%
visibility among peers 34.9% 21.0% 39.1% 44.3% 37.8%
target power 48.3% 42.7% 79.8% 79.6% 76.7%
Table 16: AdaBoost
Criterion BoW Text User Proposed Combined
aggression 72.2% 82.5% 70.7% 72.4% 81.8%
repetition 12.0% 7.6% 12.4% 20.7% 15.2%
harmful intent 35.7% 37.3% 45.0% 45.8% 41.3%
visibility among peers 38.0% 27.7% 39.2% 45.5% 31.4%
target power 48.2% 41.0% 75.4% 74.0% 67.0%
Table 17: MLP
Rank Feature Weight
1 affect (LIWC) -1.34
2 sexual (LIWC) 1.07
3 negemo (LIWC) 0.90
4 maximum author retweets 0.86
5 relativ (LIWC) -0.75
6 bio (LIWC) -0.69
7 posemo (LIWC) 0.66
8 num chars -0.64
9 space (LIWC) 0.52
10 upward overlap 0.51

Table 19: Top Absolute Weights for Repetition Features
Rank Feature Weight
1 negemo (LIWC) 1.40
2 author verified status -1.32
3 affect (LIWC) -1.24
4 cogmech (LIWC) -0.96
5 relativ (LIWC) -0.89
6 posemo (LIWC) 0.80
7 social (LIWC) 0.77
8 aggressive user count 0.63
9 upward overlap 0.62
10 number of unique terms 0.61

Table 20: Top Absolute Weights for Harmful Intent
Rank Feature Weight
1 number of words -1.70
2 number of unique terms 1.41
3 bio (LIWC) -1.05
4 funct (LIWC) 0.95
5 author follower count -0.90
6 present (LIWC) 0.83
7 you (LIWC) 0.83
8 message count 0.79
9 upward mention count -0.71
10 verb (LIWC) -0.67

Table 21: Top Absolute Weights for Visibility Among Peers
Rank Feature Weight
1 author follower count 6.29
2 maximum author retweets -1.63
3 maximum author favorites 1.46
4 aggressive user count -1.36
5 number of words -1.16
6 reply user count 1.03
7 number of unique terms 1.02
8 reply message count -0.91
9 message count 0.77
10 affect (LIWC) -0.67

Table 22: Top Absolute Weights for Target Power
Rank Feature Weight
1 target follower count 2.28
2 author follower count -1.67
3 bidirectional overlap -1.22
4 target verified status 1.20
5 upward overlap -1.11
6 downward overlap 1.04
7 relativ (LIWC) 0.76
8 reply user count -0.69
9 space (LIWC) -0.68
10 message count -0.63
Table 18: Top Absolute Weights for Aggressive Language