DeepAI
Log In Sign Up

Automatic Identification and Classification of Bragging in Social Media

Bragging is a speech act employed with the goal of constructing a favorable self-image through positive statements about oneself. It is widespread in daily communication and especially popular in social media, where users aim to build a positive image of their persona directly or indirectly. In this paper, we present the first large scale study of bragging in computational linguistics, building on previous research in linguistics and pragmatics. To facilitate this, we introduce a new publicly available data set of tweets annotated for bragging and their types. We empirically evaluate different transformer-based models injected with linguistic information in (a) binary bragging classification, i.e., if tweets contain bragging statements or not; and (b) multi-class bragging type prediction including not bragging. Our results show that our models can predict bragging with macro F1 up to 72.42 and 35.95 in the binary and multi-class classification tasks respectively. Finally, we present an extensive linguistic and error analysis of bragging prediction to guide future research on this topic.

READ FULL TEXT VIEW PDF
04/28/2020

Analyzing Political Parody in Social Media

Parody is a figurative device used to imitate an entity for comedic or c...
10/21/2020

Complaint Identification in Social Media with Transformer Networks

Complaining is a speech act extensively used by humans to communicate a ...
03/23/2021

Modeling the Severity of Complaints in Social Media

The speech act of complaining is used by humans to communicate a negativ...
09/15/2017

Are you serious?: Rhetorical Questions and Sarcasm in Social Media Dialog

Effective models of social dialog must understand a broad range of rheto...
09/30/2020

Point-of-Interest Type Inference from Social Media Text

Physical places help shape how we perceive the experiences we have there...
11/08/2019

Macro F1 and Macro F1

The 'macro F1' metric is frequently used to evaluate binary, multi-class...

1 Introduction

The desire to be viewed positively is a key driver of human behavior (baumeister1982self; leary1990impression; sedikides1993assessment; tetlock2002social) and creating a positive image often leads to personal rewards (gilmore1989effects; hogan1982socioanalytic; schlenker1980impression). Self-presentation strategies are means for individuals to build and establish this positive social image to meet their goals (goffman1978presentation; jones1982toward; jones1990interpersonal; bak2014self). Bragging (or self-praise) is one of the most common strategies and involves disclosing a positively valued attribute about the speaker or their in-group (dayter2014self; dayter2018self).

Social media platforms tend to promote self-presentation tendencies (chen2016anonymity) and allow users to craft an idealized self-image of themselves (chou2012they; michikyan2015can; halpern2017online). Self-presentation online is predominantly positive (chou2012they; lee2014puts; matley2018not). Furthermore, self-promotion is acceptable and even desired in certain online contexts (dayter2018self). This is also amplified by social media platforms through the presence of likes or positive reactions to users’ posts (reinecke2014authenticity) which often are used to quantify impact on the platform (lampos-etal-2014-predicting). Bragging in particular was found to be more frequent on social media than face-to-face interactions (ren2020self).

Type Definition Tweet
Achievement Concrete outcome obtained as a result of the tweet author’s actions. These may include accomplished goals, awards and/or positive change in a situation or status (individually or as part of a group). Finally got the offer! Whoop!!
Action Past, current or upcoming action of the user that does not have a concrete outcome. Guess what! I met Matt Damon today!
Feeling Feeling that is expressed by the user for a particular situation. Im so excited that I am back on my consistent schedule. I am so excited for a routine so I can achieve my goals!!
Trait A personal trait, skill or ability of the user. To be honest, I have a better memory than my siblings
Possession A tangible object belonging to the user. Look at our Christmas tree! I kinda just wanna keep it up all year!
Affiliation Being part of a group (e.g. family, fanclub, university, team, company etc.) and/or a certain location including living in a city, neighborhood or country. My daughter got first place in the final exam, so proud of her!
Not Bragging The tweet is not about bragging or (a) there is not enough information to determine that the tweet is about bragging; (b) the bragging statements belong to someone other than the author of the tweet; (c) the relationship between the author and people or things mentioned in the tweet are unknown. Glad to hear that! Well done Jim!
Table 1: Bragging taxonomy together with type definitions and examples of tweets.

However, bragging is considered a high risk act (brown1987politeness; holtgraves1990language; van2017praising) and can lead to the opposite effect than intended, such as dislike or decreased perceived competence (jones1982toward; sezer2018humblebragging; matley2018not). It is, thus, paramount to understand the types of bragging and strategies to mitigate the face-threat introduced by bragging as well as how effective the self-presentation attempt is (herbert1990sex). Table 1 shows examples of a non-bragging and bragging statements grouped in six types under a taxonomy that we propose in this paper based on previous linguistic research dayter2018self; matley2018not.

Despite its pervasiveness and importance in online communication, bragging has yet to be studied at scale in computational (socio) linguistics. The ability to identify bragging automatically is important for: (a) linguists to better understand the context and types of bragging through empirical studies (dayter2014self; ren2020self); (b) social scientists to analyze the relationship between bragging and personality traits, online behavior and communication strategies (miller1992should; van2017praising; sezer2018humblebragging); (c) online users to enhance their self-presentation strategies (miller1992should; dayter2018self); (d) enhancing NLP applications such as intent identification (wen2017jointly) and conversation modeling.

In this paper, we aim to bridge the gap between previous work in pragmatics and the computational study of speech acts. Our contributions are:

  • A new publicly available data set containing a total of 6,696 English tweets annotated with bragging and their types;

  • Experiments with transformer-based models combined with linguistic features for bragging identification (binary classification) and bragging type classification (seven classes);

  • A qualitative linguistic analysis of markers of bragging in tweets and the model behavior in predicting bragging.

2 Related Work

Bragging as a Speech Act

Bragging as a speech act is considered a face-threatening act to positive face (i.e. the desire to be liked) under politeness theory brown1987politeness. It is directly oriented to the speaker and may threaten their likeability if the bragging is perceived negatively, while also may affect hearer’s face by implying that their feelings are not valued by the speaker (matley2018not). Bragging online plays an important role in self-presentation and its pervasiveness challenges classic politeness theories, such as the modesty maxim (leech2016principles) and the self-denigration maxim (gu1990politeness). Thus, research in social psychology and linguistics has mostly focused on identifying the pragmatic strategies for bragging that mitigate face threat and their impact of likeability and perceived competence, which the speakers aim to increase with this self-presentation strategy.

Bragging Strategies

Modest and sincere self-presentation styles are more likely to be perceived positively (sedikides2007). Bragging framed as mere information-sharing, but with positive connotation to the speaker, can make the speaker be perceived as more likeable (miller1992should). It can also be perceived negatively and causes greater aggression when it involves boasting, elements of competitiveness, use of superlatives and explicit comparisons to others (miller1992should; hoorens2012hubris; scopelliti2015you; matley2018not). In addition, competence related statements are more likely to be negatively perceived than those based on warmth (e.g. the ability to form connections with others) (van2017praising). Common mitigation strategies include speaker’s attempts to deny compliments, shifting focus to persons closely related to them, reframing bragging as praise from a third party, admitting the bragging act through disclaimers (e.g. using #brag) or expressing it as a complaint wittels2011humblebrag; sezer2018humblebragging, question, narration or sharing dayter2018self; matley2018not; ren2020self. The success of self-presentation strategies are also impacted by the social context (tice1995modesty) or speaker identity (paramita2021benefits).

Analysis of Bragging

Bragging has been studied in the context of a small ballet community (dayter2014self), a pick-up artist forum (rudiger2020manbragging) and a small set of WhatsApp conversations (dayter2018self). On social media, matley2018not studied the functional use of hashtags (e.g. #brag, #humblebrag) in Instagram posts, tobback2019telling examined bragging strategies on LinkedIn, ren2020self investigated bragging and its pragmatic functions in Chinese social media and matley2020isn studied impact of mitigating bragging through irony showing that bragging was negatively perceived. However, all these studies rely on manual analyses of small data sets (e.g. 300 posts).

Speech Acts in NLP

Speech acts have been studied in NLP with examples including politeness (danescu-niculescu-mizil-etal-2013-computational), complaints (preotiuc-pietro-etal-2019-automatically; jin-aletras-2020-complaint; jin-aletras-2021-modeling), humor (yang-etal-2021-choral), parody (maronikolakis-etal-2020-analyzing), irony (bamman2015contextualized), deception (chen-etal-2020-acoustic) and self-disclosure (bak-etal-2012-self; levontin2017negative; ravichander-black-2018-empirical). Self-disclosure is closer to bragging as it is related to revealing personal information about oneself. It is usually employed to improve or maintain relationships (bak-etal-2012-self) as measured through conversation frequency (bak2014sdtm). On the other hand, bragging is about aspects that are positively valued by the audience with the goal of improving the speaker’s self-image. bak2014self aim to predict different levels of self-disclosure statements, from general to sensitive; while wang-etal-2021-bragging examine gender differences in self-promotion by Congress members on Twitter. Bragging also involves in some cases possessions (chinnappa-blanco-2018-mining).

3 Bragging Data

3.1 Bragging Definition & Types

Definition

Bragging is a speech act which explicitly or implicitly attributes credit to the speaker for some good (e.g.possession, skill) that is positively valued by the speaker and their audience (dayter2014self). A bragging statement should clearly express what the author is bragging about.

Types

We generalize and extend the bragging types based on the definitions by dayter2018self and matley2018not. The former summarizes them as accomplishments and some aspects of self; while the latter includes everyday achievements (e.g. cooking) and personal qualities. We divide the ‘some aspects of self’ category into two categories, namely ‘Possession’ and ‘Trait’ respectively. We also add an ‘Affiliation’ category for bragging involving a group to which the speaker belongs. In total, we consider six bragging types and a non-bragging category. Table 1 shows the definitions of each type.

Classification Tasks

Given the taxonomy above, we define two classification tasks: (i) binary bragging prediction (i.e. if a tweet contains a bragging statement or not); and (ii) seven-way multiclass classification for predicting if a tweet contains one of the six bragging types or no bragging at all.

3.2 Data Collection

To the best of our knowledge, there is no other data set available for our study. We use Twitter for data collection as tweets are openly available for research and widely used in other related tasks, e.g. predicting sentiment (rosenthal2017semeval), affect (SemEval2018Task1), sarcasm (bamman2015contextualized), stance (mohammad2016semeval).

Random Sampling

We select tweets for annotation by randomly sampling from the 1% Twitter feed one day per month from January 2019 to December 2020 (approximately 10k tweets per day) to ensure diversity using the Premium Twitter Search API for academic research.222https://tinyurl.com/2p8wnure

Keyword-based Sampling

To give a model access to more positive examples of bragging statements for training, we use a keyword-based sampling method that increases the hit rate of bragging, following previous work on labeling infrequent linguistic phenomena, e.g. irony (SemEval2018Task1) or hate speech waseem-hovy-2016-hateful.

We build queries based on indicators of positive self-disclosure (e.g. I, just) dayter2018self and stylistic indicators, e.g. positive emotion words, present tense verbs bazarova2013managing. As the frequency of these keywords is high, we construct multi-word queries consisting of a personal pronoun and an indicator. In addition, we use a short list of curated bragging-related hashtags.333The queries are: {[I, proud], [I, glad], [I, happy], [I, best], [I, amazed], [I, amazing], [I, excellent], [I, just], [I’m, proud], [I’m, glad], [I’m, happy], [I’m, best], [I’m, amazed], [I’m, amazing], [I’m, excellent], [me, proud], [my, best], #brag, #bragging, #humblebrag, #humble, #braggingrights}. After annotating 1,000 tweets, we compute the percentage of bragging tweets for each keyword and remove from sampling tweets with less than 5% (i.e. [I, amazed], [I’m, amazing], [I’m, best], [my, best], [I, excellent], #humble).

We initially collected around 6K and 368K tweets using hashtags and multi-word queries respectively. We obtain over 9k tweets by keeping all tweets collected using hashtags and sample 1% from those collected using multi-word queries to balance the two types.

Data Filtering

After collecting tweets, we exclude those with duplicate or no meaningful textual content (e.g. only @-mentions or images). We only focus on English posts and filter out non-English ones using the language code provided by Twitter. We also exclude retweets and quoted tweets, as these do not typically express the thoughts of the user who retweeted them. Moreover, we exclude 131 tweets containing a URL in the text because these were related to advertisements based on initial results from our annotation calibration rounds. This resulted in a total of 6,696 tweets which is of similar size with data sets recently released for social NLP oprea-magdy-2020-isarcasm; chung-etal-2019-conan; beck-etal-2021-investigating; mendelsohn-etal-2021-modeling.

Label Training set Dev/Test set All
(Keyword sampling) (Random sampling)
Binary
Bragging 544 (16.09%) 237 (7.15%) 781 (11.66%)
Not Bragging 2838 (83.91%) 3077 (92.85%) 5915 (88.34%)
Multi-class
Achievement 166 (4.91%) 71 (2.14%) 237 (3.54%)
Action 127 (3.76%) 58 (1.72%) 185 (2.76%)
Feeling 39 (1.15%) 27 (0.82%) 66 (0.99%)
Trait 91 (2.69%) 48 (1.45%) 139 (2.08%)
Possession 58 (1.72%) 28 (0.84%) 86 (1.28%)
Affiliation 63 (1.86%) 5 (0.15%) 68 (1.01%)
Not Bragging 2838 (83.91%) 3077 (92.85%) 5915 (88.34%)
Total 3382 3314 6696
Table 2: Bragging data set statistics.

3.3 Annotation and Quality Control Process

We manually annotate tweets for providing a solid benchmark and foster future research. All authors of the paper have significant experience in linguistic annotation. We run three calibration rounds of 100 tweets each, where all annotated all tweets and discussed disagreements, until a Krippendorf’s Alpha above 0.80 in the seven-class task was reached.

To monitor quality, a subset of 1,564 tweets were annotated by two annotators or more in case of disagreements. If a tweet fits into multiple bragging types, we assign the more prominent one.444For example, we annotate “New car✓New crib✓New barbershop✓20 years young” as ‘Possession’ because bragging is mostly about possessions (crib, car, barbershop). The annotation is based only on the actual text of the tweet without considering additional modalities (e.g. images), context or replies. This is similar to the information available to predictive models during training. We selected the final label as the majority vote and a final label was assigned after consensus in cases of three different votes.555We experimented on training models using the subset annotated by a single annotator compared to multiple annotators and find no significant differences (see Appendix A). The full task guidelines, examples and interface are presented in Appendix B.

The inter-annotator agreement between two annotations of all tweets is: (a) percentage agreement: 89.03; (b) Krippendorf’s Alpha (krippendorff2011computing) (7-class): 0.840; (c) Krippendorf’s Alpha (binary): 0.786. Agreement values are between the upper part of the substantial agreement band and the perfect agreement band (artstein2008inter). The final data set consists of 6,696 tweets with one of the seven classes. Before annotation, the keyword-based and randomly sampled tweets were shuffled to not induce frequency bias. Data set statistics are shown in Table 2, including statistics across the two sampling strategies. The model performance curve by varying the training set size indicates that annotating more data is not likely to lead in substantial improvements in bragging prediction (see Figure 3 in Appendix).

Class Self-disclosure (%) Non-self-disclosure (%)
Bragging 31.63 68.37
Non-bragging 24.04 75.96
Achievement 31.65 68.35
Action 27.57 72.43
Feeling 31.82 68.18
Trait 36.69 63.31
Possession 29.07 70.93
Affiliation 35.29 64.71
Non-bragging 24.04 75.96
Total 24.93 75.07
Table 3: Percentages of self-disclosure class across bragging classes

3.4 Self-disclosure in Bragging

We conduct an analysis of the relationship between self-disclosure and bragging as they are closely related. We use self-disclosure lexicon by

bak2014self to assign each tweet in our data set a label (i.e. self-disclosure or non-self-disclosure). The percentages of self-disclosure across each bragging type are shown in Table 3. We also used self-disclosure models as a predictor for bragging in early experimentation but the results are omitted due to the low performance.

3.5 Data Splits

We use the keyword sampled data for training and the random data for development and testing (in the ratio of 2:8) because the latter is representative of the real distribution of tweets (see Table 2).

4 Predictive Models

We evaluate vanilla transformer-based models vaswani2017attention and further leverage external linguistic information to improve them.

BERT, RoBERTa and BERTweet

We experiment with Bidirectional Encoder Representations from Transformers (BERT; devlin-etal-2019-bert), RoBERTa (liu2019roberta) and BERTweet (nguyen-etal-2020-bertweet). RoBERTa is a more robust variant of BERT that obtains better results on a wide range of tasks. BERTweet is pre-trained on English tweets using RoBERTa as basis and achieves better performance on Twitter tasks (nguyen-etal-2020-bertweet). We fine-tune BERT, RoBERTa and BERTweet for binary and multiclass bragging prediction by adding a classification layer that takes the [CLS] token as input.

BERTweet with Linguistic Features

We inject linguistic knowledge that could be related to bragging to the BERTweet model with a similar method proposed by jin-aletras-2021-modeling,666Early experimentation with simply concatenating or applying attention resulted in lower performance. that was found to be effective on complaint severity classification, a related pragmatics task. The method is adapted from rahman2020integrating

, which integrates multimodal information (e.g. audio, visual) in transformers using a fusion mechanism called Multimodal Adaption Gate (MAG). MAG integrates multimodal information to text representations in transformer layers using an attention gating mechanism for modality influence controlling. We first expand vectors of linguistic information to a comparable size to the embeddings fed to the pre-trained transformer. We, then, use MAG to concatenate contextual and linguistic representations after the embedding layer of the transformer similar to

rahman2020integrating. The output is sent to a pre-trained BERTweet encoder for fine-tuning followed by an output layer.

We experiment with these linguistic features:

  • NRC: The NRC word-emotion lexicon contains a list of English words mapped to ten categories related to emotions and sentiment mohammad2013crowdsourcing. We represent each tweet as a 10-dimensional vector where each element is the proportion of tokens belonging to each category.

  • LIWC: Linguistic Inquiry and Word Count pennebaker2001linguistic is a dictionary-based approach to count words in linguistic, psychological and topical categories. We use LIWC 2015 to represent each tweet as a 93-dimensional vector.

  • Clusters: We use Word2Vec clusters proposed by preoctiuc2015studying to represent each tweet as a 200-dimensional vector over thematic subjects.

5 Experimental Setup

Text Processing

We pre-process text by lowercasing, replacing all username mentions with placeholder tokens @USER and emojis with words using demojize.777https://pypi.org/project/emoji/ We also remove hashtags that are used as keywords (e.g. #brag) in data collection. Finally, we tokenize the text using TweetTokenizer.888https://www.nltk.org/api/nltk.tokenize.html

Model Precision Recall Macro-F1 Precision Recall Macro-F1
Bragging Classification (Binary) Bragging and Type Classification (7 class)
Majority Class 46.42 50.00 48.15 13.26 14.29 13.76
LR-BOW 54.53 63.16 52.68 18.52 20.02 18.59
BiGRU-Att 55.93 1.53 51.41 0.47 51.29 1.40 18.32 0.10 26.16 3.41 19.19 0.31
BERT 64.24 1.40 65.91 3.32 64.58 0.80 24.16 1.15 39.66 4.84 26.85 0.81
RoBERTa 66.53 0.29 68.43 2.05 67.34 1.02 28.99 0.61 45.90 3.59 32.82 0.65
BERTweet 70.43 0.16 72.62 0.89 71.44 0.43 30.82 0.75 47.25 2.68 34.86 0.79
BERTweet-NRC 72.89 1.26 70.95 0.96 71.80 0.49 30.95 0.54 47.98 1.12 34.36 0.19
BERTweet-LIWC 72.65 0.20 72.21 0.43 0.31 32.06 2.42 46.68 7.45 34.83 0.79
BERTweet-Clusters 71.26 2.27 72.53 1.91 71.60 0.21 32.51 1.36 46.97 2.36 35.95 0.54
Table 4: Macro precision, recall and F1-Score (± std. dev. for 3 runs) for bragging prediction (binary and multiclass). Best results are in bold.

indicates significant improvement over BERTweet (t-test, p

0.05).

Baselines

Majority Class:

As a first baseline, we label all tweets with the label of the majority class.

Lr-Bow:

We train a Logistic Regression with bag-of-words using L2 regularization.

BiGRU-Att:

We also train a bidirectional Gated Recurrent Unit (GRU) network

(cho-etal-2014-learning) with self-attention (tian2018attention). Tokens are first mapped to GloVe embeddings (pennington2014glove) and then passed to a bidirectional GRU. Subsequently, its output is passed to a self-attention layer and an output layer for classification.

Hyperparameters

For BiGRU-Att, we use 200-dimensional GloVe embeddings (pennington2014glove) pre-trained on Twitter data. The hidden size is = 128 where {64, 128, 256, 512} with dropout = .2, {.2, .5}. We use Adam optimizer (kingma2015method) with learning rate = 1e-2, {1e-3, 1e-2, 1e-1}. For BERT, RoBERTa and BERTweet, we use the base cased model (12 layers and 109M parameters, 12 layers and 125M parameters and 12 layers and 135M parameters accordingly) and fine-tune them with learning rate = 3e-6, {1e-4, 1e-5, 5e-6, 3e-6, 1e-6}. For BERTweet with linguistic features, we project these to vectors of size = 200, = 400, = 768, {10, 93, 200, 400, 600, 768}. For MAG, we use the default parameters from rahman2020integrating. For multi-class classification

, we apply class weighting due to the imbalanced data and set the training epoch to

= 40, {15, 20, 25, 30, 35, 40, 45, 50, 55, 60,}. The maximum sequence length is set to 50 covering 95% of tweets in the training set. We use a batch size of 32.

Training and Evaluation

We train each model three times using different random seeds and report the mean Precision, Recall and F1 (macro). We apply early stopping during training based on the dev loss. The experiments with linguistic features are performed with the best pre-trained transformer in each of the two classification tasks.

6 Results

Binary Bragging Classification

Table 4 (left) shows the predictive performance of all models on predicting bragging (i.e. binary classification). Overall, BERTweet models with linguistic information achieve better overall performance. Transformer models perform substantially above the majority class baseline (+23.29 F1) and above Logistic Regression (+18.76). BERTweet (71.44 F1) performs better than BERT (64.58 F1) and RoBERTa (67.34 F1), which illustrates the advantage of pre-training on English tweets for this task.

Performance is further improved (+0.98 F1) by using LIWC features alongside BERTweet, which indicates that injecting extra linguistic information benefits bragging identification. We speculate that this is because a bragging statement usually contains particular terms (e.g. personal pronouns, positive terms) or involves at least one certain aspect or theme (e.g. reward or property), which can be captured by linguistic features (e.g. feature I and ACHIEVE in LIWC). Combining lexicons lead to worse results than using a single one, so we refrain from reporting these results for clarity.

Multi-class Bragging Classification

Table 4 (right) shows the predictive performance of all models on multiclass bragging type prediction including not bragging. We again find that pre-trained transformers substantially outperform the majority class baseline (+21.1 F1) and logistic regression (+16.27 F1). In line with the binary results, we find that BERTweet (34.86 F1) performs best out of all transformers. BERTweet-Clusters outperforms all models (35.95 F1), which indicates that topical information helps to identify different types of bragging. Each bragging type might be particularly specialized to certain topics (e.g. weight loss in ‘Achievement’ category).

Bragging Non-Bragging Bragging type
Achievement Action Feeling Trait Possession Affiliation
Feature r Feature r Feature r Feature r Feature r Feature r Feature r Feature r
Unigrams and LIWC
AUTHENTIC 0.149 CLOUT 0.109 FOCUSPAST 0.200 get 0.146 happy 0.228 APOSTRO 0.197 own 0.211 FAMILY 0.276
my 0.127 YOU 0.089 Number 0.157 trip 0.128 POSEMOE 0.218 COGPROC 0.181 buy 0.175 CLOUT 0.271
I 0.122 DISCREP 0.078 Analytic 0.153 RELATIV 0.119 0.191 FOCUSPRESENT 0.179 bought 0.149 proud 0.263
TONE 0.104 NEGEMO 0.077 finished 0.150 ready 0.114 blessed 0.190 cute 0.159 car 0.146 rights 0.215
FOCUSPAST 0.102 SOCIAL 0.076 3 0.133 him 0.114 AFFECT 0.184 PRONOUN 0.157 bedroom 0.144 SOCIAL 0.209
WC 0.100 FOCUSPRESENT 0.070 WORK 0.132 happen 0.105 feels 0.176 take 0.143 extra 0.144 amazing 0.205
RELATIV 0.090 INFORMAL 0.056 managed 0.130 FOCUSFUTURE 0.105 love 0.169 COMPARE 0.143 xr 0.142 0.197
TIME 0.081 COGPROC 0.056 over 0.129 fun 0.102 sunrise 0.166 ANGER 0.138 macbook 0.055 law 0.185
during 0.078 ANGER 0.056 under 0.119 gave 0.097 weighted 0.162 I 0.137 new 0.139 team 0.182
ACHIEVE 0.075 just 0.054 beat 0.112 hours 0.096 july 0.159 if 0.137 afford 0.139 OTHERP 0.181
PREP 0.073 your 0.052 race 0.104 before 0.095 time 0.159 SWEAR 0.134 PERIOD 0.106 words 0.164
managed 0.072 IPRON 0.051 office 0.103 sitting 0.095 truly 0.156 am 0.133 HOME 0.105 teams 0.164
REWARD 0.069 ? 0.043 possible 0.103 VERB 0.094 BIO 0.147 PPRON 0.132 DASH 0.084 #baseball 0.164
row 0.068 not 0.038 5 0.101 PREP 0.089 CERTAIN 0.143 me 0.130 I 0.077 fan 0.163
got 0.067 why 0.037 SIXLTR 0.100 INGEST 0.085 TONE 0.140 look 0.122 DISCREP 0.071 MALE 0.160
POS (Unigrams and Bigrams)
PRP_VBD 0.104 NNP 0.081 CD_NNS 0.198 DT_NNP 0.139 RB_JJ 0.183 VBP 0.252 $_CD 0.161 FW_, 0.164
VBD 0.093 VB 0.061 VBD 0.171 VBP_TO 0.124 VBP_IN 0.174 PRP 0.193 $ 0.130 VB_VBD 0.161
CD_NNS 0.077 RB_VB 0.056 CD 0.164 IN_: 0.117 VB_RBR 0.161 PRP_VBP 0.191 NN_PDT 0.130 CC_UH 0.159
PRP$ 0.074 NNP_NNP 0.049 NNS 0.145 VBP_WP 0.116 JJR_WRB 0.161 VBP_JJ 0.162 NNS_UH 0.122 VBZ_DT 0.151
VBD_DT 0.062 VBP_PRP 0.048 VBD_DT 0.141 NNP_UH 0.116 RB_VBZ 0.146 UH_DT 0.150 SYM_: 0.114 DT_RBS 0.146
NN_IN 0.061 VBZ 0.039 PRP_VBD 0.132 NFP_NNP 0.116 CC_JJ 0.143 VBP_DT 0.150 VBZ_JJ 0.110 UH_NNP 0.145
IN_CD 0.060 MD 0.035 NN_IN 0.132 NNP 0.116 VBD_: 0.131 RB_VB 0.149 VB_PRP$ 0.109 ._SYM 0.138
IN_PRP$ 0.060 NNP_VBZ 0.033 IN_CD 0.130 NNP_NNS 0.114 ._VBG 0.123 MD 0.149 PRP$_JJ 0.109 NFP_CC 0.137
PRP$_NN 0.058 RB_RB 0.031 VBN 0.129 TO_VB 0.109 UH_WP 0.118 MD_VB 0.134 ._VBD 0.109 PRP_PRP$ 0.136
VBD_PRP$ 0.057 MD_PRP 0.031 VB_JJR 0.109 TO 0.107 POS_RB 0.118 CC_WP 0.131 NN_PRP$ 0.106 NN_NN 0.135
Table 5: Feature correlations including unigrams (lowercase), LIWC (uppercase), part-of-speech (POS) unigrams and bigrams with bragging and non-bragging tweets (left) and bragging tweets grouped in six types (right), sorted by Pearson correlation (r). All correlations are significant at , two-tailed t-test.

7 Analysis

Linguistic Feature Analysis

We analyze the linguistic features i.e. unigrams, LIWC and part-of speech (POS) tags associated with bragging and its types in all tweets of our data set. For this purpose, we first tag all tweets using the Twitter POS Tagger (derczynski2013twitter). Each tweet is represented as a bag-of-words distribution over POS unigrams and bigrams to reveal distinctive syntactic patterns of bragging and their types. For each unigram, LIWC and POS feature, we compute correlations between its distribution across posts and the label of the post. Then, we use the method introduced by schwartz2013personality to rank the features using univariate Pearson correlation with words normalized to sum up to unit for each tweet.

Table 5

(left) presents the top 15 features from unigrams (lowercase) and LIWC (uppercase) and top 10 features from POS unigrams and bigrams correlated with bragging and non-bragging tweets. We notice that the top words in the bragging category can be classified into (a) personal pronouns (e.g.

my, I) that usually indicate the author of the bragging statement; (b) words related to time (e.g. FOCUSPAST, TIME, during); and (c) words related to a specific bragging target (e.g. RELATIV, ACHIEVE, REWARD, managed). These findings are in line with the indicators of positive self-disclosure by dayter2018self and bazarova2013managing. Furthermore, personal pronouns followed by a verb in past tense (PRP_VBD) is common in bragging (e.g. I forgot what it’s like to be good at school. Today I finished a thing we were doing so fast that everyone around me started asking ME for help instead of the prof :’))

Table 5 (right) presents the top 15 features from unigrams (lowercase) and LIWC (uppercase) correlated with bragging tweets grouped in six types. We observe that Achievement statements usually involve verbs that are in past tense or indicate a result (e.g. FOCUSPAST, finished, beat). A POS pattern common in Achievement statements is a cardinal number followed by nouns in plural (CD_NNS), similar to its unigram and LIWC features (NUMBER, 3, 5) (e.g. I made a total of 5 dollars from online surveys wooo). It is worth noting that one of the prevalent LIWC features for Action is FOCUSFUTURE. This is because the user may brag about a planned action (e.g. @USER You know what? I’m going to make some PizzaRolls Brag). Most of the top words in Feeling express emotion or sensitivity (e.g. happy, blessed), which is consistent with the top POS feature, RB_JJ (e.g. absolutely chuffed, so happy). In Trait category, words are mostly pronouns (e.g. I, PRP, PRP_VBP) and verbs (e.g. VBP, VBP_JJ). Words appear frequently in Possession category are actions related to purchase (e.g. own, buy) and nouns related to a tangible object (e.g. car, bedroom). In addition, users usually show off the value of their possessions using statements that involve currency signs ($) or currency signs followed by a number ($_CD) (e.g. I just signed a new three-year contract and I’ll be getting 235 anytime minutes per month. Plus, the company is going to throw in a phone for just $ 49 per month. I’ll bet you can’t beat that deal!). Finally, top words in Affiliation category involve positive feeling towards belonging to a group (e.g. proud, amazing) and nouns related to it (e.g. FAMILY, team).

Class Mean Median
Achievement 3.06 3.00
Action 0.91 0
Feeling 0.50 0
Trait 2.38 2.00
Possession 2.00 0.50
Affiliation 5.50 2.00
Table 6: Mean and median Twitter favorites across bragging classes on a sample set of the data.

Bragging and Post Popularity

We also analyze the association between bragging posts and the number of favorites/retweets they receive by other users. Similar to the previous linguistic feature analysis, we use univariate Pearson correlation to compute the correlations between the log-scaled favorites/retweets number of each tweet and its label (i.e. bragging or non-bragging) by controlling the numbers of followers and friends of the user who post the tweet. Our results show that the number of favorites is positively correlated with bragging (see Appendix Figure 5) while there is no correlation between bragging and the number of retweets.

We further explore the popularity of different bragging types. We randomly analyze a set of 443 tweets containing 56 bragging statements, where the follower and friend number of users are within a similar range: from 100 to 500 followers and from 500 to 1000 friends ( = 0.19, < .01). We compute the mean and median Twitter favorites across the six bragging classes (see Table 6). We observe that bragging statements about Affiliation such as family members or sports teams are more likely to receive considerable amount of favorites with the mean of 5.5. For example, 14 users favorite the tweet This maybe is a little, but I’m SO proud of my research group. We represent so many different personality types, cultures, ways of thinking, etc, and every single member of my lab (all 21 of them). We speculate this is because praising the group that one belongs to instead of oneself as a bragging strategy enables users be perceived as more likeable. Furthermore, bragging about Achievement is generally marked as favorite by other users with the median of 3, where bigger achievements in the content such as job offers may receive more favorites (e.g. tweet Scored 80 % on my thesis. Rather proud of that given the circumstances: new baby; pandemic; late topic change due to lockdown; minimal uni support because of furloughs; and an international move. was marked as favorite 15 times).

Class Confusion Analysis

Figure 1: Confusion matrix of annotator agreement on seven bragging categories.
Figure 2: Confusion matrix of the best performing model on multi-class bragging classification, i.e. BERTweet-Clusters

Figure 2 presents the confusion matrix of human agreement on seven classes normalized over the actual values (rows). We observe that Non-bragging (97%), Achievement (81%) and Action (78%) have high agreement, consistent with the class frequency. Affiliation (77%), Possession (76%) and Trait (72%) have comparable percentages as these are easily associated with a bragging target or group. The Feeling category has the lowest percentage mostly caused by misclassification to the Action category. This is due to the fact that both types are not associated to a concrete outcome by definition, with the feeling class linked to a feeling linked to an action. Thus, it makes the boundary between bragging about the action or the feeling associated to the action more challenging to interpret. The next most frequent confusion is between possession and achievement, which usually arises when a tangible possession is involved and the annotators disagree if the author was bragging about the actual possession or the action that lead to the author obtaining that possession (e.g. @USER I just got some stealth 300 easily the best headset I’ve ever had going from astro to turtle beach was a night and day difference).

Figure 2 presents the confusion matrix between bragging type predictions from the best performing model, BERTweet-Clusters, on the multi-class classification task. First, we observe that the model is more likely to misclassify other classes as the dominant class, Non-bragging. Secondly, the most unambiguous classes are Non-bragging (87%) and Achievement (52%), which are in line with human agreement. Also, the model is good at identifying Trait (50%) and Possession (46%) due to the particular bragging targets (e.g. personalities, skills or tangible objects). Furthermore, we notice that the percentages of Action (31%) and Feeling (37%) are low. We speculate this is because they share more similarities with other classes (e.g. involving actions). This might also explain the high percentage of misclassified data points between Action and Achievement, Feeling and Action. Lastly, the model often confuses Affiliation with Feeling likely because the terms that express positive feelings (e.g. ‘proud’, ) also appear frequently in Affiliation (see Table 5).

Error Analysis

Finally, we perform an error analysis to examine the behavior and limitations of our best performing model (i.e. BERTweet-LIWC for binary classification and BERTweet-Clusters for multi-class classification) and identify pathways to improve the task modeling.

We first start with the binary bragging classification. We observe that non-bragging tweets containing positive sentiment are easy to be misclassified as bragging and even if such tweets involve something valued positively by authors, the purpose is usually to express recommendation, compliment or appreciation to others:

T1: @USER paid for my new bottle of vodka & I Love Her with all my heart

Another frequent error happens when non-bragging tweets contain popular bragging targets such as achievement-oriented (e.g. weight loss, marathon) or possession-oriented (e.g. car, electronics):

T2: 4 spaces left on my budget weight loss program. £ 5 a week!???

Bragging often involves contextual understanding that goes beyond word use and require deep understanding of the context to determine the label. For example, common terms such as first, finally, just often appear in both non-bragging (T3) and bragging (T4) tweets:

T3: just cleaned my cats’ toilets
T4: It happened again! I just completed 30 minutes of meditation with @USER. Just sitting and resting in presence.

Models also fail to detect bragging mainly because it is indirect or there are no typical trigger terms, so they lean on pre-training to contextualize:

T5: 9 hr drives feel like nothing now lol

Some bragging statements use additional mitigation strategies, e.g. re-framing the bragging statement as irony, as a complaint or invoking praise from a third party:

T6: I find it strange how I was always the weird one in school and irl but online people think im cool for some reason

Finally, we highlight some representative examples of model confusion between bragging types. One example is when users’ actions lead or not to a concrete result. In this example the model predicted Action, but the actual label is Achievement:

T7: not to appropriate the gang escapes culture but me n my parents just did an escape room n actually got out?

Another example is an Action misclassified as Possession. This usually happens when a common phrase indicative of a certain type of bragging (a new dish)) is invoked as part of an action:

T8: I had a new dish "egusi" it’s so damn good! Love Nigerian food!

Other errors occur when multiple types of bragging are present (e.g. feeling and action) but the label expresses the more salient type, such as the feeling highlighted in this example:

T9: Literally had the best time with the girls last night, don’t think I’ve drank that much in my life?

8 Conclusion

We presented the first computational approach to analyzing and modeling bragging as a speech act along with its types in social media. We introduced a publicly available annotated data set in English collected from Twitter. We experimented using transformer models combined with linguistic information on binary bragging and multiclass bragging type prediction. Finally, we presented an extensive analysis of features related to bragging statements and an error analysis of the model predictive behavior. In future work, we plan to study the extent to which bragging is used across various locations sanchez-villegas-etal-2020-point; sanchez-villegas-aletras-2021-point and languages and how it is employed by users across contexts.

Acknowledgements

We would like to thank Ari Silburt, Danae Sánchez Villegas, Yida Mu, and all the anonymous reviewers for their valuable feedback.

Ethics Statement

Our work has received approval from the Ethics Committee of the Department of Computer Science at the University of Sheffield (No 037572) and complies with Twitter’s data policy for research.999https://developer.twitter.com/en/developer-terms/agreement-and-policy

References

Appendix A Impact of Multiple Annotations

Table 7 shows the performance of binary bragging classification of the best performing model (BERTweet-LIWC) on two different subsets of the test data: one annotated by a single annotator (2,130 tweets) and the other annotated by two or more annotators until consensus is reached (522 tweets). The results show that the same model tested on the two different subsets of test data lead to similar results. This shows there is no quantitative difference between the data sets annotated by two or more annotators when compared to a single annotator.

Data set Precision Recall Macro-F1
Single Annotation 73.81 71.78 72.74
Multiple Annotations 68.24 83.31 73.23
Entire set 72.92 72.81 72.86
Table 7: Precision, Recall and macro F1-Score obtained by the same best performing model (BERTweet-LIWC) for binary classification on two different subsets of training data, annotated either by a single annotator or by multiple annotators.
Figure 3: Learning curve for performance across each bragging type.

Appendix B Guidelines and Annotation Interface

Figure 4: Screenshot of annotation interface on our platform.

Thank you for your participation in our study. During our experiment, we will ask you to read and evaluate a tweet which may include a bragging or a praisal statement.

Instructions

You need to identify whether or not a tweet includes a bragging statement.

Bragging

Bragging is a speech act which explicitly or implicitly attributes credit to the speaker for some ‘good’ (possession, accomplishment, skill, etc.) which is positively valued by the speaker and the potential audience. As such, bragging includes announcements of accomplishments, explicit positive evaluations of some aspect of self and other types defined below. A bragging statement should clearly express what the author is bragging about (i.e. the target of bragging).

If the tweet is about bragging, decide on the category where the tweet belongs to from the following categories:

Achievement

The act of bragging is about a concrete outcome obtained as a result of the tweet author’s actions. These results may include achievements, awards, products, and/or positive change in a situation or status (individually or as part of a group).

Examples:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • Finally got that offer! Whoop!!

  • Our team won the championship

Action

The act of bragging is about a past, current or upcoming action of the user that does not have a concrete outcome

Examples:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • Hanging at Buffalo Wild Wings with @user for the #ILLvsASU game. #BraggingRights

  • Guess what! I met Matt Damon today!

Feeling

The act of bragging is about a feeling that is expressed by the user for a particular situation.

Example:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • Im so excited that I am back on my consistent schedule. I am so excited for a routine so I can achieve my goals!!

Trait

The act of bragging is about a personal trait, skill or ability of the user .

Examples:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • To be honest, I have a better memory than my siblings

  • I look great after losing weight

Possession

The act of bragging is about a tangible object belonging to the user.

Example:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • Look at our Christmas tree! I kinda just wanna keep it up all year!

Affiliation

The act of bragging is about being part of a group (e.g. family, team, org etc.) and/or a certain location including living in a city, neighborhood or country, enrolled into a university, supporting a team, working in a company etc.

Example:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • My daughter got first place in the final exam, so proud of her!

Not bragging

If the tweet is not about bragging, then select "No. This is not a bragging statement."

Examples:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • One of the best books I’ve ever read

  • hahahahahaha

  • You gotta admit, that’s some mighty awesome aim!

  • Vote in the poll below for your book of choice!

  • I think this is great

  • dear everyone announcing they are at "Friendsgiving", we get it, you have friends

  • In case you didn’t know, Adam Silver is in charge

  • I feel terrible

  • I don’t know why you are celebrating

  • This is exactly what is going on!

  • I love you

Select "No. This is not a bragging statement", also in cases when:

  • [noitemsep,topsep=0pt,leftmargin=1em]

  • there is not enough information to determine that the tweet is about bragging

  • the bragging statements belong to someone other than the author of the tweet

  • the relationship between author and people/things mentioned in the tweet are unknown:

    • [noitemsep,topsep=0pt,leftmargin=1em]

    • This kid is smart

    • That was an amazing stream

    • Kudos to mike Dunleavy! It’s hard to get a franchise record ANYTHING in Chicago

  • the post is about the act of bragging:

    • [noitemsep,topsep=0pt,leftmargin=1em]

    • We want to hear you brag!

    • Trump isn’t Bragging anymore as his tradewar hits the stockmarket hard

    • Dudes are getting too cocky these days. Them lil labels and that dar don’t impress everyone. brag differently

Not available

Finally, if the tweet is not available or displayed, or is in a language other than English, please select the "Not available" option.

Other considerations

Please verify the content of hashtags as these may give clues towards the category of the tweet. The judgment should be made only based on the given content of the tweet - please do not search the tweet on Twitter or online in order to identify additional context.

Figure 5: Pearson correlation between Twitter favorite number and bragging by controlling the number of followers and friends. All correlations are significant at < .01, two-tailed t-test.