Understanding Psycholinguistic Behavior of predominant drunk texters in Social Media

05/28/2018 ∙ by Suman Kalyan Maity, et al. ∙ Microsoft Northwestern University 0

In the last decade, social media has evolved as one of the leading platform to create, share, or exchange information; it is commonly used as a way for individuals to maintain social connections. In this online digital world, people use to post texts or pictures to express their views socially and create user-user engagement through discussions and conversations. Thus, social media has established itself to bear signals relating to human behavior. One can easily design user characteristic network by scraping through someone's social media profiles. In this paper, we investigate the potential of social media in characterizing and understanding predominant drunk texters from the perspective of their social, psychological and linguistic behavior as evident from the content generated by them. Our research aims to analyze the behavior of drunk texters on social media and to contrast this with non-drunk texters. We use Twitter social media to obtain the set of drunk texters and non-drunk texters and show that we can classify users into these two respective sets using various psycholinguistic features with an overall average accuracy of 96.78 with very high precision and recall. Note that such an automatic classification can have far-reaching impact - (i) on health research related to addiction prevention and control, and (ii) in eliminating abusive and vulgar contents from Twitter, borne by the tweets of drunk texters.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Alcohol consumption has serious implications on individual’s health. In 2012, 5.9% of all global deaths (7.6% for men and 4.0% for women), were attributed to alcohol consumption and the number is increasing over time. In US alone, nearly 88,000 people (approximately 62,000 men and 26,000 women) die from alcohol-related causes yearly, making it the fourth leading preventable cause of death in that country111http://1.usa.gov/1hcR6dX. In addition to causing traumatic death and injury, alcohol consumption also leads to chronic liver disease, cancers, acute alcohol poisoning, and fetal alcohol syndrome. Alcoholism and other health related issues like smoking are known to be influenced by one’s social environment [1]. With increase in usage of online social media as a preferred medium of communication, it has become a diagnostic tool to identify human nature. According to Pew Research Center, as of January 2014, 74% of online adults use social networking sites; the number is more than 80% for individuals under the age of 50. Also from the reports published by the Centers for Disease Control and Prevention (CDC)222http://1.usa.gov/23PMj4F, we found the prevalence of heavy drinkers/smokers in the said age group. This suggests that social media is a viable platform to study the alcoholic users and the interaction (exchange of messages, posts etc.) in these social media has opened up a research corridor for observing and understanding individuals’ psychological states and their social environment. It is very important to identify how these characteristics vary dynamically for different human behaviors. It will also be quite informative to examine how different characteristics vary demographically (sex, age, region etc.), for different time frames like days of week (weekdays vs weekends), monthly (start of the month vs end of the month) or hourly (morning vs work hours vs evening vs late night). Demographic patterns can be different for psychogenic people, predominant drunkers and others scenarios than the normal people. For example, we can identify predominant drunk peoples’ suicidal tendencies or change in behavior in near future by tracking social media so that we can control situations accordingly. Thus we can use social media as an important medical diagnostic system and develop a predominant drunker identification model.

In this paper, we investigate how social media language usage and interactions can be used to characterize and understand the drunk texters. Subsequently, we leverage on the behavioral, social, psychological and linguistic aspects of the Twitter users to propose a classification framework to automatically identify the drunk texters. The automatic identification of drunk texters is important because these users can then be targeted by the communities that are missioned to cure alcohol abuse and help the alcoholics to quit addiction. Also as these users tend to abuse in social media under the influence of alcohol, our automatic identification framework can be used to enrich the process of filtering abusive contents from the media.

Ii Related Work

There have been several works on health and social media. Joshi et al. [2] propose a computational framework for identifying drunk tweets from non-drunk tweets. Tamersoy et al. [3] study the abstinence from smoking and drinking. They use linguistic features of the content shared by the users as well as the network structure of their social interactions to distinguish between the short-term and long-term abstinence. Murnane and Counts[4] examine the cessation process of smoking. Many of the past research works focus on finding relationship between alcohol abusers with human aggression [5], crime [6], suicide [7].

Strapparava and Mihalcea [8] perform a computational analysis of the language of drug users when talking about their drug experiences. Cameron et al. [9] develop a web platform (PREDOSE), focusing on epidemiological study of prescription and related drug abuse practices using social media (e.g., online forums). Paul and Dredze [10, 11] have developed multidimensional latent text model to capture orthogonal factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). Coyle et al. [12]

classify and characterize different kinds of drug use experiences, using a random-forest classifier over 1000 reports of 10 drugs from the drug information website Erowid.org (manually identified subsets of words differentiated by drugs).

On the other hand, there exist several works that try to establish the role of online social media in alcoholic’s life - how it influence alcohol use of the adult users [13], how people use the social network to display their drunk behaviors [14]. West et al. [15] examine the extent to which individuals tweet about the problem of drinking, and to identify if such tweets correspond with time periods when the problem of drinking was likely to occur.

Some researches focused on extracting various sociological aspects from online social media. Coppersmith et al. [16] analyze broad range of mental health conditions in Twitter texts by identifying self-reported statements of diagnosis. Schwartz et al. [17] predict latent personal attributes including user demographics, online personality, emotions and sentiments from texts published on Twitter. Volkova et al. [18] explore emotion, sentiment and other personality types.

Iii Dataset Preparation

Our first step was to identify Twitter users who are drunk texters. To achieve this goal, we used the manually labelled tweet dataset mentioned in [19]. We then separately crawled the timeline of the posters of these tweets. We then filtered out the tweets of these users based on keyword333Initial seed keywords are collected from [15] like - ‘drunk’, ‘tipsy’, ‘intoxicated’, ‘buzzed’ etc. Later we increased the datasets using similar keywords from wordnet like ‘booze’, ‘juiced’ etc. and make the final wordset of length 61. and then got the tweets manually labelled as drunk-texts or not by 3 of the authors. We considered only those tweets which are tagged drunk text unanimously by all of them. After this manual labeling, we consider those users who have posted at least 5 drunk tweets. In total, we had 278 drunk texters. We then prepared the dataset corresponding to the non-drunk texters444Normal users are defined as the user who never posted any ‘drunk’ related tweets i.e. none of the tweet contain any word from the previous wordset of length 61.. We use Twitter 1% random sample from the month of January, 2014 to obtain a set of users who didn’t have any tweets containing any of the keywords related to alcohol consumption. We chose 278 such non-drunk texters from this set in order to keep both the sets comparable. Following are the example tweets which depict that the user is a drunk-texter.

  • I know its Saturday but I’m trying to get roofied drunk

  • Gotta say, my spelling’s been pretty on-point considering how drunk I’ve been tonight

  • Alcohol and weed are like the mom and dad I always wanted

Iv Behavioral, Psychological and linguistic aspects of the drunk texters

In this section, we focus on the comparative study of the drunk texters and non-drunk texters based on their behavioral, psychological and linguistic aspects. Our empirical study is based on the content extracted from the tweets of the drunk and non-drunk texters. Each of the analysis has been done separately for the tweets posted on weekends and weekdays to differentiate between the lifestyles of the users over the weekdays and in the weekends.

Iv-a Health and food

Since health is one of the crucial aspects of well-being, people often share information related to health and food over social media. We empirically find if drunk texters and non-drunk texters have some contrasting contents related to health and food. Consumption of alcohol has adverse impacts on health. It could be long-term (impact on health over a period of time) or short-term (hangover from last night or throwing up)555http://1.usa.gov/1d7aWk2; so drunk texters might share their experiences on Twitter. To obtain the behavior of drunk texters and non-drunk texters in regard to health and food content sharing, we compiled a list of most frequently used health and food related keywords666http://bit.ly/200kea3 on social media; further we computed the fraction of health and food related keywords for both the set of users. Figure 1 and 2 show that drunk texters, in general, use more of health and food related keywords in their tweets as compared to non-drunk texters.

Fig. 1: Health
Fig. 2: Food

Iv-B Stress

People tend to drink in response to stress, accordingly exposure to the tension-producing situations lead to increased drinking [20]; so there is a high chance that drunk texters while posting the tweets will communicate their stress. In general stress levels are rising severely, a survey by American Psychological Association portrays a picture of high stress and ineffective coping mechanisms that appear to be ingrained in our culture777http://bit.ly/1cz4n99. People might share the stressful situations they have been in, so non-drunk texters also have a decent chance of posting tweets expressing stress and anxiety.

The major sources of stress are listed as follows [21]

  • Low Self-esteem

  • Inter-personal conflicts

  • Smoking

  • Financial difficulties

  • Family problems

To empirically find the stress related behavior of the drunk and the non-drunk texters we gather a list of stress related keywords\getrefnumbernote1\getrefnumbernote1footnotemark: note1 corresponding to each of the source of the stress mentioned above. Further we compute the fraction of stress related keywords for both the drunk and non-drunk texters. Figure 3 shows the contrasting behavior between them and illustrates that in general non-drunk texters seem to experience more stress arising out of financial problems and low self-esteem whereas drunk texters experience more stress due to inter-personal conflicts, smoking and family problems.

Fig. 3: Different sources of stress (y-axis values are scaled up by 10 times in case of financial and low self-esteem stress for better visualization.)

Iv-C Swearing and abusing

Alcohol consumption is closely related to violent behavior[22, 23]

. Swearing being a verbal form of aggression can serve as an indicator of aggressive behavior. We speculate that drunk texters in general are more probable to use swear words in their tweets because of relatively higher violent behavior. To investigate whether this trend is also observed on Twitter we compiled a list of swear related keywords

\getrefnumbernote1\getrefnumbernote1footnotemark: note1 used most frequently on social media and then compute the fraction of such keywords for both the drunk and the non-drunk texters. Figure 4 supports our speculation that drunk texters use a larger proportion of swear words in their tweets compared to non-drunk texters.

Fig. 4: Swear words

Iv-D Money

Spending money and drinking alcohol are positively correlated [24]. Drunk texters might post about their spending on drinks which might be a considerable share of their income. For the analysis, we compiled a list of money related keywords\getrefnumbernote1\getrefnumbernote1footnotemark: note1 used most frequently on social media and then computed the fraction of money related keywords for both the alcoholic and the non-drunk texters. Figure 5 shows that drunk texters are more likely to use money related words during the weekdays compared to the weekends in their tweets.

Fig. 5: Money

Iv-E Sentiment analysis

Sentiments of a user greatly depend on the state of the user. We believe that a user’s tweets shall largely depend on the state in which the user is tweeting. People tend to speak differently when he/she is in a drunken state compared to when in a normal state. The same clause should be applicable while the user is tweeting. We have used sentiment lexicon 


for the sentiment analysis. Figure 

6 shows the behavior of the drunk and the non-drunk texters and illustrates that in general drunk texters have higher sentiment score in their tweets as compared to non-drunk texters.

Fig. 6: Sentiment scores

Iv-F Psychological and linguistic states

Theories on drinking and aggression postulate that alcohol contributes indirectly to increased aggression by causing cognitive, emotional, and psychological changes that may reduce self-awareness or result in inaccurate assessment of risks [25]. The function and emotion words people use provide important psychological cues to their thought processes, emotional states, intentions, and motivations. To capture user’s social and psychological states we used Linguistic Inquiry and Word Count (LIWC) framework [26]. Some of the interesting observations are presented in Table I. It is evident from the table that drunk texters express more anxiety, anger, sadness and also show more sexual aggression by using more sexual words in their tweets than the non-drunk texters. Also the drunk texters tweet more about leisure activities and are less religious.

 LIWC category
Social processes 8.69 6.88 8.86 6.78
Family 0.4 0.27 0.48 0.29
Friends 0.28 0.17 0.31 0.17
Anxiety 0.33 0.22 0.30 0.22
Anger 1.55 0.79 1.62 0.78
Sadness 0.50 0.34 0.52 0.33
Body 1.22 0.68 1.24 0.68
Sexual 1.10 0.61 1.19 0.57
Ingestion 0.79 0.36 0.83 0.35
Leisure 1.83 1.42 2.14 1.56
Religious 0.37 0.41 0.36 0.42
TABLE I: Psycholinguistic analysis for drunk and non-drunk texters. , , , are avg. LIWC scores for drunk texters on weekday, non-drunk texters on weekday and drunk texters on weekend, non-drunk texters on weekend respectively.

V Classification Framework

From discussions in the earlier section, it is evident that there exists differences between drunk and non-drunk texters in various behavioral, psychological and linguistic aspects. We use these discriminative aspects as features in our classification framework to classify a user into a drunk texter or not. We use 10-fold cross-validation technique of various classifiers like Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Bagging, Decision Tree (DT-J48), Naive Bayes, Ada Boost for checking robustness of our method. All the classifiers perform very well. Table 

II shows that the evaluation results for weekday and weekend data with various classification techniques in terms of accuracy, precision, recall, F1-Score, ROC Area. SVM classifier performs the best as we obtain 96.78% (weekday), 96.14% (weekend) accuracy with avg. precision - 0.968 (weekday), 0.963 (weekend) and recall of 0.968 (weekday) and 0.961 (weekend). It also gives better area under the ROC curve. We also compared the drunk texters set with a random sample of users and we achieve a similar very high accuracy with high precision and recall which establishes the fact that the features we use are robust and strong discriminators of drunk-texting.

width=0.985center Weekday Weekend Classifiers Acc. (%) P R F1 ROC Acc. (%) P R F1 ROC SVM 96.78 0.968 0.968 0.968 0.991 96.14 0.963 0.961 0.962 0.994 LR 96.62 0.967 0.966 0.966 0.986 95.17 0.952 0.952 0.952 0.991 RF 95.81 0.958 0.958 0.958 0.987 94.85 0.949 0.948 0.948 0.989 Bagging 94.04 0.941 0.94 0.94 0.984 95.01 0.95 0.95 0.95 0.981 DT(J48) 94.36 0.944 0.945 0.945 0.948 93.88 0.939 0.939 0.939 0.918 NB 91.46 0.92 0.915 0.917 0.971 90.18 0.91 0.90 0.905 0.967 Ada Boost 94.68 0.947 0.947 0.947 0.988 95.49 0.955 0.955 0.955 0.988

TABLE II: Evaluation results for various classifiers - Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Bagging, Decision Tree (DT), Naive Bayes (NB), Ada Boost in terms of Accuracy (Acc.), Precision (P), Recall (R), F1-Score (F1) and Area under Receiver operating characteristic (ROC) curve for weekday and weekend data.

In order to determine the discriminative power of each feature, we compute the chi-square () value and the information gain. Table III shows the rank order of all features based on the

value. The ranks of the features are very similar when ranked by information gain (Kullback-Leibler divergence). The most prominent discriminative features are various linguistic as well psychological features obtained from LIWC.

Value Rank Feature
494.8247 1 Dictionary words
468.8026 2 Function words
407.5107 3 Relativity
395.2954 4 Adverbs
391.3315 5 Time
381.0396 6 Ingestion
373.1287 7 Space
363.9906 8 Inclusive words
353.0703 9 Cognitive processes
352.5059 10 Auxiliary verbs
349.9656 11 Preposition
342.1795 12 Common verbs
328.7726 13 Smoking related words
322.1683 14 Biological
318.6225 15 Conjunctions
316.1245 16 Present tense
314.3415 17 Pronouns
312.5311 18 Past tense
311.6849 19 person singular
304.3776 20 Home related words
294.9148 21 Quantifiers
292.5086 22 Impersonal pronouns
292.2972 23 Motion related words
289.969 24 Food related words
281.1014 25 Certainty
TABLE III: Top 25 predictive features and their discriminative power

Vi Discussions

Vi-a Bot Detection

We have identified bots having more than 99% drunk related tweets, for example - ‘GhumPaitase’, ‘WhoDoYouKnwHere’, ‘UrDrunkTweets’ etc. Our system were also able to detect bots as shown in Fig. 7.

Fig. 7: Drunk texting Bots

Vi-B Temporal Tweeting behavior and community detection

We further try to understand the temporal tweeting pattern of the users999To capture the temporal tweeting characteristics more efficiently, we increase the drunk texters’s dataset to 800 users. For this task, we identify some additional keywords, based on their co-occurrences with drunk words (61 length wordset) and we assign each tweet a ‘drunk’ score based on these words and then analyze the peaks in the profile as shown in Fig. 8

. We observe that - (i) average peak height of tweets of drunk texters follow normal distribution, (ii) most of the drunk texters having inter-peak distance less than 100 tweets.

Existence of communities We also study the existence of communities among the drunk texters. We identify 2 different types of communities :
1. Interest Based Communities:

First, we investigate whether there exist interest-driven communities. For this task, for each user, we construct a vector of the features - (a) no. of peaks, (b) average peak height (c) std. error (peak height) (d) max peak height (e) mean peak interval and (f) std. error (peak interval). Users are the nodes in the graph and an edge between two users are formed if the cosine similarity of the feature vectors of the user-pair crosses a certain threshold (0.2). We then apply Louvain Algorithm to detect communities. Three different types of communities are formed of length - 276, 193 and 312.

2. Bond Based Communities:
We also observe that these users have common friends and followers and the distribution shows a power-law behavior. Hence, we try to observe if there are social communities formed among these drunk texters. We construct two kind of communities - based on common friends and common followers. For common friends-based communities, we obtain a total of 179 communities and for common followers-based communities, 283 communities are formed which suggest that there are large number of small-sized communities existing.

Fig. 8: Peak Analysis

Vii Conclusion

In this paper, we investigate various psycholinguistic aspects of the drunk texters. We then use these characteristic properties as features for a classification model that tries to classify whether a user is drunk texter or not. To the best of our knowledge, this is the first study which tries to use the psycholinguistic aspects of social media interactions to identify drunk texters. Our proposed classification framework achieves an accuracy of 96.78% (weekday), 96.14% (weekend) with very high precision and recall. This high accuracy suggest that it can be used as an alternate approach for identifying keyword-based classification of drunk texters which requires a lot of manual intervention to obtain accurate results. We observed that linguistic features (LIWC) are the most discriminative features compared to others. One immediate future research is to identify various steps of how social media influence a non drunk person to become predominant drunkers and by detecting change in characteristics in various demographic dimensions how can we increase social awareness to decrease social influences. One direction is to explore different feature behaviors - like how opinion dynamics [27] change or correlation with other different addictions for predominant drunkers compared to non-drunkers. Another idea is to detect various subsets of drunkers - occasional, situational or regular and respective change in personal life and different associated health hazards.


  • [1] S. Galea, A. Nandi, and D. Vlahov, “The social epidemiology of substance use,” Epidemiologic reviews, vol. 26, no. 1, pp. 36–52, 2004.
  • [2] A. Joshi, A. M. B. AR, P. Bhattacharyya, and M. J. Carman, “A computational approach to automatic prediction of drunk-texting,” Volume 2: Short Papers, p. 604, 2015.
  • [3] A. Tamersoy, M. De Choudhury, and D. H. Chau, “Characterizing smoking and drinking abstinence from social media,” in Proc. of Hypertext’ 15.   ACM, 2015, pp. 139–148.
  • [4] E. L. Murnane and S. Counts, “Unraveling abstinence and relapse: smoking cessation reflected in social media,” in Proc. of SIGCHI’ 14.   ACM, 2014, pp. 1345–1354.
  • [5] B. J. Bushman and H. M. Cooper, “Effects of alcohol on human aggression: An intergrative research review.” Psychological bulletin, vol. 107, no. 3, p. 341, 1990.
  • [6] C. Carpenter, “Heavy alcohol use and crime: Evidence from underage drunk-driving laws,” J. of Law and Economics, vol. 50, no. 3, pp. 539–557, 2007.
  • [7] J. Merrill, G. Milker, J. Owens, and A. Vale, “Alcohol and attempted suicide,” British journal of addiction, vol. 87, no. 1, pp. 83–89, 1992.
  • [8] C. Strapparava and R. Mihalcea, “A computational analysis of the language of drug addiction,” in EACL-Short Papers, 2017.
  • [9] D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, and R. Falck, “Predose: a semantic web platform for drug abuse epidemiology using social media,” Journal of biomedical informatics, vol. 46, no. 6, pp. 985–997, 2013.
  • [10] M. J. Paul and M. Dredze, “Experimenting with drugs (and topic models): Multi-dimensional exploration of recreational drug discussions,” in AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text, 2012.
  • [11] ——, “Drug extraction from the web: Summarizing drug experiences with multi-dimensional topic models,” in NAACL-HLT, 2013.
  • [12] J. R. Coyle, D. E. Presti, and M. J. Baggott, “Quantitative analysis of narrative reports of psychedelic drugs,” arXiv preprint arXiv:1206.0312, 2012.
  • [13] S. H. Cook, J. A. Bauermeister, D. Gordon-Messer, and M. A. Zimmerman, “Online network influences on emerging adults’ alcohol and drug use,” J. of youth and adolescence, vol. 42, no. 11, pp. 1674–1686, 2013.
  • [14] K. Beullens and A. Schepers, “Display of alcohol use on facebook: A content analysis,” Cyberpsychology, Behavior, and Social Networking, vol. 16, no. 7, pp. 497–503, 2013.
  • [15] J. H. West, P. C. Hall, C. L. Hanson, K. Prier, C. Giraud-Carrier, E. S. Neeley, and M. D. Barnes, “Temporal variability of problem drinking on twitter,” 2012.
  • [16] G. Coppersmith, M. Dredze, C. Harman, and K. Hollingshead, “From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses,” in Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2015, pp. 1–10.
  • [17] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, 2013.
  • [18] S. Volkova, Y. Bachrach, M. Armstrong, and V. Sharma, “Inferring latent user properties from texts published in social media.” in AAAI, 2015, pp. 4296–4297.
  • [19] N. Hossain, T. Hu, R. Feizi, A. M. White, J. Luo, and H. Kautz, “Inferring fine-grained details on user activities and home location from social media: Detecting drinking-while-tweeting patterns in communities,” arXiv preprint arXiv:1603.03181, 2016.
  • [20] M. L. Cooper, M. Russell, J. B. Skinner, M. R. Frone, and P. Mudar, “Stress and alcohol use: Moderating effects of gender, coping, and alcohol expectancies,” J. of Abnormal Psychology, vol. 101, no. 1, pp. 139–152, 1992.
  • [21] S. A. R. Al-Dubai, R. A. Al-Naggar, M. A. Alshagga, and K. G. Rampal, “Stress and coping strategies of students in a medical faculty in malaysia,” The Malaysian Journal of Medical Sciences, 2011.
  • [22] I. S. Obot, “The measurement of drinking patterns and alcohol problems in nigeria,” J. of Substance Abuse, vol. 12, pp. 169–181, 2000.
  • [23] L. A. Greenfeld, “Alcohol and crime: An analysis of national data on the prevalence of alcohol involvement in crime,” 1998.
  • [24] B. Zhang, C. Cartmill, and R. Ferrence, “The role of spending money and drinking alcohol in adolescent smoking,” Addiction, vol. 103, no. 2, pp. 310–319, 2008.
  • [25] C. A. Anderson, , and B. J. Bushman, “Human aggression,” Annual Review of Psychology, vol. 53, no. 1, pp. 27–51, 2002.
  • [26] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of words: Liwc and computerized text analysis methods,” Journal of Language and Social Psychology, vol. 29, no. 1, pp. 24–54, 2010.
  • [27] A. Mullick, S. Maheshwari, P. Goyal, N. Ganguly et al., “A generic opinion-fact classifier with application in understanding opinionatedness in various news section,” in WWW Companion, 2017, pp. 827–828.