Alcohol consumption has serious implications on individual’s health. In 2012, 5.9% of all global deaths (7.6% for men and 4.0% for women), were attributed to alcohol consumption and the number is increasing over time. In US alone, nearly 88,000 people (approximately 62,000 men and 26,000 women) die from alcohol-related causes yearly, making it the fourth leading preventable cause of death in that country111http://1.usa.gov/1hcR6dX. In addition to causing traumatic death and injury, alcohol consumption also leads to chronic liver disease, cancers, acute alcohol poisoning, and fetal alcohol syndrome. Alcoholism and other health related issues like smoking are known to be influenced by one’s social environment . With increase in usage of online social media as a preferred medium of communication, it has become a diagnostic tool to identify human nature. According to Pew Research Center, as of January 2014, 74% of online adults use social networking sites; the number is more than 80% for individuals under the age of 50. Also from the reports published by the Centers for Disease Control and Prevention (CDC)222http://1.usa.gov/23PMj4F, we found the prevalence of heavy drinkers/smokers in the said age group. This suggests that social media is a viable platform to study the alcoholic users and the interaction (exchange of messages, posts etc.) in these social media has opened up a research corridor for observing and understanding individuals’ psychological states and their social environment. It is very important to identify how these characteristics vary dynamically for different human behaviors. It will also be quite informative to examine how different characteristics vary demographically (sex, age, region etc.), for different time frames like days of week (weekdays vs weekends), monthly (start of the month vs end of the month) or hourly (morning vs work hours vs evening vs late night). Demographic patterns can be different for psychogenic people, predominant drunkers and others scenarios than the normal people. For example, we can identify predominant drunk peoples’ suicidal tendencies or change in behavior in near future by tracking social media so that we can control situations accordingly. Thus we can use social media as an important medical diagnostic system and develop a predominant drunker identification model.
In this paper, we investigate how social media language usage and interactions can be used to characterize and understand the drunk texters. Subsequently, we leverage on the behavioral, social, psychological and linguistic aspects of the Twitter users to propose a classification framework to automatically identify the drunk texters. The automatic identification of drunk texters is important because these users can then be targeted by the communities that are missioned to cure alcohol abuse and help the alcoholics to quit addiction. Also as these users tend to abuse in social media under the influence of alcohol, our automatic identification framework can be used to enrich the process of filtering abusive contents from the media.
Ii Related Work
There have been several works on health and social media. Joshi et al.  propose a computational framework for identifying drunk tweets from non-drunk tweets. Tamersoy et al.  study the abstinence from smoking and drinking. They use linguistic features of the content shared by the users as well as the network structure of their social interactions to distinguish between the short-term and long-term abstinence. Murnane and Counts examine the cessation process of smoking. Many of the past research works focus on finding relationship between alcohol abusers with human aggression , crime , suicide .
Strapparava and Mihalcea  perform a computational analysis of the language of drug users when talking about their drug experiences. Cameron et al.  develop a web platform (PREDOSE), focusing on epidemiological study of prescription and related drug abuse practices using social media (e.g., online forums). Paul and Dredze [10, 11] have developed multidimensional latent text model to capture orthogonal factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). Coyle et al. 
classify and characterize different kinds of drug use experiences, using a random-forest classifier over 1000 reports of 10 drugs from the drug information website Erowid.org (manually identified subsets of words differentiated by drugs).
On the other hand, there exist several works that try to establish the role of online social media in alcoholic’s life - how it influence alcohol use of the adult users , how people use the social network to display their drunk behaviors . West et al.  examine the extent to which individuals tweet about the problem of drinking, and to identify if such tweets correspond with time periods when the problem of drinking was likely to occur.
Some researches focused on extracting various sociological aspects from online social media. Coppersmith et al.  analyze broad range of mental health conditions in Twitter texts by identifying self-reported statements of diagnosis. Schwartz et al.  predict latent personal attributes including user demographics, online personality, emotions and sentiments from texts published on Twitter. Volkova et al.  explore emotion, sentiment and other personality types.
Iii Dataset Preparation
Our first step was to identify Twitter users who are drunk texters. To achieve this goal, we used the manually labelled tweet dataset mentioned in . We then separately crawled the timeline of the posters of these tweets. We then filtered out the tweets of these users based on keyword333Initial seed keywords are collected from  like - ‘drunk’, ‘tipsy’, ‘intoxicated’, ‘buzzed’ etc. Later we increased the datasets using similar keywords from wordnet like ‘booze’, ‘juiced’ etc. and make the final wordset of length 61. and then got the tweets manually labelled as drunk-texts or not by 3 of the authors. We considered only those tweets which are tagged drunk text unanimously by all of them. After this manual labeling, we consider those users who have posted at least 5 drunk tweets. In total, we had 278 drunk texters. We then prepared the dataset corresponding to the non-drunk texters444Normal users are defined as the user who never posted any ‘drunk’ related tweets i.e. none of the tweet contain any word from the previous wordset of length 61.. We use Twitter 1% random sample from the month of January, 2014 to obtain a set of users who didn’t have any tweets containing any of the keywords related to alcohol consumption. We chose 278 such non-drunk texters from this set in order to keep both the sets comparable. Following are the example tweets which depict that the user is a drunk-texter.
I know its Saturday but I’m trying to get roofied drunk
Gotta say, my spelling’s been pretty on-point considering how drunk I’ve been tonight
Alcohol and weed are like the mom and dad I always wanted
Iv Behavioral, Psychological and linguistic aspects of the drunk texters
In this section, we focus on the comparative study of the drunk texters and non-drunk texters based on their behavioral, psychological and linguistic aspects. Our empirical study is based on the content extracted from the tweets of the drunk and non-drunk texters. Each of the analysis has been done separately for the tweets posted on weekends and weekdays to differentiate between the lifestyles of the users over the weekdays and in the weekends.
Iv-a Health and food
Since health is one of the crucial aspects of well-being, people often share information related to health and food over social media. We empirically find if drunk texters and non-drunk texters have some contrasting contents related to health and food. Consumption of alcohol has adverse impacts on health. It could be long-term (impact on health over a period of time) or short-term (hangover from last night or throwing up)555http://1.usa.gov/1d7aWk2; so drunk texters might share their experiences on Twitter. To obtain the behavior of drunk texters and non-drunk texters in regard to health and food content sharing, we compiled a list of most frequently used health and food related keywords666http://bit.ly/200kea3 on social media; further we computed the fraction of health and food related keywords for both the set of users. Figure 1 and 2 show that drunk texters, in general, use more of health and food related keywords in their tweets as compared to non-drunk texters.
People tend to drink in response to stress, accordingly exposure to the tension-producing situations lead to increased drinking ; so there is a high chance that drunk texters while posting the tweets will communicate their stress. In general stress levels are rising severely, a survey by American Psychological Association portrays a picture of high stress and ineffective coping mechanisms that appear to be ingrained in our culture777http://bit.ly/1cz4n99. People might share the stressful situations they have been in, so non-drunk texters also have a decent chance of posting tweets expressing stress and anxiety.
The major sources of stress are listed as follows 
To empirically find the stress related behavior of the drunk and the non-drunk texters we gather a list of stress related keywords\getrefnumbernote1\getrefnumbernote1footnotemark: note1 corresponding to each of the source of the stress mentioned above. Further we compute the fraction of stress related keywords for both the drunk and non-drunk texters. Figure 3 shows the contrasting behavior between them and illustrates that in general non-drunk texters seem to experience more stress arising out of financial problems and low self-esteem whereas drunk texters experience more stress due to inter-personal conflicts, smoking and family problems.
Iv-C Swearing and abusing
. Swearing being a verbal form of aggression can serve as an indicator of aggressive behavior. We speculate that drunk texters in general are more probable to use swear words in their tweets because of relatively higher violent behavior. To investigate whether this trend is also observed on Twitter we compiled a list of swear related keywords\getrefnumbernote1\getrefnumbernote1footnotemark: note1 used most frequently on social media and then compute the fraction of such keywords for both the drunk and the non-drunk texters. Figure 4 supports our speculation that drunk texters use a larger proportion of swear words in their tweets compared to non-drunk texters.
Spending money and drinking alcohol are positively correlated . Drunk texters might post about their spending on drinks which might be a considerable share of their income. For the analysis, we compiled a list of money related keywords\getrefnumbernote1\getrefnumbernote1footnotemark: note1 used most frequently on social media and then computed the fraction of money related keywords for both the alcoholic and the non-drunk texters. Figure 5 shows that drunk texters are more likely to use money related words during the weekdays compared to the weekends in their tweets.
Iv-E Sentiment analysis
Sentiments of a user greatly depend on the state of the user. We believe that a user’s tweets shall largely depend on the state in which the user is tweeting. People tend to speak differently when he/she is in a drunken state compared to when in a normal state. The same clause should be applicable while the user is tweeting. We have used sentiment lexicon888https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
for the sentiment analysis. Figure6 shows the behavior of the drunk and the non-drunk texters and illustrates that in general drunk texters have higher sentiment score in their tweets as compared to non-drunk texters.
Iv-F Psychological and linguistic states
Theories on drinking and aggression postulate that alcohol contributes indirectly to increased aggression by causing cognitive, emotional, and psychological changes that may reduce self-awareness or result in inaccurate assessment of risks . The function and emotion words people use provide important psychological cues to their thought processes, emotional states, intentions, and motivations. To capture user’s social and psychological states we used Linguistic Inquiry and Word Count (LIWC) framework . Some of the interesting observations are presented in Table I. It is evident from the table that drunk texters express more anxiety, anger, sadness and also show more sexual aggression by using more sexual words in their tweets than the non-drunk texters. Also the drunk texters tweet more about leisure activities and are less religious.
V Classification Framework
From discussions in the earlier section, it is evident that there exists differences between drunk and non-drunk texters in various behavioral, psychological and linguistic aspects. We use these discriminative aspects as features in our classification framework to classify a user into a drunk texter or not. We use 10-fold cross-validation technique of various classifiers like Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Bagging, Decision Tree (DT-J48), Naive Bayes, Ada Boost for checking robustness of our method. All the classifiers perform very well. TableII shows that the evaluation results for weekday and weekend data with various classification techniques in terms of accuracy, precision, recall, F1-Score, ROC Area. SVM classifier performs the best as we obtain 96.78% (weekday), 96.14% (weekend) accuracy with avg. precision - 0.968 (weekday), 0.963 (weekend) and recall of 0.968 (weekday) and 0.961 (weekend). It also gives better area under the ROC curve. We also compared the drunk texters set with a random sample of users and we achieve a similar very high accuracy with high precision and recall which establishes the fact that the features we use are robust and strong discriminators of drunk-texting.
In order to determine the discriminative power of each feature, we compute the chi-square () value and the information gain. Table III shows the rank order of all features based on the
value. The ranks of the features are very similar when ranked by information gain (Kullback-Leibler divergence). The most prominent discriminative features are various linguistic as well psychological features obtained from LIWC.
|328.7726||13||Smoking related words|
|304.3776||20||Home related words|
|292.2972||23||Motion related words|
|289.969||24||Food related words|
Vi-a Bot Detection
We have identified bots having more than 99% drunk related tweets, for example - ‘GhumPaitase’, ‘WhoDoYouKnwHere’, ‘UrDrunkTweets’ etc. Our system were also able to detect bots as shown in Fig. 7.
Vi-B Temporal Tweeting behavior and community detection
We further try to understand the temporal tweeting pattern of the users999To capture the temporal tweeting characteristics more efficiently, we increase the drunk texters’s dataset to 800 users. For this task, we identify some additional keywords, based on their co-occurrences with drunk words (61 length wordset) and we assign each tweet a ‘drunk’ score based on these words and then analyze the peaks in the profile as shown in Fig. 8
. We observe that - (i) average peak height of tweets of drunk texters follow normal distribution, (ii) most of the drunk texters having inter-peak distance less than 100 tweets.
Existence of communities We also study the existence of communities among the drunk texters. We identify 2 different types of communities :
1. Interest Based Communities:
First, we investigate whether there exist interest-driven communities. For this task, for each user, we construct a vector of the features - (a) no. of peaks, (b) average peak height (c) std. error (peak height) (d) max peak height (e) mean peak interval and (f) std. error (peak interval). Users are the nodes in the graph and an edge between two users are formed if the cosine similarity of the feature vectors of the user-pair crosses a certain threshold (0.2). We then apply Louvain Algorithm to detect communities. Three different types of communities are formed of length - 276, 193 and 312.
2. Bond Based Communities:
We also observe that these users have common friends and followers and the distribution shows a power-law behavior. Hence, we try to observe if there are social communities formed among these drunk texters. We construct two kind of communities - based on common friends and common followers. For common friends-based communities, we obtain a total of 179 communities and for common followers-based communities, 283 communities are formed which suggest that there are large number of small-sized communities existing.
In this paper, we investigate various psycholinguistic aspects of the drunk texters. We then use these characteristic properties as features for a classification model that tries to classify whether a user is drunk texter or not. To the best of our knowledge, this is the first study which tries to use the psycholinguistic aspects of social media interactions to identify drunk texters. Our proposed classification framework achieves an accuracy of 96.78% (weekday), 96.14% (weekend) with very high precision and recall. This high accuracy suggest that it can be used as an alternate approach for identifying keyword-based classification of drunk texters which requires a lot of manual intervention to obtain accurate results. We observed that linguistic features (LIWC) are the most discriminative features compared to others. One immediate future research is to identify various steps of how social media influence a non drunk person to become predominant drunkers and by detecting change in characteristics in various demographic dimensions how can we increase social awareness to decrease social influences. One direction is to explore different feature behaviors - like how opinion dynamics  change or correlation with other different addictions for predominant drunkers compared to non-drunkers. Another idea is to detect various subsets of drunkers - occasional, situational or regular and respective change in personal life and different associated health hazards.
-  S. Galea, A. Nandi, and D. Vlahov, “The social epidemiology of substance use,” Epidemiologic reviews, vol. 26, no. 1, pp. 36–52, 2004.
-  A. Joshi, A. M. B. AR, P. Bhattacharyya, and M. J. Carman, “A computational approach to automatic prediction of drunk-texting,” Volume 2: Short Papers, p. 604, 2015.
-  A. Tamersoy, M. De Choudhury, and D. H. Chau, “Characterizing smoking and drinking abstinence from social media,” in Proc. of Hypertext’ 15. ACM, 2015, pp. 139–148.
-  E. L. Murnane and S. Counts, “Unraveling abstinence and relapse: smoking cessation reflected in social media,” in Proc. of SIGCHI’ 14. ACM, 2014, pp. 1345–1354.
-  B. J. Bushman and H. M. Cooper, “Effects of alcohol on human aggression: An intergrative research review.” Psychological bulletin, vol. 107, no. 3, p. 341, 1990.
-  C. Carpenter, “Heavy alcohol use and crime: Evidence from underage drunk-driving laws,” J. of Law and Economics, vol. 50, no. 3, pp. 539–557, 2007.
-  J. Merrill, G. Milker, J. Owens, and A. Vale, “Alcohol and attempted suicide,” British journal of addiction, vol. 87, no. 1, pp. 83–89, 1992.
-  C. Strapparava and R. Mihalcea, “A computational analysis of the language of drug addiction,” in EACL-Short Papers, 2017.
-  D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, and R. Falck, “Predose: a semantic web platform for drug abuse epidemiology using social media,” Journal of biomedical informatics, vol. 46, no. 6, pp. 985–997, 2013.
-  M. J. Paul and M. Dredze, “Experimenting with drugs (and topic models): Multi-dimensional exploration of recreational drug discussions,” in AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text, 2012.
-  ——, “Drug extraction from the web: Summarizing drug experiences with multi-dimensional topic models,” in NAACL-HLT, 2013.
-  J. R. Coyle, D. E. Presti, and M. J. Baggott, “Quantitative analysis of narrative reports of psychedelic drugs,” arXiv preprint arXiv:1206.0312, 2012.
-  S. H. Cook, J. A. Bauermeister, D. Gordon-Messer, and M. A. Zimmerman, “Online network influences on emerging adults’ alcohol and drug use,” J. of youth and adolescence, vol. 42, no. 11, pp. 1674–1686, 2013.
-  K. Beullens and A. Schepers, “Display of alcohol use on facebook: A content analysis,” Cyberpsychology, Behavior, and Social Networking, vol. 16, no. 7, pp. 497–503, 2013.
-  J. H. West, P. C. Hall, C. L. Hanson, K. Prier, C. Giraud-Carrier, E. S. Neeley, and M. D. Barnes, “Temporal variability of problem drinking on twitter,” 2012.
-  G. Coppersmith, M. Dredze, C. Harman, and K. Hollingshead, “From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses,” in Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2015, pp. 1–10.
-  H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, 2013.
-  S. Volkova, Y. Bachrach, M. Armstrong, and V. Sharma, “Inferring latent user properties from texts published in social media.” in AAAI, 2015, pp. 4296–4297.
-  N. Hossain, T. Hu, R. Feizi, A. M. White, J. Luo, and H. Kautz, “Inferring fine-grained details on user activities and home location from social media: Detecting drinking-while-tweeting patterns in communities,” arXiv preprint arXiv:1603.03181, 2016.
-  M. L. Cooper, M. Russell, J. B. Skinner, M. R. Frone, and P. Mudar, “Stress and alcohol use: Moderating effects of gender, coping, and alcohol expectancies,” J. of Abnormal Psychology, vol. 101, no. 1, pp. 139–152, 1992.
-  S. A. R. Al-Dubai, R. A. Al-Naggar, M. A. Alshagga, and K. G. Rampal, “Stress and coping strategies of students in a medical faculty in malaysia,” The Malaysian Journal of Medical Sciences, 2011.
-  I. S. Obot, “The measurement of drinking patterns and alcohol problems in nigeria,” J. of Substance Abuse, vol. 12, pp. 169–181, 2000.
-  L. A. Greenfeld, “Alcohol and crime: An analysis of national data on the prevalence of alcohol involvement in crime,” 1998.
-  B. Zhang, C. Cartmill, and R. Ferrence, “The role of spending money and drinking alcohol in adolescent smoking,” Addiction, vol. 103, no. 2, pp. 310–319, 2008.
-  C. A. Anderson, , and B. J. Bushman, “Human aggression,” Annual Review of Psychology, vol. 53, no. 1, pp. 27–51, 2002.
-  Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of words: Liwc and computerized text analysis methods,” Journal of Language and Social Psychology, vol. 29, no. 1, pp. 24–54, 2010.
-  A. Mullick, S. Maheshwari, P. Goyal, N. Ganguly et al., “A generic opinion-fact classifier with application in understanding opinionatedness in various news section,” in WWW Companion, 2017, pp. 827–828.