Characterizing Diabetes, Diet, Exercise, and Obesity Comments on Twitter

Social media provide a platform for users to express their opinions and share information. Understanding public health opinions on social media, such as Twitter, offers a unique approach to characterizing common health issues such as diabetes, diet, exercise, and obesity (DDEO), however, collecting and analyzing a large scale conversational public health data set is a challenging research task. The goal of this research is to analyze the characteristics of the general public's opinions in regard to diabetes, diet, exercise and obesity (DDEO) as expressed on Twitter. A multi-component semantic and linguistic framework was developed to collect Twitter data, discover topics of interest about DDEO, and analyze the topics. From the extracted 4.5 million tweets, 8 of tweets discussed diabetes, 23.7 The strongest correlation among the topics was determined between exercise and obesity. Other notable correlations were: diabetes and obesity, and diet and obesity DDEO terms were also identified as subtopics of each of the DDEO topics. The frequent subtopics discussed along with Diabetes, excluding the DDEO terms themselves, were blood pressure, heart attack, yoga, and Alzheimer. The non-DDEO subtopics for Diet included vegetarian, pregnancy, celebrities, weight loss, religious, and mental health, while subtopics for Exercise included computer games, brain, fitness, and daily plan. Non-DDEO subtopics for Obesity included Alzheimer, cancer, and children. With 2.67 billion social media users in 2016, publicly available data such as Twitter posts can be utilized to support clinical providers, public health experts, and social scientists in better understanding common public opinions in regard to diabetes, diet, exercise, and obesity.


page 1

page 2

page 3

page 4


Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise

Social media based digital epidemiology has the potential to support fas...

Characterizing Diseases and disorders in Gay Users' tweets

A lack of information exists about the health issues of lesbian, gay, bi...

Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?

Social media services such as Twitter are a valuable source of informati...

Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media

Millions of users share their experiences on social media sites, such as...

A Large-Scale Empirical Study of Geotagging Behavior on Twitter

Geotagging on social media has become an important proxy for understandi...

"Like Sheep Among Wolves": Characterizing Hateful Users on Twitter

Hateful speech in Online Social Networks (OSNs) is a key challenge for c...

Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

As an online, crowd-sourced, open English-language slang dictionary, the...

1 Introduction

The global prevalence of obesity has doubled between 1980 and 2014, with more than 1.9 billion adults considered as overweight and over 600 million adults considered as obese in 2014 (World Health Organization Fact Sheet, 2016). Since the 1970s, obesity has risen 37 percent affecting 25 percent of the U.S. adults (Flegal et al., 2012). Similar upward trends of obesity have been found in youth populations, with a 60% increase in preschool aged children between 1990 and 2010 (Harvard HSPH, 2017). Overweight and obesity are the fifth leading risk for global deaths according to the European Association for the Study of Obesity (World Health Organization Fact Sheet, 2016). Excess energy intake and inadequate energy expenditure both contribute to weight gain and diabetes (Hill et al., 2012; Wing et al., 2001).

Obesity can be reduced through modifiable lifestyle behaviors such as diet and exercise (Wing et al., 2001). There are several comorbidities associated with being overweight or obese, such as diabetes (Kopelman, 2000)

. The prevalence of diabetes in adults has risen globally from 4.7% in 1980 to 8.5% in 2014. Current projections estimate that by 2050, 29 million Americans will be diagnosed with type 2 diabetes, which is a 165% increase from the 11 million diagnosed in 2002

(Boyle et al., 2001). Studies show that there are strong relations among diabetes, diet, exercise, and obesity (DDEO) (Hartz et al., 1983; Wing et al., 2001; Barnard et al., 2009; Association et al., 2004); however, the general public’s perception of DDEO remains limited to survey-based studies (Tompson et al., 2012).

The growth of social media has provided a research opportunity to track public behaviors, information, and opinions about common health issues. It is estimated that the number of social media users will increase from 2.34 billion in 2016 to 2.95 billion in 2020 (Statista, 2017). Twitter has 316 million users worldwide (Olanoff, 2015) providing a unique opportunity to understand users’ opinions with respect to the most common health issues (Mejova et al., 2015)

. Publicly available Twitter posts have facilitated data collection and leveraged the research at the intersection of public health and data science; thus, informing the research community of major opinions and topics of interest among the general population

(Nasukawa and Yi, 2003; Wiebe et al., 2003; Zabin and Jefferies, 2008) that cannot otherwise be collected through traditional means of research (e.g., surveys, interviews, focus groups) (Eichstaedt et al., 2015; Wartell, 2015). Furthermore, analyzing Twitter data can help health organizations such as state health departments and large healthcare systems to provide health advice and track health opinions of their populations and provide effective health advice when needed (Mejova et al., 2015).

Among computational methods to analyze tweets, computational linguistics is a well-known developed approach to gain insight into a population, track health issues, and discover new knowledge (Paul and Dredze, 2011, 2012; Harris et al., 2014; Zhao et al., 2011). Twitter data has been used for a wide range of health and non-health related applications, such as stock market (Bollen et al., 2011) and election analysis (Tumasjan et al., 2010). Some examples of Twitter data analysis for health-related topics include: flu (Ritterman et al., 2009; Szomszor et al., 2010; Lampos et al., 2010; Lampos and Cristianini, 2012, 2010; Culotta, 2010), mental health (Coppersmith et al., 2015), Ebola (Lazard et al., 2015; Odlum and Yoon, 2015), Zika (Fu et al., 2016), medication use (Scanfeld et al., 2010; Hanson et al., 2013; Buntain and Golbeck, 2015), diabetes (Harris et al., 2013), and weight loss and obesity (Dahl et al., 2016; Ghosh and Guha, 2013; Vickey et al., 2013; Turner-McGrievy and Beets, 2015; Harris et al., 2014).

The previous Twitter studies have dealt with extracting common topics of one health issue discussed by the users to better understand common themes; however, this study utilizes an innovative approach to computationally analyze unstructured health related text data exchanged via Twitter to characterize health opinions regarding four common health issues, including diabetes, diet, exercise and obesity (DDEO) on a population level. This study identifies the characteristics of the most common health opinions with respect to DDEO and discloses public perception of the relationship among diabetes, diet, exercise and obesity. These common public opinions/topics and perceptions can be used by providers and public health agencies to better understand the common opinions of their population denominators in regard to DDEO, and reflect upon those opinions accordingly.

2 Methods

Our approach uses semantic and linguistics analyses for disclosing health characteristics of opinions in tweets containing DDEO words. The present study included three phases: data collection, topic discovery, and topic-content analysis.

2.1 Data Collection

This phase collected tweets using Twitter’s Application Programming Interfaces (API) (Twitter, 2017). Within the Twitter API, diabetes, diet, exercise, and obesity were selected as the related words (Wing et al., 2001) and the related health areas (Paul and Dredze, 2011). Twitter’s APIs provides both historic and real-time data collections. The latter method randomly collects 1% of publicly available tweets. This paper used the real-time method to randomly collect 10% of publicly available English tweets using several pre-defined DDEO-related queries (Table 1) within a specific time frame. We used the queries to collect approximately 4.5 million related tweets between 06/01/2016 and 06/30/2016. The data will be available in the first author’s website111 Figure 1 shows a sample of collected tweets in this research.

Figure 1: A Sample of Tweets

2.2 Topic Discovery

To discover topics from the collected tweets, we used a topic modeling approach that fuzzy clusters the semantically related words such as assigning “diabetes”, “cancer”, and “influenza” into a topic that has an overall “disease” theme (Karami et al., 2017; Karami, 2015). Topic modeling has a wide range of applications in health and medical domains such as predicting protein-protein relationships based on the literature knowledge (Asou and Eguchi, 2008), discovering relevant clinical concepts and structures in patients’ health records (Arnold et al., 2010), and identifying patterns of clinical events in a cohort of brain cancer patients (Arnold and Speier, 2012).

Among topic models, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is the most popular effective model (Lu et al., 2011; Paul and Dredze, 2011) as studies have shown that LDA is an effective computational linguistics model for discovering topics in a corpus (Mcauliffe and Blei, 2008; Hong and Davison, 2010). LDA assumes that a corpus contains topics such that each word in each document can be assigned to the topics with different degrees of membership (Karami et al., 2015a, b; Karami and Gangopadhyay, 2014).

Twitter users can post their opinions or share information about a subject to the public. Identifying the main topics of users’ tweets provides an interesting point of reference, but conceptualizing larger subtopics of millions of tweets can reveal valuable insight to users’ opinions. The topic discovery component of the study approach uses LDA to find main topics, themes, and opinions in the collected tweets.

We used the Mallet implementation of LDA (Blei et al., 2003; McCallum, 2002) with its default settings to explore opinions in the tweets. Before identifying the opinions, two pre-processing steps were implemented: (1) using a standard list for removing stop words, that do not have semantic value for analysis (such as “the”); and, (2) finding the optimum number of topics. To determine a proper number of topics, log-likelihood estimation with 80% of tweets for training and 20% of tweets for testing was used to find the highest log-likelihood, as it is the optimum number of topics (Wallach et al., 2009). The highest log-likelihood was determined 425 topics.

2.3 Topic Content Analysis

The topic content analysis component used an objective interpretation approach with a lexicon-based approach to analyze the content of topics. The lexicon-based approach uses dictionaries to disclose the semantic orientation of words in a topic. Linguistic Inquiry and Word Count (LIWC) is a linguistics analysis tool that reveals thoughts, feelings, personality, and motivations in a corpus

(Karami and Zhou, 2015, 2014a, 2014b). LIWC has accepted rate of sensitivity, specificity, and English proficiency measures (Golder and Macy, 2011). LIWC has a health related dictionary that can help to find whether a topic contains words associated with health. In this analysis, we used LIWC to find health related topics.

3 Results

Obesity and Diabetes showed the highest and the lowest number of tweets (51.7% and 8.0%). Diet and Exercise formed 23.7% and 16.6% of the tweets (Table 1).

Health Issue Queries Number of Tweets Percentage
Diabetes diabetes OR #diabetes 353,655 8.0%
Diet diet OR #diet OR dieting 1,045,374 23.7%
Exercise exercise OR #exercise OR exercising 734,118 16.6%
Obesity obesity OR #obesity OR fat 2,283,517 51.7%

Table 1: DDEO Queries

Out of all 4.5 million DDEO-related tweets returned by Tweeter’s API, the LDA found 425 topics. We used LIWC to filter the detected 425 topics and found 222 health-related topics. Additionally, we labeled topics based on the availability of DDEO words. For example, if a topic had “diet”, we labeled it as a diet-related topic. As expected and driven by the initial Tweeter API’s query, common topics were Diabetes, Diet, Exercise, and Obesity (DDEO). (Table 2) shows that the highest and the lowest number of topics were related to exercise and diabetes (80 and 21 out of 222). Diet and Obesity had almost similar rates (58 and 63 out of 222).

Each of the DDEO topics included several common subtopics including both DDEO and non-DDEO terms discovered by the LDA algorithm (Table 2). Common subtopics for “Diabetes”, in order of frequency, included type 2 diabetes, obesity, diet, exercise, blood pressure, heart attack, yoga, and Alzheimer. Common subtopics for “Diet” included obesity, exercise, weight loss [medicine], celebrities, vegetarian, diabetes, religious diet, pregnancy, and mental health. Frequent subtopics for “Exercise” included fitness, obesity, daily plan, diet, brain, diabetes, and computer games. And finally, the most common subtopics for “Obesity” included diet, exercise, children, diabetes, Alzheimer, and cancer (Table 2). Table 3 provides illustrative examples for each of the topics and subtopics.

Frequency Subtopics Distributions (%) Topics Frequency Subtopics Distributions (%)
Diabetes 21 Diabetes Type 2 42.87% Diet 63 Obesity 39.69%
Obesity 14.29% Exercise 15.87%
Diet 9.52% Weight Loss 12.71%
Exercise 9.52% Celebrities 9.52%
Blood Pressure 9.52% Vegetarian 9.52%
Heart Attack 4.76% Diabetes 3.17%
Yoga 4.76% Religious Diet 3.17%
Alzheimer 4.76% Weight Loss Medicine 3.17%
Pregnancy 1.59%
Mental Health 1.59%

80 Fitness 32.5% Obesity 58 Diet 43.11%
Obesity 22.5% Exercise 31.04%
Daily Plan 21.25% Children 17.24%
Diet 11.25% Diabetes 5.17%
Brain 8.75% Alzheimer 1.72%
Diabetes 2.50% Cancer 1.72%
Computer Games 1.25%

Table 2: DDEO Topics and Subtopics - Diabetes, Diet, Exercise, and Obesity are shown with italic and underline styles in subtopics

Further exploration of the subtopics revealed additional patterns of interest (Tables 2 and 3). We found 21 diabetes-related topics with 8 subtopics. While type 2 diabetes was the most frequent of the sub-topics, heart attack, Yoga, and Alzheimer are the least frequent subtopics for diabetes. Diet had a wide variety of emerging themes ranging from celebrity diet (e.g., Beyonce) to religious diet (e.g., Ramadan). Diet was detected in 63 topics with 10 subtopics; obesity, and pregnancy and mental health were the most and the least discussed obesity-related topics, respectively. Exploring the themes for Exercise subtopics revealed subjects such as computer games (e.g., Pokemon-Go) and brain exercises (e.g., memory improvement). Exercise had 7 subtopics with fitness as the most discussed subtopic and computer games as the least discussed subtopic. Finally, Obesity themes showed topics such as Alzheimer (e.g., research studies) and cancer (e.g., breast cancer). Obesity had the lowest diversity of subtopics: six with diet as the most discussed subtopic, and Alzheimer and cancer as the least discussed subtopics.

Blood Pressure Heart Attack Diabetes Type II Yoga Alzheimer Obesity Diet and Exercise Obesity
risk heart change diabetes medicine diabetes helps health
blood diabetes diabetes #yogafightsdiabetes diseases surgery diabetes diet
high cardiovascular #lifestyle yoga common treatment children obesity
diabetes attack type control drugs obesity exercise immune
pressure stroke ii life Alzheimer cure diet syndrome
Vegetarian Pregnancy Diet Celebrities Diet Weight Loss Diet Weight Loss Medicine Religious Diet Mental Health Exercise& Diabetes
diet pregnancy diet weightlose diet burning health helps
eat motherhood beyonce effective #weightloss #weightloss nutrition diabetes
fruits diet tips morning slimming fasting benefits children
vegetables baby fatloss dieting pills Ramadan healing exercise
fresh motherhood #angelinajolie banana #fatburners diets #mentalhealth diet
Diet Daily Plan Computer Games Brain Fitness Diet& Diabetes Obesity Exercise
diet food exercise exercise fitness helps workout bellyfat
exercise exercise finding brain #gymlife diabetes burning losing
protein calorie pokemon improve bodybuilding children exercise exercise
beauty goal #pokemongo memory gym exercise fatburn ways
muscle completed hour performance workout diet obesity effective
Diet Alzheimer Cancer Children Diabetes
health study cancer obesity diabetes
diet link breast kids surgery
obesity Alzheimer study childhood treatment
immune obesity risk rates obesity
syndrome research obesity problem cure
Table 3: Topics Examples

Diabetes subtopics show the relation between diabetes and exercise, diet, and obesity. Subtopics of diabetes revealed that users post about the relationship between diabetes and other diseases such as heart attack (Tables 2 and 3). The subtopic Alzheimer is also shown in the obesity subtopics. This overlap between categories prompts the discussion of research and linkages among obesity, diabetes, and Alzheimer’s disease. Type 2 diabetes was another subtopic expressed by users and scientifically documented in the literature.























Figure 2: DDEO Correlation P-Value

The main DDEO topics showed some level of interrelationship by appearing as subtopics of other DDEO topics. The words with italic and underline styles in Table 2 demonstrate the relation among the four DDEO areas. Our results show users’ interest about posting their opinions, sharing information, and conversing about exercise & diabetes, exercise & diet, diabetes & diet, diabetes & obesity, and diet & obesity (Figure 2). The strongest correlation among the topics was determined to be between exercise and obesity (). Other notable correlations were: diabetes and obesity (), and diet and obesity ().

4 Discussion

Diabetes, diet, exercise, and obesity are common public health related opinions. Analyzing individual- level opinions by automated algorithmic techniques can be a useful approach to better characterize health opinions of a population. Traditional public health polls and surveys are limited by a small sample size; however, Twitter provides a platform to capture an array of opinions and shared information a expressed in the words of the tweeter. Studies show that Twitter data can be used to discover trending topics, and that there is a strong correlation between Twitter health conversations and Centers for Disease Control and Prevention (CDC) statistics (Prier et al., 2011).

This research provides a computational content analysis approach to conduct a deep analysis using a large data set of tweets. Our framework decodes public health opinions in DDEO related tweets, which can be applied to other public health issues. Among health-related subtopics, there are a wide range of topics from diseases to personal experiences such as participating in religious activities or vegetarian diets.

Diabetes subtopics showed the relationship between diabetes and exercise, diet, and obesity (Tables 2 and 3). Subtopics of diabetes revealed that users posted about the relation between diabetes and other diseases such as heart attack. The subtopic Alzheimer is also shown in the obesity subtopics. This overlap between categories prompts the discussion of research and linkages among obesity, diabetes, and Alzheimer’s disease. Type 2 diabetes was another subtopic that was also expressed by users and scientifically documented in the literature. The inclusion of Yoga in posts about diabetes is interesting. While yoga would certainly be labeled as a form of fitness, when considering the post, it was insightful to see discussion on the mental health benefits that yoga offers to those living with diabetes (Ross and Thomas, 2010).

Diet had the highest number of subtopics. For example, religious diet activities such as fasting during the month of Ramadan for Muslims incorporated two subtopics categorized under the diet topic (Tables 2 and 3). This information has implications for the type of diets that are being practiced in the religious community, but may help inform religious scholars who focus on health and psychological conditions during fasting. Other religions such as Judaism, Christianity, and Taoism have periods of fasting that were not captured in our data collection, which may have been due to lack of posts or the timeframe in which we collected data. The diet plans of celebrities were also considered influential to explaining and informing diet opinions of Twitter users (Boyington et al., 2008).

Exercise themes show the Twitter users’ association of exercise with “brain” benefits such as increased memory and cognitive performance (Tables 2 and 3) (Cotman and Berchtold, 2002). The topics also confirm that exercising is associated with controlling diabetes and assisting with meal planning (Laaksonen et al., 2005; Association et al., 2004), and obesity (Ross et al., 2000). Additionally, we found the Twitter users mentioned exercise topics about the use of computer games that assist with exercising. The recent mobile gaming phenomenon Pokeman-Go game (Pokeman-Go Game, 2017)

was highly associated with the exercise topic. Pokemon-Go allows users to operate in a virtual environment while simultaneously functioning in the real word. Capturing Pokemons, battling characters, and finding physical locations for meeting other users required physically activity to reach predefined locations. These themes reflect on the potential of augmented reality in increasing patients’ physical activity levels

(Schwarzer, 2008).

Obesity had the lowest number of subtopics in our study. Three of the subtopics were related to other diseases such as diabetes (Tables 2 and 3). The scholarly literature has well documented the possible linkages between obesity and chronic diseases such as diabetes (Flegal et al., 2012) as supported by the study results. The topic of children is another prominent subtopic associated with obesity. There has been an increasing number of opinions in regard to child obesity and national health campaigns that have been developed to encourage physical activity among children (PLAY 60 Challenge, 2017). Alzheimer was also identified as a topic under obesity. Although considered a perplexing finding, recent studies have been conducted to identify possible correlation between obesity and Alzheimer’s disease (Verdile et al., 2015; Luchsinger et al., 2012; Kivipelto et al., 2005). Indeed, Twitter users have expressed opinions about the study of Alzheimer’s disease and the linkage between these two topics.

This paper addresses a need for clinical providers, public health experts, and social scientists to utilize a large conversational dataset to collect and utilize population level opinions and information needs. Although our framework is applied to Twitter, the applications from this study can be used in patient communication devices monitored by physicians or weight management interventions with social media accounts, and support large scale population-wide initiatives to promote healthy behaviors and preventative measures for diabetes, diet, exercise, and obesity.

This research has some limitations. First, our DDEO analysis does not take geographical location of the Twitter users into consideration and thus does not reveal if certain geographical differences exists. Second, we used a limited number of queries to select the initial pool of tweets, thus perhaps missing tweets that may have been relevant to DDEO but have used unusual terms referenced. Third, our analysis only included tweets generated in one month; however, as our previous work has demonstrated (Turner-McGrievy and Beets, 2015), public opinion can change during a year. Additionally, we did not track individuals across time to detect changes in common themes discussed. Our future research plans includes introducing a dynamic framework to collect and analyze DDEO related tweets during extended time periods (multiple months) and incorporating spatial analysis of DDEO-related tweets.

5 Conclusion

This study represents the first step in developing routine processes to collect, analyze, and interpret DDEO-related posts to social media around health-related topics and presents a transdisciplinary approach to analyzing public discussions around health topics. With 2.34 billion social media users in 2016, the ability to collect and synthesize social media data will continue to grow. Developing methods to make this process more streamlined and robust will allow for more rapid identification of public health trends in real time.

Note: Amir Karami will handle correspondence at all stages of refereeing and publication.

6 Conflict of interest

The authors state that they have no conflict of interest.

7 Acknowledgement

This research was partially supported by the first author’s startup research funding provided by the University of South Carolina, School of Library and Information Science. We thank Jill Chappell-Fail and Jeff Salter at the University of South Carolina College of Information and Communications for assistance with technical support.



  • Arnold and Speier (2012) Arnold, C., Speier, W., 2012. A topic model of clinical reports. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 1031–1032.
  • Arnold et al. (2010) Arnold, C. W., El-Saden, S. M., Bui, A. A., Taira, R., 2010. Clinical case-based retrieval using latent topic analysis. In: AMIA Annual Symposium Proceedings. Vol. 2010. American Medical Informatics Association, p. 26.
  • Asou and Eguchi (2008) Asou, T., Eguchi, K., 2008. Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation. In: Proceedings of the 2nd international workshop on Data and text mining in bioinformatics. ACM, pp. 77–80.
  • Association et al. (2004) Association, A. D., et al., 2004. Physical activity/exercise and diabetes. Diabetes care 27 (suppl 1), s58–s62.
  • Barnard et al. (2009) Barnard, N. D., Cohen, J., Jenkins, D. J., Turner-McGrievy, G., Gloede, L., Green, A., Ferdowsian, H., 2009. A low-fat vegan diet and a conventional diabetes diet in the treatment of type 2 diabetes: a randomized, controlled, 74-wk clinical trial. The American journal of clinical nutrition, ajcn–26736H.
  • Blei et al. (2003)

    Blei, D. M., Ng, A. Y., Jordan, M. I., 2003. Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022.

  • Bollen et al. (2011) Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. Journal of Computational Science 2 (1), 1–8.
  • Boyington et al. (2008) Boyington, J. E., Carter-Edwards, L., Piehl, M., Hutson, J., Langdon, D., McManus, S., 2008. Cultural attitudes toward weight, diet, and physical activity among overweight african american girls. Preventing chronic disease 5 (2).
  • Boyle et al. (2001) Boyle, J. P., Honeycutt, A. A., Narayan, K. V., Hoerger, T. J., Geiss, L. S., Chen, H., Thompson, T. J., 2001. Projection of diabetes burden through 2050. Diabetes care 24 (11), 1936–1940.
  • Buntain and Golbeck (2015) Buntain, C., Golbeck, J., 2015. This is your twitter on drugs: Any questions? In: Proceedings of the 24th International Conference on World Wide Web. ACM, pp. 777–782.
  • Coppersmith et al. (2015) Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., 2015. From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses. NAACL HLT 2015, 1.
  • Cotman and Berchtold (2002) Cotman, C. W., Berchtold, N. C., 2002. Exercise: a behavioral intervention to enhance brain health and plasticity. Trends in neurosciences 25 (6), 295–301.
  • Culotta (2010) Culotta, A., 2010. Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the first workshop on social media analytics. ACM, pp. 115–122.
  • Dahl et al. (2016) Dahl, A. A., Hales, S. B., Turner-McGrievy, G. M., 2016. Integrating social media into weight loss interventions. Current Opinion in Psychology 9, 11–15.
  • Eichstaedt et al. (2015) Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., et al., 2015. Psychological language on twitter predicts county-level heart disease mortality. Psychological science 26 (2), 159–169.
  • Flegal et al. (2012) Flegal, K. M., Carroll, M. D., Kit, B. K., Ogden, C. L., 2012. Prevalence of obesity and trends in the distribution of body mass index among us adults, 1999-2010. Jama 307 (5), 491–497.
  • Fu et al. (2016) Fu, K.-W., Liang, H., Saroha, N., Tse, Z. T. H., Ip, P., Fung, I. C.-H., 2016. How people react to zika virus outbreaks on twitter? a computational content analysis. American Journal of Infection Control.
  • Ghosh and Guha (2013) Ghosh, D., Guha, R., 2013. What are we tweeting about obesity? mapping tweets with topic modeling and geographic information system. Cartography and geographic information science 40 (2), 90–102.
  • Golder and Macy (2011) Golder, S. A., Macy, M. W., 2011. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333 (6051), 1878–1881.
  • Hanson et al. (2013) Hanson, C. L., Cannon, B., Burton, S., Giraud-Carrier, C., 2013. An exploration of social circles and prescription drug abuse through twitter. Journal of medical Internet research 15 (9), e189.
  • Harris et al. (2014) Harris, J. K., Moreland-Russell, S., Tabak, R. G., Ruhr, L. R., Maier, R. C., 2014. Communication about childhood obesity on twitter. American journal of public health 104 (7), e62–e69.
  • Harris et al. (2013) Harris, J. K., Mueller, N. L., Snider, D., Haire-Joshu, D., 2013. Peer reviewed: Local health department use of twitter to disseminate diabetes information, united states. Preventing chronic disease 10.
  • Hartz et al. (1983) Hartz, A. J., Rupley, D. C., Kalkhoff, R. D., Rimm, A. A., 1983. Relationship of obesity to diabetes: influence of obesity level and body fat distribution. Preventive medicine 12 (2), 351–357.
  • Harvard HSPH (2017) Harvard HSPH, 2017. Obesity Trends.
  • Hill et al. (2012) Hill, J. O., Wyatt, H. R., Peters, J. C., 2012. Energy balance and obesity. Circulation 126 (1), 126–132.
  • Hong and Davison (2010) Hong, L., Davison, B. D., 2010. Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics. ACM, pp. 80–88.
  • Karami (2015) Karami, A., 2015. Fuzzy topic modeling for medical corpora. Ph.D. thesis, University of Maryland, Baltimore County.
  • Karami and Gangopadhyay (2014) Karami, A., Gangopadhyay, A., 2014. Fftm: A fuzzy feature transformation method for medical documents. In: Proceedings of the Conference of the Association for Computational Linguistics (ACL). Vol. 128.
  • Karami et al. (2015a) Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H., 2015a. Flatm: A fuzzy logic approach topic model for medical documents. In: Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE.
  • Karami et al. (2015b) Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H., 2015b. A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In: Proceedings of the iConference.
  • Karami et al. (2017) Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H., 2017. Fuzzy approach topic discovery in health and medical corpora. International Journal of Fuzzy Systems, 1–12.
  • Karami and Zhou (2015) Karami, A., Zhou, B., 2015. Online review spam detection by new linguistic features. iConference 2015 Proceedings.
  • Karami and Zhou (2014a) Karami, A., Zhou, L., 2014a. Exploiting latent content based features for the detection of static sms spams. the 77th annual meeting of the association for information science and technology (ASIST).
  • Karami and Zhou (2014b) Karami, A., Zhou, L., 2014b. Improving static sms spam detection by using new content-based features. the 20th americas conference on information systems (AMCIS).
  • Kivipelto et al. (2005) Kivipelto, M., Ngandu, T., Fratiglioni, L., Viitanen, M., Kåreholt, I., Winblad, B., Helkala, E.-L., Tuomilehto, J., Soininen, H., Nissinen, A., 2005. Obesity and vascular risk factors at midlife and the risk of dementia and alzheimer disease. Archives of neurology 62 (10), 1556–1560.
  • Kopelman (2000) Kopelman, P. G., 2000. Obesity as a medical problem. Nature 404 (6778), 635–643.
  • Laaksonen et al. (2005) Laaksonen, D. E., Lindström, J., Lakka, T. A., Eriksson, J. G., Niskanen, L., Wikström, K., Aunola, S., Keinänen-Kiukaanniemi, S., Laakso, M., Valle, T. T., et al., 2005. Physical activity in the prevention of type 2 diabetes. Diabetes 54 (1), 158–165.
  • Lampos and Cristianini (2010) Lampos, V., Cristianini, N., 2010. Tracking the flu pandemic by monitoring the social web. In: 2010 2nd International Workshop on Cognitive Information Processing. IEEE, pp. 411–416.
  • Lampos and Cristianini (2012) Lampos, V., Cristianini, N., 2012. Nowcasting events from the social web with statistical learning. ACM Transactions on Intelligent Systems and Technology (TIST) 3 (4), 72.
  • Lampos et al. (2010) Lampos, V., De Bie, T., Cristianini, N., 2010. Flu detector-tracking epidemics on twitter. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 599–602.
  • Lazard et al. (2015) Lazard, A. J., Scheinfeld, E., Bernhardt, J. M., Wilcox, G. B., Suran, M., 2015. Detecting themes of public concern: A text mining analysis of the centers for disease control and prevention’s ebola live twitter chat. American journal of infection control 43 (10), 1109–1111.
  • Lu et al. (2011) Lu, Y., Mei, Q., Zhai, C., 2011. Investigating task performance of probabilistic topic models: an empirical study of plsa and lda. Information Retrieval 14 (2), 178–203.
  • Luchsinger et al. (2012) Luchsinger, J. A., Cheng, D., Tang, M. X., Schupf, N., Mayeux, R., 2012. Central obesity in the elderly is related to late onset alzheimer s disease. Alzheimer disease and associated disorders 26 (2), 101.
  • Mcauliffe and Blei (2008) Mcauliffe, J. D., Blei, D. M., 2008. Supervised topic models. In: Proceedings of the Advances in neural information processing systems. pp. 121–128.
  • McCallum (2002) McCallum, A. K., 2002. Mallet: A machine learning for language toolkit.
  • Mejova et al. (2015) Mejova, Y., Weber, I., Macy, M. W., 2015. Twitter: a digital socioscope. Cambridge University Press.
  • Nasukawa and Yi (2003)

    Nasukawa, T., Yi, J., 2003. Sentiment analysis: Capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on Knowledge capture. ACM, pp. 70–77.

  • Odlum and Yoon (2015) Odlum, M., Yoon, S., 2015. What can we learn about the ebola outbreak from tweets? American journal of infection control 43 (6), 563–571.
  • Olanoff (2015) Olanoff, D., 2015. Twitter Monthly Active Users Crawl To 316M, Dorsey: We Are Not Satisfied.
  • Paul and Dredze (2011) Paul, M. J., Dredze, M., 2011. You are what you tweet: Analyzing twitter for public health. ICWSM 20, 265–272.
  • Paul and Dredze (2012) Paul, M. J., Dredze, M., 2012. A model for mining public health topics from twitter. Health 11, 16–6.
  • PLAY 60 Challenge (2017) PLAY 60 Challenge, 2017. NFL.
  • Pokeman-Go Game (2017) Pokeman-Go Game, 2017. Niantic, Inc.
  • Prier et al. (2011) Prier, K. W., Smith, M. S., Giraud-Carrier, C., Hanson, C. L., 2011. Identifying health-related topics on twitter. In: International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, pp. 18–25.
  • Ritterman et al. (2009) Ritterman, J., Osborne, M., Klein, E., 2009. Using prediction markets and twitter to predict a swine flu pandemic. In: 1st international workshop on mining social media. Vol. 9. pp. 9–17.
  • Ross and Thomas (2010) Ross, A., Thomas, S., 2010. The health benefits of yoga and exercise: a review of comparison studies. The journal of alternative and complementary medicine 16 (1), 3–12.
  • Ross et al. (2000) Ross, R., Dagnone, D., Jones, P. J., Smith, H., Paddags, A., Hudson, R., Janssen, I., 2000. Reduction in obesity and related comorbid conditions after diet-induced weight loss or exercise-induced weight loss in mena randomized, controlled trial. Annals of internal medicine 133 (2), 92–103.
  • Scanfeld et al. (2010) Scanfeld, D., Scanfeld, V., Larson, E. L., 2010. Dissemination of health information through social networks: Twitter and antibiotics. American journal of infection control 38 (3), 182–188.
  • Schwarzer (2008) Schwarzer, R., 2008. Modeling health behavior change: How to predict and modify the adoption and maintenance of health behaviors. Applied Psychology 57 (1), 1–29.
  • Statista (2017) Statista, 2017. Number of social media users worldwide from 2010 to 2020.
  • Szomszor et al. (2010) Szomszor, M., Kostkova, P., De Quincey, E., 2010. # swineflu: Twitter predicts swine flu outbreak in 2009. In: International Conference on Electronic Healthcare. Springer, pp. 18–26.
  • Tompson et al. (2012) Tompson, T., Benz, J., Agiesta, J., Brewer, K., Bye, L., Reimer, R., Junius, D., 2012. Obesity in the united states: public perceptions. The food industry 53 (26), 21.
  • Tumasjan et al. (2010) Tumasjan, A., Sprenger, T. O., Sandner, P. G., Welpe, I. M., 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. In: ICWSM 10.
  • Turner-McGrievy and Beets (2015) Turner-McGrievy, G. M., Beets, M. W., 2015. Tweet for health: using an online social network to examine temporal trends in weight loss-related posts. Translational behavioral medicine 5 (2), 160–166.
  • Twitter (2017) Twitter, 2017. Twitter Developer Documentation.
  • Verdile et al. (2015) Verdile, G., Keane, K. N., Cruzat, V. F., Medic, S., Sabale, M., Rowles, J., Wijesekara, N., Martins, R. N., Fraser, P. E., Newsholme, P., 2015. Inflammation and oxidative stress: the molecular connectivity between insulin resistance, obesity, and alzheimer s disease. Mediators of inflammation 2015.
  • Vickey et al. (2013) Vickey, T. A., Ginis, K. M., Dabrowski, M., 2013. Twitter classification model: the abc of two million fitness tweets. Translational behavioral medicine 3 (3), 304–311.
  • Wallach et al. (2009) Wallach, H. M., Murray, I., Salakhutdinov, R., Mimno, D., 2009. Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, pp. 1105–1112.
  • Wartell (2015) Wartell, D., 2015. The geography of obesity: Predicting obesity rates in california based on access to health care.
  • Wiebe et al. (2003) Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., Fraser, B., Litman, D. J., Pierce, D. R., Riloff, E., Wilson, T., et al., 2003. Recognizing and organizing opinions expressed in the world press. In: New Directions in Question Answering. pp. 12–19.
  • Wing et al. (2001) Wing, R. R., Goldstein, M. G., Acton, K. J., Birch, L. L., Jakicic, J. M., Sallis, J. F., Smith-West, D., Jeffery, R. W., Surwit, R. S., 2001. Behavioral science research in diabetes lifestyle changes related to obesity, eating behavior, and physical activity. Diabetes care 24 (1), 117–123.
  • World Health Organization Fact Sheet (2016) World Health Organization Fact Sheet, 2016. Obesity and Overweight.
  • Zabin and Jefferies (2008) Zabin, J., Jefferies, A., 2008. Social media monitoring and analysis: Generating consumer insights from online conversation. Aberdeen Group Benchmark Report 37 (9).
  • Zhao et al. (2011) Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X., 2011. Comparing twitter and traditional media using topic models. In: European Conference on Information Retrieval. Springer, pp. 338–349.