COVID-19 and Mental Health/Substance Use Disorders on Reddit: A Longitudinal Study

by   Amanuel Alambo, et al.

COVID-19 pandemic has adversely and disproportionately impacted people suffering from mental health issues and substance use problems. This has been exacerbated by social isolation during the pandemic and the social stigma associated with mental health and substance use disorders, making people reluctant to share their struggles and seek help. Due to the anonymity and privacy they provide, social media emerged as a convenient medium for people to share their experiences about their day to day struggles. Reddit is a well-recognized social media platform that provides focused and structured forums called subreddits, that users subscribe to and discuss their experiences with others. Temporal assessment of the topical correlation between social media postings about mental health/substance use and postings about Coronavirus is crucial to better understand public sentiment on the pandemic and its evolving impact, especially related to vulnerable populations. In this study, we conduct a longitudinal topical analysis of postings between subreddits r/depression, r/Anxiety, r/SuicideWatch, and r/Coronavirus, and postings between subreddits r/opiates, r/OpiatesRecovery, r/addiction, and r/Coronavirus from January 2020 - October 2020. Our results show a high topical correlation between postings in r/depression and r/Coronavirus in September 2020. Further, the topical correlation between postings on substance use disorders and Coronavirus fluctuates, showing the highest correlation in August 2020. By monitoring these trends from platforms such as Reddit, epidemiologists, and mental health professionals can gain insights into the challenges faced by communities for targeted interventions.



There are no comments yet.


page 5


Temporal Mental Health Dynamics on Social Media

We describe a set of experiments for building a temporal mental health d...

Social Behavior and Mental Health: A Snapshot Survey under COVID-19 Pandemic

Online social media provides a channel for monitoring people's social be...

Effect of Social Media Use on Mental Health during Lockdown in India

This research paper studies about the role of social media use and incre...

Weakly-Supervised Methods for Suicide Risk Assessment: Role of Related Domains

Social media has become a valuable resource for the study of suicidal id...

Capturing social media expressions during the COVID-19 pandemic in Argentina and forecasting mental health and emotions

Purpose. We present an approach for forecasting mental health conditions...

Pandemic and disability: Challenges faced and role of technology

The pandemic has affected every facet of human life. Apart from individu...

Many Ways to be Lonely: Fine-grained Characterization of Loneliness and its Potential Changes in COVID-19

Loneliness has been associated with negative outcomes for physical and m...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The number of people suffering from mental health or substance use disorders has significantly increased during COVID-19 pandemic. 40% of adults in the United States have been identified suffering from disorders related to depression or drug abuse in June 2020 111 In addition to the uncertainty about the future during the pandemic, policies such as social isolation that are enacted to contain the spread of COVID-19 have brought additional physical and emotional stress on the public. During these unpredictable and hard times, those who misuse or abuse alcohol and/or other drugs can be vulnerable.

Due to the stigma surrounding mental health and substance use, people generally do not share their struggles with others and this is further aggravated by the lack of physical interactions during the pandemic. With most activities going online coupled with the privacy and anonymity they offer, social media platforms have become common for people to share their struggle with depression, anxiety, suicidal thoughts, and substance use disorders. Reddit is one of the widely used social media platforms that offers convenient access for users to engage in discussions with others on sensitive topics such as mental health or substance use. The forum-like structure of subreddits enables users to discuss a topic with specific focus with others, and seek advice without disclosing their identities.

We conduct an initial longitudinal study of the extent of topical overlap between user-generated content on mental health and substance use disorders with COVID-19 during the period from January 2020 until October 2020. For mental health, our study is focused on subreddits r/depression, r/Anxiety, and r/SuicideWatch. Similarly, for substance use, we use subreddits r/Opiates, r/OpiatesRecovery, and r/addiction. We use subreddit r/Coronavirus for extracting user postings on Coronavirus. To constrain our search for relevance, we collect postings in mental health/substance use subreddits that consist of at least one of the keywords in a Coronavirus dictionary. Similarly, to collect postings related to mental health/substance use in r/Coronavirus, we use the DSM-5 lexicon

[7], PHQ-9 lexicons [12], and Drug Abuse Ontology (DAO) [3] . We implement a topic modeling algorithm [2] for generating topics. Furthermore, we explore two variations of the Bidirectional Encoder Representations from Transformers (BERT) [5] model for representing the topics and computing topical correlation among different pairs of subreddits on Mental Health/Substance Use and r/Coronavirus. The topical correlations are computed for each of the months from January 2020 to October 2020.

The rest of the paper is organized as follows. Section 2 discusses the related work, followed by section 3 which presents the method we followed including data collection, linguistic analysis, and model building. Further, we present in section 4 that according to our analysis, there is high correlation between topics discussed in a mental health or substance use subreddit and topics discussed in a Coronavirus subreddit after June 2020 than during the first five months of the year 2020. Finally, section 5 concludes the paper by providing conclusion and future work.

2 Related Work

In the last few months, there has been a high number of cases and deaths related to COVID-19 which led governments to respond rapidly to the crisis [10]. Topic modeling of social media postings related to COVID-19 has been used to produce valuable information during the pandemic. While Yin et al. [13] studied trending topics, Medford et al. [8] studied the change in topics on Twitter during the pandemic. Stokes et al.[11] studied topic modeling of Reddit content and found it to be effective in identifying patterns of public dialogue to guide targeted interventions. Furthermore, there has been a growing amount of work on the relationship between mental health or substance use and COVID-19. While [6] conducted a study of the prevalence of depressive symptoms in US adults before and during the COVID-19 pandemic, [1] studied the level of susceptibility to stressors that might lead to mental disorder between people with existing conditions of anxiety disorder and the general population. [4] conducted an assessment of mental health, substance use and suicidal ideation using panel survey data collected from US adults in the month of June 2020. They observed that people with pre-existing conditions of mental disorder are more likely to be adversely affected by the different stressors during the COVID-19 pandemic. We propose an approach to study the relationship between topics discussed in mental health/substance use subreddits and coronavirus subreddit.

3 Methods

3.1 Data Collection

In this study, we crawl Reddit for user postings in subreddits r/depression, r/Anxiety, r/SuicideWatch, r/Opiates, r/OpiatesRecovery, r/addiction, and r/Coronavirus. To make our query focused so that relevant postings from each category of subreddits would be returned, we use mental health/substance lexicons while crawling for postings in subreddit r/Coronavirus; similarly, we use the glossary of terms in Coronavirus WebMD 222 to query for postings in the mental health/substance use subreddits. Table-1 shows the size of the data collected for each subreddit for three three-month to four-month periods.

3.2 Analysis

We build a corpus of user postings from January 2020 to October 2020 corresponding to each of the subreddits. For better interpretability during topic modeling, we generate bigrams and trigrams of a collection of postings for each month using gensim’s implementation of skip-gram model [9]. We then train an LDA topic model with the objective of maximizing the coherence scores over the collections of bigrams and trigrams. As we are interested in conducting topical correlation among topics in a mental health/substance use subreddit and r/Coronavirus, we use deep representation learning to represent a topic from its constituent keywords. We employ a transformer-based bidirectional language modeling where we use two models: 1) a language model that is pre-trained on a huge generic corpus; and 2) a language model which we tune on a domain-specific corpus. Thus, we experiment with two approaches:

  1. We use a vanilla BERT [5] model to represent each of the keywords in a topic. A topic is then represented as a concatenation of the representations of its keywords after which we perform dimensionality reduction to 300 units using t-SNE.

  2. We fine-tune a BERT model on a sequence classification task on our dataset where user postings from Mental health/Substance use subreddit or r/Coronavirus are labeled positive and postings from a control subreddit are labeled negative. For subreddits r/depression, r/Anxiety, and r/SuicideWatch, we fine-tune one BERT model which we call MH-BERT and for subreddits r/opiates, r/OpiatesRecovery, and r/addiction, we fine-tune a different BERT model and designate it as SU-BERT. We do the same for subreddit r/Coronavirus. Finally, the fine-tuned BERT model is used for topic representation.

Subreddit Search Keywords  Number of Postings
Coronavirus Opiates (DAO + DSM-5) 5934 4848 2146
OpiatesRecovery (DSM-5) 2167 1502 400
addiction (DSM-5) 154 204 30
Anxiety (DSM-5) 6690 750 214
Depression (PHQ-9) 432 366 130
Suicide Lexicon 596 588 234
opiates Coronavirus Glossary of terms 534 823 639
OpiatesRecovery 226 540 514
addiction 192 794 772
anxiety 2582 2862 5478
depression 4128 8524 17250
suicide 944 1426 814
Table 1: Dataset size in terms of number of postings used for this study.

Once topics are represented using a vanilla BERT or MH-BERT/SU-BERT embedding, we compute inter-topic similarities among topics in an MH/SU subreddit with subreddit r/Coronavirus for each of the months from January 2020 to October 2020.

4 Results and Discussion

We report our findings using vanilla BERT and a fine-tuned BERT model used for topic representation. Figure-1 and Figure-2 show the topical correlation results using vanilla BERT and a fine-tuned BERT model. We can see from the figures that there is a significant topical correlation between postings in a subreddit on mental health and postings in r/Coronavirus during the period from May 2020 - Sep 2020 with each of the subreddits corresponding to a mental health disorder showing their peaks at different months. For substance use, we see higher topical correlation during the period after the month of June 2020. While the results using a fine-tuned BERT model show similar trends as vanilla BERT, they give higher values for the topical correlation scores. We present a pair of groups of topics that have low topical correlation and another pair with high topical correlation. To illustrate low topical correlation, we show the topics generated for r/OpiatesRecovery and r/Coronavirus during APR - JUN (Table-2). For high topical correlation, we show topics in r/Suicidewatch and r/Coronavirus for the period JUN - AUG (Table-3).

Figure 1: Temporal Topical Correlation using vanilla BERT-based topical representations.
Figure 2: Temporal Topical Correlation using fine-tuned BERT-based representations.
Figure 1: Temporal Topical Correlation using vanilla BERT-based topical representations.

From Figure-1 and Figure-2, we see Coronavirus vs depression has highest topical correlation in September followed by May. On the other hand, we see the fine-tuned BERT model give bigger absolute topical correlation scores than vanilla BERT albeit the topics and keywords are the same in either of the representation techniques; i.e., the same keywords in a topic render different representations using vanilla BERT and fine-tuned BERT models. The different representations of the keywords and, hence the topics yield different topical correlation scores as seen in Figure-1 and Figure-2.

Subreddit Topic-1 Keywords Topic-2 Keywords
(‘practice_social’, 0.34389842),
(‘sense_purpose’, 0.0046864394),
(‘shift_hope’, 0.0046864394),
(‘quarantine_guess’, 0.0046864394),
(‘real_mess’, 0.0046864394),
(‘relate_effect’, 0.0046864394),
(‘return_work’, 0.0046864394),
(‘rule_weekly’, 0.0046864394),
(‘pray_nation’, 0.0046864394),
(‘severe_morning’, 0.0046864394)
(‘life_relation’, 0.03288892),
(‘nonperishable_normal’, 0.03288892),
(‘great_worried’, 0.03288892),
(‘head_post’, 0.03288892),
(‘hide_work’, 0.03288892),
(‘high_relapse’, 0.03288892),
(‘kid_spend’, 0.03288892),
(‘want_express’, 0.03288892),
(‘covid_hear’, 0.03288892),
(‘live_case’, 0.03288892)
(‘hong_kong’, 0.13105245),
(‘confirmed_case’, 0.060333144),
(‘discharge_hospital’, 0.025191015),
(‘fully_recovere’, 0.020352725),
(‘interest_rate’, 0.017662792),
(‘confuse_percentage’, 0.016409962),
(‘compare_decrease’, 0.016409962),
(‘day_difference’, 0.016409962),
(‘people_observation’, 0.014938728),
(‘yesterday_update’, 0.0142750405)
(‘https_reddit’, 0.119571894),
(‘recovered_patient’, 0.061810568),
(‘mortality_rate’, 0.041617688),
(‘total_confirm_total’, 0.029896544),
(‘reddit_https’, 0.029568143),
(‘test_positive’, 0.02851962),
(‘total_confirm’, 0.026607841),
(‘disease_compare_transmission’, 0.024460778),
(‘tested_positive’, 0.019345615),
(‘people_list_condition’, 0.017226003)
Table 3: Top two topics for subreddit pair with high topical correlation.
Subreddit Topic-1 Keywords Topic-2 Keywords
(‘lose_mind’, 0.07695319),
(‘commit_suicide’, 0.06495),
(‘hate_life’, 0.04601657),
(‘stream_digital_art’, 0.04184869),
(‘suicide_attempt’, 0.0353566),
(‘social_distance’, 0.033066332),
(‘tired_tired’, 0.03140721),
(‘depression_anxiety’, 0.029040402),
(‘online_classe’, 0.022438377),
(‘hurt_badly’, 0.022400128)
(‘lose_job’, 0.07087262),
(‘suicidal_thought’, 0.055275694),
(‘leave_house’, 0.052148584),
(‘alot_people’, 0.0401444),
(‘fear_anxiety’, 0.029107107),
(‘push_edge’, 0.0288938),
(‘spend_night’, 0.027937064),
(‘anxiety_depression’, 0.027174871),
(‘couple_day’, 0.026346965),
(‘suicide_method’, 0.026167406)
(‘wear_mask’, 0.069305405),
(‘infection_rate’, 0.03957882),
(‘coronavirus_fear’, 0.028482975),
(‘health_official’, 0.027547654),
(‘middle_age’, 0.02721413),
(‘coronavirus_death’, 0.024511503),
(‘suicide_thought’, 0.023732582),
(‘immune_system’, 0.021382293),
(‘case_fatality_rate’, 0.020946493),
(‘panic_buying’, 0.02040879)
(‘priority_medical_treatment’, 0.033756804),
(‘coronavirus_crisis_worry’, 0.0295495),
(‘depress_lonely’, 0.028835943),
(‘essential_business’, 0.027301027),
(‘fear_anxiety’, 0.02715925),
(‘death_coronavirus’, 0.026890624),
(‘adjustment_reaction’, 0.026794448),
(‘die_coronavirus_fear’, 0.026734803),
(‘declare_state_emergency’, 0.026288562),
(‘jump_gun’, 0.025342517)
Table 2: Top two topics for subreddit pair with low topical correlation

The reason we generally see higher topical correlation scores with a fine-tuned BERT based representation is because a fine-tuned BERT has a smaller semantic space than a vanilla BERT model leading to keywords across different topics to have smaller semantic distance. According to our analysis, high topical overlap implies close connection and mutual impact between postings in one subreddit and postings in another subreddit.

5 Conclusion and Future Work

In this study, we conducted a longitudinal study of the topical correlation between social media postings in mental health or substance use subreddits and a Coronavirus subreddit. Our analysis reveals that the period including and following Summer 2020 shows higher correlation among topics discussed by users in a mental health or substance use groups to those in r/Coronavirus. Our analysis can give insight into how the sentiment of social media users in one group can influence or be influenced by users in another group. This enables to capture and understand the impact of topics discussed in r/Coronavirus on other subreddits over a course of time. In the future, we plan to investigate user level and posting level features to further study how the collective sentiment of users in one subreddit relate to another subreddit. Our study can provide insight into how discussion of mental health/substance use and the Coronavirus pandemic relate to one another over a period of time for epidemiological intervention.


  • [1] Asmundson, G.J., Paluszek, M.M., Landry, C.A., Rachor, G.S., McKay, D., Taylor, S.: Do pre-existing anxiety-related and mood disorders differentially impact covid-19 stress responses and coping? Journal of anxiety disorders 74, 102271 (2020)
  • [2]

    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine Learning research

    3(Jan), 993–1022 (2003)
  • [3] Cameron, D., Smith, G.A., Daniulaityte, R., Sheth, A.P., Dave, D., Chen, L., Anand, G., Carlson, R., Watkins, K.Z., Falck, R.: Predose: a semantic web platform for drug abuse epidemiology using social media. Journal of biomedical informatics 46(6), 985–997 (2013)
  • [4] Czeisler, M.É., Lane, R.I., Petrosky, E., Wiley, J.F., Christensen, A., Njai, R., Weaver, M.D., Robbins, R., Facer-Childs, E.R., Barger, L.K., et al.: Mental health, substance use, and suicidal ideation during the covid-19 pandemic—united states, june 24–30, 2020. Morbidity and Mortality Weekly Report 69(32),  1049 (2020)
  • [5] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  • [6] Ettman, C.K., Abdalla, S.M., Cohen, G.H., Sampson, L., Vivier, P.M., Galea, S.: Prevalence of depression symptoms in us adults before and during the covid-19 pandemic. JAMA network open 3(9), e2019686–e2019686 (2020)
  • [7] Gaur, M., Kursuncu, U., Alambo, A., Sheth, A., Daniulaityte, R., Thirunarayan, K., Pathak, J.: ” let me tell you about your mental health!” contextualized classification of reddit posts to dsm-5 for web-based intervention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. pp. 753–762 (2018)
  • [8] Medford, R.J., Saleh, S.N., Sumarsono, A., Perl, T.M., Lehmann, C.U.: An “infodemic”: Leveraging High-Volume twitter data to understand public sentiment for the COVID-19 outbreak
  • [9]

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013)

  • [10] Sands, P., Mundaca-Shah, C., Dzau, V.J.: The neglected dimension of global security—a framework for countering infectious-disease crises. New England Journal of Medicine 374(13), 1281–1287 (2016)
  • [11] Stokes, D.C., Andy, A., Guntuku, S.C., Ungar, L.H., Merchant, R.M.: Public priorities and concerns regarding covid-19 in an online discussion forum: Longitudinal topic modeling. Journal of general internal medicine p. 1
  • [12] Yazdavar, A.H., Al-Olimat, H.S., Ebrahimi, M., Bajaj, G., Banerjee, T., Thirunarayan, K., Pathak, J., Sheth, A.: Semi-supervised approach to monitoring clinical depressive symptoms in social media. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. pp. 1191–1198 (2017)
  • [13] Yin, H., Yang, S., Li, J.: Detecting topic and sentiment dynamics due to COVID-19 pandemic using social media (Jul 2020)