A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media

by   Ana-Maria Bucur, et al.
Universitatea din Bucuresti

In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more about their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse. We use the publicly available depression dataset from the Early Risk Prediction on the Internet Workshop (eRisk) 2018 and extract part-of-speech features and several indices based on them. Our results reveal statistically significant differences between the depressed and non-depressed individuals confirming findings from the existing psychology literature. Our work provides insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.



There are no comments yet.


page 1

page 2

page 3

page 4


Mental Disorders on Online Social Media Through the Lens of Language and Behaviour: Analysis and Visualisation

Due to the worldwide accessibility to the Internet along with the contin...

Exploratory Study of Young Children's Social Media Needs and Requirements

As social media are becoming increasingly popular among young children, ...

Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression

In this work, we explore the relationship between depression and manifes...

Vulnerable to Misinformation? Verifi!

We present Verifi2, a visual analytic system to support the investigatio...

Early Risk Detection of Pathological Gambling, Self-Harm and Depression Using BERT

Early risk detection of mental illnesses has a massive positive impact u...

Lifelong Learning of Hate Speech Classification on Social Media

Existing work on automated hate speech classification assumes that the d...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mental health disorders are a common problem in our world. Currently, mental health issues are on the rise: there is a 13% increase in the past decade according to World Health Organization (WHO)111https://www.who.int, with depression being at the forefront. Many mental illnesses remain undiagnosed due to social stigma, leading people to live 1 in 5 years of disability in their lifetime.

With the rise of social media websites, interdisciplinary researchers in natural language processing, psychology and network analysis have turned their attention to automatically detect and monitor mental health manifestations through users’ individual activity on social platforms (e.g. Facebook, Twitter, Reddit). The research is primarily focused on analyzing users’ texts from posts and comments and determining, through computational linguistics models, the risk for various mental conditions - self-harm, depression, addictions etc.

Research is fulled through curated datasets Yates et al. (2017); Losada and Crestani (2016); Amir et al. (2019) with texts from primarily Reddit and Twitter. At the forefront of incentivising interdisciplinary research on monitoring mental health on social media are workshops such as the Early risk prediction on the Internet (eRisk) Workshop222https://erisk.irlab.org/ and the Workshop on Computational Linguistics and Clinical Psychology (CLPsych)333https://clpsych.org/.

CLPsych and eRisk are two significant initiatives focusing on the interdisciplinary research area between computational linguistics and psychology. The eRisk Workshop, from the Conference and Labs of the Evaluation Forum (CLEF), focuses on the technologies that can be used for early risk detection of different pathologies or safety threats Losada et al. . In five years of existence, the workshop addressed multiple mental health problems: pathological gambling, depression, self-harm and anorexia.

The CLPsych Workshop was co-located with several international conferences on natural languages processing, the last edition was co-located with the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Throughout the seven editions of this workshop, it hosted shared tasks on depression, post-traumatic stress disorder (PTSD) Coppersmith et al. (2015), labeling crisis posts from the peer-support forum ReachOut444https://au.reachout.com/ Milne et al. (2016), predicting current and future psychological health from childhood essays Lynn et al. (2018), the degree of suicide risk (no risk, low, moderate, or severe risk) Zirikly et al. (2019) and suicide risk prediction from real data donated through OurDataHelps555https://ourdatahelps.org/ Goharian et al. (2021).

In the present study, we perform a part-of-speech analysis and contribute to the understanding of the differences in social media discourse between depressed and non-depressed individuals. We focus on the differences in part-of-speech use and ground them in the existing literature from psychology researchers. We aim to answer the following two research questions:


Are there significant differences in part-of-speech use between individuals with self-reported depression diagnosis and control?


Can these part-of-speech features be used alone to differentiate individuals with depression from controls?

2 Related Work

Detecting the manifestations of mental health disorders from social media is an interdisciplinary research problem for researchers from psychology, natural language processing and network analysis. The two main approaches used to detect cues of depression from online users are: extracting linguistic features for a quantitative analysis or using automated models for classification.

The differences in language between depressed and non-depressed individuals focus on greater use of negative words, the personal pronoun ”I” Rude et al. (2004), more words with negative polarity, frequent dichotomous expressions (e.g. always, never) Fekete (2002), cues of rumination (reflected in greater use of past tense verbs) Smirnova et al. (2018) in texts from depressed individuals.

Most linguistic features are extracted using the Linguistic inquiry and word count (LIWC) lexicon

Pennebaker et al. (2001). LIWC provide a list of dictionary words for more than seventy categories: part-of-speech (e.g. personal pronouns, first-person personal pronouns, nouns, present/future/past verbs, adjectives), psychological processes (e.g. social, affective, cognitive processes), personal concerns (e.g. work, money, religion, death), etc. It is used to detect cues of depression Loveys et al. (2018); Nalabandian and Ireland (2019); Eichstaedt et al. (2018), neuroticism Resnik et al. (2013), to explore the language of suicide poets Stirman and Pennebaker (2001), etc.

Other approaches to mental illnesses detection from text rely on character and word n-grams

Coppersmith et al. (2015); Pedersen (2015) or topic modelling techniques Preoţiuc-Pietro et al. (2015); Bucur and Dinu (2020).

Recent studies analyzing the online discourse of social media users with depression have focused on other particularities of language, such as offensive language. Birnbaum et al. (2020) found that depressed individuals use more swear words in their Facebook messages compared to controls. Bucur et al. (2021) apply offensive language identification techniques and show that users with depression diagnosis use more offensive language in their Reddit posts, individuals manifesting signs of depression in their posts having a more profane language and fewer insults targeted towards other individuals or groups.

Computational models used to detect the cues of depression from social media texts rely on traditional machine learning classifiers (e.g. SVM, Naïve Bayes)

De Choudhury et al. (2013); Aldarwish and Ahmad (2017); Tadesse et al. (2019), CNNs Orabi et al. (2018); Yates et al. (2017), RNNs Orabi et al. (2018) or transformer models Martínez-Castaño et al. (2020); Uban and Rosso (2020); Bucur et al. (2021).

In the multimodal framework (involving text, voice and visual cues), the use of syntactic features (e.g. pronouns, adverbs usage) seems to improve the performance in depression detection, further emphasizing the relationship between linguistic features and depression Morales et al. (2018).

Recently, researchers began exploring the interpretability and explainability of the decisions made by automatic classifiers for mental illnesses detection from social media to further understand the manifestations of different mental disorders in written language Uban et al. (2021a, b).

3 Data

In our experiments, we use the dataset from the eRisk workshop containing posts written in English from the social media platform Reddit.

The eRisk 2018 dataset Losada and Crestani (2016) was created for the early detection of depression task. It contains two classes of users, depression and control. Users from the depression class were annotated by their mention of diagnosis in their posts (e.g. ”I was diagnosed with depression”), but expressions such as ”I have depression”, ”I am depressed” were not taken into account. The authors removed the mentions of diagnosis from the dataset. Users from the control group are random individuals who do not have any mention of diagnosis in their post, including those active in the depression subreddit666https://www.reddit.com/r/depression/. The training dataset provided by the organizers contains 135 depressed users and 752 controls, while the test dataset contains 79 depressed users and 741 controls. We use both train and test splits in our exploration, consisting of a total of approximately 90,000 submissions from users with a depression diagnosis and over 985,000 in the control group.

4 Methods

Part-of-Speech Analysis

For each post in our dataset, we use the spaCy777https://spacy.io/ part-of-speech tagger to extract the corresponding tags and also morphological features (e.g. person and number for pronouns) for each word. We extract the universal POS tags888https://universaldependencies.org/docs/u/pos/ and the ones from The Penn Treebank tagset Taylor et al. (2003). We use the latter to explore the differences in the verb tenses. We assign the tenses according to their tags: VBD and VBN corresponding to present tense, VBG, VBZ and VBP corresponding to present tense, and MD tag before a VB corresponds to the future tense Caragea et al. (2018). From the morphological features provided by the spaCy tagger, we extract the person and the number for all the pronouns. After this analysis, we use the following features in our exploration:

Universal Part-of-Speech:


Verb tenses:

Past, present, future

Person of pronouns:

First, second and third-person

Pronoun number:

Singular and plural, only for the first-person

For each of these features, we compute their frequency for each post in the dataset. For the universal POS, the frequency is computed as the number of occurrences of a specific tag normalized by the total number of tags in a post. For verb tenses, the frequency of each tense is calculated as a percentage of the total number of verb occurrences. For the three kinds of personal pronouns, each frequency is computed as a percentage of the number of all personal pronouns. The frequency of singular and plural first-person pronouns is computed as a percentage of all first-person pronouns.

To further explore the part-of-speech usage in the social media dataset, we also use some special measurements Havigerová et al. (2019):

Pronominalisation Index (PI):

reflecting the usage of pronouns, instead of another part-of-speech (e.g. nouns). It is computed as the number of pronouns divided by the number of nouns Litvinova et al. (2016).

Formality Metric

Mairesse et al. (2007):

Moreover, we test the discriminatory power of both POS tags frequency usage in users’ texts and the specific computed indices. For this, we employ a Random Forest classifier on the train set of the eRisk 2018 dataset on the aforementioned features. To interpret the trained model and to estimate the importance of each feature, we employ SHapley Additive exPlanations (SHAP)

Lundberg and Lee (2017) to measure each feature’s contribution to the classifier decision. SHAP offers a game-theoretic approach to quantify feature importance, aligned with human intuition.


We opted for a simple Random Forest model, trained with 50 estimators and a max depth of 15, to avoid overfitting, with balanced class weights, since the dataset is heavily imbalanced. On the test set for the eRisk 2018 dataset, we obtain a weighted F-score of 78.37% (with balanced class weights) and a macro F-score of 51.93%. While the classification task is difficult, we are interested in exploring the feature importances of the model, which shed light on the model behaviour and provide us with insights regarding which POS tag is most discriminatory.

We further present our findings and provide interpretations and discussions based on recent findings in psychology.

5 Results and Discussion

Addressing our RQ1

, we compute the Welch t-test for all our features and demonstrate that there are statistically significant differences (

-value 0.001) in part-of-speech usage between depressed and non-depressed individuals. In this section, we present these differences and their interpretation from the psychology research.

Figure 1: Frequency of content part-of-speech

Content Part-of-speech

In Figure 1, we present the usage of content words for the two classes from the eRisk 2018 dataset. Individuals diagnosed with depression tend to use fewer common and proper nouns in comparison with control users. They also use more verbs and adverbs in their posts. The discourse is focused around actions, but with fewer entities (e.g. nouns), showing a defective linguistic structure with less interest in the environment (e.g. people, objects) De Choudhury et al. (2016).

To further understand these differences, we pay a closer look at the frequencies of nouns and verbs in the social media discourse. We compute the keyness score Kilgarriff (2009); Gabrielatos (2018) for verbs and nouns separately. In the keyness analysis, we compare the frequencies of nouns and verbs from the posts written by individuals with depression diagnosis (target corpus) in comparison to the posts from control users (reference corpus). In Figure 2, we present the top 20 verbs and nouns from each corpus, ordered by their log-likelihood ratio (G) Dunning (1993).

Figure 2: Most frequent nouns and verbs in each class

Rumination is a cognitive process focusing on past and present negative content and resulting in emotional distress Sansone and Sansone (2012). It is present in several mental health disorders (e.g. depression, anxiety, obsessive-compulsive disorder, post-traumatic stress disorder). In depression, rumination is defined as behaviours and thoughts that focus one’s attention on one’s depressive symptoms and on the implications of these symptoms Nolen-Hoeksema (1991). The rumination, as a response to depression, focuses the person’s attention on their emotional state and inhibits the actions necessary to distract them from their mood. In Figure 3, when comparing the usage of the three verb tenses (present, past and future) between the two groups, we expected that signs of rumination would be present in our analysis of verb tenses, with texts from depressed users being shifted into the past Smirnova et al. (2018), but this result is not found in this sample of individuals.

Figure 3: Frequency of verb tenses

Regarding the usage of future tense, depressed people have a lower frequency of verbs portraying future actions. This result may be a consequence of anhedonia, people suffering from depression reporting lower anticipatory pleasure, and thus talking online less about their future plans. Anhedonia, defined as markedly diminished interest or pleasure in all, or almost all activities most of the day, nearly every day Association and others (2013), is a common symptom of depression.

The higher frequency of cognitive verbs (e.g. feel, think, know) in the texts written by depressed individuals indicates the cognitive impairments and judgement issues specific to depression. People with depression have cognitive deficits and biases in the processing of emotional information and they are unable to adaptively regulate their emotions De Choudhury and De (2014). Individuals with or without depression may not differ in their initial response to an adverse event. Still, they differ in their ability to recover once they have experienced the negative emotion. Depressed individuals are not able to repair their mood. Instead, they remain in a negative state of mind, which is related to increased negative thoughts, selective attention to negative stimuli and greater accessibility of negative recollections Joormann (2010). In comparison, the individuals from the control group use more action verbs (e.g. vote, lead, show, begin, create. In addition, depressed individuals are more passive; they have a lower level of general activity, consistent with symptoms of depressive disorder Hopko et al. (2003).

Being an online social media platform similar to forums, Reddit is organized in subreddits with specific topics. It also has several communities dedicated to mental health problems. Compared to other social media platforms that require real-name authentication (e.g. Facebook), Reddit affords users anonymity or pseudo-anonymity. Complete anonymity is almost impossible, users providing bits of information with every interaction on the platform (e.g. comments, posts). Reddit allows users to create throwaway accounts to engage temporary without revealing their identity Kilgo et al. (2018). These types of accounts are used to discuss sensitive information or stigmatizing problems. De Choudhury and De (2014) study the mental health discourse on Reddit and show that its communities allow a high degree of information exchange related to mental health. Users use Reddit to self-disclose the challenges faced in their daily lives or in personal relationships. They also seek emotional support or specific information about mental illnesses diagnosis and treatment. Their study demonstrates that Reddit fills the gap between social media platforms like Twitter and Facebook and online health forums regarding mental health discourse.

Examining the frequencies of nouns in the eRisk 2018 dataset, we show in Figure 2 that the users with self-reported depression diagnosis use their Reddit account to disclose and discuss their mental health problems (e.g. depression, anxiety, therapist) or their personal relationships (e.g. friend, boyfriend, relationship, mom, dad). The process of seeking online support is also shown in the frequency of verbs: feel, talk, diagnose, help.

Even if the dataset contains control users active in the depression subreddit, the majority of control users seem to post on other themed subreddits (e.g. politics). Bucur and Dinu (2020) perform a topic modelling analysis and show that texts from control users are found in topics related to their hobbies, as opposed to depressed people, who are more focused on their feelings and life events. Our results are in line with this study, the users from the control group use more politics-related words (e.g. trump, government, president, news, vote).

Figure 4: Frequency of functional part-of-speech

Functional part-of-speech

In Figure 4

, we present the frequencies of functional part-of-speech for depressed and control users. Depressed individuals generally use fewer functional words in their texts in comparison with control users, with the exception of pronouns. Neurons involved in content words processing are equally distributed over both hemispheres, while function words are processed mainly in the left hemisphere

Pulvermüller et al. (1995). Lower preposition usage may be an outcome of deficient activation of the left hemisphere regions, responsible for producing more abstract lexical units Litvinova et al. (2016). Function words are highly social, the capacity to use function words requires social skills Pennebaker (2017).

Figure 5: Frequency of personal pronouns

The Figure 5 shows the differences in personal pronouns used by the two groups. The high frequency of first-person singular pronouns indicates a higher self-preoccupation in depressed individuals, as opposed to the first-person plural pronouns, which shows collective attention. Second and third-person pronouns indicate social interactivity and contain references to other people or things in the environment De Choudhury et al. (2016).

Depressed users use more first-person singular pronouns. The frequencies of first-person plural pronouns are inversely proportional to the first-person singular pronouns frequencies. This result is in line with the self-focused attention tendency (SFA) in depressed individuals. SFA is a cognitive bias linked to depression; the high frequency of first-person singular pronouns in speech or written text is considered a linguistic marker of SFA Brockmeyer et al. (2015). Individuals with depression have deficits in other-focused social cognitions, they are impaired in Theory of Mind reasoning and empathy. Theory of Mind enables people to make inferences on the behaviour of others and their own Premack and Woodruff (1978). Erle et al. (2019) show that individuals exhibiting high levels of depressive symptoms were impaired on tasks involving overcoming their egocentrism.

The usage of fewer first-person plural pronouns in texts from users diagnosed with depression may be a sign of a lesser sense of belonging. The information-processing biases of depressed individuals make it hard for them to perceive cues of acceptance and belonging in social interactions, and to view ambiguous social interactions as being negative Steger and Kashdan (2009).

Figure 6: Indices based on part-of-speech tags
Figure 7: Feature importances of the RandomForest model for depression classification using POS frequencies and POS indices. As shown in our analysis, the usage of pronouns (1st Person Singular/Plural) and proper nouns is the most discriminatory.

Pronominalisation and Formality Indices

In Figure 6 we present the results for the two metrics computed on part-of-speech tags. Depressed individuals have a higher pronominalisation index, using more pronouns in spite of nouns. This finding is also found in the language of people with self-destructive behaviour, insufficient activation of the cerebellum being associated with suicidal behaviour Litvinova et al. (2016). Depressive individuals also use less formal language. They have a more contextualized discourse on social media, providing information about the context in order to avoid ambiguity Heylighen and Dewaele (2002).

Addressing our RQ2, we show in Figure 7 the Shapley values for a random sample of 1500 posts from the eRisk 2018 test set. A higher Shapley value corresponds with a higher importance in the final model decision based on the feature value. We used all computed POS features and indices in our model, but show only the top 20 for clarity. It is evident that from the summary plot, the absence of proper nouns is the most discriminatory factor in the decision to classify a person as depressive. Moreover, the use of pronouns (also evident in the Pronominalisation Index) is highly correlated with positive model output. The high usage of first-person singular pronouns and low usage of first-person plural pronouns confirms both our findings in the exploratory analysis and psychology literature.

6 Conclusion

In this work, we provide an extensive analysis of part-of-speech usage in social media texts from depressed and non-depressed users. Our findings are confirmed by studies in psychology and show that individuals diagnosed with depression use more pronouns (especially first-person singular pronouns) and verbs, and fewer common and proper nouns. Their social media discourse revolves around their life experiences and sentiments, as opposed to control users who are not interested in discussing their problems online.

Moreover, we also provided insights into the discriminatory power of POS frequencies by employing SHAP, a game-theoretic approach for model interpretation. Through this, we showed that depressive users can be characterized most easily, primarily through their usage of pronouns and proper nouns.


  • M. M. Aldarwish and H. F. Ahmad (2017) Predicting depression levels using social media posts. In 2017 IEEE 13th international Symposium on Autonomous decentralized system (ISADS), pp. 277–280. Cited by: §2.
  • S. Amir, M. Dredze, and J. W. Ayers (2019) Mental health surveillance over social media with digital cohorts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, Minneapolis, Minnesota, pp. 114–120. External Links: Link, Document Cited by: §1.
  • A. P. Association et al. (2013) Diagnostic and statistical manual of mental disorders (dsm-5®). American Psychiatric Pub. Cited by: §5.
  • M. L. Birnbaum, R. Norel, A. Van Meter, A. F. Ali, E. Arenare, E. Eyigoz, C. Agurto, N. Germano, J. M. Kane, and G. A. Cecchi (2020) Identifying signals associated with psychiatric illness utilizing language and images posted to facebook. npj Schizophrenia 6 (1), pp. 1–10. Cited by: §2.
  • T. Brockmeyer, J. Zimmermann, D. Kulessa, M. Hautzinger, H. Bents, H. Friederich, W. Herzog, and M. Backenstrass (2015) Me, myself, and i: self-referent word use as an indicator of self-focused attention in relation to depression and anxiety. Frontiers in psychology 6, pp. 1564. Cited by: §5.
  • A. Bucur, A. Cosma, and L. P. Dinu (2021) Early risk detection of pathological gambling, self-harm and depression using bert. CLEF (Working Notes). Cited by: §2.
  • A. Bucur and L. P. Dinu (2020) Detecting early onset of depression from social media text using learned confidence scores. Proceedings of the Seventh Italian Conference on Computational Linguistics. Cited by: §2, §5.
  • A. Bucur, M. Zampieri, and L. P. Dinu (2021) An exploratory analysis of the relation between offensive language and mental health. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, pp. 3600–3606. External Links: Link, Document Cited by: §2.
  • C. Caragea, L. P. Dinu, and B. Dumitru (2018)

    Exploring optimism and pessimism in twitter using deep learning

    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 652–658. Cited by: §4.
  • G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead, and M. Mitchell (2015) CLPsych 2015 shared task: depression and ptsd on twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pp. 31–39. Cited by: §1.
  • G. Coppersmith, M. Dredze, C. Harman, and K. Hollingshead (2015) From ADHD to SAD: analyzing the language of mental health on Twitter through self-reported diagnoses. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, Colorado, pp. 1–10. External Links: Link, Document Cited by: §2.
  • M. De Choudhury, S. Counts, and E. Horvitz (2013) Social media as a measurement tool of depression in populations. In Proceedings of the 5th annual ACM web science conference, pp. 47–56. Cited by: §2.
  • M. De Choudhury and S. De (2014) Mental health discourse on reddit: self-disclosure, social support, and anonymity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 8. Cited by: §5, §5.
  • M. De Choudhury, E. Kiciman, M. Dredze, G. Coppersmith, and M. Kumar (2016) Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems, pp. 2098–2110. Cited by: §5, §5.
  • T. E. Dunning (1993) Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19 (1), pp. 61–74. Cited by: §5.
  • J. C. Eichstaedt, R. J. Smith, R. M. Merchant, L. H. Ungar, P. Crutchley, D. Preoţiuc-Pietro, D. A. Asch, and H. A. Schwartz (2018) Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences 115 (44), pp. 11203–11208. External Links: Document, ISSN 0027-8424, Link, https://www.pnas.org/content/115/44/11203.full.pdf Cited by: §2.
  • T. M. Erle, N. Barth, and S. Topolinski (2019) Egocentrism in sub-clinical depression. Cognition and Emotion 33 (6), pp. 1239–1248. Cited by: §5.
  • S. Fekete (2002) The internet - a new source of data on suicide, depression and anxiety: a preliminary study. Archives of Suicide Research 6 (4), pp. 351–361. External Links: Document, Link, https://doi.org/10.1080/13811110214533 Cited by: §2.
  • C. Gabrielatos (2018) Keyness analysis. Corpus approaches to discourse: A critical review, pp. 225–258. Cited by: §5.
  • N. Goharian, P. Resnik, A. Yates, M. Ireland, K. Niederhoffer, and R. Resnik (Eds.) (2021) Proceedings of the seventh workshop on computational linguistics and clinical psychology: improving access. Association for Computational Linguistics, Online. External Links: Link Cited by: §1.
  • J. M. Havigerová, J. Haviger, D. Kučera, and P. Hoffmannová (2019) Text-based detection of the risk of depression. Frontiers in psychology 10, pp. 513. Cited by: §4.
  • F. Heylighen and J. Dewaele (2002) Variation in the contextuality of language: an empirical measure. Foundations of Science 7, pp. 293–340. External Links: Document Cited by: §5.
  • D. R. Hopko, M. E. Armento, M. S. Cantu, L. L. Chambers, and C. Lejuez (2003) The use of daily diaries to assess the relations among mood state, overt behavior, and reward value of activities. Behaviour research and therapy 41 (10), pp. 1137–1148. Cited by: §5.
  • J. Joormann (2010) Cognitive inhibition and emotion regulation in depression. Current Directions in Psychological Science 19 (3), pp. 161–166. Cited by: §5.
  • A. Kilgarriff (2009) Simple maths for keywords. In Proceedings of the CL, Cited by: §5.
  • D. K. Kilgo, Y. M. M. Ng, M. J. Riedl, and I. Lacasa-Mas (2018) Reddit’s veil of anonymity: predictors of engagement and participation in media environments with hostile reputations. Social Media+ Society 4 (4), pp. 2056305118810216. Cited by: §5.
  • T. Litvinova, O. Zagorovskaya, O. Litvinova, and P. Seredin (2016) Profiling a set of personality traits of a text’s author: a corpus-based approach. In International Conference on Speech and Computer, pp. 555–562. Cited by: §4, §5, §5.
  • D. Losada and F. Crestani (2016) A test collection for research on depression and language use. In Proc. of Experimental IR Meets Multilinguality, Multimodality, and Interaction, 7th International Conference of the CLEF Association, CLEF 2016, Evora, Portugal, pp. 28–39. Cited by: §1, §3.
  • [29] D. E. Losada, F. Crestani, and J. Parapar Overview of erisk at clef 2020: early risk prediction on the internet (extended overview). Cited by: §1.
  • K. Loveys, J. Torrez, A. Fine, G. Moriarty, and G. Coppersmith (2018) Cross-cultural differences in language markers of depression online. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, New Orleans, LA, pp. 78–87. External Links: Link, Document Cited by: §2.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp. . External Links: Link Cited by: §4.
  • V. Lynn, A. Goodman, K. Niederhoffer, K. Loveys, P. Resnik, and H. A. Schwartz (2018) CLPsych 2018 shared task: predicting current and future psychological health from childhood essays. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 37–46. Cited by: §1.
  • F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore (2007) Using linguistic cues for the automatic recognition of personality in conversation and text.

    Journal of artificial intelligence research

    30, pp. 457–500.
    Cited by: §4.
  • R. Martínez-Castaño, A. Htait, L. Azzopardi, and Y. Moshfeghi (2020) Early risk detection of self-harm and depression severity using bert-based transformers: ilab at clef erisk 2020. Early Risk Prediction on the Internet. Cited by: §2.
  • D. N. Milne, G. Pink, B. Hachey, and R. A. Calvo (2016) Clpsych 2016 shared task: triaging content in online peer-support forums. In Proceedings of the third workshop on computational linguistics and clinical psychology, pp. 118–127. Cited by: §1.
  • M. Morales, S. Scherer, and R. Levitan (2018) A linguistically-informed fusion approach for multimodal depression detection. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, New Orleans, LA, pp. 13–24. External Links: Link, Document Cited by: §2.
  • T. Nalabandian and M. Ireland (2019) Depressed individuals use negative self-focused language when recalling recent interactions with close romantic partners but not family or friends. pp. 62–73. External Links: Document Cited by: §2.
  • S. Nolen-Hoeksema (1991) Responses to depression and their effects on the duration of depressive episodes.. Journal of abnormal psychology 100 (4), pp. 569. Cited by: §5.
  • A. H. Orabi, P. Buddhitha, M. H. Orabi, and D. Inkpen (2018) Deep learning for depression detection of twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 88–97. Cited by: §2.
  • T. Pedersen (2015) Screening Twitter users for depression and PTSD with lexical decision lists. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, Colorado, pp. 46–53. External Links: Link, Document Cited by: §2.
  • J. W. Pennebaker, M. E. Francis, and R. J. Booth (2001) Linguistic inquiry and word count: liwc 2001. Mahway: Lawrence Erlbaum Associates 71 (2001), pp. 2001. Cited by: §2.
  • J. Pennebaker (2017) Mind mapping: using everyday language to explore social & psychological processes. Procedia Computer Science 118, pp. 100–107. External Links: Document Cited by: §5.
  • D. Premack and G. Woodruff (1978) Does the chimpanzee have a theory of mind?. Behavioral and brain sciences 1 (4), pp. 515–526. Cited by: §5.
  • D. Preoţiuc-Pietro, J. Eichstaedt, G. Park, M. Sap, L. Smith, V. Tobolsky, H. A. Schwartz, and L. Ungar (2015) The role of personality, age, and gender in tweeting about mental illness. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, Colorado, pp. 21–30. External Links: Link, Document Cited by: §2.
  • F. Pulvermüller, W. Lutzenberger, and N. Birbaumer (1995) Electrocortical distinction of vocabulary types. Electroencephalography and clinical Neurophysiology 94 (5), pp. 357–370. Cited by: §5.
  • P. Resnik, A. Garron, and R. Resnik (2013) Using topic modeling to improve prediction of neuroticism and depression in college students. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1348–1353. External Links: Link Cited by: §2.
  • S. Rude, E. Gortner, and J. Pennebaker (2004) Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18 (8), pp. 1121–1133. Cited by: §2.
  • R. A. Sansone and L. A. Sansone (2012) Rumination: relationships with physical health. Innovations in clinical neuroscience 9 (2), pp. 29. Cited by: §5.
  • D. Smirnova, P. Cumming, E. Sloeva, N. Kuvshinova, D. Romanov, and G. Nosachev (2018) Language patterns discriminate mild depression from normal sadness and euthymic state. Frontiers in psychiatry 9, pp. 105. Cited by: §2, §5.
  • M. F. Steger and T. B. Kashdan (2009) Depression and everyday social activity, belonging, and well-being.. Journal of counseling psychology 56 (2), pp. 289. Cited by: §5.
  • S. W. Stirman and J. W. Pennebaker (2001) Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine 63 (4), pp. 517–522. Cited by: §2.
  • M. M. Tadesse, H. Lin, B. Xu, and L. Yang (2019) Detection of depression-related posts in reddit social media forum. IEEE Access 7, pp. 44883–44893. Cited by: §2.
  • A. Taylor, M. Marcus, and B. Santorini (2003) The penn treebank: an overview. Treebanks, pp. 5–22. Cited by: §4.
  • A. S. Uban, B. Chulvi, and P. Rosso (2021a) On the explainability of automatic predictions of mental disorders from social media data. In International Conference on Applications of Natural Language to Information Systems, pp. 301–314. Cited by: §2.
  • A. Uban, B. Chulvi, and P. Rosso (2021b) An emotion and cognitive based analysis of mental health disorders from social media data. Future Generation Computer Systems. Cited by: §2.
  • A. Uban and P. Rosso (2020) Deep learning architectures and strategies for early detection of self-harm and depression level prediction. In CEUR Workshop Proceedings, Vol. 2696, pp. 1–12. Cited by: §2.
  • A. Yates, A. Cohan, and N. Goharian (2017) Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2968–2978. Cited by: §1, §2.
  • A. Zirikly, P. Resnik, O. Uzuner, and K. Hollingshead (2019) CLPsych 2019 shared task: predicting the degree of suicide risk in reddit posts. In Proceedings of the sixth workshop on computational linguistics and clinical psychology, pp. 24–33. Cited by: §1.