Lightme: Analysing Language in Internet Support Groups for Mental Health

07/02/2020 ∙ by Gabriela Ferraro, et al. ∙ CSIRO Australian National University 0

Background: Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution. Methods: Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout mental health forum for young people. Results: When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52 is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; 1) posts expressing hopelessness, 2) short posts expressing concise negative emotional responses, 3) long posts expressing variations of emotions, 4) posts expressing dissatisfaction with available health services, 5) posts utilising storytelling, and 6) posts expressing users seeking advice from peers during a crisis. Conclusion: It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Internet Support Groups (ISG) has been important and popular technologies for individuals with mental ill-health to receive support from other peers that have similar lived experiences Islam et al. [2018] and to anonymously share their stories with others to support their recovery Mikal et al. [2017]. They are also referred to as online peer-support forum or networks. ISGs have supported groups of people with specific chronic health conditions, such as diabetes or mental health Islam et al. [2018]; Naslund et al. [2018]. Current evidence suggests ISGs may have a positive impact on individuals with mental ill-health; however, it may also exacerbate a person’s distress levels Kaplan et al. [2011]. Nevertheless, the safe use of ISGs will require more attention, especially designing mechanisms that can assist in mitigating possible adverse effects and harm to ISG users Griffiths [2017].

Assessment and monitoring in ISGs are challenging and costly because it relies on the manual detection of posts in an online forum by trained moderators. This raises particular concerns on the scalability of ISGs as a potential digital health intervention. To overcome limitations, Natural Language Processing (NLP) and Machine Learning (ML) technologies can be used to build systems that can assist trained moderators in detecting and responding to hazardous posts that may cause further distress or self-harm to ISG users. Prior research has shown text classification methods to be promising solutions in reducing the workload of trained moderators Huh et al. [2013].

Moderators play an important role in managing communication between users in an ISG. They offer a range of informal support and advice to users, including providing personal experiences of recovery, motivating users to participate in the discussion, and enhancing the adoption of digital mental health services Kornfield et al. [2018]. However, moderators may lack the necessary skills and expertise to guide appropriate decision making on issues relating to clinical safety Hartzler and Pratt [2011]. Triaging ISG posts to assist moderators in reviewing new content uploaded daily is an automated text classification task designed to efficiently detect individuals’ thoughts, feelings, emotions, and possible behaviors represented in messages Conway and O’Connor [2016]; Tausczik and Pennebaker [2010].

Previous research has often focused on evaluating the performance of different ML classification models using mental health ISG data, such as Logistic Regression

Cohan et al. [2016]; Pink et al. [2016]; Zirikly et al. [2016]

, Stochastic Gradient Descent (SGD)

Kim et al. [2016], and Linear Discriminant Analysis (LDA) Shickel et al. [2016]. The study by Islam et al. [2018]

used different ML techniques to detect depression from Facebook data. They evaluated the performance of several classification models, including Support Vector Machines (SVM), Decision Tree (DT), ensemble methods, and K-Nearest Neighbor (KNN). The results demonstrated the relative performance for specific classifiers. However, the authors did not evaluate the performance of different language features from lexical resources and deep learning models.

Lexicon-based resources are central to modelling the linguistic characteristics of ISGs. Over the years, examples of comparative systems have used different lexicon-based features to classify hazardous posts in an ISG for mental health Hollingshead et al. [2017]; Milne et al. [2016], including in posts from Twitter data O’Dea et al. [2017, 2015]; Coppersmith et al. [2016]; Jamil et al. [2017]. While modelling linguistic characteristics are important for accuracy performance, other features such as interactions of ISG users, forum structure, meta-data and other external features may likely improve the prediction performance B. et al. [2015]; Smithson et al. [2011]

. However, some authors have stated that relying on features extracted from external sources (e.g., forum structure and meta-data) may introduce biases; therefore, decreasing the predictive capabilities of the classifier on never seen before messages published on online forums

Altszyler et al. [2018].

Our study focuses on developing an automated classifier for triaging posts using only features from the textual content of the post derived from lexicon-based resources. We want to investigate the language of ISGs that exclude the use of the forum structure or post threads. By excluding forum structure and meta-data features from the model, the study primarily focuses on optimizing the linguistic aspects of detecting forum posts to avoid biases on unseen messages. Furthermore, given the extent of previous research on the combination of different ML classification models, we want to experiment on a broad combination of features using only a relatively small number of linear and nonlinear ML techniques including a couple of different deep learning models.

1.1 Research rationale

This study used state-of-the-art methods to develop hand-crafted features derived from the Reachout online support forum Hollingshead et al. [2017]. The model aims to achieve the best classification performance for crisis posts and competitive results for other classification labels, described in Section 2. We conducted a qualitative analysis of the post, which requires the immediate attention of moderators. The study has two aims:

  • Shed some light about the linguistic characteristics of the urgent posts.

  • Examine the feasibility of lexical resources in an ML classification system for triage post using the Reachout dataset.

2 Materials and Methods

2.1 Dataset

This study used a collection of posts from the Australian Reachout mental health online forum released by the Computational Linguistics and Clinical Psychology Shared Task (CLPsych) Hollingshead et al. [2017]. Participants range from 18 to 25 years old. All of the posts are written in English. Each post in the dataset is labelled with a semaphore pattern to indicate the urgency of the post, and the required attention of the moderator, as shown in Table 1. Label distribution across the training and testing dataset of the Reachout online forum is given in Table 2.

Label Description Example
Green No input from a moderator, and it can be safely left for the wider community of peers to respond. I’m proud that I was able to call and keep up a phone conversation with my mum.
Amber A moderator should address the post at some point, but they do not need to do so immediately. There are so many stuff I’m thinking about, but my medications are slowing my thoughts down and making it more manageable.
Red A moderator should respond to the post as soon as possible. I feel helpless and things seem pointless. I hate feeling so down.
Crisis The author, or someone they know, is in imminent risk of being harmed, or harming themselves or others. Posts should be prioritized above all others. Im having some strong thoughts about ending my life, nothing helps.
Table 1: Severity label descriptions and examples in the Reachout dataset.
Train % Test %
Crisis 40 3.36 42 10.5
Red 137 11.53 48 12
Amber 296 24.91 94 23.5
Green 715 60.18 216 54
Total 1188 - 400 -
Table 2: Label distribution across training and testing set of the Reachout dataset 2017

Precision, recall and F-measure were used to examine the performance of the classifier. Precision is defined as the proportion of correctly classified posts into a particular label by the ML model. Recall is defined as the proportion of the labels that are successfully classified. The F-measure is the mean of precision and recall. Macro f-score metric is preferred since it gives more weight to infrequent yet more critical labels, such as

red and crisis. Similar to Altszyler et al. [2018], the f-score for crisis versus non-crisis was reported. This metric measured the classifier’s capability to detect the most severe cases. Details of the official evaluation matrices are described below;

  • Macro-averaged F-score: The macro-averaged f-score is calculated among crisis, red and amber, and after excluding the green class.

  • F-score for flagged vs. non-flagged: This metric separates the posts that moderators need to action (i.e. crisis, red, amber) compared to posts that can be safely ignored (i.e. green). This is the most important metric in CLPsych since it measures the classifier’s capability to identify the post that requires moderator attention.

  • F-score for urgent vs. non-urgent: This metric is the average F1-score among urgent (crisis + red) and non-urgent (amber + green) labels.

A search of key computing and health databases (IEEE, ACM, PubMed and PsycINFO) were conducted to identify the key components of previous text classifiers for ISGs. Table 3 shows the features and methods used by the best performing classifiers using the Reachout dataset, more details in Milne et al. [2016] and Hollingshead et al. [2017].

Lexicon Features Used by
LIWC lexicon W. et al. [2015] Cohan et al. [2016]; Malmasi et al. [2016]
MPQA lexicon Kroenke et al. [2001] Cohan et al. [2016]; Altszyler et al. [2018]
PERMA lexicon Schwartza et al. [2016] Altszyler et al. [2018]
Emolex lexicon Mohammad and Turney [2013] Altszyler et al. [2018]
DepecheMood lexicon Staiano and Guerini [2014] Cohan et al. [2016]; Altszyler et al. [2018]
Other Features
Lexical diversity Altszyler et al. [2018]
Topic modeling Cohan et al. [2016]
TF-IDF weighted Kim et al. [2016]; Brew [2016]
Character embeddings Malmasi et al. [2016]
Word embeddings Kim et al. [2016]; Brew [2016]; Malmasi et al. [2016]; Altszyler et al. [2018]
Sentence embeddings Le and Mikolov [2014]
POS-tags Malmasi et al. [2016]
Pronouns Altszyler et al. [2018]
Sentiment analysis Shickel et al. [2016]; Zirikly et al. [2016]
Post author Altszyler et al. [2018]
Post history Malmasi et al. [2016]; Altszyler et al. [2018]
Post reply chain Pink et al. [2016]
Time of the post Altszyler et al. [2018]
Time between post Altszyler et al. [2018]
Week day of the post Altszyler et al. [2018]
References to advisors Altszyler et al. [2018]
References to self-harm Altszyler et al. [2018]
References to Telephone helplines Altszyler et al. [2018]
LDA: unsupervised topic modeling Shickel et al. [2016]
SGD: supervised classification Kim et al. [2016]
Support Vector Machine (SVM): supervised classification Malmasi et al. [2016]; Brew [2016]; Zirikly et al. [2016]; Altszyler et al. [2018]
Logistic regression: supervised classification Cohan et al. [2016]; Pink et al. [2016]; Zirikly et al. [2016]
Table 3: Examples of features used for triage classification using the Reachout dataset. Used by refers to previous research studies that a feature was used.

2.2 Predicting Alerts Approach

The Reachout dataset consist of training instances, where the th instance is a feature vector and label . The classification task is to predict the label given the feature vector for each forum post such that:


We trained a Support Vector Machine (SVM) multi-class classifier with linear kernels V. N. Vapnik [1963]. SVM is a supervised ML method used widely in text classification. This method used a state-of-the-art triage classification using the Reachout dataset. Hyper-parameters111In machine learning, an hyper-parameter is a parameter whose value is set before the learning process, while the value of other parameters are derived via learning. were selected with a grid search222Grid search is a way of choosing the best hyper-parameters, and consist of exhaustively searching through a subset of the hyper-parameter space of a learning algorithm. scheme with a 5-fold Cross-Validation over the training set. The C hyper-parameter333The C hyper-parameter referrers to the regularization value, which serves as a degree of importance that is given to miss-classification. The larger the value, the less the wrongly classified examples are allowed. is 1 with

regularization type, and the loss function

444A loss function or cost function measures how good a prediction model does in terms of being able to predict the expected outcome. is hinge, the maximum number of iterations is 2000.

In order to compare Lightme (which used SVM) against other ML classifiers, we also trained K-Nearest Neighbour (KNN), and Naïve Bayes. Since deep learning models are the state-of-the-art in many natural language processing applications, we trained two neural network classifiers: Multi Layer Perceptron (MLP) and Recurrent Neural Networks (RNN) with Long Short Term Memory (LSTM).

The feature set is shown in Table 4. All the features were derived from the post themselves. No features derived from the forum structure or interactions between posts were used. We included additional language features such as MPQA, offensive language, and mental health lexicons.

During feature extraction, negation was model as in Cimino et al. [2014]. Thus, when a from a post is found in a lexicon, its negation is checked by inspecting the term . As in Cimino et al. [2014], we used a list of negation terms:

no, nobody, nothing, none, never, neither, nor, nowhere, hardly, scarcely, barely, don’t, isn’t, wasn’t, doesn’t, ain’t, can’t, won’t, wouldn’t, shouldn’t, couldn’t, hasn’t, haven’t, didn’t

If a negation term was found, the polarity of the term was shifted when the lexicon differentiate between positive and negative terms (e.g., the PERMA lexicon); otherwise, it was skipped and not included as a feature.

Lexicon Features Feature Description
MPQA lexicon* The number of words with MPQA polarity in each post
DepecheMood lexicon* The number of words overlap between each category in DepechMood and a post.
Emolex lexicon* The number of words overlap between each category in the NRC-Emotion-Lexicon-v0.92 lexicon and a post
Mental Disorder lexicon 555 The number of words overlap between the Mental Disorder lexicon and a post
PHQ_9 lexicon The number of words overlap between the PHQ_9 and a post
PERMA lexicon (1)* The number of bi-gram and tri-gram overlap between PERMA and a post
PERMA lexicon (2) The number of bi-gram and tri-gram overlap between PERMA negatives categories and a post
PERMA lexicon (3) The weights sum of the bi-gram and tri-grams overlap between PERMA and a post
Offensive word lexicon666 The number of words overlap between offensive word list
Other Features
TF-IDF weighted N-grams TF-IDF representation of each post with top max features chosen by Scikit-learn based on term frequency
Pronouns The number of pronouns used in each post, including I, me, you, he, him, she, her, it, we, us, they, them
Mean word length The average length of words in a post
Sentence embeddings Sentence representation computed by averaging pre-trained FastText word embeddings fine-tuned with the Reachout dataset
Last sentence embeddings Sentence representation of the last sentence in each post computed by averaging FastText word embeddings trained with the Reachout dataset
Sentiment analysis feature The sentiment of each post classified by a sentiment classifier trained by us with GloVe [Pennington et al., 2014] word embeddings feature and emoticon embedding
User rank The forum title of the poster for each post
Number of web links Total number of web links in a post
Number of reference to a help line services mental health, australia, general practitioner, doctor, psychologist, counsellor, gp (general practitioner), emergency, 000, lifeline, 131114, 13 11 14, kids help line, 1800 55 1800, 1800551800, salvation army care line, 1300 36 36 22, 1300363622, e-couch, moodgym, bluepages, black dog institute, reachout, beyondblue,,,,,
Number of references to self-harm expressions suicide, kill myself, kill my self, cut myself, cut my self, hurt myself, hurt my self, harm myself, harm my self, I want to die, don’t want to live, end my life, kill, hurt, cut, want to die, I don’t want to live
Number of references to advisors supervisor, supervisors, mentor, manager, tutor, case-manager, managers, manager, psych, psychiatrist, gp (general practitioner), gps, counsellor, counselor
Table 4: Feature set used for triage classification with the Reachout dataset. ’*’ indicates a lexicon that have been tested in the previous studies (see Table 3).

3 Results

3.1 Triage Classification Experimental Results

The results of the triage classification experiment using the different features, including lexical resources and treating negation, are presented in Table 5. Best results are highlighted in boldface. Exclusive use of lexicon features resulted in lower performance for all classes (flagged, urgent and crisis) and the overall performance (macro F1-score). Treating negation when using only lexicons did not boost the classification performance. However, adding Term Frequency-Inverse Document Frequency (TF-IDF) contributed to improving the classification performance for all classes777TF-IDF is the amount of times a word appears in a document weighted by the number of meaningful words across multiple documents. Best results were achieved with features that included “TF-IDF + lexicons with negation”, F1-score of 0.44. The most complex set of features (included all features in Table 4) showed competitive results with the most state-of-the-art triage classification system by Altszyler et al. [2018], and the baseline classification system by Milne et al. [2016].

Macro F1-score Flagged Urgent Crisis
Only lexicons 0.24 0.38 0.38 0.20
Lexicons with negation 0.19 0.43 0.37 0.04
TF-IDF + lexicons 0.38 0.71 0.53 0.44
TF-IDF + lexicons with negation 0.44 0.74 0.63 0.52
Lightme (features from Table 4) 0.43 0.77 0.59 0.51
Table 5: Triage classification with different features sets.

We also experimented on a linear and nonlinear classification method. Naïve Bayes was trained with the Lightme feature set since it is an easy and fast linear classification method suitable for classifying large chunks of data. Similarly, KNN was trained with the same feature set due to its practicality and ease. Hyper-parameters such as the number of neighbours were selected with a grid search scheme using a range from 1 to 25. Table 6 shows the results of Naïve Bayes and KNN, which underperformed SVM and other state-of-the-art systems. Compare to Naïve Bayes and KNN, SVM is known to perform better on rich feature sets such as the one presented in this study.

MLP was trained using the same set of features as Lightme with hidden layer sizes that varied between 100 and 300 nodes, depending on the development set. The RNN+LSTM model was trained with pre-trained word embeddings and without features since one of the advantages of this type of model is its ability to learn feature representations automatically. Important hyper-parameters such as the number of epochs, size of the hidden layer and batch size were tuned using a portion of the training set as the development set. As shown in Table

6, the deep learning models underperformed the other models. This is not surprising as deep learning models are data-hungry and the size of Reachout is small, especially some of the classes (e.g., red and crisis) only have a few instances.

System Macro F1-score Flagged Urgent Crisis
Baseline 0.3 0.61 0.44 -
Naïve Bayes 0.28 0.67 0.42 0.39
KNN 0.14 0.39 0.08 0.0
MLP 0.38 0.71 0.58 0.39
RNN+LSTM 0.28 0.44 0.008 0.0
Altszyler Altszyler et al. [2018] 0.44 0.90 0.68 0.48
TF-IDF + lexicons with negation 0.44 0.74 0.63 0.52
Lightme (features from Table 4) 0.43 0.77 0.59 0.51
Table 6: Comparison results on the test set in terms of F-score

3.2 Qualitative formative analysis of crisis posts

We randomly selected 40 crisis posts to analyse from the training dataset. We then used open coding to understand linguistic characteristics. Through the qualitative analysis of the selected crisis posts, we identified six linguistic characteristics. We extracted selected phases of crisis posts (including the post id) that matched the given linguistic profile, and suggested recommendation of features to the model.

3.2.1 Expressing hopelessness in crisis

Many of the posts used language or words that described a person’s feeling of immediate hopelessness. Extreme hopelessness or helplessness may be associated with an increased risk of suicide Cash et al. [2013]. Learned helplessness comes from a repeated belief that uncomfortable situations are inescapable, an example statement is ”I tried doing this for my anxiety, but I ended up faced with these challengesLiu et al. [2015]. Hopelessness is the feeling of a combination of helplessness and experiences of depression resulted from a person’s response to a negative event Cash et al. [2013]; Liu et al. [2015]. An example statement is ”I am fed up with my friend anger! I can’t bother trying anymore because I am frustrated”.

Extracted phases of crisis posts describing hopelessness of a forum user:

  • I can feel pretty hopeless at times too. I start questioning if I can ever get better. It’s hard enough to live.” (Post ID: 136600)

  • I’m feeling so tired, and I want to give up on life. I need to keep holding on. There’s still hope for me. I just need to make sure I reach out when I feel like things are getting way too intense.” (Post ID: 136601)

  • No but I am pretty friggin sick of my entire life at this point and my existence…” (Post ID: 135818)

  • I’m still finding it hard not to do anything stupid. I’ve screwed up. Now I don’t know where this is headed.” (Post ID: 138188)

Recommended features to model: Categorical features can be model with the following keywords; feel tired, fed up, better dead, give up life, the end is near, sick of life, sick of existence, holding on, hopeless times, hope, trying help or talking, and hard to try or do. Other features can include checking spelling mistakes.

3.2.2 Short crisis posts and emotional response

Short length posts contained concise descriptions of a person’s negative emotions. Contrast to longer post; shorter posts contained more variations in expressing positive and negative emotions. As noted by O’Dea et al. [2017], lexicons may be limited in detecting certain expressions such as irony, sarcasm, and metaphors. Therefore, any text under 50 words should be interpreted with caution. Further limitations of interpreting short posts included the use of negation Gkotsis et al. [2016].

Examples of short crisis posts describing a concise negative emotion:

  • I’m suffocating. I don’t if I can do this anymore.” (Post ID: 138064)

  • @redhead I don’t know how long I can even keep myself together before I’m screwed.” (Post ID: 138067)

  • @chessca_h no. I don’t want to be safe anymore. I’m ai over it right now.” (Post ID: 137786)

Recommended features to model: Features can define short posts as messages under 50 words, or posts that contain no more than two sentences. Additionally, features can include detecting only one negative emotion for short posts, treating all negations for short posts, and checking for spelling mistakes.

3.2.3 Long crisis posts and emotional coping

Long length posts were found to start with a user expressing some negative emotions, followed by positive emotions related to their abilities to cope. This may be a positive sign as it may indicate a person attempting to reconcile negative emotions A. et al. [2014]. However, they may express negative emotions after showing signs of positivity.

Extracted phases of crisis posts that expresses patterns of health service dissatisfaction:

  • (Negative) Feeling extremeley tired each morning. It’s getting to the point I’m contemplating ringing in sick to aviod getting up. (Positive) Despite the tiredness, I’ve bee getting up and going to work, because I know I need to face the world. (Negative) Keep having thoughts to end it all…” (Post ID: 135898)

  • (Negative) Didn’t sleep til way after 2 last night. It was super hard to sleep, then wake this morning. I just wanted to ignore the world today. (Positive) I eventually fell asleep. I eventually got up and went to work. I faced the world and smiled a little. (Negative) Really struggled through my shift today… ” (Post ID: 137919)

Recommended features to model: Feature can define long posts as messages that contain 50 or more words with varying levels of positive and negative emotions. Other features can include detecting negative emotions at the beginning of a sentence followed by subsequent positive emotions. Checking for spelling mistakes can also be a feature. A feature can detect positive emotions such as keywords relating to coping; getting there, I faced the world, or getting up and working.

3.2.4 Health service dissatisfaction

It was found that people who seek support services (e.g., health service, counsellors, or treatment) in the forum would sometimes feel hopelessness, avoidance, or frustration. This pattern may signal a person using the forum to vent their dissatisfaction or frustrations with local mental health support services, or failures with their treatment of care.

Extracted phases of crisis posts that expresses patterns of health service dissatisfaction:

  • (Service) My gp was running late today, which heightened my anxiety. At first she didn’t realise it was my 3month follow up apt. She also had no idea that the psych was meant to write a letter, so the psych either ran out of time or forgot. Meh. I just wanted to run away and hide. It was SO VERY hard not to close off and run. I didn’t even hear her call my name the first time. Blergh…” (Post ID: 137919)

  • im having bad thought about ending my life, nothing helps not even (Service) my counceller” (Post ID: 136895)

  • I know looking back at therapy experiences that didn’t work out will only discourage me. I’m highly impatient and annoyed. I’m trying to find the right (Service) professionals for me, its a very frustrating process. I can feel pretty hopeless at times too. I start questioning if I can ever get better. It’s hard enough to live.” (Post ID: 136600)

Recommended features to model: Categorical features can be used to model any mention of support services. Features can also consider detecting the negative expression of support services in crisis posts, factoring the length of the posts, and checking for spelling mistakes.

3.2.5 Utilising story telling to express crisis

According to Smithson et al. [2011], there are two types of behaviours when ISG users seek help. The first behaviour involves a person wanting to communicate their story or ’trouble telling’, and the other behaviour involves a person wanting advice. Appropriate timing for offering advice is crucial. If the advice is suggested too soon, it is likely to be rejected. It was found some people would join the forum to seek advice about some issue followed by opening up to talk about their problems.

Extracted phases of story telling in crisis posts:

  • (Event) So today I went to the doctors and they told me that the chemotherapy that I am on is not working, my body isnt reacting to it the way it should be, which means that I now need to start this new treatment that is going to knock me around a lot more then the ast chemotheraphy…” (Post ID: 136116)

  • (Event) I moved out of home into a defacto relationship about a year ago now, and despite having troubles with my mum, who I used to live with (single parent), I have the feeling that she is very lonely and she often gets teary about that. (Event) She mentioned today that she may as well just kill herself because she feels like she’s not really worth it anymore.” (Post ID: 137384)

Recommended features to model: Features can detect a sequence of personal events. Personal events may contain temporal features, such as today, yesterday, or tomorrow. Additionally, features can detect negative and positive emotional responses relating to different events identified in the post. Other features can include checking spelling mistakes.

3.2.6 Seeking advice of peers during crisis

Crisis post was found to contain more advice seeking information than information providing support to other peers. Gaining the support of peers online is a common behaviour among people with severe mental ill-health A. et al. [2014]. B. et al. [2015] differentiate posts that provide support to peers and posts that attempt to seek advice from other peers in an ISG. Supportive posts were characterized by the user providing emotional support, and informational support, such as offering website links to seek help. However, posts that seek advice are characterized by the users seeking informational support and seeking emotional support, and companionship from other peers.

Extracted phases of advice seeking in crisis posts:

  • …Suffering from anxiety and deppression myself, this kind of relationship is setting me back quite significantly. (Advice Seeking) Has anyone else ever had a depressed parent that they are worried about when they move out of home? I have been going to her place often and not sure what else I can do to really help her…” (Post ID: 137384)

  • She’s very depressed and always wnats to die. I’m pretty scared and I try to help but deep down I’m pretty useless for helping.(Advice Seeking) Any good tips? Because she likes to talk to me because i’m nice to her and doesn’t judge her.” (Post ID: 135748)

Recommended features to model: Features can detect questions relating to an emotional response. Other feature can also detect advice-seeking information as a text embedding feature that identifies information relating to emotional support.

4 Discussion

4.1 Principal Findings

This study demonstrates a solution that utilises a variety of lexicon-based resources and supervised ML techniques to assist trained moderators to efficiently moderate ISGs. Contrast to other similar research; this study extracted lexicon-based features from the textual content of posts which may avoid possible biases during classification. The classification experiment found one of our classifier (Liteme) achieved the best results for the crisis post (0.52 F1-score) and competitive results in the other classes (i.e., non-green, flagged, and urgent posts). These results may indicate that it is possible to build a strong classifier that can process only textual features extracted from individual messages. However, the experimental results also demonstrated that using only lexicons was not enough to classify posts into all relevant classes. Exclusive use of vocabulary in the Reachout dataset was built into the solution which may have introduced some noise that may have impacted on the classification performance of flagged and urgent posts. Furthermore, this study demonstrated the limitations of utilizing lexicons, especially their ability to only capture information at the ’word’ level. This may prevent their ability to understand the contextual meaning at the ’sentence’ level.

Furthermore, the findings suggest that using mental health lexicons can have an impact on the classification of posts requiring immediate response by trained moderators. This is unsurprising given the distinct domain-specific properties of lexicons, especially their association with certain mental and behavioural health theoretical constructs Kornfield et al. [2018]. Lastly, six linguistic characteristics were identified in the qualitative analysis of crisis posts. Interestingly, we found a person in crisis will use words or language associated to hopelessness, publish short posts containing concise negative emotional responses, publish long posts containing variations of emotions, express dissatisfaction with locally available health services, use storytelling to express crisis, and seek the advice of peers during a crisis.

4.2 Comparison to Previous Research

Our best classifier showed comparative results with the state-of-the-art systems for triage classification with the Reachout dataset and the baseline classifiers. The baseline system by Milne et al. [2016] used uni-grams and bi-grams as features, and a default scikit-learn logistic regression classifier Pedregosa et al. [2011]. We also found the classification performance of the best system by Altszyler et al. [2018] from the CLPsych 2017 Shared Task also utilized an SVM classifier. This classifier used a richer set of features, including features from the forum structure and interactions between posts, which outperformed our system for flagged and urgent posts. Interestingly, our approaches showed better results in identifying crisis posts. This is an important category for this problem, especially the moderator’s need to immediately respond to these posts. Adding features derived from the forum structure may help to improve the classification performance. However, the trade-off is the expense of not properly classifying posts from new users.

Similar to other systems, our triage text classifier found using TF-IDF with lexicons improved classification performance. Kim et al. [2016] received the best results in CLPsych 2016 Shared Task when using TF-IDF weighted n-grams and post embeddings using Sent2vec in an SGD classifier, and a set of twelve fine-coarse grained labels, instead of the coarse-grained four labels. The system by Brew [2016] also weights n-grams with TF-IDF producing similar results. Similarly, the use of TF- IDF showed comparative results to triage text classifiers for Twitter O’Dea et al. [2015], and an online support forum for substance abuse Kornfield et al. [2018].

The qualitative findings appeared to support prior research that found similar patterns of online interactions in people with mental ill-health using social media. As highlighted in previous research, online peer-to-peer interactions can improve health and psychosocial outcomes by facilitating a range of positive behaviours that can empower people, such as seeking information and emotional support A. et al. [2014]. However, these online networks can also become harmful when social media content begin to promote self-harm, suicide, or pro-eating disorder behaviours Gerrard [2018]; Dyson et al. [2016]. Particularly, social media posts that promote “problematic” content that may be difficult to identify by specifically moderating hashtags in online communities Gerrard [2018].

4.3 Implication on Future Research

Most of the qualitative findings may be translated into features that could improve classification performance. As noted, the qualitative findings of the crisis posts could be used to distinguish salient linguistics characteristics of the language used in urgent messages for moderators. For example, specific features for detecting hopelessness may improve detection of crisis messages. Suggestions for future work may also include differentiating posts that provide support or seek advice to other peers and identifying participants roles, such as leaders, influence, and opinion users B. et al. [2015]. Furthermore, various types of help-seeking behaviours can be identified, such as users wanting to share their personal stories of struggle Smithson et al. [2011]. The analysis of satisfaction with available services can play a role in developing enhanced mixed reality care approaches combining eHealth and on-site services van Genderen and Vlake [2018].

4.4 Limitation

There was a limitation to this study. First, our classifier is restricted to one dataset. More data is needed to generalize the model to avoid overfitting. Second, the training set was relatively small. This may have had implications to our approach and subsequent results. Third, an error analysis was not conducted. The error analysis could have examined why certain posts were misclassified or classified correctly.

4.5 Conclusion

The current study examines a triage classifier using features derived only from the textual content of the post. Various lexicons were used to analyse the value of lexical resources on the text classifier for triaging posts. Lexical resources alone were not enough to build a good performing classifier; however, a solution that includes lexicons with other features derived from the content of the posts performed well in identifying crisis posts. Qualitative investigation on the crisis posts found six salient linguistic characteristics. While qualitative findings are still formative, more work is needed to translate these findings into features that can improve the overall performance.

5 Disclosure Statement

No competing financial interests exist.


  • N. A., S. W. Grande, K. A. Aschbrenner, and G. Elwyn (2014) Naturally occurring peer support through social media: the experiences of individuals with severe mental illness using youtube. PLoS One 9 (10) (English). Cited by: §3.2.3, §3.2.6, §4.2.
  • E. Altszyler, A. J. Berenstein, D. N. Milne, R. A. Calvo, and D. F. Slezak (2018) Using contextual information for automatic triage of posts in a peer-support forum. See Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, clpsych@naacl-htl, new orleans, la, usa, june 2018, Loveys et al., pp. 57–68. External Links: Link Cited by: §1, §2.1, Table 3, §3.1, Table 6, §4.2.
  • C. B., A. K., C. JA., and G. KM (2015) From help-seekers to influential users: a systematic review of participation styles in online health communities. Journal of Medical Internet Research. Cited by: §1, §3.2.6, §4.3.
  • C. Brew (2016)

    Classifying reachout posts with a radial basis function svm

    In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pp. 138–142. External Links: Document, Link Cited by: Table 3, §4.2.
  • S. J. Cash, M. Thelwall, S. N. Peck, J. Z. Ferrell, and J. A. Bridge (2013) Adolescent suicide statements on myspace. Cyberpsychology, Behavior, and Social Networking 16 (3), pp. 166–174. Note: PMID: 23374167 External Links: Document, Link, Cited by: §3.2.1.
  • A. Cimino, S. Cresci, F. Dell’Orletta, and M. Tesconi (2014) Linguistically-motivated and lexicon features for sentiment analysis of italian tweets. 4th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2014), pp. 81–86. Cited by: §2.2.
  • A. Cohan, S. Young, and N. Goharian (2016) Triaging mental health forum posts. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 143–147. External Links: Link Cited by: §1, Table 3.
  • M. Conway and D. O’Connor (2016) Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology 9, pp. 77–82. Note: Social media and applications to health behavior External Links: ISSN 2352-250X, Document, Link Cited by: §1.
  • G. Coppersmith, K. Ngo, R. Leary, and A. Wood (2016) Exploratory analysis of social media prior to a suicide attempt. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 106–117. External Links: Link Cited by: §1.
  • M. P. Dyson, L. Hartling, J. Shulhan, A. Chisholm, A. Milne, P. Sundar, S. D. Scott, and A. S. Newton (2016) A systematic review of social media use to discuss and view deliberate self-harm acts. PLOS ONE 11 (5), pp. 1–15. External Links: Link, Document Cited by: §4.2.
  • Y. Gerrard (2018) Beyond the hashtag: circumventing content moderation on social media. New Media & Society 20 (12), pp. 4492–4511. External Links: Document, Link, Cited by: §4.2.
  • G. Gkotsis, S. Velupillai, A. Oellrich, H. Dean, M. Liakata, and R. Dutta (2016) Don’t let notes be misunderstood: a negation detection method for assessing risk of suicide in mental health records. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 95–105. External Links: Link Cited by: §3.2.2.
  • K. M. Griffiths (2017) Mental health internet support groups: just a lot of talk or a valuable intervention?. World Psychiatry 16 (3), pp. 247–248. External Links: Document, Link Cited by: §1.
  • A. Hartzler and W. Pratt (2011) Managing the personal side of health: how patient expertise differs from the expertise of clinicians. J Med Internet Res 13 (3), pp. e62. External Links: Document Cited by: §1.
  • K. Hollingshead, M. E. Ireland, and K. Loveys (Eds.) (2017) Proceedings of the fourth workshop on computational linguistics and clinical psychology — from linguistic signal to clinical reality. Association for Computational Linguistics, Vancouver, BC. External Links: Link Cited by: §1.1, §1, §2.1, §2.1.
  • J. Huh, M. Yetisgen-Yildiz, and W. Pratt (2013) Text classification for assisting moderators in online health communities. Journal of Biomedical Informatics 46 (6), pp. 998–1005. Note: Special Section: Social Media Environments External Links: ISSN 1532-0464, Document, Link Cited by: §1.
  • Md. R. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, and A. Ulhaq (2018) Depression detection from social network data using machine learning techniques. Health Information Science and Systems 6 (1), pp. 8. External Links: ISSN 2047-2501, Document, Link Cited by: §1, §1.
  • Z. Jamil, D. Inkpen, P. Buddhitha, and K. White (2017) Monitoring tweets for depression to detect at-risk users. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality, Vancouver, BC, pp. 32–40. External Links: Link Cited by: §1.
  • K. Kaplan, M. Salzer, P. Solomon, E. Brusilovskiy, and P. Cousounis (2011) Internet peer support for individuals with psychiatric disabilities: a randomized controlled trial. 72, pp. 54–62. Cited by: §1.
  • S. M. Kim, Y. Wang, S. Wan, and C. Paris (2016) Data61-csiro systems at the clpsych 2016 shared task. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 128–132. External Links: Link Cited by: §1, Table 3, §4.2.
  • R. Kornfield, P. K. Sarma, D. V. Shah, F. McTavish, G. Landucci, K. Pe-Romashko, and D. H. Gustafson (2018) Detecting recovery problems just in time: application of automated linguistic analysis and supervised machine learning to an online substance abuse forum. J Med Internet Res 20 (6), pp. e10136. External Links: ISSN 1438-8871, Document, Link, Link, Link Cited by: §1, §4.1, §4.2.
  • K. Kroenke, R. L. Spitzer, and J. B. W. Williams (2001) The phq-9. Journal of General Internal Medicine 16 (9), pp. 606–613. External Links: ISSN 1525-1497, Document, Link Cited by: Table 3.
  • Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pp. II–1188–II–1196. External Links: Link Cited by: Table 3.
  • R. T. Liu, E. M. Kleiman, B. A. Nestor, and S. M. Cheek (2015) The hopelessness theory of depression: a quarter-century in review. Clinical Psychology: Science and Practice 22 (4), pp. 345–365. External Links: Document Cited by: §3.2.1.
  • K. Loveys, K. Niederhoffer, E. Prud’hommeaux, R. Resnik, and P. Resnik (Eds.) (2018) Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, clpsych@naacl-htl, new orleans, la, usa, june 2018. Association for Computational Linguistics. External Links: Link, ISBN 978-1-948087-12-4 Cited by: E. Altszyler, A. J. Berenstein, D. N. Milne, R. A. Calvo, and D. F. Slezak (2018).
  • S. Malmasi, M. Zampieri, and M. Dras (2016) Predicting post severity in mental health forums. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 133–137. Cited by: Table 3.
  • J. Mikal, S. Hurst, and M. Conway (2017) Investigating patient attitudes towards the use of social media data to augment depression diagnosis and treatment: a qualitative study. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality, Vancouver, BC, pp. 41–47. External Links: Link Cited by: §1.
  • D. N. Milne, G. Pink, B. Hachey, and R. A. Calvo (2016) CLPsych 2016 shared task: triaging content in online peer-support forums. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, pp. 118–127. External Links: Link Cited by: §1, §2.1, §3.1, §4.2.
  • S. Mohammad and P. D. Turney (2013) Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, pp. 436–465. Cited by: Table 3.
  • J. A. Naslund, K. A. Aschbrenner, L. A. Marsch, G. J. McHugo, and S. J. Bartels (2018) Facebook for supporting a lifestyle intervention for people with major depressive disorder, bipolar disorder, and schizophrenia: an exploratory study. Psychiatric Quarterly 89 (1), pp. 81–94. External Links: ISSN 1573-6709, Document, Link Cited by: §1.
  • B. O’Dea, M. E. Larsen, P. J. Batterham, A. L. Calear, and H. Christensen (2017) A linguistic analysis of suicide-related twitter posts. Crisis 38 (5), pp. 319–329. Note: PMID: 28228065 External Links: Document, Link, Cited by: §1, §3.2.2.
  • B. O’Dea, S. Wan, P. J. Batterham, A. L. Calear, C. Paris, and H. Christensen (2015) Detecting suicidality on twitter. Internet Interventions 2 (2), pp. 183–188. External Links: ISSN 2214-7829, Document, Link Cited by: §1, §4.2.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §4.2.
  • J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation.. In EMNLP, Vol. 14, pp. 1532–1543. Cited by: Table 4.
  • G. Pink, W. Radford, and B. Hachey (2016) Classification of mental health forum posts. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych@NAACL-HLT 2016, June 16, 2016, San Diego, California, USA, pp. 180–182. External Links: Link Cited by: §1, Table 3.
  • H. A. Schwartza, M. Sap, M. L. Kern, J. C. Eichstaedt, A. Kapelner, M. Agrawal, E. Blanco, L. Dziurzynski, G. Park, D. Stillwell, M. Kosinski, M. E.P. Seligman, and L. H. Ungar (2016) Predicting individual well-being through the language of social media. pp. 516–527. Cited by: Table 3.
  • B. Shickel, M. Heesacker, S. Benton, A. Ebadi, P. Nickerson, and P. Rashidi (2016) Self-reflective sentiment analysis. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 23–32. External Links: Document, Link Cited by: §1, Table 3.
  • J. Smithson, S. Sharkey, E. Hewis, R. Jones, T. Emmens, T. Ford, and C. Owens (2011) Problem presentation and responses on an online forum for young people who self-harm. Discourse Studies 13 (4), pp. 487–501. External Links: Document, Link, Cited by: §1, §3.2.5, §4.3.
  • J. Staiano and M. Guerini (2014) Depeche mood: a lexicon for emotion analysis from crowd annotated news. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland, pp. 427–433. External Links: Link Cited by: Table 3.
  • Y. R. Tausczik and J. W. Pennebaker (2010) The psychological meaning of words: liwc and computerized text analysis methods. Journal of Language and Social Psychology 29 (1), pp. 24–54. External Links: Document, Link, Cited by: §1.
  • A. Ya. L. V. N. Vapnik (1963) Recognition of patterns with help of generalized portraits. Vol. 24, pp. 774–780. Cited by: §2.2.
  • M.E. van Genderen and J.H. Vlake (2018) Virtual healthcare; use of virtual, augmented and mixed reality. Nederlands tijdschrift voor geneeskunde 162, pp. D3229. Cited by: §4.3.
  • P. J. W., B. R. L., J. K., and K. Blackburn (2015) The development and psychometric properties of liwc2015. Cited by: Table 3.
  • A. Zirikly, V. Kumar, and P. Resnik (2016) The gw/umd clpsych 2016 shared task system. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 166–170. External Links: Document, Link Cited by: §1, Table 3.