Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptoms

06/14/2020 ∙ by Lucia Lushi Chen, et al. ∙ 0

Depression is the leading cause of disability worldwide. Initial efforts to detect depression signals from social media posts have shown promising results. Given the high internal validity, results from such analyses are potentially beneficial to clinical judgment. The existing models for automatic detection of depressive symptoms learn proxy diagnostic signals from social media data, such as help-seeking behavior for mental health or medication names. However, in reality, individuals with depression typically experience depressed mood, loss of pleasure nearly in all the activities, feeling of worthlessness or guilt, and diminished ability to think. Therefore, a lot of the proxy signals used in these models lack the theoretical underpinnings for depressive symptoms. It is also reported that social media posts from many patients in the clinical setting do not contain these signals. Based on this research gap, we propose to monitor a type of signal that is well-established as a class of symptoms in affective disorders – mood. The mood is an experience of feeling that can last for hours, days, or even weeks. In this work, we attempt to enrich current technology for detecting symptoms of potential depression by constructing a 'mood profile' for social media users.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Depression is the leading cause of disability worldwide. Initial efforts to detect depression signals from social media posts have shown promising results (De Choudhury et al., 2013; Coppersmith et al., 2014; Park et al., 2012; Tsugawa et al., 2015; Nguyen et al., 2014; Nadeem, 2016; Almeida et al., 2017). Given the high internal validity (Reece et al., 2017; De Choudhury et al., 2013), results from such analyses are potentially beneficial to clinical judgement. The existing models for automatic detection of depressive symptoms learn proxy diagnostic signals from social media data, such as help-seeking behaviour for mental health or medication names (De Choudhury et al., 2013; Coppersmith et al., 2014). However, in reality, individuals with depression typically experience depressed mood, loss of pleasure nearly in all the activities, feeling of worthlessness or guilt, and diminished ability to think (Association and others, 2013). Therefore, a lot of the proxy signals used in these models lack the theoretical underpinnings for depressive symptoms. It is also reported that the social media posts from many patients in the clinical setting do not contain these signals (Ernala et al., 2019). Based on this research gap, we propose to monitor a type of signal that is well-established as a class of symptom in affective disorders — mood. Mood is an experience of feeling that can last for hours, days or even weeks (Association and others, 2013). In this work, we attempt to enrich current technology for detecting symptoms of potential depression by constructing a ’mood profile’ for social media users.

The variance of quality and intensity of mood and emotional reactions are referred to as ”affective style”

(Davidson, 1998), which underlies one’s risks of developing psychological disorders (Rottenberg and Gross, 2003; Akiskal, 1996). Assessing affective style in everyday life is difficult in an experimental context because it requires a costly extended period of data collection. In contrast, social media data contains longitudinal information that reflect one’s emotional reactions to stimuli. Therefore, it can provide researchers with an alternative lens to examine the affective style of an individual, based on the premise that approval is obtained from social media users, and data privacy is well-protected.

Existing models for detecting symptoms of potential depression often include mood as a feature variable in the modeling process. However, there are a few methodological gaps in these models. First, most of them do not distinguish between mood and emotions. Emotion is a brief reaction to a specific stimulus, whereas mood has longer temporal duration (Morris, 2012). Researchers using social media data to study mood or emotions often see a single post as reflecting mood (Bollen et al., 2011; Thelwall et al., 2011; Celli et al., 2016). However, a single social media post is likely to reflect a participant’s emotions at the time rather than ongoing mood (Batson et al., 1992; Rottenberg, 2005). In this current work, we adopted the definition of mood from The Diagnostic and Statistical Manual of Mental Disorders (Association and others, 2000): “mood is the pervasive and sustained ‘emotional climate’, and emotions are ‘fluctuating changes in emotional ‘weather’ ”. We sought to determine whether temporal mood representation derived from social media text is associated with subsequent self-reported depressive symptoms, and if so, what are the best approaches to represent mood as a time dependent variable for future work?

Furthermore, a majority of models in this line of research often ignore the fact that affect is inherently time dependent. Only a few models have adopted temporal affective patterns (Reece et al., 2017; De Choudhury et al., 2013). Most of these models also formulate the associations between affect and depressive symptoms based on the averaged affect (Schwartz et al., 2013; Chen et al., 2020), but the transitioning from one affective state to another was largely ignored (Rottenberg, 2005; Frijda, 1993; Bylsma et al., 2011; Sheppes et al., 2015). In this work, we explored and tested multiple approaches to represent the temporal affective patterns and the transitions of affective states.

Nevertheless, social media users often post sporadically. The sparsity of social media data posits a big challenge in the modeling process. Most of the existing studies imputed missing values with the mean or simply removed users with a lower word count

(De Choudhury et al., 2013; Wang et al., 2013)

. Removing outliers is beneficial to the modeling process. However, it may result in removing those with severe symptoms from the sample, because disinterest in social contact and social withdrawal (e.g. posting sparsely) is the core symptom of major depressive disorder (MDD)

(Association and others, 2000). Therefore, it is necessary to use some modeling techniques to include the outliers.

Towards addressing the methodological gaps described above, we designed multiple mood representations with the following characteristics: (i) Temporal features (ii) Transitions from one mood state to another (iii) Posting behavior. Here we see all the mood representations as a Mood Profile for social media users. We formulate the following questions to explore the roles of mood in predicting depressive symptoms:

  1. Are mood representations derived from social media text associated with the severity of self-reported depressive symptoms?

  2. Which representation in the mood profile is most predictive of the severity of self-reported depressive symptoms?

Our main contributions in this study are:

  1. Constructing a mood profile for social media users based on their status updates. The mood profile encompasses representations that encoded the variance of mood intensity, alternations of mood states and the behavior of not posting.

  2. Examining the associations between the social media mood profile and users’ depressive symptoms level.

  3. Examining which representation in the mood profile is more predictive to depressive symptoms level.

In our work, we analysed a set of 93,378 posts from 781 Facebook users who had consented to the use of their posts and answers to related questionnaires for research reasons. For each user, a mood profile is constructed based on their social media text. We found that people with low symptom level tend to have less fluctuations in their mood pattern. We also modelled the mood representation with a Hidden Markov Model and we found the hidden states estimated based on the mood representation is highly related to depressive symptoms. Nevertheless, combining several representations in the mood profile is more predictive to depressive symptom levels (f-score: 0.62) than using one representation only. Our results suggest the mood profile derived from social media text can potentially serve as a reference for an individual’s depressive symptom level. The data-driven, evidential nature of our approach provides us with better insight into the relationship between mood derived from social media data and depression.

2. Background

2.1. Depression and Mood

Moods are slow-moving states of feeling, influenced by others, objects or situations (Rottenberg and Gross, 2003; Watson, 2000). The pattern of mood reflects one’s vulnerability to developing affective disorders (Rottenberg, 2005; Rottenberg and Gross, 2003). Depressed mood is a symptom of mood disorders, such as major depressive disorder (characterized by a persistent feeling of sadness) and dysthymia (persistent mild depression) (Association and others, 2013).

It is also well established that mood fluctuation and irritability are associated with many somatic and sensory dysfunctions in the psychology literature. Frequent alternating between moods (typically a few days) and irregular cycles of mood underlie the behavioural features of a wide variety of conditions (Akiskal, 1996). In this study, we expect to find associations between mood derived from social media text and depressive symptoms similar to the psychology literature. Some level of associations has been found in the existing studies. For example, participants with depressive symptoms use more negative affective words (e.g. sad, cry, hate) in their social media text than those who do not (De Choudhury et al., 2013; Park et al., 2012).

2.2. Detecting Depressive Symptoms with Sentiment

Studies which examine emotions derived from social media data often adopt sentiment analysis. This is a computational process that categorizes affect or opinions expressed in a piece of text. The extracted affect is called sentiment

(Pang et al., 2008). Most of the existing works use averaged sentiment over a long period of time (e.g. one year) as a feature to predict depressive symptoms (Coppersmith et al., 2014; Tsugawa et al., 2015; Benton et al., 2017; Park et al., 2012; Tsugawa et al., 2015; Wang et al., 2013).

In addition to that, the change of sentiment over time is also an important aspect to infer affective disorders. However, only a few studies have included sentiment as a time dependent feature in the model (De Choudhury et al., 2013). For example, De Choudhury et al. (2013)

used the momentum of the feature vector in the screening detection.

Eichstaedt et al. (2018) include temporal posting patterns, but not the temporal affect pattern. Chen et al. (2018) used temporal measures of fine grained emotions to predict users’ depressive states. Recently, Reece et al. (2017) adopted a Hidden Markov Model (HMM) to analyse the change of language in social media posts and users’ depressive symptom. They found that the shift of words in status updates indicate depression and (expand) PTSD symptoms. The above mentioned studies adopted a sliding window technique to define dynamic sentiment (De Choudhury et al., 2013; Chen et al., 2018; Reece et al., 2017). However, none of them systematically explored the size of time window and the slide increment, and most studies only use a continuous sentiment value. In this work, we aggregated the sentiment in a sliding window based on its dominant valence (e.g. positive, negative) or average value. We also included the changes of affective states as a feature variable.

2.3. Posting Behavior and depressive symptoms

Social media users are known to communicate selectively due to self-presentation biases (Kim and Lee, 2011; Vogel et al., 2014). They are less likely to reveal events that project negatively on themselves (Mehdizadeh, 2010) due to stigma and fear of potential repercussions. Therefore, self-presentation biases leads to fundamental differences between real-life mood and social media mood.

In addition to that, social media behavior can be counter intuitive. For example, people with who are more depresses would be expected to post less than people with fewer symptoms, however, several studies found that individuals with a history of depression (determined from past medical history) tended to post more often compared with people without depression (Smith et al., 2017). There are several potential reasons for this. A person might not be severely depressed, they might be more comfortable with talking about their feelings, they might see their social media as a place where they can escape stigma, or they might have a social media support network for their mental health. In this study, we see the behaviour of not posting as a variable in itself and observe if posting frequency has any predictive capacity with regards depressive symptoms.

3. Data

For this study, the myPersonality data set (Bachrach et al., 2012; Youyou et al., 2015) was used. It contains Facebook posts of 180,000 participants collected from 2010 to 2012, enriched with a variety of additional validated scales (Bachrach et al., 2012). The collection of myPersonality data complied with the terms of Facebook service, and informed consent for research use was obtained from all participants. Permission for the use of this database was obtained in 2018, and Ethical Approval for this piece of secondary data analysis was obtained from the Ethics Committee of the School of Informatics, University of Edinburgh. Other publications using this dataset include (Freudenstein et al., 2019; Sun et al., 2019).

3.1. Screening for Depressive Symptoms

From the participants in the myPersonality data, we focused on 1047 participants who completed the Center for Epidemiologic Studies Depression Scale (CES-D). The CES-D is a 20 item scale that measures the presence of depressive symptoms in the general population (Radloff, 1977a). It is one of the screening tests most widely used by health service provider. The symptoms measured in CES-D include mood, anhedonia, the feeling of being worried, restless, changes in sleeping pattern and physical symptoms (such as lost of appetite) and irrational thoughts. The scale has been found to have high internal consistency, test-retest reliability (Radloff, 1977a; Orme et al., 1986; Roberts, 1980), and validity (Orme et al., 1986).

Radloff (1977a) proposed three groups of depression severity: low (0-15), mild to moderate (16-22), and high (23-60). For using mood profile to predict self-reported depressive symptoms, we followed the practice from previous social media studies (De Choudhury et al., 2013; Park et al., 2012; Reece et al., 2017; Tsugawa et al., 2015) and adopted 22 as a cutoff point to divide participants into high symptoms and low symptom groups. This allows us to compare our model’s performances with previous studies. For examining the mood fluctuation, we were additionally interested in a more nuanced picture in different symptom levels. Therefore, we further distinguish moderate and high symptom by following the original study from Radloff (1977a). Participants were divided into three groups using two cutoff points: 16 and 22.

3.2. Summary Statistics

Among the 1047 participants who completed the CES-D scale, we removed 110 participants who were less than 18 years old. The CES-D survey was open from 2010 to December 2012, but MyPersonality only collected participants’ status updates from January 2009 to December 2011. Since 2012 status updates were not available, we further removed participants who completed the scale in 2012 and who posted at least one post in the past year. Eventually we yielded a final set of 781 participants who had posted 93,378 posts over the past year before they took the test.

The average number of posts per user over one year was 120, this distribution was skewed by a small number of frequent posters, as evidenced by a median value of 73 posts per user. Figure 

1 shows participants’ count of posts up to one year before they completed the CES-D scale. The mean age of the participant is 26 ( = 11.7), 333 (43%) participants are male and 448 (57%) are female. Table 1 shows further details of the participants, including the ethnicity, gender and marital status.

Note: Figure demonstrates the distribution of post count over one year before participants completed the CES-D survey scale. Size of the bin is 10.

Figure 1. Distribution of post count from participants
Ethnicity No. % Marital Status No. %
Black 38 4.3 Single 574 73.8
Asian Chinese 26 3.3 Divorced 28 3.5
Middle Eastern 13 1.7 Married 27 3.4
Native American 13 1.6 Married with Children 38 4
Other Asians 84 10.8 Partner 78 10
Not Specified 96 12.2 Not specified 36 4.5
White-American 309 39.2
White-British 71 8.9
White-Other 131 17.1
Table 1. Demographic Information of the 781 Participants

Overall, our sample has a relatively high mean CES-D score ( = 26.3, = 8.9), and the proportion of high symptom class to low symptom class is 1.6:1 (cutoff 22), see Figure 2. Radloff (1977b) found only 21% of the general population scored at and above an arbitrary cutoff score of 16. However, we note the current dataset is not an exceptional case. For example, Reece et al. (2017) used a dataset that contained 105 depressed participants and 99 non-depressed participants, other studies have a proportion of high symptom to low symptom class as 2:3 (De Choudhury et al., 2013; Tsugawa et al., 2015; Nadeem, 2016), 3:5 (Orabi et al., 2018). All of these studies recruited a sample biased towards potentially high symptom individuals compared with empirical studies which selected participants in a random trail. We speculate that there is a bias in those individuals self-selecting for this type of research.

Note: Figure demonstrates the density distribution of the CES-D score, red line indicate the cutoff point 22

Figure 2. Distribution of CES-D score

4. Constructing Mood Profile

A mood profile is constructed for each participant. Each mood profile encompasses sets of features which represent mood, the change of mood and the transition of mood states. Since mood is time dependent, we use a sliding window technique to construct the temporal features. A window starts from day 0 (the day when users completed the CES-D scale) and moves backwards for up to one year. Choosing the size of a time window presents a challenging question, how granular should a time window be? De Choudhury et al. (2013) look at a user’s tweets in a single day. Reece et al. (2017) use both day and week as the time window because most of the participants did not generate enough daily content. In this paper, we define the size of the time window as measured by day , see Table 2 for the notations. The size of the slide increment determines how much information the two adjacent windows share. The slide increment is also measured by day .

Another challenge is to decide how far back do social media posts indicate symptom level. Earlier studies use data up to one year before participants completed the self-reported symptom measurement (De Choudhury et al., 2013), Reece et al. (2017) found that symptoms can be predicted up to nine months before the official disclosure of the illness. In the current work, each representation in the mood profile was constructed with posts written up to one year before the participant completed the CES-D survey.

Sentiment Scores

We used the sentiment scores retrieved from SentiStrength (Thelwall et al., 2010). SentiStrength extracts sentiment from the text based on a function that describes how well the words and phrases of the text match a predefined set of sentiment-related words.

Temporal Mood Representations

Since many social media users do not post every day, we encoded the behavior of not posting as ”Silence” and we defined four mood states: positive, negative, neutral and silence. We adopted two approaches to define mood within a time window: most frequent mood state over a time window and average sentiment over a time window, see Table 2. If two mood states had the same high frequency in the same time window, we defined the mood as mixed. Since neutral mood state is relatively less frequent in compare with the rest of the mood states, we tend to give neutral more weights. If other mood states have the same frequencies as neutral, we defined the mood as neutral. For the average sentiment, silence days as missing values are imputed by the mean. We also constructed features that represent the change of mood (De Choudhury et al., 2013), see mood momentum in Table 2.

Variable Notation Description
Window Size A period of time within days,
Slide Increment A sliding window move forward by every days,
Sentiment Sentiment score of a single post
Day Sentiment Arithmetic mean of sentiment in one day
Arithmetic mean of day sentiment over a time window,
Most frequent sentiment over a time window, categorical
Mood Momentum Difference between in two time windows
Mood States Transition

The probability of a user transfer from one mood-state to another, a mood state is defined by

Mood States Transition Difference between in two time windows
Table 2. Notations for Mood Profile
Temporal Mood Transition Representations

We also encoded the probability of a user transferring from one mood state to another as a representation in the mood profile. We have in total 16 transition states (e.g. positive to negative, negative to silence) from the fours classes (positive, negative, neutral and silence). Note that if we set the slide increment as one day, we would have 365 16 mood transitions features. To prevent the large dimensionality, which might led to sparse representation, we defined as 30 and as 30, so that we have 12 16 feature columns for Mood Transition Representations.

5. Association Between Mood Profile and Depressive Symptoms

We first observed whether the pattern of the mood profile is related to symptom level. Then we tested the mood profile’s predictive power on symptom level.

5.1. Mood Fluctuations

We modelled mood fluctuations using Gaussian Process (GP) regression. GP regression is a Bayesian approach that assumes a Gaussian process prior over functions (Quiñonero-Candela and Rasmussen, 2005). In this analysis, we see the temporal mood representations as noisy representations of participants’ mood. We use GP regression to estimate participants’ latent mood based on their mood representation. For participants with few data points, the GP regression is modeling the mean of the sample due to the imputation approach we adopted. Thus, for this experiment, we excluded participants posted less than 10 posts over year before they completed the depressive symptom scale. Eventually, this yielded 690 participants for the current analysis. We used mood representations with and as input of the GP regression model. The GP regression is best fitted on mood vector with = 14 and = 3, see Figure 3. Each dot on the graph represents mood (averaged sentiment) in a time window = 14, x axis shows the count of time windows. Since the entirety of the dataset includes posts of one year (365 days), there are 122 time windows for each participant.

We constructed one model for each participant. Here we are not interested in making prediction with the GP regression model, instead, our focus is on the function parameter, lengthscale. The lengthscale describes how smooth a function it is. A small lengthscale means the function value changes quickly, while a large lengthscale means that its value changes slowly (Chalupka et al., 2013). By fitting a GP regression model on each user, we obtain a lengthscale of each user’s latent mood, and we compare the lengthscale among participants with different symptom levels (low, moderate, high).

Note: here shows examples from two participants, each data point represents mood of every 14 days estimated by the GP regression model. = 690.

Figure 3. Example of GP Regression

We used a nonparametric test (Mann-Whitney U test) to compare the lengthscale differences between groups. The lengthscale of the high symptom group ( = 2.77) is identical to the moderate symptom ( = 2.77) group ( = 35424, = 0.01). However, the low symptom group ( = 2.98) has a significantly larger lengthscale than the high symptom group ( = 17231, = 0.01). The moderate symptom group was also significantly different from the low symptom group ( = 7244, = 0.02). Our result suggest that people with high or moderate depressive symptom level have more mood fluctuations than people with low symptom level.

5.2. Classifying Symptom Levels using Daily Mood Representation

Another approach to examine whether the mood profile is associated with depressive symptom is to see if a particular mood state is influenced by depressive symptoms level. We assume the mood states are serially dependent and we used Hidden Markov Model (HMM) (Beal et al., 2002) to model two unobservable states based on a daily mood state representation. This representation comprises four mood states (positive, negative, neutral and silence). Since the behavior of not posting (silence) is included in the modeling process, we did not remove any less active users in this analysis (N = 781).

Hidden Markov Model

We used a multinomial (discrete) emission Hidden Markov Model (HMM) to model users’ observed mood for one year (Johansson and Olofsson, 2007). The major parameters used for the model are:

  1. Observed mood (time series), daily mood transition representation ( = 1, = 1).

  2. Transition matrix (), gives the probability of a transition from one state to another.

  3. Transition state .

  4. Observation emission matrix (), which gives the probability of observing when in state .

An HMM model (denoted by ) can be written as:


The idea behind this approach is to use the observed mood to estimate the parameter set , shows us the probability of transferring from one hidden state to another, and tells us the probability of emitting a certain mood when a user is in a specific symptom state.

We used hmmlearn python library (Gao et al., 2017)

to fit emission, transition matrices (using expectation-maximization) and hidden state sequence (using the Viterbi path algorithm), see Section

A for the initialized probabilities. We trained the model on the entire set of data and observed if the emission probabilities align with our existing knowledge of affect and depressive symptoms. Here we were not to find the optimal model to forecast a new observation sequence, hence we did not test the training model on a test set. Instead, we were interested to know whether the hidden states decoded from the HMM model were associated with depressive symptom levels.

The HMM model decodes a binary hidden state for each day. We speculate that one of the hidden states represents the user experiencing more depressive symptoms (high symptom state), and another represents fewer symptoms (low symptom state). Although the CES-D scale measures an overall symptom level in one week, it is entirely possible for an individual to have more symptom on some days (e.g.sleep disturbance, loss of appetite) but less on others. To test our speculation on the hidden states, we classify participants’ self-reported symptom level according to the count of high symptom state. Here we use cutoff score 22 to divide participants into two groups for comparing the results with the existing models. However, there is a challenging questions, up till when shall we count the high symptom states? Since the CES-D scale measures an overall symptom level in the past one week (e.g less than 1, 1-2, 3-4, 5 or more), and the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) defines depressive symptom as “The individual must be experiencing five or more symptoms during the same 2-week period”. Therefore, we defined our classification criteria as whether participants have at least

days experiencing high symptom in the last days before they completed the CES-D scale, , .

5.2.1. Evaluation of Hidden States

Emission Probabilities

We observed whether the hidden states’ emission probabilities align with our existing knowledge in depressive symptom and affect. Table 3 shows two hidden states and their emission probabilities to each observation. Given an observed day, we can see both hidden states were most likely to emit silence day because social media users posted sparsely. However, the high symptom hidden state has lower probability to emit silence days compared with low symptom hidden state. The high symptoms state also has a higher probability to emit negative mood or neutral mood, but the low symptoms state has a higher probability to emit positive mood. Therefore, results from the HMM model aligns with our existing knowledge in depressive symptom and affect.

Positive Negative Neutral Silence
Low Symptom 8.51 5.20 4.65 81.6
High Symptom 3.15 12.8 7.00 76.9

Note: less symptoms: hidden state that represents less symptoms on a particular day, more symptoms: hidden state that represents more symptoms on a particular day, : training sample size

Table 3. Emission Probabilities
Transition Probabilities of Observations

We are also interested to know whether people are more likely to transfer from certain mood states to another. We constructed a transition probability matrix for the observations (daily mood representation). Table 4 again shows us that social media users in general are more likely to become silent after they posted any social media content, although high symptom group is less so. High symptom individuals have higher probabilities of changing in between any mood states other than silence. This result aligns with the findings from the GP regression that low symptom individuals shows less fluctuations in their mood representation.

In general, people were more likely to have a positive mood if they had a positive mood in the previous time window. The probabilities of , were similar among the two groups, but high symptom participants are slightly more likely to transfer from negative to negative. When low symptom participants have a neutral mood, they have similar chances of having a neutral or negative mood in the next time window, whereas, high symptom participants are also more likely to have a negative mood in the next time window. Our result shows that while people, in general, are more vocal when they have a negative mood, but high symptom participants are more likely to vocal about the negative content for a more extended period.

High symptom Low symptom
+ - 0 S + - 0 S
+ 21.1 15.7 13.4 49.6 19.5 13.3 12.3 54.8
- 22.3 16.2 14.1 47.3 20.5 13.3 12.9 53.3
0 19.3 14.5 12.8 53.3 17.6 11.6 11.8 58.9
S 5.82 37.5 4.21 86.2 5.92 37.1 4.33 85.9

Note: : positive, : negative, neutral, : silent

Table 4. Transition Probabilities of Observations
Using Hidden States to Classify Symptom Level

Figure 4

shows the precision and recall of the high symptom class by counting the hidden states from the HMM model. The baseline model is formulated using a stratified dummy classifier that predicts based on the most frequent training labels. Precision increases as the criterion of

increase. Table 5 shows some of the best classification results. Assigning participants with six high symptom states within 14 days to the high symptom class results in very low recall (10.8%) but high precision (71.2%). Assigning participants with one high symptom state within 14 days results in a more balanced recall (60.3%) and precision (58.1%) to high symptom class. Result from this classifier does not surpass the baseline in f1 score but when using a higher as criteria, the precision rate is much higher than the baseline. Our result supports the claim that daily mood representations inferred from social media text is highly associated with depressive symptoms. When a social media user shows specific mood patterns, it is highly likely that the person developed high level of depressive symptoms. However, only using this approach to identify high symptom individuals would result in a lot of false negative cases.

Note: window: size of the time window, days before participants completed the CESD scale. ndays: count of high symptom state within the time window

Figure 4. Precision and Recall of High Symptom Class (HMM) with Various Assignment Criterion
Criteria P R f1
baseline 61.2 100.0 76.0
= 1, = 7 61.5 48.5 45.2
= 1, = 14 60.3 58.1 59.2
= 6, = 14 71.2 10.8 18.9

Note: high: high symptom class, low: low symptom class, R: recall of high symptom class, P: precision of high symptom class, f1: average macro-f1 score of both classes, criteria: criteria for classifying high symptom class

Table 5. Predicting depressive symptom with hidden states

6. Representation Predictability of Depressive symptoms

The previous analysis suggests that the mood profile is highly associated with depressive symptoms. Now we examine which representation in the mood profile is most predictive of depressive symptoms. We combine the representations with sets of proxy signals in a classification task.

6.1. Feature Extractionn

We extracted multiple features for the posts of each user to train multiple models for high-symptoms prediction. Our extracted featured included: 1) n-gram word representation, where

); 2) topic modelling from Latent Dirichlet allocation (LDA) and 3) all the entries from Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2001). N-gram were ordered by term frequency across the corpus, we grid searched the number of most frequent n-gram and number of topics for LDA (see Section A). We found the most frequent 1500 n-grams and 30 LDA topics gave us the optimal results. These feature variables were commonly used in detecting signs of potential depression (De Choudhury et al., 2013; Park et al., 2012; Coppersmith et al., 2014; Reece et al., 2017). We compare the precision and recall between models with different representations from the mood profile.

Our dataset has an exceptionally high proportion of high symptom individuals as discussed earlier. Given that we have only 303 low symptom participants among 781 participants, we randomly selected 303 participants in the high symptom sample to have a dataset with a balanced class proportion that is closer to the existing literature (1:1),

= 606, So that we can have results that are more comparable with the existing literature. We split the data into train (80%, N = 486), and test set (20%, N = 120) in stratified fashion. Stratified five-fold cross-validation was used to optimize the parameters in the model training. A grid search of parameters was carried out for several candidate classification algorithms (e.g. decision trees, support vector machine, logistics regression)

(Suykens and Vandewalle, 1999), see Section A for the grid search parameters.

6.2. Model Evaluation

A baseline model is formulated using a stratified dummy classifier that generates predictions according to the training set’s class distribution. Out of several candidate algorithms, logistic regression demonstrated best performance. Hundreds of classification models were trained and evaluated for this task. The models with different representations from the mood profile can be evaluated by precision and recall. We grid searched and that maximises the metrics. Figure 5 shows the precision and recall of the high symptom class from models with various configurations and feature sets. Models with configuration 4 (time window 30 days and increment slide 3 days) yield the best scores. Table 6 shows the precision, recall and f1 score of the high symptom class from configuration 4. The model with mood, mood momentum and mood transition representations yields the highest scores, and the model with averaged mood over a time window gives second highest scores, 0.59 precision, 0.65 recall, and an F-score of 0.62.

Note: config 1: = 7, = 3, config 2: = 14, = 3, config 3: = 14, = 7, config 4: = 30, = 3, config 5: = 30, = 7, B: basic features (n-gram, topic modeling, LIWC), M: B + , MC: B + mood momentum, MT: B + mood transition, MTM: B + mood transition momentum, All: all features excluded MTM

Figure 5. Precision and Recall of logistic regression
Features P R F1
RB 47.6 50.0 48.8
B 51.4 58.3 54.7
B + 55.5 58.3 56.9
B + 53.0 58.3 55.6
B + 51.6 53.3 52.4
B + 52.3 55 53.6
B + + + B + 59.0 65 61.9

Note: R, P, F1 are recall, precision, and f1 score of high symptom classes respectively. B: basic features (tfidf bag-of-words, topic modeling, sentiment, LIWC). RB: random baseline, model parameters: penalty: l2, Inverse of regularization strength: 0.1

Table 6. Prediction result of depressive symptom (=14, =3

7. Discussion

7.1. The Role of Mood in Predicting Depressive symptoms

Mood is a time dependent variable, using time series approaches to model mood inferred from social media text provides us with better insight about mood and depressive symptoms. Participants in this study demonstrated significantly fewer mood fluctuations if they reported a low symptom score. This finding aligns with the well-established connection between emotionality and depression in the psychology literature. We also found the hidden states from the HMM model are highly relevant to self-reported depressive symptoms, see Table 5. Our model suggests that an individual having one high symptom state in 14 days is highly likely to have high symptom level. It is worth to note that the criteria we used in here is different from the criteria in the CES-D scale, where individuals need to have experienced symptoms 1-2 days in the past 7 days to score on a criterion. However, we cannot assume that people will talk about their symptoms every time they experience them. This result suggests that individuals who show specific mood pattern in social media text are highly likely having high depressive symptoms, however, most of the individuals with high symptom do not display this mood pattern.

Existing studies that use a sliding window technique to create dynamic sentiment features have not yet explored which representations and configurations tend to yield a better result in classifying symptoms. We explored various configurations of the sliding window and found that mood in a 30 days time window and move the time window every 3 days is most predictive to depressive symptom level. This result suggests that a less granular mood representation is more beneficial in identifying symptoms. Moreover, combining several representations in the mood profile together can dramatically enhance the model performance. Our best model (f-score: 0.62) encompasses the mood profile and a set of basic features commonly used in existing works. Other studies using multiple sets of proxy signals to predict depressive symptoms achieved a precision score ranging from 0.48 (Coppersmith et al., 2014) to 0.87 (Reece et al., 2017; Guntuku et al., 2017). Schwartz et al. (2014), using the same data set, achieved correlation of 0.386 with continuous scores. The mood profile can potentially enhance the current screening technology by combining it with more advanced engineered features.

The transition probabilities of mood showed that participants, in general, were more vocal on social media when they were in a negative mood. We speculate that some depressed individuals react to negative mood by posting, and some by silence. Those who are more vocal on social media when in a negative mood might be using social media to reach out to others or use posts as a way to reflect. The associations between negative mood and being vocal, and the association between high symptom scores and a specific mood pattern, suggest that posters could be stratified into several groups, those that withdraw, those that reach out, and those that do not disclose potential signs of depression on social media.

Our results show that a temporal mood profile derived from social media text is highly associated with users’ subsequent self reported depressive symptom level. In order to examine the potential of mood momentum and mood transition further, advanced time series analysis techniques need to be applied. Most importantly, mood profiles can potentially provide more information to clinicians than a classification system with binary output.

7.2. Technological and Ethical Implications

Similar to the existing studies, the present finding of the derived mood pattern has implications on symptom level but does not provide an accurate interpretation for participants’ mental health condition. An accurate interpretation of one’s mental health condition requires a holistic view, and any diagnosis requires a strong understanding of an individual’s case history. The daily life information contained in social media data is just a tip of the iceberg of one’s life experience.

Our approaches provide a useful source of information for assessing participants’ derived mood pattern over time. However, as with all social media related research, ethical and privacy issues need to be considered, given the potential for misusing social media data (Fiske and Hauser, 2014; Lumb, 2016; Cadwalladr and Graham-Harrison, 2018). Using social media analysis techniques in practice requires that the user whose data is being analysed is comfortable with their social media timeline being used in this way, and that they consent to it. The scope of their consent also needs to be clear, i.e., whether it is for research or whether it is also for potential clinical use.

7.3. Limitations

Our sample contains participants who allowed researchers access to their Facebook posts and to complete a symptom screening scale. Therefore, this sample may be strongly biased towards those who were comfortable to disclose and reach out on social media. It is still unclear about what the biases are in a sample with these tendencies compared with a random patient sample. Of particular interest is the relatively high depressive symptom score from most of the participants in this sample, and this bias is prevalent in studies in this line of research (Guntuku et al., 2017). We speculate that people who have depression are more curious about taking part in mental health related studies.

In this work, the symptom screening test was conducted once only. There were also no tests controlling for the presence of other disorders, such as bipolar, which greatly affect behaviour and mood variability. Those at the high end of the scale could have other types of affective disorders but showing depressive symptoms at the time when they carried out the self-reported measurement. Therefore, the measurement of self-reported symptoms is not an accurate reflection of whether the person has depression.

In addition, the sentiment scores employed in this study were retrieved with SentiStrength, which is a word counting approach to identify positive and negative affect. Although numerous studies have validated the word counting approach, the ideal method to retrieve less noisy sentiment is to construct the sentiment classification model with the examined dataset. Future studies can train their model for sentiment annotation to retrieve more accurate sentiment.

8. Conclusion

Mood is an important signal for the development of a depression episode. This report provides an outline of utilizing the sliding window technique to construct temporal representations of mood based on sentiment expressed in social media text. The behavior of not posting was also encoded in some of the representations. However, mood inferred from social media text is different from mood in real life. In order to examine whether the mood profile inferred from social media text is associated with depressive symptoms, we use the mood profile to classify depressive symptom level with time-series modelling and logistic regression algorithm. Our result suggests that the mood profile inferred from social media data is highly predictive of depressive symptoms, especially when the behavior of not posting is included. We also discover a pattern whereby people are more vocal in social media when they are unhappy. Despite many social media users being subject to positive self-presentation biases, social media provides a place for people to channel their emotions. Future studies can focus on studying this behavior on an actual patient group and a random control sample. The techniques proposed here offer a novel contribution to technology for detecting potential signs of depression as they are not focused on providing a binary classification result, but a longitudinal reference for the development of depressive symptoms.

We thank Michael Kosinski and David Stilwell for permission to use myPersonality, the three anonymous reviewers for their useful comments, and Chris Lucas for invaluable guidance on Gaussian Processes. Magdy and Wolters acknowledge partial funding by The Alan Turing Institute (EPSRC, EP/N510129/1).

Appendix A Experiment Details

The following supplementary material details what is required to reproduce our results as closely as possible.

a.1. Model Training

Grid searches of the following pairings of parameter spaces and Scikit-Learn implementations of algorithms were carried out:

  • Feature Extraction

    • number of n-gram: 1000, 1500, 3000, 4000, 5000, 6000

    • number of topics: 10, 20, 30

  • HMM:

    • Initial transition probability: [0.5, 0.5], [0.5, 0.5]

    • Initial transition probability: [0.2, 0.3, 0,2, 0.3], [0.2, 0.2, 0.3, 0.3]

    • Number of iteration: 10

  • Support Vector Machine

    • Inverse of regularization strength: 0.5, 0.7, 1.0, 1.5, 2.0, 2.5

    • Kernel: linear, poly, rbf, sigmoid

    • Kernel coefficient: 0.01, 0.001, 0.0005

  • Extra Trees

    • Number of Estimators: 100, 300, 500, 1000

    • Maximum Tree Depth: 20, 50, 100, 200

    • Maximum number of features: sqrt, log2

  • Logistic Regression:

    • Penalty: l1, l2

    • Inverse of regularization strength: 0.1, 0.3, 0.5, 0.7, 0.9, 1.0, 1.5, 2.0


  • H. S. Akiskal (1996) The temperamental foundations of affective disorders. Interpersonal factors in the origin and course of affective disorders, pp. 3–30. Cited by: §1, §2.1.
  • H. Almeida, A. Briand, and M. Meurs (2017) Detecting early risk of depression from social media user-generated content.. In CLEF (Working Notes), Cited by: §1.
  • A. P. Association et al. (2000) Diagnostic and statistical manual-text revision (dsm-iv-tr). Washington, DC: American Psychiatric Association, pp. 28–819. Cited by: §1, §1.
  • A. P. Association et al. (2013) Diagnostic and statistical manual of mental disorders (dsm-5®). American Psychiatric Pub. Cited by: §1, §2.1.
  • Y. Bachrach, M. Kosinski, T. Graepel, P. Kohli, and D. Stillwell (2012) Personality and patterns of facebook usage. In Proceedings of the 4th annual ACM web science conference, pp. 24–32. Cited by: §3.
  • C. D. Batson, L. L. Shaw, and K. C. Oleson (1992) Differentiating affect, mood, and emotion: toward functionally based conceptual distinctions.. Cited by: §1.
  • M. J. Beal, Z. Ghahramani, and C. E. Rasmussen (2002) The infinite hidden markov model. In Advances in neural information processing systems, pp. 577–584. Cited by: §5.2.
  • A. Benton, M. Mitchell, and D. Hovy (2017) Multi-task learning for mental health using social media text. arXiv preprint arXiv:1712.03538. Cited by: §2.2.
  • J. Bollen, H. Mao, and A. Pepe (2011) Modeling public mood and emotion: twitter sentiment and socio-economic phenomena. In Fifth International AAAI Conference on Weblogs and Social Media, Cited by: §1.
  • L. M. Bylsma, A. Taylor-Clift, and J. Rottenberg (2011) Emotional reactivity to daily events in major and minor depression.. Journal of abnormal psychology 120 (1), pp. 155. Cited by: §1.
  • C. Cadwalladr and E. Graham-Harrison (2018) Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach. The guardian 17, pp. 22. Cited by: §7.2.
  • F. Celli, A. Ghosh, F. Alam, and G. Riccardi (2016) In the mood for sharing contents: emotions, personality and interaction styles in the diffusion of news. Information Processing & Management 52 (1), pp. 93–98. Cited by: §1.
  • K. Chalupka, C. K. Williams, and I. Murray (2013) A framework for evaluating approximation methods for gaussian process regression. Journal of Machine Learning Research 14 (Feb), pp. 333–350. Cited by: §5.1.
  • L. Chen, C. H. K. Cheng, and T. Gong (2020) Inspecting vulnerability to depression from social media affect. Frontiers in Psychiatry 11, pp. 54. Cited by: §1.
  • X. Chen, M. D. Sykora, T. W. Jackson, and S. Elayan (2018) What about mood swings: identifying depression on twitter with temporal measures of emotions. In Companion Proceedings of the The Web Conference 2018, pp. 1653–1660. Cited by: §2.2.
  • G. Coppersmith, M. Dredze, and C. Harman (2014) Byt. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality, pp. 51–60. Cited by: §1, §2.2, §6.1, §7.1.
  • R. J. Davidson (1998) Affective style and affective disorders: perspectives from affective neuroscience. Cognition & Emotion 12 (3), pp. 307–330. Cited by: §1.
  • M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz (2013) Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media, Cited by: §1, §1, §1, §2.1, §2.2, §3.1, §3.2, §4, §4, §4, §6.1.
  • J. C. Eichstaedt, R. J. Smith, R. M. Merchant, L. H. Ungar, P. Crutchley, D. Preoţiuc-Pietro, D. A. Asch, and H. A. Schwartz (2018) Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences 115 (44), pp. 11203–11208. Cited by: §2.2.
  • S. K. Ernala, M. L. Birnbaum, K. A. Candan, A. F. Rizvi, W. A. Sterling, J. M. Kane, and M. De Choudhury (2019) Methodological gaps in predicting mental health states from social media: triangulating diagnostic signals. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 134. Cited by: §1.
  • S. T. Fiske and R. M. Hauser (2014) Protecting human research participants in the age of big data. National Acad Sciences. Cited by: §7.2.
  • J. Freudenstein, C. Strauch, P. Mussel, and M. Ziegler (2019) Four personality types may be neither robust nor exhaustive. Nature human behaviour 3 (10), pp. 1045–1046. Cited by: §3.
  • N. H. Frijda (1993) Moods, emotion episodes, and emotions.. Handbook of emotions 12 (2), pp. 155. Cited by: §1.
  • Z. Gao, M. Small, and J. Kurths (2017) Complex network analysis of time series. EPL (Europhysics Letters) 116 (5), pp. 50001. Cited by: §5.2.
  • S. C. Guntuku, D. B. Yaden, M. L. Kern, L. H. Ungar, and J. C. Eichstaedt (2017) Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18, pp. 43–49. Cited by: §7.1, §7.3.
  • M. Johansson and T. Olofsson (2007) Bayesian model selection for markov, hidden markov, and multinomial models. IEEE signal processing letters 14 (2), pp. 129–132. Cited by: §5.2.
  • J. Kim and J. R. Lee (2011) The facebook paths to happiness: effects of the number of facebook friends and self-presentation on subjective well-being. CyberPsychology, behavior, and social networking 14 (6), pp. 359–364. Cited by: §2.3.
  • D. Lumb (2016) Scientists release personal data for 70,000 okcupid profiles. Available at engt. co/2b4NnQ0. Accessed August 7, pp. 2016. Cited by: §7.2.
  • S. Mehdizadeh (2010) Self-presentation 2.0: narcissism and self-esteem on facebook. Cyberpsychology, behavior, and social networking 13 (4), pp. 357–364. Cited by: §2.3.
  • W. N. Morris (2012) Mood: the frame of mind. Springer Science & Business Media. Cited by: §1.
  • M. Nadeem (2016) Identifying depression on twitter. arXiv preprint arXiv:1607.07384. Cited by: §1, §3.2.
  • T. Nguyen, D. Phung, B. Dao, S. Venkatesh, and M. Berk (2014) Affective and content analysis of online depression communities. IEEE Transactions on Affective Computing 5 (3), pp. 217–226. Cited by: §1.
  • A. H. Orabi, P. Buddhitha, M. H. Orabi, and D. Inkpen (2018) Deep learning for depression detection of twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 88–97. Cited by: §3.2.
  • J. G. Orme, J. Reis, and E. J. Herz (1986) Factorial and discriminant validity of the center for epidemiological studies depression (ces‐d) scale. Journal of clinical psychology 42 (1), pp. 28–33. Cited by: §3.1.
  • B. Pang, L. Lee, et al. (2008) Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2 (1–2), pp. 1–135. Cited by: §2.2.
  • M. Park, C. Cha, and M. Cha (2012) Depressive moods of users portrayed in twitter. In Proceedings of the ACM SIGKDD Workshop on healthcare informatics (HI-KDD), Vol. 2012, pp. 1–8. Cited by: §1, §2.1, §2.2, §3.1, §6.1.
  • J. W. Pennebaker, M. E. Francis, and R. J. Booth (2001) Linguistic inquiry and word count: liwc 2001. Mahway: Lawrence Erlbaum Associates 71 (2001), pp. 2001. Cited by: §6.1.
  • J. Quiñonero-Candela and C. E. Rasmussen (2005) A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research 6 (Dec), pp. 1939–1959. Cited by: §5.1.
  • L. S. Radloff (1977a) The ces-d scale: a self-report depression scale for research in the general population.. Applied psychological measurement 1 (3), pp. 385–401. Cited by: §3.1, §3.1.
  • L. S. Radloff (1977b) The ces-d scale: a self-report depression scale for research in the general population. Applied psychological measurement 1 (3), pp. 385–401. Cited by: §3.2.
  • A. G. Reece, A. J. Reagan, K. L. Lix, P. S. Dodds, C. M. Danforth, and E. J. Langer (2017) Forecasting the onset and course of mental illness with twitter data. Scientific reports 7 (1), pp. 13006. Cited by: §1, §1, §2.2, §3.1, §3.2, §4, §4, §6.1, §7.1.
  • R. E. Roberts (1980) Reliability of the ces-d scale in different ethnic contexts. Psychiatry research 2 (2), pp. 125–134. Cited by: §3.1.
  • J. Rottenberg and J. J. Gross (2003) When emotion goes wrong: realizing the promise of affective science. Clinical Psychology: Science and Practice 10 (2), pp. 227–232. Cited by: §1, §2.1.
  • J. Rottenberg (2005) Mood and emotion in major depression. Current Directions in Psychological Science 14 (3), pp. 167–170. Cited by: §1, §1, §2.1.
  • H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al. (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8 (9), pp. 234. Cited by: §1.
  • H. A. Schwartz, J. Eichstaedt, M. Kern, G. Park, M. Sap, D. Stillwell, M. Kosinski, and L. Ungar (2014) Towards assessing changes in degree of depression through facebook. In Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp. 118–125. Cited by: §7.1.
  • G. Sheppes, G. Suri, and J. J. Gross (2015) Emotion regulation and psychopathology. Annual review of clinical psychology 11, pp. 379–405. Cited by: §1.
  • R. J. Smith, P. Crutchley, H. A. Schwartz, L. Ungar, F. Shofer, K. A. Padrez, and R. M. Merchant (2017) Variations in facebook posting patterns across validated patient health conditions: a prospective cohort study. Journal of medical Internet research 19 (1), pp. e7. Cited by: §2.3.
  • X. Sun, B. Liu, Q. Meng, J. Cao, J. Luo, and H. Yin (2019) Group-level personality detection based on text generated networks. World Wide Web, pp. 1–20. Cited by: §3.
  • J. A. Suykens and J. Vandewalle (1999) Least squares support vector machine classifiers. Neural processing letters 9 (3), pp. 293–300. Cited by: §6.1.
  • M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas (2010) Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology 61 (12), pp. 2544–2558. Cited by: §4.
  • M. Thelwall, K. Buckley, and G. Paltoglou (2011) Sentiment in twitter events. Journal of the American Society for Information Science and Technology 62 (2), pp. 406–418. Cited by: §1.
  • S. Tsugawa, Y. Kikuchi, F. Kishino, K. Nakajima, Y. Itoh, and H. Ohsaki (2015) Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 3187–3196. Cited by: §1, §2.2, §3.1, §3.2.
  • E. A. Vogel, J. P. Rose, L. R. Roberts, and K. Eckles (2014) Social comparison, social media, and self-esteem.. Psychology of Popular Media Culture 3 (4), pp. 206. Cited by: §2.3.
  • X. Wang, C. Zhang, Y. Ji, L. Sun, L. Wu, and Z. Bao (2013) A depression detection model based on sentiment analysis in micro-blog social network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 201–213. Cited by: §1, §2.2.
  • D. Watson (2000) Mood and temperament. Guilford Press. Cited by: §2.1.
  • W. Youyou, M. Kosinski, and D. Stillwell (2015) Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112 (4), pp. 1036–1040. Cited by: §3.