The science of happiness is an area of positive psychology that studies the factors that sustain people’s happiness over time [Seligman2011, Fredrickson2009, Lyubomirsky2008]. One of the interesting findings of the field [Diener et al.1999] is that while 50% of our happiness is genetically determined, and only 10% of it is determined by our life circumstances (e.g., finances, job, material belongings), 40% of our happiness is determined by behaviors that are under our control. Examples of such behaviors include investing in long-term personal relationships, bonding with loved ones, doing meaningful work, and caring for one’s body and mind. Consequently, positive psychologists have focused on devising methods to steer people towards those behaviors. Fostering happiness has also received attention at the national policy level – in a recent interview [Murthy2016] the U.S. Surgeon General claimed that fostering happiness is an important priority as one of the main ways to prevent disease and live a longer, healthier life.
Naturally, there has been recent interest to develop technologies that help users incorporate the findings of the science of happiness into their daily lives. Current applications that pursue this goal generally fall into one of the two categories: (1) applications that suggest relevant content to the users based on their answers to a predefined set of questions [Killingsworth2017, Happify2017, Happier2017] or (2) applications in which users can log their emotions in a journaling-style environment but that content is available mostly for their own reflection [Bliss2017, Mojo2017, DayOne2017].
Our work has been to develop a journal-like application where users express their happy moments using their own language, thereby allowing for more nuance in their description of what makes them happy. The ultimate goal of our app is that it should understand from the text which activities make the user happy and who else participated in those happy moments. The app can then provide a useful visualization of the user’s happy moments, offer meaningful follow-up questions, and over time learn to suggest other activities that may benefit the user.
As we started working on this application we quickly realized that understanding the different aspects of happy moments is a challenging NLP problem that has received very little attention to date. In order to advance the state of the art for this problem, we set out to crowd-source HappyDB, a corpus of 100,000 moments that we released publicly at http://rebrand.ly/happydb.
This paper describes the HappyDB corpus, and outlines a few NLP problems that can be studied with it. We describe the application of a few state-of-the-art analysis techniques to the corpus resulting in several observations. We also discuss some additional annotations that we provide along with HappyDB that would be useful to anyone who wants to explore the corpus further. The upshot of all these analyses, however, is that there is a need for deeper NLP techniques in the analysis of happy moments (and of emotions expressed in text in general), and thus HappyDB provides an exciting opportunity for follow-up research.
In addition to the applications motivating our work, there are other areas in which a deeper understanding of happy moments can be useful. Of particular note is the analysis (by advertisers or third parties) of the sources of happiness relating to products and services from comments on social media. Viewed in that perspective, analyzing happy moments can also be seen as a refined analysis of sentiments (e.g., [Liu2012, Pang and Lee2008]).
HappyDB is a collection of sentences in which crowd-workers answered the question: what made you happy in the past 24 hours (or alternatively, the past 3 months). Naturally, the descriptions of happy moments exhibit a high degree of linguistic variation. Note that HappyDB is not a longitudinal dataset that follows individuals over a period of time. Some examples of happy moments are:
My son gave me a big hug in the morning when I woke him up.
I finally managed to make 40 pushups.
I had dinner with my husband.
Morning started with the chirping of birds and the pleasant sun rays.
The event at work was fun. I loved spending time with my good friends and laughing.
I went to the park with the kids. The weather was perfect!
Fully understanding a happy moment is obviously a problem that goes beyond natural language processing into the fields of psychology and philosophy. Here we take an NLP perspective on the problem and set a goal of understanding which activities happened in the happy moment and who participated in those activities. Evaluating which of these activities is the true cause of happiness adds another level of complexity. For example, even for the very simple happy momentI had dinner with my husband, the extracted activities could be “having dinner”, “being with the husband”, or something that is not explicitly in the text such as “having a date night without the children”.
The following are several NLP-related problems that could be studied using HappyDB.
What are the activities described in a given happy moment? What other components besides activities are important in the happy moment? Which of these aspects are most central to the happy moment?
Can we discover common paraphrasings to describe activities that appear in happy moments?
Can we discover whether the cause of happiness in a particular happy moment is related to the expectation the person had? For example, a happy moment can be written as I got to spend time with my son versus I spent time with my son. In the first case it seems that the person was partially happy because they didn’t expect to be able to spend time with their son.
Can we reliably remove extraneous text in a happy moment? For example can we transform, “I am happy to hear that my friend is pregnant” to “My friend is pregnant”. Note that removing extraneous information can be very helpful in understanding which activity or event is the cause of happiness.
Can we create a useful ontology of activities that cause happiness and map happy moments onto that ontology. Such an ontology can be an important tool for recommending additional activities to the user.
Solutions to the questions raised above will require advances in NLP. In particular, we need techniques that go beyond analysis of the happy moments at the keyword level and perform deeper analyses such as semantic role labeling (into possibly a set of frames that leverage Framenet, Verbnet, and/or Propbank). Further analysis also needs to accommodate ungrammatical sentences such as “Early morning in the beach, having breakfast with the family.”
In this paper, we lay the groundwork for a deeper exploration of HappyDB. We begin by describing how HappyDB was collected and cleaned. We present some basic statistics about HappyDB demonstrating that it is a broad corpus. We compare the topics and the emotional content of our corpus with other corpora using standard state-of-the-art annotations which we are releasing with HappyDB. We also illustrate another interesting aspect of HappyDB: moments describing experiences from the last 24 hours are significantly different from those describing experiences from the last 3 months. Finally, we address the most basic research problem at the heart of HappyDB: classifying happy moments into categories. We show that even this problem is extremely challenging as it is closely related to the problem of mining expressed emotions in short sentences. We describe a set of crowd-sourced category annotations that will facilitate future research in the problem.
2 HappyDB: 100,000 happy moments :)
We collected 100,000 happy moments with Mechanical Turk (MTurk) over 3 months. The workers were asked to answer either: what made you happy in the last 24 hours? or, what made you happy in the last 3 months? HappyDB is split evenly between these two reflection periods. The majority of our workers are of age 20 to 40 years and from the USA. There are about the same number of male and female workers and the majority of our workers are single. More information about the demographics of the workers as well as our crowd-sourcing setup can be found in appendices A and B respectively. Along with the original 100,000 happy moments (which we refer to as the original HappyDB), we also released a cleaned version of HappyDB (which we refer to as the cleaned HappyDB), where some spelling mistakes are corrected (as described below) and some vacuous moments are removed. Each moment is also annotated with the reflection period (24 hours or 3 months) and with the demographic information of the worker providing it.
|Collection period||3/28/2017 – 6/16/2017|
|# happy moments||100,922|
|# distinct workers||10,843|
|# distinct words||38,188|
|Avg. # happy moments / worker||9.31|
|Avg. # words / happy moment||19.66|
Cleaning HappyDB: Naturally, the collected happy moments can contain a variety of errors. In our cleaning process we dealt with two types of errors: (1) empty or single word sentences and (2) sentences with spelling errors. We removed any sentences with less than two words. To find the spelling errors, we compared all the words to a dictionary built from Norvig’s text corpus [Norvig2007] as well as a complete list of English Wikipedia titles111https://dumps.wikimedia.org/enwiki/ which includes the name of many cities, locations and other known entities. We also performed a few edits on the dictionary to remove foreign language phrases as well as certain words such as ‘Alot’ and ‘Iam’ which are actual city names, but are more likely to be spelling errors. We found that only 2.7% of happy moments contain words not present in our dictionary. While this number seems small enough to justify removing such happy moments, we observe that certain words are more likely to be misspelled and could create a bias if we remove these happy moments. A specific example is that mentions of the word “son” is higher than “daughter” in the original corpus because the word “daughter” is more likely to be misspelled than the word “son”. After fixing the typos using our technique (which we describe next), both words ended up having almost the same frequency. This example indicates that there is a need for the spell-corrector.
To fix the spelling issues, we experimented with various open-source spell correctors, but we didn’t find them suitable for our task; they either didn’t provide confidence scores for the corrections, or suggested corrections that would have a higher likelihood in other corpora, but not ours. For example, in the context of happy moments, the phrase “achive” is more likely to be a typo for “achieve” than for “active”. Thus, we decided to develop a spell corrector that is tailored to the domain of HappyDB and only corrects typos that we are highly confident of. The details of our spell-corrector are presented in appendix C.
Some basic statistics: Table 1 shows some basic statistics of the original HappyDB. Figure 1 shows the word cloud for the cleaned HappyDB. The figure is mostly provided for anecdotal value and as a means to highlight the most frequent words in the corpus. As one proxy for the complexity of the sentences in HappyDB, we calculated the number of verbs in each sentence which are summarized in Figure 2. The data shows that 53% of the sentences have 3 verbs or more and 36% of the sentences have 4 verbs or more meaning that workers definitely expressed quite complex thoughts in their moments.
Diversity of contents in HappyDB: An important question concerning the utility of HappyDB is whether it covers happy moments from a variety of topics. To get a feel for the level of diversity, we identified 9 rather diverse topics we saw occurring often in the corpus. The topics are “people”, “family” (a subset of “people”), “pets”, “work”, “food”, “exercise”, “shopping”, “school”, and “entertainment”. For each topic, we curated a list of keywords and regular expressions whose usage is almost exclusive to the topic. For example, the category “people” contains words describing family members, as well as other words that refer to people, like “hairdresser” or “neighbor”, but it does not include “he”, “she” or “they”, as these words are sometimes used in reference to pets or inanimate objects. Additionally, if these pronouns refer to a person they should also have an antecedent which our dictionary should recognize. “People” also does not contain the word “I”, since we are trying to capture interactions between people.
Table 2 shows the percentage of sentences in HappyDB found for each topic as well as the size of the list associated with each topic. Note that a happy moment may be related to multiple topics. For instance, “running with my son” is related to both “family” and “exercise”. All of the keywords lists are disjoint except for “people”: this is a superset of “family”, and also contains some words from other topics, for example “co-worker” which is also in “work”. We can observe that 80% of HappyDB pertains to these 9 topics. The remaining sentences that did not fit into any of these topics contain all sorts of topics, such as rare surprises (“finding a $100 dollar bill inside my pants pockets”) or situations that turned out to be better than expected (“There was almost no traffic today”). None of these other categories covered a large enough portion of our corpus to justify adding them to our dictionaries. However, this suggests that there is a long tail of topics in the corpus.
|% of Sentences||Size of|
|Topics||in Topic||Keywords List|
Another perspective on the contents of HappyDB can be obtained by annotating the corpus with the popular semantic classes known as supersense. Supersense tags are defined in WordNet [Fellbaum1998] as lexicographic classes and are categorized into 15 verb classes (e.g., stative, cognition, communication, social, motion etc.) and 26 noun classes (e.g., person, artifact, cognition, food etc.).
We trained a supersense tagger with the SemEval-2016 dataset [Schneider and Smith2015] using CRF [Okazaki2007]. The supersense annotated HappyDB is also provided as part of HappyDB. Table 3 shows the proportion of sentences for the top seven supersense labels in HappyDB. It also displays the proportion of supersense labels for sentences in other textual corpora from the Manually Annotated Sub-Corpus dataset222MASC v3.0.0 http://www.anc.org/data/masc/downloads/data-download/. As shown, the proportions of several of the top five labels for HappyDB are significantly higher than the other corpora which implies that these labels are potential features for identifying happy moments. Examples of some supersense classes and their frequencies in HappyDB are shown in Table 4.
2.1 Emotions in happy moments
To analyze the cognitive and emotional state of happy moments, we applied the sentiment lexicon Linguistic Inquiry and Word Count (LIWC), “a transparent text analysis program that counts words in psychologically meaningful categories”,[Pennebaker et al.2015b] on a sample of 500 happy moments; only a sample was chosen because of existing restrictions on the amount of requests to the LIWC commercial API. Table 5 shows some of the LIWC categories in which the scores for the 500 happy moments vary notably from those of other corpora (expressive writing, blog posts, and novels [Pennebaker et al.2015a]). These categories are defined as follows: analytic refers to a measurement of the author’s logical thinking, as opposed to narrative and informal thinking; authentic approximates how honest and disclosing the writing is; and tone measures how positive or negative the text is. As expected, HappyDB has a higher score for tone than any of the other corpora analyzed. More interestingly, the analytic score for HappyDB is quite high and very close to that of Novels, yet the authentic score (also quite high) is closer to that for Expressive Writing. Our analysis of LIWC scores suggest that our corpus is very disclosing and honest which makes it an ideal corpus for studying emotions expressed in text.
An alternative approach to analyze the emotions expressed in text is to use the Valence-Arousal-Dominance model (VAD) of emotion [Bradley and Lang1994, Warriner et al.2013] which provides a score for each lemmatized word on a scale of pleasure-displeasure (valence), excitement-calm (arousal), and control-inhibition (dominance). To evaluate our data across these dimensions, we used the Warriner et al. database of 13,915 manually rated English lemmas, as averaged over at least 18 ratings for all three VAD features [Warriner et al.2013]. This is currently the largest available lexicon of VAD scores, and the VAD ratings covered 45.84% of the lemmatized words in HappyDB. The words which were not covered were mostly pronouns, articles, conjunctions, numbers, and proper nouns. Some examples of the highest and lowest scoring words across each dimension are listed in Table 6. We calculated a VAD score for HappyDB by taking the mean over the VAD score of words in the corpus. Interestingly, we observed that HappyDB’s VAD score is similar to the travel section of the Guardian corpus [Brett and Pinna2013] (V6.2, A4.0, D5.7) and rather different from other sections such as crime or banking. This shows that the VAD scores (which we release as part of HappyDB) can help us quantify how emotional the content of the corpus is.
|Category||Low Scoring Words||High Scoring Words|
|Valence||murder (1.48)||excited (8.11)|
|leukemia (1.47)||happiness (8.48)|
|Arousal||librarian (1.75)||rampage (7.57)|
|calm (1.67)||lover (7.45)|
|Dominance||Alzheimer’s (2.00)||completion (7.73)|
|earthquake (2.14)||smile (7.72)|
The conclusion from the analyses provided so far in this section is that HappyDB is a diverse corpus with content that is emotionally rich and covers various topics (e.g., “work”, “leisure”, “exercise” and etc.). Furthermore, while we used several techniques to extract general statistics about the content, diversity, and emotional content of HappyDB, there is clearly a need for deeper analysis of happy moments.
2.2 Comparing Reflection Periods
The analyses presented thus far, though rather rudimentary, already enable us to discover an important property of HappyDB, namely that there are important differences between the happy moments that reflect on the last 24 hours versus those that reflect 3 months back. In addition to being an important property of the corpus, these differences raise additional interesting research questions. We demonstrate these differences in two ways.
Pointwise Mutual Information Scores: For each reflection period we calculated pointwise mutual information (PMI) scores [Manning et al.2008] for words in the cleaned happy moments, and compared the top nouns in each batch. Table 7 shows the top 10 nouns with the highest PMI scores in the 24 hours batch w.r.t. the other batch and vice-versa. The results suggest that moments reported in the 24 hour period tend to be activities that occur daily (e.g., foods, bedtime) and moments reported in the 3 months period tend to reflect infrequent occurrences like holidays or life events.
Topic Mentions by Reflection Period: We analyzed the incidence of different topics separately for each reflection period. In Table 8 we observe different distributions of topics for each reflection period, mainly in the categories “food”, “school”, “people”, “family”, and “entertainment”. For instance, we observe that the categories “food” and “entertainment” have higher percentage of coverage in 24 hour (19.2%, and 9.6%) compared to the 3 months reflection period (13.1%, and 7.8%). Naturally, people are more likely to talk about a meal or a movie because these are more frequent daily events that are more likely to be remembered if they occurred recently. When people are asked to reflect on the past 3 months, they tend to remember events that are more prominent such as school, big achievements, and time spent with friends and family.
|Topics||% of sentences in topic|
|24 Hours Reflection||3 Months Reflection|
3 Categorizing Happy Moments
So far we have gained an understanding of HappyDB through some analysis of annotations that are frequently considered in the literature. In this section, we take a first step towards a deeper analysis of happy moments by trying to classify them into categories. Categorization is important for several reasons. First, it forms the basis for visualizing one’s happy moments. Second, the techniques for analyzing happy moments may depend partially on the category they belong to. Finally, the category of a happy moment could trigger a conversation between an app and a user, and the course of conversation is clearly dependent on the category being discussed. For instance, the app’s response to a happy moment about completing an exercise may be to congratulate the user, but the same response would be unacceptable if the user mentions that she is enjoying a beautiful scenery.
|Achievement||With extra effort to achieve a better than expected result||Finish work. Complete marathon.|
|Affection||Meaningful interaction with family, loved ones and pets||Hug. Cuddle. Kiss.|
|Bonding||Meaningful interaction with friends and colleagues||Have meals w coworker. Meet with friends.|
|Enjoy the moment||Being aware or reflecting on present environment||Have a good time. Mesmerize.|
|Exercise||With intent to exercise or workout||Run. Bike. Do yoga. Lift weights.|
|Leisure||An activity done regularly in one’s free time for pleasure||Play games. Watch movie. Bake cookies.|
|Nature||In the open air, in nature||Garden. Beach. Sunset. Weather|
There is no consensus on a single set of categories for happy moments in positive psychology because they are often discussed under different names, with small variations and at different levels of granularity. We chose a set of categories inspired by research in positive psychology that also reflects the contents of HappyDB. These categories and a brief description of them are listed in Table 9. Note that affection refers to activity with family members and loved ones, while bonding refers to activities with other people in one’s life.
We developed a multi-class classifier using Logistic Regression with a bag of words representation of happy moments as features. To obtain training data, we crowdsourced a batch ofhappy moments to obtain category labels. Every happy moment was shown to 5 workers, and we only considered labels that at least 3 workers agreed on. Table 10 shows the performance of our classifier using a 5-fold cross-validation setup. Clearly the classifier has room for further improvement on categories such as “Leisure” and “Enjoy the moment” which shows that word distributions are not sufficient for this task. Building a classifier with sufficiently high precision/recall scores on all categories can be a challenging task in general, as it usually involves inferring some information or context that is not explicitly mentioned in text.
We publish our crowd-sourced labels as part of HappyDB to provide a ground-truth for researchers interested in topic mining and clustering of short utterances. We also released our predicted results on the entire corpus as a baseline.
|% of moments|
|Category||Precision||Recall||F1||24 Hrs||3 Months|
|Enjoy the moment||59.2||49.9||54.0||13.3||8.9|
Table 10 also shows the percentage of moments classified into each category for both reflection periods, which further highlights the differences between reflection periods described in Section 2. Notice that HappyDB has roughly the same number of moments for each reflection period. Thus, the percentage of moments classified in each category in HappyDB can be computed by taking the average of the last two columns in Table 10. The higher frequency of moments in “Exercise”, “Nature”, and “Leisure” under the 24 hours reflection period confirms our theory that daily tasks are sources of short-term happiness. Longer-term happiness is more likely to come from loved ones or achievements.
4 Related work
To the best of our knowledge, HappyDB is the first crowdsourced corpus of happy moments which can be used for understanding the language people use to describe happy events. There has been recent interest in creating datasets in the area of mental health. Althoff et al. [Althoff et al.2016] conducted a large scale analysis on counseling conversational logs collected from short message services (SMS) for mental illness study. They studied how various linguistic aspects of conversations are correlated with conversation outcomes. Mihalcea et al. [Mihalcea and Liu2006] performed text analysis on blog posts from LiveJournal (where posts can be assigned happy/sad tags by their authors). Lin et al. [Lin et al.2016] measure stress from short texts by identifying the stressors and the stress levels. They classified tweets into 12 stress categories defined by the stress scale in [Holmes and Rahe1967].
Last year alone, there were multiple research efforts that obtained datasets via crowdsourcing and applied natural language techniques to understand different corpora. For example, SQuAD [Rajpurkar et al.2016] created a large-scale dataset for question-answering. The crowdsourced workers were asked to create questions based on a paragraph obtained from Wikipedia. They employed MTurk workers with strong experience and a high approval rating to ensure the quality of the dataset. We did not select the workers based on their qualification for HappyDB as our task is cognitively easier than SQuAD’s and we want to avoid bias in our corpus. HappyDB is similar to SQuAD in terms of the scale of the crowdsourced dataset. However, unlike SQuAD, which was designed specifically for studying the question answering problem, the problems that HappyDB can be used for are more open-ended.
We have published HappyDB, a broad corpus of happy moments expressed in diverse linguistic styles. We have also derived a cleaned version of HappyDB, added annotations, and presented our analysis of HappyDB based on these annotations. We made our dataset and most of our annotations publicly available to encourage further research in the science of happiness and well-being in general. We believe that HappyDB can spur research of the topic of understanding happy moments and more generally, the expression of emotions in text. The results of this research can translate to applications that can improve people’s lives.
6 Bibliographical References
- [Althoff et al.2016] Althoff, T., Clark, K., and Leskovec, J. (2016). Large-scale Analysis of Counseling Conversations - An Application of Natural Language Processing to Mental Health. TACL.
- [Bliss2017] Bliss. (2017). Gratitude journal - bliss. http://bliss31.com/.
- [Bradley and Lang1994] Bradley, M. M. and Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic differential. Journal of behavior therapy and experimental psychiatry, 25(1):49–59.
- [Brett and Pinna2013] Brett, D. and Pinna, A. (2013). The Distribution of Affective Words in a Corpus of Newspaper Articles. Procedia: social and behavioral sciences, 95:621–629.
- [DayOne2017] DayOne. (2017). Day one - a simple and elegant journal for iphone. http://dayoneapp.com/.
- [Diener et al.1999] Diener, Suh, Lucas, and Smith. (1999). Subjective well-being: Three decades of progress. Psychological Bulletin, pages 276–302.
- [Fellbaum1998] Fellbaum, C. (1998). WordNet. Wiley Online Library.
- [Fredrickson2009] Fredrickson, B. (2009). Positivity: Top-Notch Research Reveals the Upward Spiral That Will Change Your Life. Harmony.
- [Happier2017] Happier. (2017). Happier: Gratitude journal, meditation, and celebrating the good around you. https://www.happier.com/.
- [Happify2017] Happify. (2017). Happify: Science-based happiness games & activities. www.happify.com/.
- [Holmes and Rahe1967] Holmes, T. H. and Rahe, R. H. (1967). The social readjustment rating scale, volume 11. Journal of psychosomatic research.
- [Killingsworth2017] Killingsworth, M. (2017). Track your happiness. https://www.trackyourhappiness.org.
- [Lin et al.2016] Lin, H., Jia, J., Nie, L., Shen, G., and Chua, T.-S. (2016). What Does Social Media Say about Your Stress?. IJCAI ’16.
- [Liu2012] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
- [Lyubomirsky2008] Lyubomirsky, S. (2008). The How of Happiness: A New Approach to Getting the Life You Want. Penguin Books; Reprint edition (December 30, 2008).
- [Manning et al.2008] Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval. Cambridge university press Cambridge.
- [Mihalcea and Liu2006] Mihalcea, R. and Liu, H. (2006). A corpus-based approach to finding happiness. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pages 139–144. AAAI.
- [Mojo2017] Mojo. (2017). Mojo: My gratitude journal. http://welovemojo.com/.
- [Murthy2016] Murthy, V. (2016). http://www.huffingtonpost.com/entry/surgeon-general-happiness-vivek-murthy_us_564f857ee4b0d4093a57c8b0.
- [Norvig2007] Norvig, P. (2007). How to write a spelling corrector. http://norvig.com/spell-correct.html.
- [Okazaki2007] Okazaki, N. (2007). Crfsuite: a fast implementation of conditional random fields (crfs). http://www.chokkan.org/software/crfsuite/.
- [Pang and Lee2008] Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135.
- [Pennebaker et al.2015a] Pennebaker, J. W., Boyd, R. L., Jordan, K., and Blackburn, K. (2015a). The development and psychometric properties of liwc2015. Technical report.
- [Pennebaker et al.2015b] Pennebaker, J., Booth, R., Boyd, R., and Francis, M. (2015b). Linguistic inquiry and word count: Liwc2015. austin, tx: Pennebaker conglomerates.
- [Rajpurkar et al.2016] Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In EMNLP, pages 2383–2392.
- [Schneider and Smith2015] Schneider, N. and Smith, N. A. (2015). A Corpus and Model Integrating Multiword Expressions and Supersenses. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1537–1547.
- [Seligman et al.2005] Seligman, M. E., Steen, T. A., Park, N., and Peterson, C. (2005). Positive psychology progress: empirical validation of interventions. American psychologist, 60(5):410.
- [Seligman2011] Seligman, M. E. P. (2011). Flourish: A New Understanding of Happiness, Well-being - and how to Achieve Them. Nicholas Brealey.
- [Tversky and Kahneman1981] Tversky, A. and Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481):453–458.
- [Warriner et al.2013] Warriner, A. B., Kuperman, V., and Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior research methods, 45(4):1191–1207.
Appendix A Demographic information of HappyDB crowdsourcing workers
The following tables represent the demographic distributions of the crowdsourcing workers who contributed to our HappyDB dataset.
Appendix B Data collection by crowdsourcing
We investigated several parameters before collecting our corpus. First, we investigated whether differences in our instructions to the workers will influence the happy moments collected. Second, we experimented with different windows of reflection (i.e., how far in the past did the happy moment occur). We did this to understand how the time period influences the content of happy moments. By analyzing the outcomes on batches of 300 moments that were collected by systematically varying these parameters, we embarked on a large-scale collection of happy moments. We also experimented with two platforms: Mechanical Turk and Crowd Flower. We did not notice a significant difference in the quality or content of the moments so we used Mechanical Turk.
b.1 Instructions for workers
Following [Seligman et al.2005] who developed a questionnaire that included a question that asked for 3 good things that went well each day, we asked our MTurk workers for 3 happy moments that happened to them in the past 24 hours. To minimize the bias we introduce through our instructions, we carefully analyze the effect our examples of happy moments have on the way crowdsource workers report their moments. In our first batch of crowdsourcing, we gave concise instructions and positive examples of what we believed are legitimate happy moments (see Figure 3).
Upon collecting our first batch of 300 happy moments, we tabulated the top 20 most frequent words (nouns/verbs/adjectives) that occur in the happy moments. It was surprising to observe that the words used in the positive examples often appear in the top 20 most frequent words. (See the first two columns of top of Table 16). For example, the bold words “morning”, “enjoy”, “woke”, “son”, and “great” (excluding “went”), which are words we use in our positive examples, appear highly among the top 20 words. Consequently, we experimented with an instruction set without the positive examples () and collected another 300 happy moments. The top of Table 16 shows the top 20 most frequent words. When we compare the moments collected under and , it is evident that the bold words no longer appear highly ranked.
The bottom of Table 16 shows a quantitative representation of the framing effect [Tversky and Kahneman1981] of the instruction set. Here, denotes the set of all words (no duplicates and no stopwords) obtained from our 5 positive examples in the instruction set. We counted the number of times words in occur in the happy moments obtained from instruction and, respectively, instruction. In addition, we also counted the number of times words outside occur in the respective batches of happy moments.
|# words in batch||# words in batch|
A analysis [Manning et al.2008] shows that the presence of the 5 positive examples in
does affect the word usage of workers. Specifically, our null hypothesis is that the word usage of the happy moments is independent of whether the instruction set contained positive examples or not. The-test rejected the null hypothesis with -value . Hence, we conclude that MTurk workers were influenced by the positive examples in our instructions when reporting their happy moments and decided against positive example sentences in the instructions for collecting happy moments. It is interesting to note that the bottom of Table 16 shows also that the vocabulary of happy moments from instruction set is significantly minimized, 3,328 total words used in the batch versus 3,578 in the batch.
From this analysis, we concluded that we should avoid using positive examples in our instructions. We also experimented with instructions that do not include negative examples. However, apart from some reduction in the number of low-quality happy moments, we did not detect significant differences between happy moments that are collected from instructions with or without negative examples. Hence, we included negative examples in our instructions for the workers.
Appendix C A Spell-Corrector for HappyDB
Here, we discuss the details of the spell-correction algorithm that we have created for HappyDB. Our main goal is to fix as many typos while introducing as little error as possible. To this end, we have decided to focus on a small set of corrections: typos that are within a Levenshtein distance of 1 of a valid word (i.e., one deletion, insertion, transposition, or replacement of a letter or a space).
The spell-correcting algorithm starts by finding the set of words within edit distance 1 of a typo and computes a confidence score , for each word which we defined as where is the frequency of the word . If consists of two words (which occurs with replacement or insertion of a space character) then is the lower frequency of the two words. We calculate these frequencies using a corpora which consists of Norvig’s corpus and the portion of HappyDB that has no spelling errors. We observed that resulting corrector was biased toward splitting words into two, for example “outtdoors” was being replaced with “out doors” instead of “outdoors”. This is because shorter words occur more frequently. For the same reason, words were often being replaced by an incorrect but shorter alternative, for example “helpd” being replaced with “help” (instead of “helped”). In an attempt to solve this problem, we refined the confidence score by adding two additional parameters: , to discount the confidence of replacements with an inserted space, and , to increase the confidence of longer words. This updated confidence score can be written as
where is a simple indicator function which returns if consists of two words and returns otherwise. Note that we are using the logarithmic frequency in our definition. The hypothesis is that shorter words occur exponentially more often than long words on average, in which case computing confidence as a function of the logarithmic frequency would yield better results. We also observed this effect in practice. The last step is to tune the parameters and . To tune these parameters, we took a random sample of 100 spelling errors from our data and manually made corrections. We then perform a grid search over possible values for and . Our experiment suggests that conservative values for and are and , respectively. With these values, in a random sample of happy moments, all 74 detected typos were either corrected appropriately or left as is.
We applied our spell-corrector on the entire corpus, which automatically replaced words. The remaining typos were corrected by workers on CrowdFlower, as well as internal workers in our lab. Each word was evaluated by two judges, and if they agreed, the result was automatically applied. There were less than 500 words where the judges did not agree. In this case, we defaulted to the worker with higher confidence rating (our internal lab workers). In the remaining cases where the confidence ratings were equal, we left the word alone if either judge was unsure, otherwise chose a suggestion at random (in the majority of these cases, the answers varied in usage of space, punctuation, or capitalization, and not in content). In total, happy moments were modified with this method.