Leveraging Medical Sentiment to Understand Patients Health on Social Media

by   Shweta Yadav, et al.

The unprecedented growth of Internet users in recent years has resulted in an abundance of unstructured information in the form of social media text. A large percentage of this population is actively engaged in health social networks to share health-related information. In this paper, we address an important and timely topic by analyzing the users' sentiments and emotions w.r.t their medical conditions. Towards this, we examine users on popular medical forums (Patient.info,dailystrength.org), where they post on important topics such as asthma, allergy, depression, and anxiety. First, we provide a benchmark setup for the task by crawling the data, and further define the sentiment specific fine-grained medical conditions (Recovered, Exist, Deteriorate, and Other). We propose an effective architecture that uses a Convolutional Neural Network (CNN) as a data-driven feature extractor and a Support Vector Machine (SVM) as a classifier. We further develop a sentiment feature which is sensitive to the medical context. Here, we show that the use of medical sentiment feature along with extracted features from CNN improves the model performance. In addition to our dataset, we also evaluate our approach on the benchmark "CLEF eHealth 2014" corpora and show that our model outperforms the state-of-the-art techniques.


page 1

page 2

page 3

page 4


Mining social media data for biomedical signals and health-related behavior

Social media data has been increasingly used to study biomedical and hea...

Sentiment-Aware Recommendation System for Healthcare using Social Media

Over the last decade, health communities (known as forums) have evolved ...

Social Media Usage in Kuwait: A Comparison of Perspectives Between Healthcare Practitioners and Patients

Social Media has been transforming numerous activities of everyday life,...

A Heterogeneous Graphical Model to Understand User-Level Sentiments in Social Media

Social Media has seen a tremendous growth in the last decade and is cont...

Dynamic Emotions of Supporters and Opponents of Anti-racism Movement from George Floyd Protests

Social media empowers citizens to raise the voice and expressed civil ou...

Movie Box office Prediction via Joint Actor Representations and Social Media Sentiment

In recent years, driven by the Asian film industry, such as China and In...

Humane Visual AI: Telling the Stories Behind a Medical Condition

A biological understanding is key for managing medical conditions, yet p...

1 Introduction

The phenomenal rise in blogging is coupled with the increasing popularity of medical forums for sharing medical problems or experiences, and seeking for health-related information or opinions of other users (i.e., patients or health professionals). According to a recent study conducted by the Pew Internet & American Life Project111http://www.pewinternet.org/, almost 80% of Internet users in the US have explored health-related topics online. More often, 63% of people look for the information about specific medical problems and nearly 47% of the users search for the medical treatments or procedures over the Internet.
These self-narrated journals consist of a diverse variety of information, including, user-specific concerns, triggers, reactions, or merely status updates on their emotional states. With this ever-increasing size of the blogs, a majority of these blog posts are left unused or unanswered. Considering this fact, it would be helpful to have a sentiment analyzer that could study the user’s sentiment associated with the post related to his/her health status. Moreover, extracting sentiment and/or opinions from medical text can be crucial to assess the patient’s health and also to help health professionals with automated decision support system. In this paper, we make an attempt to capture medical sentiment (MS) from unstructured text by analyzing the subjectivity expressions describing a patient’s medical conditions to prioritize the individual’s posts that requires immediate attention.

The existing research in sentiment analysis is primarily focused on detecting users’ external sentiment

[Shickel et al.2016] towards some entities like products, organizations, or events etc. In contrast, when working with the self-narrative medical journals (blogs), the focus is more on gauzing the users’ internal sentiments towards their own emotions, feelings, and thoughts. The difference in objectives, sentiment types, and distribution of polarity introduce unique challenges for applying traditional sentiment analysis to this new domain.
One key aspect that differentiates our problem from the traditional sentiment analysis is our way of defining the polarity class. Traditionally, the sentiment is classified as either positive, negative, or neutral. In contrast, the notion of sentiment in the medical context is more granular which can be studied after considering various aspects [Denecke and Deng2015] that can directly impact the users’ health conditions as studied by [Yadav et al.2018b] such as:

  • Changes in the medical condition (example, Sentiment can be observed as a change in a patient’s medical condition which can improve or worsen over a time.)

  • Severity of the medical condition that impacts patient life (example, severe headache impacts the patient’s life more than a mild headache.)

  • Outcome of a treatment (example, there may be positive or negative impacts on a patient’s treatment.)

According to our analysis, nearly 95% of the user-posts on medical forums carry negative sentiment towards their medical conditions. The negative sentiment alone is not informative enough for health professionals to make any clinical decision. Considering this fact, we further divide the negative sentiment specific class into ‘Exist’ and ‘Deteriorate’, which explicitly conveys whether a user is experiencing a disorder or the existing has worsened over the time. We also analyze the positive sentiment of a user and project it as ’Recovered’, conveying recovery from the disorders.
Although, several techniques exist to capture the sentiments in general domains, the sentiments expressed in medical narratives have not been analyzed and exploited well in required measures yet. The research in medical sentiment analysis is primarily focused on biomedical literature and Electronic Medical Record (EMR) documents. Recently, preliminary studies [Shickel et al.2016, Yang et al.2016] have been conducted to understand the sentiment in the medical setting. Several shared tasks [Losada et al.2017, Hollingshead et al.2017] have also been organized to study the patient health-related opinions on social media. However, a majority of these studies are focused on mental health disorder, and the defined classification schemes are framed to understand the depressive behavior of a user using PHQ-9 levels [Kroenke et al.2001] (a method to monitor the severity of depression).

Problem Statement: For a given medical blog post , consisting of sentences i.e. , the task is to predict the medical sentiment specific label ‘’ from a discrete set of medical condition ‘={Recovered, Exist, Deteriorate, Other}’.
We started with the Convolutional Neural Network (CNN) [LeCun et al.1995]

given the recent success in several Natural Language Processing (NLP) tasks

[Kim2014, Collobert et al.2011, Kalchbrenner et al.2014, Mikolov et al.2013, Collobert et al.2011, Ekbal et al.2016]

. Many applications use CNN for having automatic feature extraction capability. However, as we explain in Section

, the analysis of data has provided an interesting insight: while CNN-extracted features (making use of word embedding) capture the semantic information, the incorporation of external features could further assist in precisely capturing the subjectivity (sentiment, emotion) associated with medical concepts. Correspondingly, we devise Medical Sentiment-CNN (MS-CNN) to advance conventional CNN by embedding various medical sentiment features. Specifically, we use a Support Vector Machine (SVM) [Cortes and Vapnik1995]

as a strong classifier instead of the softmax (Logistic Regression) classifier at the output layer of the CNN because of the limitation of logistic regression (LR). This is informed by the observation that when the data is non-linearly separable, an SVM with a non-linear kernel outperforms LR

[Pochet and Suykens2006].
Contributions: (i) a description of the medical sentiment analysis task by mining medical blog using crowd intelligence, (ii) development of an annotated medical sentiment corpus to the research community, and (iii) a method for online rapid assessment of medical sentiment with the fusion of CNN-generated features and the sentiment-sensitive medical features learned on an SVM to significantly improve the accuracy.

Our evaluations show that compared to baseline system (SVM & CNN), our proposed approach yields 15.27% and 4.17% improvements in F-Score on the curated blog dataset. In addition to presenting the solution to the practically useful challenge of identifying medical sentiment in a clinical context, we also significantly improve upon the highly successful classification techniques (CNN, SVM). We further evaluate our proposed approach on

“2014 ShARe/CLEF task-2a (attribute normalization)” in order to show the generic nature of our algorithm. On this task, our system achieves significantly high precision (P), recall (R), and accuracy (A) for both ‘Severity Class’ (P: 0.9828, R: 0.9832, A: 0.9832) and ‘Course Class’ (P: 0.9833, R: 0.9837, A: 0.9837) attributes.

Medical Blog Label
“Hi been on Sertaline now for abut 4 weeks. My mood has
definitely improved and I am alot calmer.
“This morning I had an attack of it that was very intense.
I felt an incredible surge of unsteadiness.”
“Had anxiety for few months on citalopram, propanolol.
Nothing seems to help been in bed for two days can’t sleep.”
“How do you calculate FEV1% from a FEV1 result with age
sex and height know.”
Table 1: Exemplar description of benchmark annotation scheme

2 Annotation Scheme

In this section, we define at first the benchmark setup by studying the sentiments expressed in the medical blog posts. Based on the medical sentiment, we classify the medical blog posts into the following four categories:
I. Recovered: indicates that the user has recovered from health problem and expressing the positive medical sentiment.
II. Exist: indicates that the user is experiencing health problem and expressing the negative medical sentiment, but has not mentioned any medication.
III. Deteriorate: indicates that the user’s medical condition has deteriorated over the time and he/she is expressing the negative medical sentiment towards the medication.
IV. Other: indicates that the user is discussing general topics and not expressing any medical sentiment. There is no mention of any medical symptom or medication.
We provide the exemplary description of the annotation scheme in Table-1.

3 Proposed Model: Network for Identifying Medical Sentiment

In this section, we propose a method based on CNN that exploits users’ medical sentiments from health forums in the augmentation layer. As presented in Figure-1, the proposed model has six different components where the first four layers are similar to the conventional CNN components as proposed by [Kim2014]

. The system takes a complete blog post as input and outputs probabilities corresponding to the four classes. We use max-pooling over the whole blog post to obtain global features through all the filters. This pooled feature is fed into the augmentation layer instead of the fully connected neural network which concatenates pooled feature produced by the CNN with the sentiment-sensitive medical features. In the output layer, we use SVM instead of the softmax classifier to automatically classify the post into the four classes. We describe below the layers of our proposed model in details:

Figure 1: Proposed MS-CNN architecture for predicting the m conditionmedical condition from the medical blog.

I. Input layer: Each blog post is provided as input to the model.
II. Word embedding layer: This layer encodes every word into a real-valued vector. Given a blog text consisting of words , each word is transformed into real-valued vector . Each word in is looked up in corresponding word embedding matrix , where represents fixed-length vocabulary and is the word embedding size. The blog-post representation matrix can be represented as:



represents the concatenation operation. We perform zero padding in case the number of the words in a blog text is less than

, to fix the length.
III. Convolution layer: Word embedding is feed as the input to the convolution layer where filter is convoluted to the context window of words for each blog-post as follows.


where is a non-linear function222

In our experiments we have used the rectified linear unit as a non linear function.

and is a bias term. The feature map is generated by applying a given filter F to every potential window of the words in the blog-post. f=[g(F ⋅x_1:1+h-1 + b), g(F ⋅x_2:1+h-1 + b) …g( F ⋅x_n-h+1:n + b)

= [ f_1, f_2, f_3 ….. f_n-h+1] In order to increase the coverage of n-gram model, multiple filters with different window sizes can be applied.

IV. Pooling layer: The function of the pooling layer is to gradually minimize the spatial size of the representation by identifying the most abstracted feature generated by the convolution layer. It involves non-linear downsampling to extract the most relevant set of the features. In our work, we apply max-pooling operation over feature map and set the maximum value as a feature for this particular filter. The max-pooling operation is performed over feature map as follows:


V. Feature layer: In this layer, we generate the sentiment features which are specific to the medical context, for each blog post as described below:
(1). Sentiment word feature (SWF): Sentiment clues words provide important features in deciding the emotions of the users. Besides, we can observe that inclusion of negation to the sentiment word can change the polarity. For example, there is positive emotion in “I’m stable” but after including negation like “I’m not stable”, the emotion polarity changes. Briefly, there are two types of sentiment events by which we can capture the sentiments of users: occurrences of sentiment words (SW), occurrences of sentiment words with negation (NSW). This feature calculates the positive (), negative () and objective () score for each word by capturing the sentiment event [Dang and Shirai2009]. Publicly available SentiWordNet (SWN)333http://sentiwordnet.isti.cnr.it/ is used to calculate the score for each word as follows:


Here, tf & idf represent the term and inverse document frequencies, respectively. , and are positive, negative and objective scores, respectively obtained from the SentiWordNet. The sentiment word feature of a blog having words obtained by the following way:
, and .

(2). Medical sentiment context feature (MCF): After analyzing the data we observe that % of the posts depict sentiments in the context of a certain stative verb such as ‘feel’, ‘suffer’, ‘experience’. We design this contextual feature by considering a context window of [-,] words and selecting the most effective stative verb. Thereafter, negative and positive densities of a post are calculated by the frequency of the clue word to the number of words in context (i.e. in this case). For example, if a post contains more than one instance of ‘feel’ term, we calculate the score individually and consider the maximum one. If the word ‘feel’ appears at the position in a blog then the score will be calculated as follows:


where, is context window size, weight and . The aggregate score and of a blog post is calculated as follows:


where, T is the number of sentiment words in the post. At the end the feature obtained from the pooling layer and these sentiment specific features form a augmented feature .


Where is the length of feature representation obtained from pooling layer, is the concatenation operator.
VI. Output layer: The blog-level feature vector is finally passed to an SVM to perform classification. In this layer, we explore SVM instead of the traditional softmax classifier to predict the label ‘’ from a discrete set of class ‘={Recovered, Exists, Deteriorate, and Other}’ for the corresponding blog post ‘’. For the multi-class classification problem, the classifiers like softmax regression (logistic regression) and SVM often provide comparable results. However, when the data is not linearly separable, SVM with non-linear kernel outperforms the Logistic Regression [Pochet and Suykens2006]

. Furthermore, LR is prone to the problem of over-fitting as it focuses on maximizing the likelihood, while SVM can generate linear hyperplanes by mapping the data into high-dimensional spaces. This is the underlying motivation behind replacing the LR with an SVM in the final layer.

4 Datasets, Experimental Results, and Analysis

In this section, we present the dataset that we create and report the experimental results along with proper analysis.

4.1 Datasets and Experimental Setup

We design a web crawler to collect posts of different users from the health-related forums. We generate a corpus of blog posts collected from health forums 444https://patient.info/, https://www.dailystrength.org/, open social media platforms which provide evidence-based information on a variety of medical and health topics. In our study, we focused on four popular groups, namely Depression, Allergy, Asthma, and Anxiety. Since, our main aim is to classify each post on the basis of content, we cleaned the dataset by omitting the user-names and its corresponding hyperlinks. Unlike dataset utilized in [Yadav et al.2018a, Yadav et al.2018b], in this dataset we have considered ‘Others’ class because our main aim is to prioritize the individual’s posts that requires immediate attention by understanding their sentiments.

A team of three expert annotators (lexicographer with knowledge of basic medical concept) independently annotated the user posts with four classes. We use Cohen’s kappa approach [Cohen1960] to measure the inter-annotator agreement. We observe high agreement ratio of for exact matching of the class w.r.t each blog post. The dataset statistics are shown in Table-2. We perform 5-fold cross validation on the labeled dataset to learn MS-CNN model as discussed in Section-3.

Dataset Statistic Recovered Exist Deteriorate Other
Number of blog posts 136 1711 1778 432
Average number of sentences per post
5 5 7 3
Average number of words per post
152 137 186 19
Table 2: Medical blog dataset statistics

CLEF eHealth Dataset: To evaluate the effectiveness our system, we also train and test on the “ShARe/CLEF eHealth 2014 Task 2” [Mowery et al.2014] dataset which is sampled from MIMIC-II (Multiparameter Intelligent Monitoring in Intensive Care), a database containing clinical reports of Intensive Care Unit (ICU) patients. The dataset contains clinical reports including Discharge Summary, Radiology Report, ECHO Report, and ECG Report for training, and Discharge Summary for the test.
We replace each disorder mention with the keyword ‘DISORDER’ to make our system insensitive to certain mentions and enable to focus on the context of a disorder. Table-6 reports the corpus statistics for the attributes ‘Severity Class’, and ‘Course Class’ and their corresponding classes.

width=1 DatasetClass Severity Class (#sentences) Course Class (#sentences) slight moderate severe unmarked changed increased decreased improved resolved worsened unmarked Training 910 339 135 9972 8 114 101 161 60 10737 54 Test 234 189 76 7448 4 23 83 98 53 7660 42

Table 3: CLEF eHealth 2014 dataset statistics

Hyperparameter settings in SVM: In the SVM the value & kernel were obtained by optimizing on the development set. The value of was set to 0.01 and a Gaussian kernel was used in our experiment. A grid search was performed to deduce to an optimal value of in the range of to . We used a LibSVM implementation 555https://www.csie.ntu.edu.tw/ cjlin/libsvm/ of SVM.
Hyperparameter settings in CNN: The hyper-parameters values were determined from preliminary experiments by evaluating the model’s performance on the 5-fold cross validation by varying the convolution feature sizes (, , &

). Specifically, most of the deep learning models use the 300-dimension word embedding, the feature maps size of

on the multiple filter window of size , , and . We used Adam [Kingma and Ba2014] as our optimization method with a learning rate of

. Training was performed using stochastic gradient descent over mini-batches considering the Adadelta

[Zeiler2012] update rule. As a regularizer, we used dropout [Hinton et al.2012] with a probability of

. After training, we chose the best performing model to be evaluated on the test set. The MS-CNN model introduced in this paper is implemented in Theano


4.2 Result and Analysis

In order to make an effective comparison of our proposed approach, we design two strong baselines as following:
(1) Baseline 1: The first baseline model is constructed by training an SVM with Bag-of-Words (BoW) and the sentiment-sensitive features presented in the feature layer of MS-CNN in Figure 1.
(2) Baseline 2: In this model, we use the standard CNN model learned using only word embedding features.

width=1 System Precision Recall F-Score Baseline-1 (BoW+SVM) 0.7218 0.7347 0.7281 Baseline-2 (CNN) 0.8204 0.7917 0.8057 CNN (Pooling)+SVM 0.8254 0.8119 0.8185 MS-CNN (Pooling+SWF+MCF+SVM) 0.8305 0.8499 0.8393

Table 4: Performance comparison of our proposed approach (MS-CNN) with baselines on the medical blog dataset.

Table-4 reports the performance of our proposed approach (MS-CNN) with other baselines where we observe improvements of and over Baseline and , respectively. Statistical significance test shows that improvement over both the baselines are significant (p-value ).
We perform experiments to select the best filter length using all the features. We perform 5-fold cross validation by varying the convolution feature sizes (, & ) as reported in Table-5. It is observed that increase in the feature size generally tends to enhance the performance. We obtain the best performance of % F-Score using the feature size of . Providing small window size (filter length) often fails to capture sufficient context, whereas too long tends to include irrelevant contextual information. Based on these observations, we set the filter lengths to a minimum and maximum of . Further, we analyze the influence of multiple filters on our evaluation. Results indicate that combination of three filters is the better choice to improve the performance. For example, the combination of filter lengths , and provide the highest recall value of % on feature size. We also observe the similar behavior on and feature sizes. Any further increase in filter length reduces the performance. For example, the combination of filter lengths , , and does not lead to performance improvement.

Convolution Feature Size Window Size Precision Recall F-Score
Table 5: Performance of MS-CNN (Pooling+SWF+MCF+SVM) with different convolution feature and window sizes on the medical blog dataset

In order to investigate the contribution of the features i.e. Sentiment word feature (SWF) & Medical sentiment context feature (MCF), we incorporate one feature at a time in our proposed model sequentially and observe its effect. Table-7 provides the details of incorporating features into CNN model. We observe that incorporation of MCF feature in CNN-SVM architecture improves the performance by % F-score. Addition of SWF feature leads to % increase in F-score. Addition of both MCF & SWF further improve recall, precision and F-score values by , and % respectively. In the second phase of our experiment, we include the feature to the basic CNN model learned on logistic regression classifier. The evaluation shows that SWF feature has more influence in improving the overall performance.
We further perform experiments with different classifiers like Ridge (RC) [Hoerl and Kennard1970], Logistic Regression (LR), Nearest Neighbors (NN) [Cover and Hart1967]

, Multi-layer Perceptron (MLP)

[Gardner and Dorling1998] at the output layer of CNN architecture. We observe that SVM performs superior compared to the other classifiers as presented in Table-7. An overall increment of 2.9% F-score is observed with the use of SVM replacing softmax in the output layer of CNN.
In the CLEF dataset, the attribute modifiers are explicit and appear mostly within a context window of . For example, in the sentences “The patient was also noted to have a significant perineal injury”, and “The patient’s diarrhea continued”, significant is a severity modifier (severe), and continued is a course class modifier (increased) for the disorder mentions perineal injury, and diarrhea respectively.
As the dataset is fairly clean, structured, and the sentences are short in length compared to the blog posts, we could achieve a significantly high accuracy of 0.9832 and 0.9837 for identifying the Severity Class, and Course Class of a disorder mention which outperforms the state-of-the-art system as presented in Table 8.

width=1 Systems Precision Recall F-Score Pooled Feature+SWF+LR 0.8219 0.7984 0.8099 Pooled Feature+MCF+LR 0.8192 0.7981 0.8085 Pooled Feature+SWF+MCF+LR 0.8074 0.8290 0.8153 Pooled Feature+SWF+SVM 0.8371 0.8289 0.8329 Pooled Feature+MCF+SVM 0.8311 0.8235 0.8272 Pooled Feature+SWF+MCF+SVM 0.8305 0.8499 0.8393
Table 6: Impact of each feature on performance with different classifiers on the medical blog dataset.
width=1 Systems Precision Recall F-Score MS-CNN(Pooling+SWF+MCF)+Ridge Classifier 0.7615 0.7835 0.7701 MS-CNN(Pooling+SWF+MCF)+Logistic Regression 0.8074 0.8290 0.8153 MS-CNN(Pooling+SWF+MCF)+ Nearest Neighbors 0.7796 0.7860 0.7712 MS-CNN(Pooling+SWF+MCF)+ MLP 0.7901 0.7983 0.7893
Table 7: Performance comparison of our proposed approach (MS-CNN) with other classifiers on the medical blog dataset.

width=1 System Severity Class Course Class Precision Recall F-score Accuracy Precision Recall F-score Accuracy MS-CNN 0.9828 0.9832 0.9829 0.9832 0.9833 0.9837 0.9833 0.9837 TeamHITACHI[Johri et al.2014] - - - 0.982 - - - 0.971 RelAgent[Ramanan and Nathan] - - - 0.975 - - - 0.970

Table 8: Comparison with the state-of-the-art systems on “2014 ShARe/CLEF task-2a (attribute normalization)” for ‘Severity Class’, and ‘Course Class’

4.3 Error Analysis

In this section, we analyze different sources of errors that lead to misclassification. We closely study the false positives and false negatives and categorize the errors in the following classes:
(1) Short blog text: We observe that our system is unable to identify the appropriate class for the short text despite of having explicit sentiment-bearing words in it. For e.g “Had bad pains since this morning more like a shooting/stabbing pain!.”. In this text despite of having mention of the term ’pain’ explicitly, the system classifies it into ’Other’ class. This might be because of the zero padding performed during pre-processing of SM-CNN. One more possible reason could be because majority of instances in ‘Other’ class have correlation with the short texts (average three sentences). We observe that approximately % of the total errors are due to the shorter blog-text.
(2) Implicit sentiments: Our system fails to identify the class where sentiments are often contained implicitly. In our analysis, we observe that the words used in the medical blogs to express sentiments greatly differ from other social media. In non-medical social media, polarities are manifested in corresponding words (mainly adjectives) while in medical domain, sentiments are often presented implicitly which need to be inferred, for instance, from the medical concepts used in documents. Implicit descriptions of medical conditions are for e.g severe pain, tight chest, rapid weight gain, etc. We identify that % errors occur due to the problems of implicit sentiments. For e.g, in the text, “Why does anxiety feels like you have to make yourself breathe instead of letting your body breathe on its own.!!”, it is implicit that the user is suffering from shortness of breathe.
(3) Presence of abbreviated and short words: We analyze that the system misclassifies in presence of abbreviated word forms or short words. For example, ‘Cit’ instead of ‘Citopram’, ‘CBT’ instead of ’Cognitive behavioral therapy’, ‘ECG’ instead of ‘electrocardiogram’. The system also misclassifies when words are misspelled, for example, xanac instead of the drug Xanax. We analyze that approximately % of texts are misclassified due to the presence of abbreviated and short words.

5 Conclusion

In this paper, we analyze different aspects of medical sentiments to identify the fine-grained conditions of patients from medical blog posts. Utilize highly representative medical blogs we validate our study. We create a benchmark setup by crawling relevant data, defining classification tagset, and manually annotating them. We propose a robust sentiment-sensitive deep learning model leveraging the medical sentiment features for classifying the medical blog posts. Our experiments show that augmenting sentiment features to CNN derived features is an effective way to combine the two informations severity classification performance. In future, we would like to explore the deep learning techniques to capture implicit sentiments.


  • [Cohen1960] Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
  • [Collobert et al.2011] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch.

    The Journal of Machine Learning Research

    , 12:2493–2537.
  • [Cortes and Vapnik1995] Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273–297.
  • [Cover and Hart1967] Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27.
  • [Dang and Shirai2009] Trung-Thanh Dang and Kiyoaki Shirai. 2009. Machine learning approaches for mood classification of songs toward music search engine. In Knowledge and Systems Engineering, 2009. KSE’09. International Conference on, pages 144–149. IEEE.
  • [Denecke and Deng2015] Kerstin Denecke and Yihan Deng. 2015. Sentiment analysis in medical settings: New opportunities and challenges. Artificial intelligence in medicine, 64(1):17–27.
  • [Ekbal et al.2016] Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya, et al. 2016. A deep learning architecture for protein-protein interaction article identification. In Pattern Recognition (ICPR), 2016 23rd International Conference on, pages 3128–3133. IEEE.
  • [Gardner and Dorling1998] Matt W Gardner and SR Dorling. 1998.

    Artificial neural networks (the multilayer perceptron) a review of applications in the atmospheric sciences.

    Atmospheric environment, 32(14):2627–2636.
  • [Hinton et al.2012] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  • [Hoerl and Kennard1970] Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.
  • [Hollingshead et al.2017] Kristy Hollingshead, Molly E. Ireland, and Kate Loveys, editors. 2017. Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality. Association for Computational Linguistics, Vancouver, BC, August.
  • [Johri et al.2014] Nishikant Johri, Yoshiki Niwa, and Veera Raghavendra Chikka. 2014. Optimizing apache ctakes for disease/disorder template filling: Team hitachi in 2014 share/clef ehealth evaluation lab.
  • [Kalchbrenner et al.2014] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 655–665, Baltimore, Maryland, June. Association for Computational Linguistics.
  • [Kim2014] Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1746–1751.
  • [Kingma and Ba2014] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
  • [Kroenke et al.2001] Kurt Kroenke, Robert L Spitzer, and Janet BW Williams. 2001. The phq-9. Journal of general internal medicine, 16(9):606–613.
  • [LeCun et al.1995] Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995.
  • [Losada et al.2017] David E Losada, Fabio Crestani, and Javier Parapar. 2017. Clef 2017 erisk overview: Early risk prediction on the internet: Experimental foundations.
  • [Mikolov et al.2013] Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
  • [Mowery et al.2014] Danielle L Mowery, Sumithra Velupillai, Brett R South, Lee Christensen, David Martinez, Liadh Kelly, Lorraine Goeuriot, Noemie Elhadad, Sameer Pradhan, Guergana Savova, et al. 2014. Task 2: Share/clef ehealth evaluation lab 2014. In Proceedings of CLEF 2014.
  • [Pochet and Suykens2006] NLMM Pochet and JAK Suykens. 2006. Support vector machines versus logistic regression: improving prospective performance in clinical decision-making. Ultrasound in Obstetrics & Gynecology, 27(6):607–608.
  • [Ramanan and Nathan] SV Ramanan and P Senthil Nathan.

    Cocoa: Extending a rule-based system to tag disease attributes in clinical records.

  • [Shickel et al.2016] Benjamin Shickel, Martin Heesacker, Sherry Benton, Ashkan Ebadi, Paul Nickerson, and Parisa Rashidi. 2016. Self-reflective sentiment analysis. In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pages 23–32.
  • [Yadav et al.2018a] Shweta Yadav, Asif Ekbal, Sriparna Saha, and Pushpak Bhattacharyya. 2018a. Medical sentiment analysis using social media: Towards building a patient assisted system. In LREC.
  • [Yadav et al.2018b] Shweta Yadav, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya, and Amit Sheth. 2018b. Multi-task learning framework for mining crowd intelligence towards clinical treatment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 271–277.
  • [Yang et al.2016] Fu-Chen Yang, Anthony JT Lee, and Sz-Chen Kuo. 2016. Mining health social media with sentiment analysis. Journal of medical systems, 40(11):236.
  • [Zeiler2012] Matthew D. Zeiler. 2012. Adadelta: An adaptive learning rate method. CoRR, abs/1212.5701.