Aspect based sentiment analysis platform
This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) system. Our aim is to develop a modular platform which allows to easily conduct experiments by replacing the modules or adding new features. We obtain the best result in the Opinion Target Extraction (OTE) task (slot 2) using an off-the-shelf sequence labeler. The target polarity classification (slot 3) is addressed by means of a multiclass SVM algorithm which includes lexical based features such as the polarity values obtained from domain and open polarity lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and laptop domain respectively, and performs second best in the out-of-domain hotel, achieving an accuracy of 0.80.READ FULL TEXT VIEW PDF
Aspect based sentiment analysis platform
Aspect based sentiment analysis platform
Nowadays Sentiment Analysis is proving very useful for tasks such as decision making and market analysis. The ever increasing interest is also shown in the number of related shared tasks organized: TASS [Villena-Román et al.2012, Villena-Román et al.2014], SemEval [Nakov et al.2013, Pontiki et al.2014, Rosenthal et al.2014], or the SemSA Challenge at ESWC2014111http://challenges.2014.eswc-conferences.org/index.php/SemSA. Research has also been evolving towards specific opinion elements such as entities or properties of a certain opinion target, which is also known as ABSA. The Semeval 2015 ABSA shared task aims at covering the most common problems in an ABSA task: detecting the specific topics an opinion refers to (slot1); extracting the opinion targets (slot2), combining the topic and target identification (slot1&2) and, finally, computing the polarity of the identified word/targets (slot3). Participants were allowed to send one constrained (no external resources allowed) and one unconstrained run for each subtask. We participated in the slot2 and slot3 subtasks.
Our main is to develop an ABSA system to be used in the future for further experimentation. Thus, rather than focusing on tuning the different modules our goal is to develop a platform to facilitate future experimentation. The EliXa system consists of three independent supervised modules based on the IXA pipes tools [Agerri et al.2014] and Weka [Hall et al.2009]. Next section describes the external resources used in the unconstrained systems. Sections 3 and 4 describe the systems developed for each subtask and briefly discuss the obtained results.
Several polarity Lexicons and various corpora were used for the unconstrained versions of our systems. To facilitate reproducibility of results, every resource listed here is publicly available.
For the restaurant domain we used the Yelp Dataset Challenge dataset222http://www.yelp.com/dataset_challenge. Following [Kiritchenko et al.2014], we manually filtered out categories not corresponding to food related businesses (173 out of 720 were finally selected). A total of 997,721 reviews (117.1M tokens) comprise what we henceforth call the Yelp food corpus ().
For the laptop domain we leveraged a corpus composed of Amazon reviews of electronic devices [Jo and Oh2011]. Although only 17,53% of the reviews belong to laptop products, early experiments showed the advantage of using the full corpus for both slot 2 and slot 3 subtasks. The Amazon electronics corpus () consists of 24,259 reviews (4.4M tokens). Finally, the English Wikipedia was also used to induce word clusters using word2vec [Mikolov et al.2013].
We generated two types of polarity lexicons to represent polarity in the slot3 subtasks: general purpose and domain specific polarity lexicons.
A general purpose polarity lexicon was built by combining four well known polarity lexicons: SentiWordnet SWN [Baccianella et al.2010], General Inquirer [Stone et al.1966], Opinion Finder [Wilson et al.2005] and Liu’s sentiment lexicon [Hu and Liu2004]. When a lemma occurs in several lexicons, its polarity is solved according to the following priority order: . The order was set based on the results of [San Vicente et al.2014]. All polarity weights were normalized to a interval. Polarity categories were mapped to weights for (; -0.6; -0.2; 0.2; 0.6; 0.8), and (-0.7; 0.7 for both). In addition, a restricted lexicon including only the strongest polarity words was derived from by applying a threshold of 0.6.
Domain specific polarity lexicons and were automatically extracted from and reviews corpora. Reviews are rated in a interval, being 1 the most negative and 5 the most positive. Using the Log-likelihood ratio (LLR) [Dunning1993] we obtained the ranking of the words which occur more with negative and positive reviews respectively. We considered reviews with 1 and 2 rating as negative and those with 4 and 5 ratings as positive. LLR scores were normalized to a interval and included in and lexicons as polarity weights.
The Opinion Target Extraction task (OTE) is addressed as a sequence labeling problem. We use the ixa-pipe-nercNamed Entity Recognition system333https://github.com/ixa-ehu/ixa-pipe-nerc [Agerri et al.2014]
off-the-shelf to train our OTE models; the system learns supervised models via the Perceptron algorithm as described by[Collins2002]. ixa-pipe-nerc uses the Apache OpenNLP project implementation of the Perceptron algorithm444http://opennlp.apache.org/ customized with its own features. Specifically, ixa-pipe-nerc implements basic non-linguistic local features and on top of those a combination of word class representation features partially inspired by [Turian et al.2010]. The word representation features use large amounts of unlabeled data. The result is a quite simple but competitive system which obtains the best constrained and unconstrained results and the first and third best overall results.
The local features implemented are: current token and token shape (digits, lowercase, punctuation, etc.) in a 2 range window, previous prediction, beginning of sentence, 4 characters in prefix and suffix, bigrams and trigrams (token and shape). On top of them we induce three types of word representations:
Clark [Clark2003] clusters, using the standard configuration to induce 200 clusters on the Yelp reviews dataset and 100 clusters on the food portion of the Yelp reviews dataset.
The implementation of the clustering features looks for the cluster class of the incoming token in one or more of the clustering lexicons induced following the three methods listed above. If found, then we add the class as a feature. The Brown clusters only apply to the token related features, which are duplicated. We chose the best combination of features using 5-fold cross validation, obtaining 73.03 F1 score with local features (e.g. constrained mode) and 77.12 adding the word clustering features, namely, in unconstrained mode. These two configurations were used to process the test set in this task. Table 2 lists the official results for the first 4 systems in the task.
|System (type)||Precision||Recall||F1 score|
The results show that leveraging unlabeled text is helpful in the OTE task, obtaining an increase of 7 points in recall. It is also worth mentioning that our constrained system (using non-linguistic local features) performs very closely to the second best overall system by the NLANGP team (unconstrained). Finally, we would like to point out to the overall low results in this task (for example, compared to the 2014 edition), due to the very small and difficult training set (e.g., containing many short samples such as “Tasty Dog!”) which made it extremely hard to learn good models for this task. The OTE models will be made freely available in the ixa-pipe-nerc website in time for SemEval 2015.
The EliXa system implements a single multiclass SVM classifier. We use the SMO implementation provided by the Weka library[Hall et al.2009]. All the classifiers built over the training data were evaluated via 10-fold cross validation. The complexity parameter was optimized as (). Many configurations were tested in this experiments, but in the following we only will describe the final setting.
The very first features we introduced in our classifier were token ngrams. Initial experiments showed that lemma ngrams (lgrams) performed better than raw form ngrams. One feature per lgram is added to the vector representation, and lemma frequency is stored. With respect to the ngram size used, we tested up to 4-gram features and improvement was achieved in laptop domain but only when not combined with other features.
PoS tag and lemma information, obtained using the IXA pipes tools [Agerri et al.2014], were also included as features. One feature per PoS tag was added again storing the number of occurrences of a tag in the sentence. These features slightly improve over the baseline only in the restaurant domain.
Given that a sentence may contain multiple opinions, we define a window span around a given opinion target (5 words before and 5 words after). When the target of an opinion is null the whole sentence is taken as span. Only the restaurant and hotel domains contained gold target annotations so we did not use this feature in the laptop domain.
The positive and negative scores we extracted as features from both general purpose and domain specific lexicons. Both scores are calculated as the sum of every positive/negative score in the corresponding lexicon divided by the number of words in the sentence. Features obtained from the general lexicons provide a slight improvement. is better for restaurant domain, while is better for laptops. Domain specific lexicons and also help as shown by tables 3 and 4.
Word2vec clustering features combine best with the rest as shown by table 3. These features only were useful for the restaurant domain, perhaps due to the small size of the laptops domain data.
Every feature, when used in isolation, only marginally improves the baseline. Some of them, such as the E&A features (using the gold information from the slot1 subtask) for the laptop domain, only help when combined with others. Best performance is achieved when several features are combined. As shown by tables 4 and 5, improvement over the baseline ranges between 2,8% and 1,9% in the laptop and restaurant domains respectively.
Table 5 shows the result achieved by our sentiment polarity classifier. Although for both restaurant and laptops domains we obtain results over the baseline both performance are modest.
In contrast, for the out of domain track, which was evaluated on hotel reviews our system obtains the third highest score. Because of the similarity of the domains, we straightforwardly applied our restaurant domain models. The good results of the constrained system could mean that the feature combination used may be robust across domains. With respect to the unconstrained system, we suspect that such a good performance is achieved due to the fact that word cluster information was very adequate for the hotel domain, because contains a 10.55% of hotel reviews.
|Sentiue||78.70 (1)||79.35 (1)||71.68 (4)|
|lsislif||75.50 (3)||77.87 (3)||85.84 (1)|
|EliXa (u)||70.06(10)||72.92 (7)||79.65 (3)|
|EliXa (c)||67.34 (14)||71.55 (9)||74.93 (5)|
We have presented a modular and supervised ABSA platform developed to facilitate future experimentation in the field. We submitted runs corresponding to the slot2 and slot3 subtasks, obtaining competitive results. In particular, we obtained the best results in slot2 (OTE) and for slot3 we obtain 3rd best result in the out-of-domain track, which is nice for a supervised system. Finally, a system for topic detection (slot1) is currently under development.
This work has been supported by the following projects: ADi project (Etortek grant No. IE-14-382), NewsReader (FP7-ICT 2011-8-316404), SKaTer (TIN2012-38584-C06-02) and Tacardi (TIN2012-38523-C02-01).
Class-based n-gram models of natural language.Computational linguistics, 18(4):467–479.
Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms.In
Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 1–8.
Word representations: A simple and general method for semi-supervised learning.In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384–394, Uppsala, Sweden, July.