The Internet is a very rich source of user-generated information. As knowledge management technologies have evolved, many organizations have turned their eyes to such information, as a way to obtain global feedback on their activities. Some studies O’Connor et al. (2010) have pointed out that such systems could perform as well as traditional polling systems, but at a much lower cost.
Talaia ‘Watchtower’ is a platform allowing automatic analysis of the impact in social media and digital press and of topics or domains specified by the user. The process starts when the user configures the system to find information related to a domain or topic. Talaia provides real time information on the topic and helps the user interpret the data by means of various graphic visualizations.
Such technology has various applications areas, such as:
Monitoring events: Follow public events in real time harvesting people’ opinions and media news.
Analyze citizen or electors voice: Track the opinions citizens convey with respect to public services or trends during electoral campaigns.
Marketing and brand management: Measure the impact of marketing campaigns in a digital environment.
Business Intelligence: Fast and efficient visualization of the information extracted from social media offers companies the possibility to analyze the opinions about their products or services.
Security: Detection of social conflicts, crimes, and cyberbullying.
Talaia consists of three main modules: (i) a crawler collecting the data; (ii)a data analysis module for processing the data; and (iii) a Graphical User Interface (GUI) providing interpretation of the data analyzed. Figure 1 describes the architecture of the platform. Its main features of are the following:
Monitoring and automatic analysis: Definition of the domain/topic to monitor by means of term taxonomies. Continuous monitoring of various mention sources, including social media and digital press.
Multilingual extraction of mentions and opinions relevant to the topics monitored, by means of Natural Language Processing (NLP) techniques.
Result exploration: Intuitive GUI to visualize and analyze the results. Advanced statistics and filters, such as per language results, impact of the topics or author statistics.
Control of the monitoring process through the user interface: update search terms or review and correct gathered mentions.
This paper focuses in processes monitoring user satisfaction respect to a topic, and that is why we pay special attention to the Sentiment Analysis (SA) module. Nevertheless, Talaia is capable of performing further data analysis tasks involving user profiling, in order to get the most out of the data. Specifically geolocalization, user community identification and gender detection have been implemented. Section 4.4 gives some details on this regard.
The rest of the paper is organized as follows. Section 2 discusses previous work on the field, focusing on social media on the one hand and in SA on the other. Both academic and industrial points of view are taken into account. The third section describes in detail the modules composing Talaia. Section 6 presents two success cases where the platform has been used for monitoring different events. Section 7 provides evaluation and results on the SA task for both scenarios. data. The last section draws some conclusions and future directions.
2.1 Social Media Analysis
Social media are becoming the primary environment for producing, spreading and consuming information. Enormous quantities of user generated content are produced constantly. Even traditional media are spread their news and get a large amount of traffic trough social media. Monitoring events or topics in such an environment is however a challenging task. That is where data mining and Natural Language Processing (NLP) become essential. We have to be able to collect large scale data, but also to find the relevant information. Tracking a topic over an extended time period means that the information flow grows and fades over time. Also a topic may evolve over in terms of the vocabulary used, and thus ”topic detection and tracking” (TDT) techniques become relevant to maintain a successful monitoring.
Several systems have been proposed in the literature to explore events. Trend Miner Preoţiuc-Pietro and Cohn (2013) extracts multilingual terms from social media, groups and visualizes them in temporal series. Social Sensor Aiello et al. (2013) and Twitcident Abel et al. (2012) may be the most similar systems to ours. The first one focuses on tracking topic or events predefined by the user. The second makes user defined searches related to crisis management. LRA111https://www.lracrisistracker.com aims to discovering and tracking crisis situations based on crowdsourced information. ReDites Osborne et al. (2014) detects an tracks topic in a fully automated way.
Detecting terms that represent a domain or topic semantically has been traditionally addressed by statistical models such as Latent Dirichlet Association (LDA) Blei et al. (2003). Classical LDA models are applied over stactic document collections. In order to extract terms from dynamic collections, the most common approach is to follow a two step strategy Shamma et al. (2011) consisting on detecting emerging terms and grouping the in clusters defining a domain.
Nguyen et al. (2016) predict emerging terms by means word co-occurrence distributional models, comparing the terms in an specific time window against the whole collection. Abilhoa and De Castro (2014) use a graph-based representation of the document collection. Aiello et al. (2013) propose df-idft, a variation of tf-idf that includes the temporal factor. Kim et al. (2016)
combine neural networks and sequence labeling in order to extract relevant terms from conversations.Miao et al. (2017) propose to reduce the cost of predicting emerging topics, by finding a small group of representative users and predict the emerging topics from their social media activity.
There is also the problem of the scope of the event or topic to be tracked. An event event maybe tracked at global level (e.g. Football World Cup), but most events are local or regional at most. Two issues arise at this point. How to restrict the data gathered to a specific region, and how to cope with multilingual data. Some authors tackle the problem by automatically geolocating tweets while others try focus on user locations. See Zubiaga et al. (2017) for a summary of previous approaches. Our approach is to geolocate users rather than tweets, in order to construct a census of inhabitants in a region.
2.2 Sentiment Analysis
Much work has been done on the sentiment analysis field, from polarity lexicon induction to sentiment labeling and opinion extraction. Extensive surveys are already availablePang and Lee (2008), Liu (2012). Thus, in this paper we will focus on relevant works for SA with respect to social media and, specially, with Twitter data.
In the last years microblogging sites such as Twitter have attracted the attention of many researchers with diverse objectives: stock market prediction Bollen et al. (2010)
, polling estimationO’Connor et al. (2010) or crisis situations analysis Nagy and Stamberger (2012). The growing number of Sentiment Analysis (SA) related shared tasks (e.g., SemEval Aspect based SA and Twitter SA shared tasks) or the commercial platforms for reputation management (see section 2.3) are proof of the interest for both academic and market worlds.
The special characteristics of the language of Twitter require a special treatment when analyzing the messages. A special syntax (RT, @user, #tag,…), emoticons, ungrammatical sentences, vocabulary variations and other phenomena lead to a drop in the performance of traditional NLP tools Foster et al. (2011), Liu et al. (2011). In order to solve this problem, a number of normalizations have been proposed, as a preprocess of any analysis. Brody and Diakopoulos (2011) deal with the word lengthening phenomenon, which is especially important for sentiment analysis because it usually expresses emphasis of the message. Other normalizations include matching Out Of Vocabulary OOV forms and acronyms to their standard vocabulary forms (e.g., ’imo = in my opinion’) Han and Baldwin (2011), Liu et al. (2012), Alegria et al. (2014) or hashtag decomposition (e.g.,#GameOfThrones = ’Game Of Thrones’) Brun and Roux (2014), Belainine et al. (2016).
Once texts are normalized, sentiment analysis can be performed. Several ruled-based systems to polarity classification have been proposedHu and Liu (2004), Thelwall (2017), Taboada et al. (2011)
. Nevertheless, we will focus on Machine Learning (ML) based approaches which are the most extended ones. Support Vector Machines (SVM) and Logistic Regression algorithms have been the very popular for polarity classification as various international shared tasksRomán et al. (2015), Pontiki et al. (2014), Rosenthal et al. (2014) show. Typical features on those systems include sentiment word/lemma ngram features, POS tagsBarbosa and Feng (2010), Sentiment LexiconsKouloumpis et al. (2011), emoticons O’Connor et al. (2010), discourse information Somasundaran et al. (2009) or more recently word embeddings Mikolov et al. (2013).
In the last years however (from 2015 on), the academic world has shifted to Deep Learning (DL) approaches, as Nakov et al.20162015), Johnson and Zhang (2016)
and Convolutional Neural Networks (CNN) are the preferred choices. Severyn and Moschitti2015
use a single layer CNN, first to construct word embeddings and then to train the classifier.Deriu et al. (2017) propose a two phase training: first they train a neural network with large amounts of weakly supervised data collected from Twitter. The network is initialized with word embeddings learned by means of word2vec Mikolov et al. (2013) from very large corpora collected from twitter. Weights learned in that step are transferred to a second neural network trained over the actual annotated data, to learn the final classifier. A two convolutional layer CNN is used for both training phases. A very similar approach is followed by Cliche 2017, obtaining top results in SemEval Rosenthal et al. (2017). Howard and Ruder (2018) follow a similar three step approach with a more complex network topology obtaining state of the art results for various task, including sentiment analysis.
A common problem of supervised approaches, specially of DL, is the need of large amounts of labeled data for training. The common practice in the literature is to gather weakly supervised datasets following the emoticon heuristicGo et al. (2009)222collect tweets containing the “:)” emoticon and regard them as positive, and likewise for the “:(“ emoticon.. This is feasible for major languages, but it is a very difficult (if possible) and time costly task for non major languages such as Basque.
2.3 Commercial solutions
We can find various commercial solutions in the market. We focus our analysis on systems that provide an integral solution of the monitoring process, leaving out tools that only approach specific phases of the surveillance process, or solutions that offer bare NLP processing chains which require further development to achieve a working social media monitor. Table 5 offers a detailed comparative of the tools analysed. We focus our analysis in the sources where information is gathered on, their tracking capabilities, the processing of multilingual information, and the data visualization.
Iconoce333text is a system oriented to reputation management, offering various features such as measuring impact of campaigns, or reputation monitoring. Although it also can monitor social media(Twitter and Facebook) its strength lies on the analysis of digital press. Multilingual information can be gathered but no treatment is done (lemmatization or crosslingual searches). It has 3 separated search engines for authors, mentions and comments. A customizable dashboard offers various visualizations and data aggregations (e.g., salient term and topics, influencer, sentiment or trends). Periodical reports and alerts in the face of tendency changes are provided. As a distinctive feature, it offers a personalized press archive based on the customer configuration. In a similar way, INNGUMA444https://www.innguma.com is a tool providing business intelligence services. They put their main effort in the crawling step. Rather than offering the user results over analysed data, the tool is designed for a group of customers to analyse the data collaboratively. Customers are provided with a search engine (more or less powerful depending on the pricing plan), and interface where they can store and share their findings.
Lexalitycs555https://www.lexalytics.com and Meaning Cloud666https://www.meaningcloud.com/ are text analytics enterprises. Their strength is the data analysis part rather than the monitoring of many sources. Both systems are built upon robust NLP chains. Document classification, entity extraction and aspect based sentiment analysis are performed among other. Sentiment Analysis is approached with rule-based systems based on lexicons and deep linguistic analysis, offering the possibility of custom domain adaptations. Both Lexalitics and Meaning Cloud lack a result visualization interface, limiting their outputs to Excel plugins, leaving the full analysis of the data in the user’s hands.
Websays777https://websays.com/ monitors a wide range of sources including news, Blogs/RSS, Forums, Facebook, Twitter, Google+, LinkedIn, Instagram, Foursquare, Pinterest, Youtube, Vimeo, Reviews (Tripadvisor, Booking,…). The user is able to configure the crawling using keywords. Negative words are also allowed in order to effectively restrict the search to the desired domain. The system is able to process data in several languages, but they report to be most effective with European languages (Spanish, English, French, Italian, and Catalan). Sentiment analysis is performed by combining ML algorithms and human validation, so the statistical models may learn from corrected data. The user may navigate through results using a dashboard that offers multiple filtering options. Graphs, salient terms, trending topics, influencers, sentiment, trends are provided, as well as periodical alerts and reports. The interface offers the possibility to manually edit and correct the results.
Following the same concept of Websays, Keyhole888https://keyhole.co/ is a monitoring and analytics tool that provides trends, insights, and analysis (including sentiment) of hashtags, keywords, or accounts on Twitter and Instagram. It reports supporting data processing in a number of languages, but no details are given on the technogoly. User can also track web mentions, but two separate monitoring processes must be setup.
Lynguo999http://lynguo.iic.uam.es/ is also in the same group of Websays and Keyhole. It Provides support in 24 languages, implementing a rule-based sentiment analysis system. Monitoring is configured specifiyng keywords and users, allowing for negative ones as well. Lynguo is also able to geolocate comments.
Ubermetrics101010https://www.ubermetrics-technologies.com/ is one of the few platforms that monitors multimedia sources including Youtube and Vimeo, but also TV and Radio sources. It reports to process data in 40 languages. Its visualization dashboard offers customizable graphs based on multiple search criteria. Ubermetrics focuses to a certain extent in analysing virality (impact) of the mentions and author profiling.
Snaptrends111111http://snaptrends.com/ focuses on social media (Twitter, Facebook, Instagram, Google+, and Pinterest). Multilingual data is handled by means of MT (80 languages to English). It uses a proprietary NLP chain for processing English data, including sentiment analysis and relevant term extraction. The main feature for filtering large volumes of information is a geolocation-based search engine, combined with keyword based searches and other filters such as data sources. As for result visualization, it has various data aggregations, such as influencer rankings or sentiment evolution across time by geographical area. Snaptrend makes an special effort visualizing specific data, generating mention mosaics and timelines in real time.
3 Data Collection
The first step of a monitoring system such as Talaia is the collection of information. Multi Source Monitor (MSM)121212http://github.com/Elhuyar/MSM is currently able to monitor Twitter, syndication feeds and also multimedia sources such as television or radio programs. Support for other social media such as Youtube, Google+, etc. is under development.
MSM is a keyword based crawler, which works on a set of keywords defined by the user. Rather than a list of unconnected terms, Talaia is designed to work over a hierarchy, which allows a better organization of the data on the analysis step. This way, the keywords are defined as belonging to a specific category in the taxonomy. One handicap of a keyword-based strategy crawling is that it is often difficult to define unambiguous terms that do not capture noisy messages. In order to minimize this situation, MSM implements a number of measures to optimize the crawling process:
Regular expressions are used to define keywords. This allows to differentiate between common words and proper names, or full words and affixes (e.g., podemos ‘we can’ vs. Podemos political party). This phenomena are specially frequent in social media, were language rules are often ignored.
Language specific keywords are defined. A word that is a very good keyword in a language can be a source of noise on another, e.g. mendia ‘mountain’ in Spanish is unambiguously referring to Idoia Mendia Basque politician in our context, while in Basque it is clearly ambiguous.
Anchor terms may be defined. Anchors terms usually define the general topic (e.g. election campaign) to monitor. If user specifies that a keyword requires an anchor, in order to accept a message containing that keyword, the message must also contain at least one anchor term. Anchor terms may be keywords or not.
Long paragraphs are split before looking for keywords in the case of messages coming from news sites. First, it looks if any keyword appears in a candidate article. If so, it looks for keywords in a sentence basis, and those sentences are considered as the message unit.
3.0.1 Language identification (LID)
LID is indispensable in order to apply the corresponding NLP analysis. LID is integrated into the crawling process as part of MSM. There are two main reasons for that. First, it allows us implement the language specific keyword feature. Second, having the language previously identified gives us flexibility for applying the subsequent NLP tools. At the moment language identification is implemented using the library Optimaize131313https://github.com/optimaize/language-detector, combined with source specific optimizations (social media vs. feeds).
4 Data Analysis
The data analysis is mainly performed by EliXa San Vicente et al. (2015) which integrates the following processes, described in the next sections.
EliXa141414https://github.com/Elhuyar/Elixa is a supervised Sentiment Analysis system. It was developed as a modular platform which allows to easily conduct experiments by replacing the modules or adding new features. It was first tested in the ABSA 2015 shared task at SemEval workshopPontiki et al. (2015). EliXa currently offers resources and models for 4 languages: Basque, Spanish, English and French. Its implementation is easily adaptable to new languages, with the minimum requirement of a polarity lexicon and/or a training datasets.
The special characteristics of tweets require an specific treatment. A special syntax (RT, @user, #tag,…), emoticons, ungrammatical sentences, vocabulary variations and other phenomena lead to a drop in the performance of traditional NLP tools Foster et al. (2011), Liu et al. (2011). To address theses issues, EliXa integrates a microtext normalization module which is applied to social media messages, based on Saralegi and San Vicente (2013). The normalizer is based on heuristic rules, such as standardizing URLs, normalizing character repetitions or dividing long words (e.g. #AVeryLongDay a very long day). Also Out Of Vocabulary (OOV) term normalization is addressed by means of language specific frequency lists based on Twitter corpora data.
Furthermore, EliXa’s normalization also includes SA specific functionalities: emoticons are normalized into a 7 category scale. Also especial expressions such us interjections and onomatopoeia are marked. Those normalized terms must be included in the polarity lexicons in order to have a greater impact in the sentiment analysis classification. Table 1 presents the resources provided for normalization according to their use.
|Word form dictionaries||text normalization (e.g. 4everforever)||122,085||556,501||67,811||453,037|
|OOV dictionaries||text normalization (e.g. 4everforever)||63||7,823||223||279|
|Emoticon lexicons||Polarity tagging||60 (regexes matching emoji groups)|
|Stopword lemma lists||
Polarity tagging feature extraction
4.2 NLP processing
EliXa currently performs tokenization, lemmatization and POS tagging prior to sentiment analysis classification. No entity recognition is applied; entities are matched only if they are defined as keywords. Although EliXa is able to work with corpora preprocessed with other taggers, its default NLP processing is made by means of IXA pipes Agerri et al. (2014) which is integrated as a library.
4.3 Sentiment Analysis
EliXa’s core feature is its polarity classifier, which implements a multiclass Support Vector Machine (SVM) algorithm Hall et al. (2009) combining the information extracted from polarity lexicons with linguistic features obtained from the previous step. Main features include polarity values from general and domain specific polarity lexicons, lemma and POS tag ngrams and positivity and negativity counts based on polarity lexicons. Features representing other linguistic phenomena such as treatment of negation, locutions or punctuation marks are also included. Finally, there are some social media specific features, such as the proportion of capitalized symbols (which often is used to increase the intensity of the message) or emoticon information.
EliXa currently provides ready to use polarity classification models, although one of its strengths is that new models can be trained if training data is available for a new domain.
4.4 User profiling
Talaia is also capable of providing deeper analysis of the data, by means of user profiling. Specifically, geolocation, gender detection and user community identification are implemented.
Opinions gathered are geolocated. This allows Talaia to analyse the differences in opinions with respect to a topic that may arise between regions or countries. Geolocation is one by exploiting social media information from both messages and authors. If a message is geolocated, its information is used directly. Otherwise, user profile information is used. The task is challenging, because users do not provide such information always, or they define fictitious locations (e.g., ’Middle earth’, ’In a galaxy far, far away…’ ). Roughly, the system is able to geolocate correctly the 73% of the social media messages extracted.
Gender detection is another important factor in many social science studies. A supervised gender classifier is implemented to infer user gender, based on features extracted from academic papers Kokkos and Tzouramanis (2014), Rangel et al. (2017). User gender detection is based on classifying messages, no user profile information is used.
User communities are identified by means of network analysis algorithms. User communities allow us to infer influencers and to establish the actual context of certain stances. For example, we are able to do analysis such as: finding if a large amount of messages in favour or against an event, actually come from an specific group of users or it is a global perception.
5 Data Visualization
The GUI has been developed using the Django Web Application framework151515https://www.djangoproject.com/. This interface provides data analysis visualizations and manages the communication with both the crawler and EliXa.
The interface also has management capabilities which allows to manually review the automatic sentiment labeling. Keyword hierarchy and new website sources can be also set up through the interface (see Figure 2 for some examples). These functionalities ease the process of creating training datasets and adapting Talaia to new domains.
6 Success Cases
In this section we present two real use cases where Talaia has been applied, and use them for evaluation purposes. The first one focuses on tracking cultural events. The second one analyses citizen or electors voice and is focused in the political domain.
6.1 Cultural domain
Talaia was first applied in the Behagunea171717http://behagune.elhuyar.eus project. The objective of the project involved tracking the social media impact of cultural events and projects carried out (more than 500) in the framework of the Donostia European Capital of Culture (DSS2016) year during 2016. The project included monitoring opinions in press and social media in four languages: Basque, French and Spanish as coexisting languages in the different Basque speaking territories and English as international language.
Domain adapted polarity models were created. Since events related to DSS2016 were already programmed during 2015, a previous crawling was carried out, in order to build datasets. Those datasets were manually annotated for polarity in a three category scale(positive, negative, neutral). Section 7.1 gives more details about the various language and domain specific datasets. Polarity classification models for the cultural domain were trained from those datasets and are distributed as part of EliXa. Section 7.2 gives details related to those classifiers.
6.2 Political domain
Talaia was used to track citizen opinions during the electoral Basque electoral campaign in September 2016. Crawling was carried out during the election campaign period, starting on September 8th and finishing on September 23th (23:59pm). It offers useful insights for political analysis such as sympathy rankings, the evolution of the opinions over time, most relevant messages, etc.
The crawler was configured to find mentions talking about main the political parties present on the campaign and their respective candidates (only main candidates monitored, i.e., those opting to be Lehendakari ’head of the government’).
Regarding social media Twitter was monitored. Since we are talking about monitoring an event happening on a regional scope, to main restrictions were applied: only mentions written in Basque and Spanish were crawled, because those are the two official languages in the region. The second restriction was to constrain mentions to users from the specific geographical area of the Basque country. The task was then to discard noisy messages, that do not belong to citizens involved in the election, but were likely to be talking about it. In this case, for example the tweets crawling process was likely to capture many mentions from other regions in Spain.
In order to solve this problem we created a census of citizens of the Basque Country. For that aim we trained an supervised classifier based on twitter geographical information, and follower and friends graphs.
As for the news sources, a list of 30 sources was manually compiled, including TV, printed media and and radio stations, all of them with working in a regional scope.
For the evaluation of Talaia, we evaluate perfomance on Elixa’s polarity classifier, for the two aforementioned domains. In all cases the L2-loss SVM implementation of the LIBLINEARFan et al. (2008) toolkit was used as classification algorithm within Weka Hall et al. (2009) data mining software. Experiments with polynomial kernels were also conducted (degrees 2-5) but we found no improvement with at the expense of much higher training times. All classifiers presented in the following sections were evaluated by means of the 10-fold cross validation strategy. Complexity parameter was optimized ().
presents the statistics and class distribution of the datasets gathered and annotated in order to build the polarity classifiers for each language in the cultural domain. All annotations were done manually. Polarity was annotated at mention level. Because of the level of specificity reached when defining the keyword taxonomy we rarely find a mention referring to more than one entity or event. Statistics show that corpora in all languages have a similar distribution, with a high number of neutral mentions a skewed towards positive opinions.
Table 3 shows the characteristics of the political domain datasets. In this case, each tweet was annotated with respect to a number of entities appearing in the tweet. Annotators were asked to annotate the polarity of a tweet from the perspective of each of the entities detected in a tweet, that is, a tweet may contain more than one polarity annotation. Example 18 show a real case were a tweet was annotated twice. In fact the numbers in table 3 give 1.3 and 1.24 average annotations per tweet for Basque and Spanish, respectively. show that In contrast, political domain datasets show very different distributions. While Basque dataset seems to follow the same pattern seen in the cultural domain, Spanish datasets has a very high number of negative opinions.
@pnvgasteiz (negative) erabat ados, lotsagarria. Aukera ona aurrera begiratu ta @ehbildu (positive)—ren euskara arloko proposamena martxan jartzeko #herriakordioa 181818English translation: @pnvgasteiz totally agrees, shameful. Good chance to look forward and apply the proposal of @ehbildu in the field of Basque #herriakordioa
Annotating tweets in the political domain proved to be a rather challenging task. Sarcasm is often present, interpellations to a person are frequent even if he/she is not the target of the opinion, an opinion may be present but in an implicit manner, or a third party negative opinion may be expressed towards an entity but the author may defend it against the expressed opinion. Full guidelines provided to the annotators can be consulted at Annex Annex II - Polarity annotation guidelines.
Table 4 shows the performance of the various multilingual classifiers trained. The models include the following features: 1-gram word form with minimum frequency of and document frequency (df) of , POS tag 1-gram features, polarity words according to a polarity lexicon, and microtext normalization features (url standarization, OOV normalization, character repetition, capitalization, emoticon normalization).
Reported results are in general higher for the cultural domain, even if the datasets are smaller in comparison. If we compare Basque and Spanish classifiers, both achieve accuracies above 70% for the cultural domain while their performance drops around 4% in the political domain. This was to be expected. On the one hand, no especial effort has been made to model the entity level polarity, and thus our classifiers have difficulties to deal with tweets containing various annotations. On the other, the political data may be more challenging in terms of the linguistic phenomena used in the genre.
8 Clonclusion and Future Work
We have presented Talaia, a real time monitor of social media and digital press. Talaia is able to extract information related to an specific topic and analyze it by means of natural language processing technologies. Two success cases and the resources generated from hose cases have been described. In that sense, we have shown the ability to adapt our system to different domains and languages.
All the software behind the platform including the crawler, data processing chain and interface is publicly available under the GNU GPLv3 license.
Talaia is still under development. The short term objectives include work on optimizing the information extraction process. Specifically, extracting keywords from the data downloaded up to a certain point would allow us automatically adapt the system to new terms, without losing information because the keyword hierarchy is outdated or the topic is poorly defined.
Another important point is the adaptation of our sentiment analysis model to new domains. In that sense experiments are being carried out in order to minimize the domain adaptation effort, both in terms of data collection and annotation effort.
Multilinguality is one of the main challenges of such a system. Currently the system is able to process data in 4 languages, and we are working to extend it to new languages.
Last but not least, data analysis may include further processing other than sentiment analysis. Geolocation based analysis, user community detection and other useful tasks for user profiling (e.g. gender detection) are the focus of our ongoing work.
This work has been supported by the following projects: Elkarola project (Elkartek grant No. IE-14-382), and Tuner project (MINECO/FEDER grant No. TIN2015-65308-C5-1-R).
Annex I - Comparative of commercial Social Media Monitors
|Platform||Data Sources||Crawling||Data Processing||Search||Navigation|
|Iconoce||Digital press, blogs, videos, social media (Facebook, Twitter, Linked-in?)||Personalized, subject to agreement||no||Personalized archive
3 separate search engines (mentions, comment, authors) no lemmatization - no crosslingual.
|Graphs (aggregations?), salient terms, salient topics, Influencers, alerts, reports|
|Intelsuite||Rss multimedia, Deep Web, Twitter, Facebook, Linkedin, possibility to include external documentation manually||?||MT, No mention of text processing. No SA||Semantic search (techniques not especified). Index cards and documents. Information is tagged manually.||Reports, content creation, social media management. Multilingual GUI.|
|Meaning Cloud||Digital news, blogs, Twitter, satisfaction surveys (customer provided), phone survey transcriptions,||5 languages (Es, En, Fr, Pt, It). Language identification, Clustering for topic detection. Normalization ?lemmatization, pos tagging, parsing, NERC, GATE API
SA: Ruled-based. Sentiment Lexicons + rules. Irony and subjectivity detection. Entity polarity detected using manually compiled dictionaries.
|no||No Dashboard, visualizations or data aggregatios. Excel plugin or API access|
|Snap-trends||Social Media (Twitter, Facebook, Instagram, Google+,…)||MT from 80 languages. Propietary linguistic processing. Topic (trends) detection.
Propietary sentiment analysis.
|Geolocation based search engine, mutiple criteria: social network, search terms, geolocation. Previous search feature.||Agreggations, interactive visualization, temporal trends.|
|Websays||News, Blogs/RSS, Forums, Facebook, Twitter, Google+, LinkedIn, Instagram, Foursquare, Pinterest, Youtube, Vimeo, Reviews (Tripadvisor, Booking,…)||Keyword based, accepts also negative keywords.||Multilingual data processing, no specific data about the coverage
SA: AI (ML) + human validation
|Multiple search criteria, filter-based.||Graphs, salient terms, trending topics, influencers, sentiment, trends. Alerts and reports.|
|Lynguo||Facebook, Twitter, Instagram, YouTube, online media, blogs and forums.||Keyword based, accepts also negative keywords and accounts.||24 languages.
SA: Rule-based. Lexicons + rules. Polarity and emotions. Aspect based SA
|Customizable dashboard. Several default aggregation and possibility to generate custom visualizations. Alerts and periodical reports|
|Keyhole||Twitter, Instagram, web sources.||Social media and web sources are configured and monitored separately. Keywords, users.||13 languages.
|Influencers, timeline, trends, sentiment, aggregations.|
|Uber-metrics||Blogs, forums, academic/scientific journals, digital press, Instagram, Tumblr, Google+, Facebook, Twitter, YouTube, Vimeo, Flickr, and Foursquare. With Ubermetrics you can even capture comments from YouTube, Facebook, and major online news sources. TV/Radio||Customizable ”search agents”. Keyword based||40 languages. Propietary data processing.||Detailed search based on multiple criteria included in visualization dashboard||Dashboard, alerts, reports.|
Annex II - Polarity annotation guidelines
Following we present the guidelines provided to the annotators for marking entity level polarity, including ambiguous cases and the solutions proposed for each of them:
Neutral: There is no clear opinion or sentiment respect to the target party or candidate from the holder. Mentions referring to objective facts fall into this category as well, even if the fact may be considered positive or negative (e.g. ”El PNV consigue grupo en el senado”191919English translation: PNV gets its own group in the senate ).
Positive: The mention includes a positive assessment from the holder with respect to the target (e.g. ”Urkullu ha sido un buen lehendakari.”202020English translation: Urkullu has been a good president).
Negative: The mention includes a negative assessment from the holder with respect to the target (e.g. ”Urkullu ha sido un lehendakari mediocre.”212121English translation: Urkullu has been a mediocre president).
Subjectivity is not explicit. E.g., ”Cataluña desobedece constantemente la Ley, PNV pide acercamiento de presos, Ribo da los pasos hacia el nacionalismo y Rajoy en SanXenso”222222English translation: Catalunya constantly disobeys the law, PNV asks for the rapprochement of prisoners, Ribo makes steps towards nationalism and Rajoy is in SanXenso. Main target in the example is ”Rajoy” but author expresses a negative opinion towards PNV. Annotators were ask to interpret the implicit subjectivity according to the holder.
The holder expresses the opinion of a third party. E.g., ”Podemos cree que Urkullu tiene miedo y por eso adelantará las elecciones - EcoDiario.es ¡URL¿”232323English translation: Podemos thinks Urkullu is scared and that’s why he will call the election early - EcoDiario.es ¡URL¿.The mentions expresses a negative opinion from Podemos towards Urkullu. Annotators were asked to annotate it as negative towards Urkullu if they could certify that the holder agreed with the opinion from Podemos, or netural otherways.
There are two (or more) references to a single target, expressing different polarities. E.g., ”PNV tendrá grupo propio en el Senado tras la cesión de cuatro asientos por parte del PP y mantiene su ”no a Rajoy””242424English translation: PNV will have its own group in the senate thanks to PP handing over for seats, and they still maintain the ”No to Rajoy”. The following criteria were applied: N+P=NEU, N+NEU=N, P+NEU=P.
The polarity of the message and the polarity towards the target are different. E.g., El tercer precandidato de #Podemos llama a desalojar al PNV ¡URL¿252525English translation: . In those cases, polarity towards the target should be annotated. In the example, message polarity would be neutral, but polarity towards ”PNV” would be negative. Thus message would be marked as negative.
Irony/sarcasm. E.g., ¿Las cambiamos por Calle Arnaldo Otegi o Paseo de Juana Chaos? Al fin y al cabo, son hombres de paz… ¡URL¿262626English translation: What if we change the name of the street to Arnaldo Otegi St. or Paseo de Juana Chaos? After all, they are men of peace… ¡URL¿. Annotators were asked to interpret irony. The previous example would be thus negative towards the target Arnaldo Otegi.
The holder is condemning a negative stance against the target. E.g., eldiarionorte cada vez se os ve más el plumero. Panfleto anti bildu. Cuando la salud de los zubietarras empeore, vais y se lo contáis.272727English translation: eldiarionorte it is more what you are up to. Anti Bildu pamphlet. When the people in Zubieta lose their health you can go and tell them. Annotators were asked to intepret the intention of the holder. If the notice a clear intention of defending the target the it should be regarded as positive.
The target captured is not the main focus of the opinion. E.g., CristinaSegui_ Subió impuestos,no hace nada contra los nacionalistas y les da dinero,no ilegaliza a Bildu y stá implicado en lo de Bárcenas282828English translation: CristinaSegui_ He rose taxes, he does nothing against nationalists and gives them money, he does not ban Bildu and he is involved in the Bárcenas affair. Annotator were asked to mark the polarity towards the target, regardless of the main focus of the opinion.
- O’Connor et al. (2010) B. O’Connor, R. Balasubramanyan, B. R. Routledge, N. A. Smith, From tweets to polls: Linking text sentiment to public opinion time series, in: Fourth International AAAI Conference on Weblogs and Social Media, 2010.
- Preoţiuc-Pietro and Cohn (2013) D. Preoţiuc-Pietro, T. Cohn, A temporal model of text periodicities using gaussian processes, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 977–988.
- Aiello et al. (2013) L. M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Göker, I. Kompatsiaris, A. Jaimes, Sensing trending topics in twitter, IEEE Transactions on Multimedia 15 (2013) 1268–1282.
- Abel et al. (2012) F. Abel, C. Hauff, G.-J. Houben, R. Stronkman, K. Tao, Twitcident: fighting fire with information from social web streams, in: Proceedings of the 21st International Conference on World Wide Web, ACM, 2012, pp. 305–308.
- Osborne et al. (2014) M. Osborne, S. Moran, R. McCreadie, A. Von Lunen, M. Sykora, E. Cano, N. Ireson, C. Macdonald, I. Ounis, Y. He, et al., Real-time detection, tracking, and monitoring of automatically discovered events in social media, ACL 2014 (2014) 37.
- Blei et al. (2003) D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, Journal of machine Learning research 3 (2003) 993–1022.
- Shamma et al. (2011) D. A. Shamma, L. Kennedy, E. F. Churchill, Peaks and persistence: modeling the shape of microblog conversations, in: Proceedings of the ACM 2011 conference on Computer supported cooperative work, ACM, 2011, pp. 355–358.
- Nguyen et al. (2016) K.-L. Nguyen, B.-J. Shin, S. J. Yoo, Hot topic detection and technology trend tracking for patents utilizing term frequency and proportional document frequency and semantic information, in: Big Data and Smart Computing (BigComp), 2016 International Conference on, IEEE, 2016, pp. 223–230.
Abilhoa and De Castro (2014)
W. D. Abilhoa, L. N. De Castro,
A keyword extraction method from twitter messages represented as graphs,Applied Mathematics and Computation 240 (2014) 308–325.
- Kim et al. (2016) S. Kim, R. Banchs, H. Li, Exploring convolutional and recurrent neural networks in sequential labelling for dialogue topic tracking, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 2016, pp. 963–973.
- Miao et al. (2017) Z. Miao, K. Chen, Y. Fang, J. He, Y. Zhou, W. Zhang, H. Zha, Cost-effective online trending topic detection and popularity prediction in microblogging, ACM Transactions on Information Systems (TOIS) 35 (2017) 18.
- Zubiaga et al. (2017) A. Zubiaga, A. Voss, R. Procter, M. Liakata, B. Wang, A. Tsakalidis, Towards real-time, country-level location classification of worldwide tweets, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2053–2066.
- Pang and Lee (2008) B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (2008) 1–135.
- Liu (2012) B. Liu, Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies 5 (2012) 1–167.
- Bollen et al. (2010) J. Bollen, H. Mao, X.-J. Zeng, Twitter mood predicts the stock market, 1010.3003 (2010).
- Nagy and Stamberger (2012) A. Nagy, J. Stamberger, Crowd sentiment detection during disasters and crises, in: Proceedings of the 9th International ISCRAM Conference, 2012, pp. 1–9.
Foster et al. (2011)
J. Foster, O. Cetinoglu,
J. Wagner, J. Le Roux,
S. Hogan, J. Nivre,
D. Hogan, J. van Genabith,
#hardtoparse: POS tagging and parsing the
in: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.
- Liu et al. (2011) X. Liu, S. Zhang, F. Wei, M. Zhou, Recognizing named entities in tweets, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), 2011, pp. 359–367.
- Brody and Diakopoulos (2011) S. Brody, N. Diakopoulos, Cooooooooooooooollllllllllllll!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, 2011, pp. 562–570.
- Han and Baldwin (2011) B. Han, T. Baldwin, Lexical normalisation of short text messages: Makn sens a# twitter, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, 2011, pp. 368–378.
- Liu et al. (2012) F. Liu, F. Weng, X. Jiang, A broad-coverage normalization system for social media language, in: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Korea, 2012, pp. 1035–1044.
- Alegria et al. (2014) I. Alegria, N. Aranberri, P. R. Comas, V. Fresno, P. Gamallo, L. Padró, I. San Vicente, J. Turmo, A. Zubiaga, Tweetnorm_es corpus: an annotated corpus for spanish microtext normalization, in: Proceedings of the Language Resources and Evaluation Conference, 2014.
- Brun and Roux (2014) C. Brun, C. Roux, Decomposing hashtags to improve tweet polarity classification (décomposition des hash tags pour l’amélioration de la classification en polarité des tweets) [in french], in: Proceedings of TALN 2014 (Volume 2: Short Papers), Association pour le Traitement Automatique des Langues, 2014, pp. 473–478. URL: http://www.aclweb.org/anthology/F14-2015.
Belainine et al. (2016)
B. Belainine, A. Fonseca,
Named entity recognition and hashtag decomposition to
improve the classification of tweets,
in: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), The COLING 2016 Organizing Committee, 2016, pp. 102–111. URL:http://www.aclweb.org/anthology/W16-3915.
- Hu and Liu (2004) M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168–177.
- Thelwall (2017) M. Thelwall, The heart and soul of the web? sentiment strength detection in the social web with sentistrength, in: Cyberemotions, Springer, 2017, pp. 119–134.
- Taboada et al. (2011) M. Taboada, J. Brooke, M. Tofiloski, K. Voll, M. Stede, Lexicon-based methods for sentiment analysis, Computational linguistics 37 (2011) 267–307.
- Román et al. (2015) J. V. Román, E. M. Cámara, J. G. Morera, S. M. J. Zafra, Tass 2014-the challenge of aspect-based sentiment analysis, Procesamiento del Lenguaje Natural 54 (2015) 61–68.
- Pontiki et al. (2014) M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, Semeval-2014 task 4: Aspect based sentiment analysis, in: Proceedings of the International Workshop on Semantic Evaluation (SemEval), 2014.
- Rosenthal et al. (2014) S. Rosenthal, P. Nakov, A. Ritter, V. Stoyanov, Semeval-2014 task 9: Sentiment analysis in twitter, in: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval, volume 14, 2014.
- Barbosa and Feng (2010) L. Barbosa, J. Feng, Robust sentiment detection on twitter from biased and noisy data, in: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, Stroudsburg, PA, USA, 2010, pp. 36–44.
- Kouloumpis et al. (2011) E. Kouloumpis, T. Wilson, J. Moore, Twitter sentiment analysis: The good the bad and the OMG!, in: Fifth International AAAI Conference on Weblogs and Social Media, 2011.
- Somasundaran et al. (2009) S. Somasundaran, G. Namata, J. Wiebe, L. Getoor, Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification, in: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 -, EMNLP ’09, Stroudsburg, PA, USA, 2009, pp. 170–179.
- Mikolov et al. (2013) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in: Advances in Neural Information Processing Systems, 2013, pp. 3111–3119.
- Nakov et al. (2016) P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, V. Stoyanov, Semeval-2016 task 4: Sentiment analysis in twitter, in: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, 2016, pp. 1–18. URL: http://www.aclweb.org/anthology/S16-1001. doi:10.18653/v1/S16-1001.
- Dai and Le (2015) A. M. Dai, Q. V. Le, Semi-supervised sequence learning, in: C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett (Eds.), Advances in Neural Information Processing Systems 28, Curran Associates, Inc., 2015, pp. 3079–3087. URL: http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf.
- Johnson and Zhang (2016) R. Johnson, T. Zhang, Supervised and semi-supervised text categorization using lstm for region embeddings, in: International Conference on Machine Learning, 2016, pp. 526–534.
- Severyn and Moschitti (2015) A. Severyn, A. Moschitti, Unitn: Training deep convolutional neural network for twitter sentiment classification, in: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Association for Computational Linguistics, 2015, pp. 464–469. URL: http://www.aclweb.org/anthology/S15-2079. doi:10.18653/v1/S15-2079.
- Deriu et al. (2017) J. Deriu, A. Lucchi, V. De Luca, A. Severyn, S. Müller, M. Cieliebak, T. Hofmann, M. Jaggi, Leveraging large amounts of weakly supervised data for multi-language sentiment classification, in: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2017, pp. 1045–1052. URL: https://doi.org/10.1145/3038912.3052611. doi:10.1145/3038912.3052611.
- Cliche (2017) M. Cliche, Bb_twtr at semeval-2017 task 4: Twitter sentiment analysis with cnns and lstms, in: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, 2017, pp. 573–580. URL: http://www.aclweb.org/anthology/S17-2094. doi:10.18653/v1/S17-2094.
- Rosenthal et al. (2017) S. Rosenthal, N. Farra, P. Nakov, Semeval-2017 task 4: Sentiment analysis in twitter, in: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, 2017, pp. 502–518. URL: http://www.aclweb.org/anthology/S17-2088. doi:10.18653/v1/S17-2088.
- Howard and Ruder (2018) J. Howard, S. Ruder, Universal language model fine-tuning for text classification, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2018, pp. 328–339. URL: http://aclweb.org/anthology/P18-1031.
- Go et al. (2009) A. Go, R. Bhayani, L. Huang, Twitter sentiment classification using distant supervision, CS224N Project Report, Stanford (2009) 1–12.
- Saralegi and San Vicente (2012) X. Saralegi, I. San Vicente, Tass: Detecting sentiments in spanish tweets, in: Proceedings of the TASS Workshop at SEPLN, 2012.
- Mohammad et al. (2013) S. Mohammad, S. Kiritchenko, X. Zhu, NRC-canada: Building the state-of-the-art in sentiment analysis of tweets, in: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Association for Computational Linguistics, 2013, pp. 321–327.
- San Vicente et al. (2015) I. San Vicente, X. Saralegi, R. Agerri, Elixa: A modular and flexible absa platform, in: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Association for Computational Linguistics, 2015, pp. 748–752. doi:10.18653/v1/S15-2127.
- Pontiki et al. (2015) M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, I. Androutsopoulos, Semeval-2015 task 12: Aspect based sentiment analysis, in: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 2015, pp. 486–495.
- Saralegi and San Vicente (2013) X. Saralegi, I. San Vicente, Elhuyar at tweetnorm 2013, in: Proceedings of the TweetNorm Workshop at SEPLN, 2013.
- Agerri et al. (2014) R. Agerri, J. Bermudez, G. Rigau, Ixa pipeline: Efficient and ready to use multilingual nlp tools, in: Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, 2014, pp. 26–31.
- Hall et al. (2009) M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The WEKA data mining software: an update, SIGKDD Explor. Newsl. 11 (2009) 10–18.
- Kokkos and Tzouramanis (2014) A. Kokkos, T. Tzouramanis, A robust gender inference model for online social networks and its application to linkedin and twitter, First Monday 19 (2014).
- Rangel et al. (2017) F. Rangel, P. Rosso, M. Potthast, B. Stein, Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter, Working Notes Papers of the CLEF (2017).
- Fan et al. (2008) R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: A library for large linear classification, Journal of machine learning research 9 (2008) 1871–1874.