Argumentation mining aims to detect the argumentative discourse structure in text and recognize the components of an argument and relations between them. It is an emerging and exciting field at the confluence of natural language processing (NLP), logic-based reasoning, and argumentation theory; see[Moens2014, Lippi and Torroni2015] for a comprehensive and recent overview.
While computational approaches to argumentation already have a long-standing tradition within the field of artificial intelligence, in particular in research on logic-based reasoning and multi-agent systems, it is only in recent years that argumentation mining has begun to attract attention of the NLP community. From an NLP perspective, argumentation mining is a daunting task, one that involves many levels of semantic processing (ranging from lexical semantics to discourse-level processing), and essentially calls for text understanding and inference mechanisms that significantly surpass the state of the art.
This is of course not to say that significant advances in the processing of natural language arguments cannot already be made – as a matter of fact, the argumentation mining community has made a significant progress in recent years. Moreover, the community now has a much better understanding of the set of tasks involved in argumentation mining, as well as their complexity.
In this short position paper, I focus on argumentation mining in the context of social media, more specifically opinionated user comments. My aim is twofold: first, to present possible motivations for argumentation in social media; secondly, to outline some of the tasks and challenges involved.
2 Argumentation in Social Media
Initial work on argumentation mining has focused on well-structured, edited text, such as legal text [Walton2005] or scientific publications [Jiménez-Aleixandre and Erduran2007]. Obviously, such genres are interesting as they exhibit all the characteristics argumentation theory is concerned with. At the same time, edited text is amenable to NLP as it can be processed with existing, mostly non-robust tools.
Recently, however, the focus has also shifted to argumentation mining from social media texts, such as online debates [Cabrio and Villata2012, Habernal et al.2014, Boltužić and Šnajder2014], discussions on regulations [Park and Cardie2014], and product reviews [Ghosh et al.2014].
Online debates are particularly well-suited for argumentation mining, because of the controlled setting offered by online debate platforms, and because most users will use these platforms with an intention to engage in argumentative discussions. The same cannot be said of less controlled communication environments, such as comment boards on news portals, product review sites, or microblogs, where the communicative intention is not to engage in an argumentative discussion, but rather to express a blunt opinion on the subject matter, or even simply to satisfy the need for self-presentation.
In what follows, I use the term opinionated comment to refer to such user-generated content, one which is not necessarily generated within a debate. As a prototypical example is a comment on a news article. As an example, consider the following opinionated comment related to the Trump rally event:111Yahoo News, http://tinyurl.com/zkez7ze
The President we have now divided our country and put his ego first instead of the people. Trump hasn’t divided the country that’s why he has so many people behind him. We want someone who is not afraid of the politics in Washington and change our policies with dealing with other countries.
Clearly, the author of this comment does express some arguments to back up his or her opinion. However, the opinion is triggered by an event, and there is no predefined debate topic. Moreover, there will likely be no follow-up discussion in which the author would need to justify or elaborate on his or her arguments. Thus, it seems that such opinionated comments mostly emerge ad hoc and are monological in nature.
It is legitimate to ask whether there is any merit in analyzing this kind of opinionated text, apart from the fact that it is abundant in social media. I argue that – to an extent in which we are interested in analyzing the opinions of other people (users of products, voters, etc.) – we should also be interested in analyzing the reasons underpinning those opinions, for otherwise we cannot fully apprehend them. If you are, say, running a political campaign, you would want to know what people think of you and why
. You would probably also want to do this analysis across all the events that are even marginally related to your campaign, and you would also want to do it on a large scale to get the “totality of the experience.”
Besides the challenges mentioned in the introduction, there is a number of additional challenges involved in argumentation mining from user-generated text:
Noisy text. baldwin2013noisy demonstrate that social media sources are more noisy than edited texts, although they can also be cleaned using NLP techniques;
Vague claims. It is probably safe to say that the majority of online users do not really see a need to present a well-formed argument for their position. As a consequence, claims made by the users will often be unclear, ambiguous, vague, or simply poorly worded. This is even the case for more discussion-tuned environments, such as online debate platforms.
Vague argument structure. Again, because the users rarely feel the need to argue for their position, most user-generated opinionated text will not constitute a properly structured argument. This is especially true for short texts, such as microbloging posts. Even when there are some traces of an argumentative structure, it will likely be incomplete and lack important premises.
A number of argumentation tasks have been proposed in the literature. The two main ones are:
Component identification – the task of detecting the premises and conclusion of an argument, as found in a text of discourse;
Relation prediction – identifying the relations between components.
In a recent study on user-generated social media texts, habernal2016argumentation showed that (a slightly modified) Toulmin model of argumentation may be suitable for short documents, such as article comments or forum posts. They annotated the claim, premise, backing, rebuttal, and refutation components, thereby achieving a moderate inter-annotator agreement. They use sequence labeling to tackle the component identification task, reaching a token-level F1-score of 0.25.
Component identification and relation prediction are without a doubt relevant argumentation mining tasks. However, coming back to the political campaign example, it is not immediately obvious how these tasks can aid in analyzing the reasons underpinning the opinions, especially when dealing with large volumes of data. To analyze arguments on a large scale, it seems that we at least need to:
Identify the main arguments – identify the main (central, most prominent, most often used) arguments that the users use when discussing a certain topic. An argument here is meant to mean a claim and (the possibly convergent) set of premises supporting it;
Classify opinionated posts – given an opinionated post, identify its main arguments.
Consider again the Trump rally example from above. The post may be classified as belonging to the main argument “Donald Trump would make a good president”. The main claim is “Donald Trump will change the foreign policy for the better”, while the supporting premises may be “Existing foreign policy is bad”, “Trump is not afraid to take on the Establishment”, etc.
Given a large-enough amount of user-generated opinionated data, there seem to be at least two ways in which main arguments could be identified. First, the arguments could be extracted manually. This is essentially what we have done in [Boltužić and Šnajder2014]
, where we used the main claims distilled from an online debating platform. Similarly, hasan2014why asked annotators to group the user comments and identify the main claims. The second option is to resort to unsupervised machine learning and try to induce the main arguments (or at least the main claims) automatically, in a bottom-up fashion. A middle-ground solution, proposed by sobhani2015argumentation, is to use unsupervised machine learning to induce the argument clusters, and then map those clusters manually to main arguments.
From a machine learning perspective, the two above-mentioned tasks may be framed as follows:
Argument clustering – grouping of similar arguments, so that the main arguments/claims can be identified;
Argument classification – given an opinionated comment, classify it into one or many classes, each corresponding to one main argument (obtained either manually or using argument clustering).
In [Boltužić and Šnajder2015], we tackled the former task and investigated the suitability of semantic textual similarity (STS) [Agirre et al.2012] for clustering the main claims. Our conclusion was that fully automatic argument clustering is hardly feasible, however we hypothesized that it might prove valuable in a computer-aided or semi-supervised argumentation mining setup.
We tackled the task of argument classification in [Boltužić and Šnajder2014], under the name “argument recognition”, while hasan2014why tackled the same task in the context of stance detection, under the name “reason classification”. The main difference is that hasan2014why frame the problem as a (joint learning) supervised text classification task with lexical features, which makes their model topic-specific. In other words, the model learns to classify the user comments into classes that correspond to main claims, without explicitly comparing the user comments against the main claims. In contrast, in [Boltužić and Šnajder2014] we modeled the similarity between the user comments and the main claims using STS and textual entailment predictions and fed these to a supervised model. At least in principle, this should make the model topic-independent. The model outperformed the baseline, although not by a large margin.
5 Argument Similarity
What the two tasks above have in common (with the exception of argument classification using lexical features, which is topic-specific and hence arguably the least practical approach) is the requirement to compute the similarity between two arguments. The argument similarity has been introduced in [Boltužić and Šnajder2015], as well as in [Swanson et al.2015, Misra et al.2015], under the name argument facet similarity. Intuitively, a pair of arguments should receive the highest score if they mean the same, and the lowest score if they are completely different and on a different topic. Ideally, argument similarity would account for both the similarity of argument components as well as the similarity of argument structures (how components relate to each other).
misra2015using consider the similarity between main claims expressed in user-generated arguments. They develop a regression model using a number of comparison features, including STS. Their model, trained on human-annotated pairs of claims, reaches a correlation score of 0.54, outperforming a sensible baseline. In contrast, in [Boltužić and Šnajder2015] we model argument similarity on the dataset from [Hasan and Ng2014], in an unsupervised fashion, using word embedding representations.
6 Nano-level Argument Processing
Work cited above seems to indicate that the existing approaches to measuring STS provide only limited means to measure argument similarity. Consider the following examples on the Marijuana legalization topic from the [Hasan and Ng2014] dataset:
Comment 1: Legalizing marijuana could potentially lower the number of users.
Comment 2: Now it is not taxed, and those who sell it are usually criminals of some sort (though many are harmless).
Main claim: Legalized marijuana can be controlled and regulated by the government.
In this case, both comments have been classified by human annotators as essentially expressing the main claim (one of the main claims identified by analyzing the complete dataset). However, at first glance it is not obvious how these two comments are similar to each other or to the main claim. It is also very unlikely that they would be predicted as similar by an STS system, given the large semantic gap between them. However, assuming that the main claim is indeed the best fit, most people would probably be able to come up with sets of implicit premises holding between each of the two comments and the main claim. For instance, the following premises link Comment 2 to the main claim:
If a thing is not taxed, criminals can sell it.
Criminals should be stopped from selling things.
Things that are taxed are controlled and regulated by the government.
Current approaches to argument similarity are not generative in nature and cannot generate a chain of implicit premises. The task seems to be related to what lippi2015argumentation refer to as the completion task: the task of inferring implicit argument components. Alternatively, if we take micro-level argumentation to focus on the components of a single argument, then the functionality to infer the similarity between two arguments can perhaps be dubbed nano-level argumentation. While this task has apparently not yet been addressed in the literature, it seems to be a necessary ingredient of an argumentation mining system capable of analyzing user-generated arguments on a large scale.
Social media argumentation mining allows us to understand the reasons underpinning user opinions. However, mining opinionated comments poses a number of challenges related to the informal and non-structured nature of user-generated text. Analyzing such text on a large scale calls for the ability to compute the similarity of arguments, either to identify the main claims or to classify the arguments by their main claims. A principled solution will probably have to operate at the nano-level of argumentation, i.e., infer (rather than merely measure) the similarity between two claims.
- [Agirre et al.2012] Eneko Agirre, Mona Diab, Daniel Cer, and Aitor Gonzalez-Agirre. 2012. SemEval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pages 385–393.
- [Baldwin et al.2013] Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources? In IJCNLP, pages 356–364.
- [Boltužić and Šnajder2014] Filip Boltužić and Jan Šnajder. 2014. Back up your stance: Recognizing arguments in online discussions. In Proceedings of the First Workshop on Argumentation Mining, pages 49–58.
- [Boltužić and Šnajder2015] Filip Boltužić and Jan Šnajder. 2015. Identifying prominent arguments in online debates using semantic textual similarity. In Proceedings of the 2nd Workshop on Argumentation Mining, pages 110–115.
- [Cabrio and Villata2012] Elena Cabrio and Serena Villata. 2012. Combining textual entailment and argumentation theory for supporting online debates interactions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 208–212.
- [Ghosh et al.2014] Debanjan Ghosh, Smaranda Muresan, Nina Wacholder, Mark Aakhus, and Matthew Mitsui. 2014. Analyzing argumentative discourse units in online interactions. In Proceedings of the First Workshop on Argumentation Mining, pages 39–48.
- [Habernal and Gurevych2016] Ivan Habernal and Iryna Gurevych. 2016. Argumentation mining in user-generated web discourse. arXiv preprint arXiv:1601.02403.
- [Habernal et al.2014] Ivan Habernal, Judith Eckle-Kohler, and Iryna Gurevych. 2014. Argumentation mining on the web from information seeking perspective. In Frontiers and Connections between Argumentation Theory and Natural Language Processing.
- [Hasan and Ng2014] Kazi Saidul Hasan and Vincent Ng. 2014. Why are you taking this stance? Identifying and classifying reasons in ideological debates. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 751–762.
- [Jiménez-Aleixandre and Erduran2007] María Pilar Jiménez-Aleixandre and Sibel Erduran. 2007. Argumentation in science education: An overview. In Argumentation in Science Education, pages 3–27. Springer.
- [Lippi and Torroni2015] Marco Lippi and Paolo Torroni. 2015. Argumentation mining: State of the art and emerging trends. ACM Transactions on Internet Technology, page In press.
- [Misra et al.2015] Amita Misra, Pranav Anand, JEF Tree, and MA Walker. 2015. Using summarization to discover argument facets in online idealogical dialog. In NAACL HLT, pages 430–440.
- [Moens2014] Marie-Francine Moens. 2014. Argumentation mining: Where are we now, where do we want to be and how do we get there? In Post-proceedings of the forum for information retrieval evaluation (FIRE 2013).
- [Park and Cardie2014] Joonsuk Park and Claire Cardie. 2014. Identifying appropriate support for propositions in online user comments. ACL 2014, pages 29–38.
- [Sobhani et al.2015] Parinaz Sobhani, Diana Inkpen, and Stan Matwin. 2015. From argumentation mining to stance classification. NAACL HLT 2015, page 67.
- [Swanson et al.2015] Reid Swanson, Brian Ecker, and Marilyn Walker. 2015. Argument mining: Extracting arguments from online dialogue. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 217.
- [Walton2005] Douglas Walton. 2005. Argumentation methods for artificial intelligence in law. Springer.