The Role of Pragmatic and Discourse Context in Determining Argument Impact

04/06/2020 ∙ by Esin Durmus, et al. ∙ Amazon cornell university 0

Research in the social sciences and psychology has shown that the persuasiveness of an argument depends not only the language employed, but also on attributes of the source/communicator, the audience, and the appropriateness and strength of the argument's claims given the pragmatic and discourse context of the argument. Among these characteristics of persuasive arguments, prior work in NLP does not explicitly investigate the effect of the pragmatic and discourse context when determining argument quality. This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims. We further propose predictive models that incorporate the pragmatic and discourse context of argumentative claims and show that they outperform models that rely only on claim-specific linguistic features for predicting the perceived impact of individual claims within a particular line of argument.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Previous work in the social sciences and psychology has shown that the impact and persuasive power of an argument depends not only on the language employed, but also on the credibility and character of the communicator (i.e. ethos) Miller et al. (1976); Chaiken (1979, 1980); the traits and prior beliefs of the audience G. Lord et al. (1979); Davies (1998); Correll et al. (2004); Hullett (2005); and the pragmatic context in which the argument is presented (i.e. kairos) Haugtvedt and Wegener (1994); Joyce and Harwood (2014).

Research in Natural Language Processing (NLP) has only partially corroborated these findings. One very influential line of work, for example, develops computational methods to automatically determine the linguistic characteristics of persuasive arguments

Habernal and Gurevych (2016); Tan et al. (2016); Zhang et al. (2016), but it does so without controlling for the audience, the communicator or the pragmatic context.

Very recent work, on the other hand, shows that attributes of both the audience and the communicator constitute important cues for determining argument strength Lukin et al. (2017); Durmus and Cardie (2018). They further show that audience and communicator attributes can influence the relative importance of linguistic features for predicting the persuasiveness of an argument. These results confirm previous findings in the social sciences that show a person’s perception of an argument can be influenced by his background and personality traits.

To the best of our knowledge, however, no NLP studies explicitly investigate the role of kairos — a component of pragmatic context that refers to the context-dependent “timeliness” and “appropriateness” of an argument and its claims within an argumentative discourse — in argument quality prediction. Among the many social science studies of attitude change, the order in which argumentative claims are shared with the audience has been studied extensively: 10.1086/209393, for example, summarize studies showing that the argument-related claims a person is exposed to beforehand can affect his perception of an alternative argument in complex ways. article-3 similarly find that changes in an argument’s context can have a big impact on the audience’s perception of the argument.

Some recent studies in NLP have investigated the effect of interactions on the overall persuasive power of posts in social media Tan et al. (2016); Hidey and McKeown (2018). However, in social media not all posts have to express arguments or stay on topic Rakshit et al. (2017), and qualitative evaluation of the posts can be influenced by many other factors such as interactions between the individuals Durmus and Cardie (2019). Therefore, it is difficult to measure the effect of argumentative pragmatic context alone in argument quality prediction without the effect of these confounding factors using the datasets and models currently available in this line of research.

In this paper, we study the role of kairos on argument quality prediction by examining the individual claims of an argument for their timeliness and appropriateness in the context of a particular line of argument. We define kairos as the sequence of argumentative text (e.g. claims) along a particular line of argumentative reasoning.

To start, we present a dataset extracted from kialo.com of over 47,000 claims that are part of a diverse collection of arguments on 741 controversial topics. The structure of the website dictates that each argument must present a supporting or opposing claim for its parent claim, and stay within the topic of the main thesis. Rather than being posts on a social media platform, these are community-curated claims. Furthermore, for each presented claim, the audience votes on its impact within the given line of reasoning. Critically then, the dataset includes the argument context for each claim, allowing us to investigate the characteristics associated with impactful arguments.

With the dataset in hand, we propose the task of studying the characteristics of impactful claims by (1) taking the argument context into account, (2) studying the extent to which this context is important, and (3) determining the representation of context that is more effective. To the best of our knowledge, ours is the first dataset that includes claims with both impact votes and the corresponding context of the argument.

Figure 1: Example partial argument tree with claims and corresponding impact votes for the thesis “Physical torture of prisoners is an acceptable interrogation tool.”.

2 Related Work

Recent studies in computational argumentation have mainly focused on the tasks of identifying the structure of the arguments such as argument structure parsing Peldszus and Stede (2015); Park and Cardie (2014), and argument component classification Habernal and Gurevych (2017); Mochales and Moens (2011). More recently, there is an increased research interest to develop computational methods that can automatically evaluate qualitative characteristic of arguments, such as their impact and persuasive power Habernal and Gurevych (2016); Tan et al. (2016); Kelman (1961); Burgoon et al. (1975); Chaiken (1987); Tykocinskl et al. (1994); Dillard and Pfau (2002); Cialdini (2007); Durik et al. (2008); Marquart and Naderer (2016). Consistent with findings in the social sciences and psychology, some of the work in NLP has shown that the impact and persuasive power of the arguments are not simply related to the linguistic characteristics of the language, but also on characteristics the source (ethos) Durmus and Cardie (2019) and the audience Lukin et al. (2017); Durmus and Cardie (2018). These studies suggest that perception of the arguments can be influenced by the credibility of the source, and the background of the audience.

It has also been shown, in social science studies, that kairos, which refers to the “timeliness” and “appropropriateness” of arguments and claims, is important to consider in studies of argument impact and persuasiveness Haugtvedt and Wegener (1994); Joyce and Harwood (2014). One recent study in NLP has investigated the role of argument sequencing in argument persuasion looking at Hidey and McKeown (2018) Change My View111https://www.reddit.com/r/changemyview/., which is a social media platform where users post their views, and challenge other users to present arguments in an attempt to change their them. However, as stated in Rakshit et al. (2017) many posts on social media platforms either do not express an argument, or diverge from the main topic of conversation. Therefore, it is difficult to measure the effect of pragmatic context in argument impact and persuasion, without confounding factors from using noisy social media data. In contrast, we provide a dataset of claims along with their structured argument path, which only consists of claims and corresponds to a particular line of reasoning for the given controversial topic. This structure enables us to study the characteristics of impactful claims, accounting for the effect of the pragmatic context.

Consistent with previous findings in the social sciences, we find that incorporating pragmatic and discourse context is important in computational studies of persuasion, as predictive models that with the context representation outperform models that only incorporate claim-specific linguistic features, in predicting the impact of a claim. Such a system that can predict the impact of a claim given an argumentative discourse, for example, could potentially be employed by argument retrieval and generation models which aims to pick or generate the most appropriate possible claim given the discourse.

3 Dataset

Claims and impact votes. We collected 47,219 claims from kialo.com222The data is collected from this website in accordance with the terms and conditions.333There is prior work by durmus-etal-2019-determining which created a dataset of argument trees from kialo.com. That dataset, however, does not include any impact labels. for 741 controversial topics and their corresponding impact votes. Impact votes are provided by the users of the platform to evaluate how impactful a particular claim is. Users can pick one of possible impact labels for a particular claim: no impact, low impact, medium impact, high impact and very high impact. While evaluating the impact of a claim, users have access to the full argument context and therefore, they can assess how impactful a claim is in the given context of an argument. An interesting observation is that, in this dataset, the same claim can have different impact labels depending on the context in which it is presented.

Figure 1 shows a partial argument tree for the argument thesis “Physical torture of prisoners is an acceptable interrogation tool.”. Each node in the argument tree corresponds to a claim, and these argument trees are constructed and edited collaboratively by the users of the platform.

Except the thesis, every claim in the argument tree either opposes or supports its parent claim. Each path from the root to leaf nodes corresponds to an argument path which represents a particular line of reasoning on the given controversial topic.

Moreover, each claim has impact votes assigned by the users of the platform. The impact votes evaluate how impactful a claim is within its context, which consists of its predecessor claims from the thesis of the tree. For example, claim O1 “It is morally wrong to harm a defenseless person” is an opposing claim for the thesis and it is an impactful claim since most of its impact votes belong to the category of very high impact. However, claim S3 “It is illegitimate for state actors to harm someone without the process” is a supporting claim for its parent O1 and it is a less impactful claim since most of the impact votes belong to the no impact and low impact categories.

# impact votes # claims
4,495
5,405
5,338
2,093
934
992
255
Table 1: Number of claims for the given range of number of votes. There are 19,512 claims in the dataset with or more votes. Out of the claims with or more votes, majority of them have or more votes.
3-class case 5-class case
Agreement score Number of claims Number of claims
10,848 7,304
7,386 4,329
4,412 2,195
2,068 840
Table 2: Number of claims, with at least 5 votes, above the given threshold of agreement percentage for 3-class and 5-class cases. When we combine the low impact and high impact classes, there are more claims with high agreement score.
Impact label # votes- all claims
No impact 32,681
Low impact 37,457
Medium impact 60,136
High impact 52,764
Very high impact 58,846
Total # votes 241,884
Table 3: Number of votes for the given impact label. There are total votes and majority of them belongs to the category medium impact.

Distribution of impact votes. The distribution of claims with the given range of number of impact votes are shown in Table 1. There are 19,512 claims in total with or more votes. Out of the claims with or more votes, majority of them have or more votes. We limit our study to the claims with at least votes to have a more reliable assignment for the accumulated impact label for each claim.

Context length # claims
1,524
1,977
1,181
1,436
1,115
153
Table 4: Number of claims for the given range of context length, for claims with more than votes and an agreement score greater than .

Impact label statistics. Table 3 shows the distribution of the number of votes for each of the impact categories. The claims have total votes. The majority of the impact votes belong to medium impact category. We observe that users assign more high impact and very high impact votes than low impact and no impact votes respectively. When we restrict the claims to the ones with at least impact votes, we have votes in total44426,998 of them no impact, 33,789 of them low impact, 55,616 of them medium impact, 47,494 of them high impact and 49,380 of them very high impact..

Agreement for the impact votes. To determine the agreement in assigning the impact label for a particular claim, for each claim, we compute the percentage of the votes that are the same as the majority impact vote for that claim. Let denote the count of the claims with the class labels C=[no impact, low impact, medium impact, high impact, very high impact] for the impact label at index .

(1)

For example, for claim S1 in Figure 1, the agreement score is since the majority class (no impact) has votes and there are impact votes in total for this particular claim. We compute the agreement score for the cases where (1) we treat each impact label separately (5-class case) and (2) we combine the classes high impact and very high impact into a one class: impactful and no impact and low impact into a one class: not impactful (3-class case).

Table 2 shows the number of claims with the given agreement score thresholds when we include the claims with at least votes. We see that when we combine the low impact and high impact classes, there are more claims with high agreement score. This may imply that distinguishing between no impact-low impact and high impact-very high impact classes is difficult. To decrease the sparsity issue, in our experiments, we use 3-class representation for the impact labels. Moreover, to have a more reliable assignment of impact labels, we consider only the claims with have more than 60% agreement.

Context. In an argument tree, the claims from the thesis node (root) to each leaf node, form an argument path. This argument path represents a particular line of reasoning for the given thesis. Similarly, for each claim, all the claims along the path from the thesis to the claim, represent the context for the claim. For example, in Figure 1, the context for O1 consists of only the thesis, whereas the context for S3 consists of both the thesis and O1 since S3 is provided to support the claim O1 which is an opposing claim for the thesis.

The claims are not constructed independently from their context since they are written in consideration with the line of reasoning so far. In most cases, each claim elaborates on the point made by its parent and presents cases to support or oppose the parent claim’s points. Similarly, when users evaluate the impact of a claim, they consider if the claim is timely and appropriate given its context. There are cases in the dataset where the same claim has different impact labels, when presented within a different context. Therefore, we claim that it is not sufficient to only study the linguistic characteristic of a claim to determine its impact, but it is also necessary to consider its context in determining the impact.

Context length () for a particular claim C is defined by number of claims included in the argument path starting from the thesis until the claim C. For example, in Figure 1, the context length for O1 and S3 are and respectively. Table 4 shows number of claims with the given range of context length for the claims with more than votes and agreement score. We observe that more than half of these claims have or higher context length.

4 Methodology

4.1 Hypothesis and Task Description

Similar to prior work, our aim is to understand the characteristics of impactful claims in argumentation. However, we hypothesize that the qualitative characteristics of arguments is not independent of the context in which they are presented. To understand the relationship between argument context and the impact of a claim, we aim to incorporate the context along with the claim itself in our predictive models.

Prediction task. Given a claim, we want to predict the impact label that is assigned to it by the users: not impactful, medium impact, or impactful.

Preprocessing. We restrict our study to claims with at least or more votes and greater than agreement, to have a reliable impact label assignment. We have claims in the dataset satisfying these constraints555We have 1,633 not impactful, 1,445 medium impact and 4,308 impacful claims.. We see that the impact class impacful is the majority class since around of the claims belong to this category.

For our experiments, we split our data to train (70%), validation (15%) and test (15%) sets.

4.2 Baseline Models

4.2.1 Majority

The majority baseline assigns the most common label of the training examples (high impact) to every test example.

4.2.2 SVM with RBF kernel

Similar to Habernal and Gurevych (2016), we experiment with SVM with RBF kernel, with features that represent (1) the simple characteristics of the argument tree and (2) the linguistic characteristics of the claim.

The features that represent the simple characteristics of the claim’s argument tree include the distance and similarity of the claim to the thesis, the similarity of a claim with its parent, and the impact votes of the claim’s parent claim. We encode the similarity of a claim to its parent and the thesis claim with the cosine similarity of their tf-idf vectors. The distance and similarity metrics aim to model whether claims which are more similar (i.e. potentially more topically relevant) to their parent claim or the thesis claim, are more impactful.

We encode the quality of the parent claim as the number of votes for each impact class, and incorporate it as a feature to understand if it is more likely for a claim to impactful given an impactful parent claim.

Linguistic features. To represent each claim, we extracted the linguistic features proposed by Habernal and Gurevych (2016) such as tf-idf scores for unigrams and bigrams, ratio of quotation marks, exclamation marks, modal verbs, stop words, type-token ratio, hedging Hyland (1998)

, named entity types, POS n-grams, sentiment

Hutto and Gilbert (2014) and subjectivity scores Wilson et al. (2005), spell-checking, readibility features such as Coleman-Liau Coleman and Liau (1975), Flesch Flesch (1948)

, argument lexicon features

Somasundaran et al. (2007) and surface features such as word lengths, sentence lengths, word types, and number of complex words666 We pick the parameters for SVM model according to the performance validation split, and report the results on the test split..

4.2.3 FastText

joulin-etal-2017-bag introduced a simple, yet effective baseline for text classification, which they show to be competitive with deep learning classifiers in terms of accuracy. Their method represents a sequence of text as a bag of n-grams, and each n-gram is passed through a look-up table to get its dense vector representation. The overall sequence representation is simply an average over the dense representations of the bag of n-grams, and is fed into a linear classifier to predict the label. We use the code released by joulin-etal-2017-bag to train a classifier for argument impact prediction, based on the claim text

777

We used maxNgram legnth of 2, learning rate of 0.8, num epochs of 15, vector dim of 300. We also used the pre-trained 300-dim wiki-news vectors made available on the fastText website.

.

4.2.4 BiLSTM with Attention

Another effective baseline Zhou et al. (2016); Yang et al. (2016)

for text classification consists of encoding the text sequence using a bidirectional Long Short Term Memory (LSTM)

Hochreiter and Schmidhuber (1997), to get the token representations in context, and then attending Luong et al. (2015)

over the tokens to get the sequence representation. For the query vector for attention, we use a learned context vector, similar to yang-etal-2016-hierarchical. We picked our hyperparameters based on performance on the validation set, and report our results for the best set of hyperparameters

888Our final hyperparams were: 100-dim word embedding, 100-dim context vector, 1 layer BiLSTM with 64 units, trained for 40 epochs with early stopping based on validation performance.. We initialized our word embeddings with glove vectors Pennington et al. (2014) pre-trained on Wikipedia + Gigaword, and used the Adam optimizer Kingma and Ba (2015) with its default settings.

Precision Recall F1
Majority
SVM with RBF Kernel
Distance from the thesis
Parent quality
Linguistic features
BiLSTM with Attention
FastText
BERT models
Claim only
Claim + Parent
Claim +
Claim +
Claim +
Claim +
Claim +
Table 5: Results for the baselines and the BERT models with and without the context. Best performing model is BERT with the representation of previous claims in the path along with the claim representation itself. We run the models

times and we report the mean and standard deviation.

4.3 Fine-tuned BERT model

devlin2018bert fine-tuned a pre-trained deep bi-directional transformer language model (which they call BERT), by adding a simple classification layer on top, and achieved state of the art results across a variety of NLP tasks. We employ their pre-trained language models for our task and compare it to our baseline models. For all the architectures described below, we finetune for 10 epochs, with a learning rate of 2e-5. We employ an early stopping procedure based on the model performance on a validation set.

4.3.1 Claim with no context

In this setting, we attempt to classify the impact of the claim, based on the text of the claim only. We follow the fine-tuning procedure for sequence classification detailed in Devlin et al. (2018), and input the claim text as a sequence of tokens preceded by the special [CLS] token and followed by the special [SEP] token. We add a classification layer on top of the BERT encoder, to which we pass the representation of the [CLS] token, and fine-tune this for argument impact prediction.

4.3.2 Claim with parent representation

In this setting, we use the parent claim’s text, in addition to the target claim text, in order to classify the impact of the target claim. We treat this as a sequence pair classification task, and combine both the target claim and parent claim as a single sequence of tokens, separated by the special separator [SEP]. We then follow the same procedure above, for fine-tuning.

4.3.3 Incorporating larger context

In this setting, we consider incorporating a larger context from the discourse, in order to assess the impact of a claim. In particular, we consider up to four previous claims in the discourse (for a total context length of 5). We attempt to incorporate larger context into the BERT model in three different ways.

Flat representation of the path. The first, simple approach is to represent the entire path (claim + context) as a single sequence, where each of the claims is separated by the [SEP] token. BERT was trained on sequence pairs, and therefore the pre-trained encoders only have two segment embeddings Devlin et al. (2018). So to fit multiple sequences into this framework, we indicate all tokens of the target claim as belonging to segment A and the tokens for all the claims in the discourse context as belonging to segment B. This way of representing the input, requires no additional changes to the architecture or retraining, and we can just finetune in a similar manner as above. We refer to this representation of the context as a flat representation, and denote the model as , where indicates the length of the context that is incorporated into the model.

BERT models
Claim only
Claim + Parent
Claim +
Claim +
Claim +
Table 6: F1 scores of each model for the claims with various context length values.

Attention over context. Recent work in incorporating argument sequence in predicting persuasiveness Hidey and McKeown (2018) has shown that hierarchical representations are effective in representing context. Similarly, we consider hierarchical representations for representing the discourse. We first encode each claim using the pre-trained BERT model as the claim encoder, and use the representation of the [CLS] token as claim representation. We then employ dot-product attention Luong et al. (2015), to get a weighted representation for the context. We use a learned context vector as the query, for computing attention scores, similar to yang-etal-2016-hierarchical. The attention score is computed as shown below:

(2)

Where is the claim representation that was computed with the BERT encoder as described above, is the learned context vector that is used for computing attention scores, and is the set of claims in the discourse. After computing the attention scores, the final context representation is computed as follows:

(3)

We then concatenate the context representation with the target claim representation and pass it to the classification layer to predict the quality. We denote this model as .

GRU to encode context

Similar to the approach above, we consider a hierarchical representation for representing the context. We compute the claim representations, as detailed above, and we then feed the discourse claims’ representations (in sequence) into a bidirectional Gated Recurrent Unit (GRU)

Cho et al. (2014), to compute the context representation. We concatenate this with the target claim representation and use this to predict the claim impact. We denote this model as .

5 Results and Analysis

Table 5 shows the macro precision, recall and F1 scores for the baselines as well as the BERT models with and without context representations999For the models that result in different scores with different random seed, we run them times and report the mean and standard deviation..

We see that parent quality is a simple yet effective feature and SVM model with this feature can achieve significantly higher ()101010

We perform two-sided t test for significance analysis.

F1 score () than distance from the thesis and linguistic features. Claims with higher impact parents are more likely to be have higher impact. Similarity with the parent and thesis is not significantly better than the majority baseline. Although the BiLSTM model with attention and FastText baselines performs better than the SVM with distance from the thesis and linguistic features, it has similar performance to the parent quality baseline.

We find that the BERT model with claim only representation performs significantly better () than the baseline models. Incorporating the parent representation only along with the claim representation does not give significant improvement over representing the claim only. However, incorporating the flat representation of the larger context along with the claim representation consistently achieves significantly better () performance than the claim representation alone. Similarly, attention representation over the context with the learned query vector achieves significantly better performance then the claim representation only ().

We find that the flat representation of the context achieves the highest F1 score. It may be more difficult for the models with a larger number of parameters to perform better than the flat representation since the dataset is small. We also observe that modeling claims on the argument path before the target claim achieves the best F1 score ().

To understand for what kinds of claims the best performing contextual model is more effective, we evaluate the BERT model with flat context representation for claims with context length values , , and separately. Table 6 shows the F1 score of the BERT model without context and with flat context representation with different lengths of context. For the claims with context length , adding and representation along with the claim achieves significantly better F1 score than modeling the claim only. Similarly for the claims with context length and , and perform significantly better than BERT with claim only ( and respectively). We see that models with larger context are helpful even for claims which have limited context (e.g. ). This may suggest that when we train the models with larger context, they learn how to represent the claims and their context better.

6 Conclusion

In this paper, we present a dataset of claims with their corresponding impact votes, and investigate the role of argumentative discourse context in argument impact classification. We experiment with various models to represent the claims and their context and find that incorporating the context information gives significant improvement in predicting argument impact. In our study, we find that flat representation of the context gives the best improvement in the performance and our analysis indicates that the contextual models perform better even for the claims with limited context.

7 Acknowledgements

This work was supported in part by NSF grants IIS-1815455 and SES-1741441. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF or the U.S. Government.

References

  • M. Burgoon, S. B. Jones, and D. Stewart (1975) TOWARD a message-centered theory of persuasion: three empirical investigations of language intensity1. Human Communication Research 1 (3), pp. 240–256. Cited by: §2.
  • S. Chaiken (1979) Communicator physical attractiveness and persuasion. Cited by: §1.
  • S. Chaiken (1980) Heuristic versus systematic information processing and the use of source versus message cues in persuasion. Journal of Personality and Social Psychology 39, pp. 752–766. External Links: Document Cited by: §1.
  • S. Chaiken (1987) The heuristic model of persuasion. In Social influence: the ontario symposium, Vol. 5, pp. 3–39. Cited by: §2.
  • K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734. External Links: Link Cited by: §4.3.3.
  • R. B. Cialdini (2007) Influence: the psychology of persuasion.. Revised edition. edition, Harper Paperbacks. External Links: ISBN 006124189X Cited by: §2.
  • M. Coleman and T. L. Liau (1975) A computer readability formula designed for machine scoring.. Journal of Applied Psychology 60 (2), pp. 283 – 284. External Links: ISSN 0021-9010, Link Cited by: §4.2.2.
  • J. Correll, S. J. Spencer, and M. P. Zanna (2004) An affirmed self and an open mind: self-affirmation and sensitivity to argument strength. Cited by: §1.
  • M. Davies (1998) Dogmatism and belief formation: output interference in the processing of supporting and contradictory cognitions. Journal of Personality and Social Psychology 75, pp. 456–466. External Links: Document Cited by: §1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §4.3.1, §4.3.3.
  • J. P. Dillard and M. Pfau (2002) The persuasion handbook: developments in theory and practice. Sage Publications. Cited by: §2.
  • A. M. Durik, M. A. Britt, R. Reynolds, and J. Storey (2008) The effects of hedges in persuasive arguments: a nuanced analysis of language. Journal of Language and Social Psychology 27 (3), pp. 217–234. Cited by: §2.
  • E. Durmus and C. Cardie (2018) Exploring the role of prior beliefs for argument persuasion. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 1035–1045. External Links: Link, Document Cited by: §1, §2.
  • E. Durmus and C. Cardie (2019) Modeling the factors of user success in online debate. In The World Wide Web Conference, WWW ’19, New York, NY, USA, pp. 2701–2707. External Links: ISBN 978-1-4503-6674-8, Link, Document Cited by: §1, §2.
  • R. Flesch (1948) A new readability yardstick.. Journal of Applied Psychology 32 (3), pp. 221 – 233. External Links: ISSN 0021-9010, Link Cited by: §4.2.2.
  • C. G. Lord, L. Ross, and M. Lepper (1979) Biased assimilation and attitude polarization: the effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology 37, pp. 2098–2109. External Links: Document Cited by: §1.
  • I. Habernal and I. Gurevych (2016) What makes a convincing argument? empirical analysis and detecting attributes of convincingness in web argumentation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 1214–1223. External Links: Link, Document Cited by: §1, §2, §4.2.2, §4.2.2.
  • I. Habernal and I. Gurevych (2017) Argumentation mining in user-generated web discourse. American Journal of Computational Linguistics 43 (1), pp. 125–179. External Links: Link, Document Cited by: §2.
  • C. P. Haugtvedt and D. T. Wegener (1994) Message Order Effects in Persuasion: An Attitude Strength Perspective. Journal of Consumer Research 21 (1), pp. 205–218. External Links: ISSN 0093-5301, Document, Link, http://oup.prod.sis.lan/jcr/article-pdf/21/1/205/5243966/21-1-205.pdf Cited by: §1, §2.
  • C. Hidey and K. McKeown (2018) Cited by: §1, §2, §4.3.3.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Comput. 9 (8), pp. 1735–1780. External Links: ISSN 0899-7667, Link, Document Cited by: §4.2.4.
  • C. R. Hullett (2005) The impact of mood on persuasion: a meta-analysis. Communication Research 32 (4), pp. 423–442. External Links: Document, Link, https://doi.org/10.1177/0093650205277317 Cited by: §1.
  • C. Hutto and E. Gilbert (2014) Cited by: §4.2.2.
  • K. Hyland (1998) Hedging in scientific research articles. John Benjamins. External Links: Link Cited by: §4.2.2.
  • N. Joyce and J. Harwood (2014) Context and identification in persuasive mass communication. Journal of Media Psychology: Theories, Methods, and Applications 26, pp. 50. External Links: Document Cited by: §1, §2.
  • H. C. Kelman (1961) Processes of opinion change. Public opinion quarterly 25 (1), pp. 57–78. Cited by: §2.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Link Cited by: §4.2.4.
  • S. Lukin, P. Anand, M. Walker, and S. Whittaker (2017) Argument strength is in the eye of the beholder: audience effects in persuasion. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, pp. 742–753. External Links: Link Cited by: §1, §2.
  • T. Luong, H. Pham, and C. D. Manning (2015)

    Effective approaches to attention-based neural machine translation

    .
    In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1412–1421. External Links: Link, Document Cited by: §4.2.4, §4.3.3.
  • F. Marquart and B. Naderer (2016) Communication and persuasion: central and peripheral routes to attitude change. In Schlüsselwerke der Medienwirkungsforschung, M. Potthoff (Ed.), pp. 231–242. External Links: ISBN 978-3-658-09923-7, Document, Link Cited by: §2.
  • N. Miller, G. Maruyama, R. J. Beaber, and K. Valone (1976) Speed of speech and persuasion. Journal of Personality and Social Psychology 34 (4), pp. 615–624 (English (US)). External Links: Document, ISSN 0022-3514 Cited by: §1.
  • R. Mochales and M. Moens (2011) Argumentation mining. Artificial Intelligence and Law 19 (1), pp. 1–22. Cited by: §2.
  • J. Park and C. Cardie (2014) Identifying appropriate support for propositions in online user comments. In Proceedings of the First Workshop on Argumentation Mining, Baltimore, Maryland, pp. 29–38. External Links: Link Cited by: §2.
  • A. Peldszus and M. Stede (2015) Joint prediction in MST-style discourse parsing for argumentation mining. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 938–948. External Links: Link, Document Cited by: §2.
  • J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543. External Links: Link, Document Cited by: §4.2.4.
  • G. Rakshit, K. K. Bowden, L. Reed, A. Misra, and M. A. Walker (2017) Debbie, the debate bot of the future. CoRR abs/1709.03167. External Links: Link, 1709.03167 Cited by: §1, §2.
  • S. Somasundaran, J. Ruppenhofer, and J. Wiebe (2007) Detecting arguing and sentiment in meetings. In Proceedings of the SIGdial Workshop on Discourse and Dialogue, Vol. 6. Cited by: §4.2.2.
  • C. Tan, V. Niculae, C. Danescu-Niculescu-Mizil, and L. Lee (2016) Winning arguments: interaction dynamics and persuasion strategies in good-faith online discussions. In Proceedings of the 25th International Conference on World Wide Web, pp. 613–624. Cited by: §1, §1, §2.
  • O. Tykocinskl, E. T. Higgins, and S. Chaiken (1994) Message framing, self-discrepancies, and yielding to persuasive messages: the motivational significance of psychological situations. Personality and Social Psychology Bulletin 20 (1), pp. 107–115. Cited by: §2.
  • T. Wilson, J. Wiebe, and P. Hoffmann (2005)

    Recognizing contextual polarity in phrase-level sentiment analysis

    .
    In Proceedings of the conference on human language technology and empirical methods in natural language processing, pp. 347–354. Cited by: §4.2.2.
  • Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 1480–1489. External Links: Link, Document Cited by: §4.2.4.
  • J. Zhang, R. Kumar, S. Ravi, and C. Danescu-Niculescu-Mizil (2016) Conversational flow in Oxford-style debates. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 136–141. External Links: Link, Document Cited by: §1.
  • P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu (2016) Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, pp. 207–212. External Links: Link, Document Cited by: §4.2.4.