Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection

We study the detection of propagandistic text fragments in news articles. Instead of merely learning from input-output datapoints in training data, we introduce an approach to inject declarative knowledge of fine-grained propaganda techniques. We leverage declarative knowledge expressed in both natural language and first-order logic. The former refers to the literal definition of each propaganda technique, which is utilized to get class representations for regularizing the model parameters. The latter refers to logical consistency between coarse- and fine- grained predictions, which is used to regularize the training process with propositional Boolean expressions. We conduct experiments on Propaganda Techniques Corpus, a large manually annotated dataset for fine-grained propaganda detection. Experiments show that our method achieves superior performance, demonstrating that injecting declarative knowledge expressed in both natural language and first-order logic can help the model to make more accurate predictions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/18/2019

Hyperintensional Reasoning based on Natural Language Knowledge Base

The success of automated reasoning techniques over large natural-languag...
03/29/2022

Fine-Grained Visual Entailment

Visual entailment is a recently proposed multimodal reasoning task where...
10/06/2019

Fine-Grained Analysis of Propaganda in News Articles

Propaganda aims at influencing people's mindset with the purpose of adva...
10/06/2016

Toward Automatic Understanding of the Function of Affective Language in Support Groups

Understanding expressions of emotions in support forums has considerable...
12/26/2017

On the Semantics of Intensionality and Intensional Recursion

Intensionality is a phenomenon that occurs in logic and computation. In ...
12/01/2019

PACLP: a fine-grained partition-based access control policy language for provenance

Even though the idea of partitioning provenance graphs for access contro...
10/13/2012

Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection

Stylometry, the science of inferring characteristics of the author from ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: An example of propagandistic texts, and definitions of corresponding propaganda techniques (Bold denotes propagandistic texts).

Propaganda is the approach deliberately designed with specific purposes to influence the opinions of readers. Different from the fake news which is entirely made-up and refers to fabricated news with no verifiable facts, propaganda conveys information with strong emotion or somewhat biased, albeit is possibly built upon an element of truth. This characteristic makes propaganda more effective and unnoticed through the rise of social media platforms. There are many propaganda techniques. For instance, examples of propagandistic texts and definitions of corresponding techniques are shown in Figure 1.

We study the problem of fine-grained propaganda detection in this work, which is possible thanks to the recent release of Propaganda Techniques Corpus (Da San Martino et al., 2019). Different from earlier works (Rashkin et al., 2017; Wang, 2017) that mainly study propaganda detection at a coarse-grained level, namely predicting whether a document is propagandistic or not, the problem requires identification of tokens of particular propaganda techniques in news articles. Da San Martino et al. (2019) propose strong baselines in a multi-task learning manner, which are trained by binary detection of propaganda at sentence level and fine-grained propaganda detection over 18 techniques at token level. Such data-driven methods have the merits of convenient end-to-end learning and strong generalization, however, they cannot guarantee the consistency between sentence-level and token-level predictions. In addition, it is appealing to integrate human knowledge into data-driven approaches.

In this paper, we introduce an approach named LatexPRO that leverages logical and textual knowledge for propaganda detection. Following (Da San Martino et al., 2019)

, we develop a BERT-based multi-task learning approach as the base model, which makes predictions for 18 propaganda techniques at both sentence-level and token-level. Based on that, we inject two types of knowledge as additional objectives to regularize the learning process. Specifically, we use logic knowledge by transforming the consistency between sentence-level and token-level predictions with propositional Boolean expressions. Moreover, we use textual definition of propaganda techniques by first representing each of them as a contextual vector and then minimizing the distances to corresponding model parameters in semantic space.

We conduct extensive experiments on Propaganda Techniques Corpus (PTC) (Da San Martino et al., 2019), a large manually annotated dataset for fine-grained propaganda detection. Experiments show that our knowledge augmented method significantly improves a strong multi-task learning approach. In particular, results show our model greatly improves precision, demonstrating injecting declarative knowledge expressed in both natural language and first-order logic can help the model to make more accurate predictions. What is more important, further analysis indicates that augmenting the learning process with declarative knowledge reduces the percentage of inconsistency in model predictions.

The contributions of this paper are summarized as follows:

  • We introduce an approach to leverage declarative knowledge expressed in both natural language and first-order logic for fine-grained propaganda techniques.

  • We utilize both types of knowledge as regularizers in the learning process, which enables the model to make more consistent between sentence-level and token-level predictions.

  • Extensive experiments on the PTC dataset (Da San Martino et al., 2019) demonstrate that our method achieves superior performance with high and precision.

Propaganda Technique Instances
Train Dev Test
Loaded Language 1,811 127 177
Name Calling,Labeling 931 68 86
Repetition 456 35 80
Doubt 423 23 44
Exaggeration,Minimisation 398 37 44
Flag-Waving 206 13 21
Appeal to fear-prejudice 187 32 20
Causal Oversimplification 170 24 7
Slogans 120 3 13
Black-and-White Fallacy 97 4 8
Appeal to Authority 91 2 23
Thought-terminating Cliches 70 4 5
Whataboutism 55 1 1
Reductio ad hitlerum 44 5 5
Red Herring 24 0 9
Straw Men 11 0 2
Obfus.,Int. Vagueness,Confusion 10 0 1
Bandwagon 10 2 1
Total 5,114 380 547
Table 1: The statistics of all 18 propaganda techniques.

2 Task

Task Definition.

Following previous work (Da San Martino et al., 2019), we conduct experiments on two different granularities tasks: sentence-level classification (SLC) and fragment-level classification (FLC). Formally, in both tasks, the input is a plain-text document containing a sequence of characters and a set of propagandistic fragments , in that each propagandistic text fragment is represented as a sequence of contiguous characters . For SLC, the target is to predict whether a sentence is propagandistic which can be regarded as a binary classification. For FLC, the target is to predict a set with propagandistic fragments and identify to one of the propagandistic techniques.

Figure 2: Overview of our proposed model. A BERT-based multi-task learning approach is adopted to make predictions for 18 propaganda techniques at both sentence-level and token-level. We introduce two types of knowledge as additional objectives: (1) textual knowledge from literal definitions of propaganda techniques, and (2) logical knowledge about the consistency between sentence-level and token-level predictions.

Dataset.

This paper utilizes Propaganda Techniques Corpus (PTC) (Da San Martino et al., 2019) for experiments. PTC is a manually annotated dataset for fine-grained propaganda detection, containing 293/ 57/ 101 articles and 14,857/ 2,108/ 4,265 corresponding sentences for training, validation and testing, respectively. Each article is annotated with the start and end of the propaganda text span as well as the type of propaganda technique. As the annotations of the official testing set are not publicly available, we divided the validation set into a validation set of 22 articles and a test set of 35 articles. The statistics of all 18 propaganda techniques and their frequencies (instance per technique) is shown as Table 1.

Evaluation.

For SLC task, we evaluate the models using precision, recall and scores. As for FLC task, we adopt the evaluation script provided by Da San Martino et al. (2019) to calculate precision, recall and , in that giving partial credit to imperfect matches at the character level. The FLC task is evaluated on two kinds of measures: (1) Full task is the overall task of detecting both propagandistic fragments and identifying the technique, while (2) Spans is a special case of the Full task, which only considers the spans of fragments except for their propaganda techniques.

3 Method

In this section, we present our approach LatexPRO

that injects declarative knowledge of fine-grained propaganda techniques into neural networks. A high-level illustration is shown in Figure

2. We first present our base model (§3.1), which is a multi-task learning neural architecture that slightly extends the model of (Da San Martino et al., 2019). Afterwards, we introduce ways to regularize the learning process with textual knowledge from literal definitions of propaganda techniques (§3.3 and logical knowledge about the consistency between sentence-level and token-level predictions (§3.2). Finally, we describe the training and inference procedures (§3.4).

3.1 Base Model

To better exploit the sentence-level information and further help token-level prediction, we develop a fine-grained multi-task method as our base model, which makes predictions for 18 propaganda techniques at both sentence-level and token-level. Inspired by the success of pre-trained language models on various natural language processing downstream tasks, we adopt BERT

(Devlin et al., 2019) as the backbone model here. To fine-tune the model, for each sentence, the input sequence is modified as “

”. Specifically, we add 19 binary classifiers and one 19-way classifier on top of BERT, where all classifiers are implemented as linear layers. At sentence level, we perform multiple binary classifications and this can further support leveraging declarative knowledge. The last representation of the special token

which is regarded as a summary of the semantic content of the input, is adopted to perform multiple binary classifications, including one binary classification of propaganda vs. non-propaganda and 18 binary classifications of each propaganda technique. We adopt sigmoid activation for each binary classifier. At token level, the last representation of each token is fed into a linear layer to predict the propaganda technique over 19 categories (i.e., 18 categories of propaganda techniques plus one category for “none of them”). We adopt Softmax activation for the 19-way classifier. We apply two different losses for this multi-task learning process, including the sentence-level loss and the token-level loss . is the binary cross-entropy loss of multiple binary classifications. is the focal loss (Lin et al., 2017) of 19-way classification for each token, which could address class imbalance problem.

3.2 Inject Logical Knowledge

There are some implicit logical constraints between predictions. However, neural networks are less interpretable and need to be trained with a large amount of data to make it possible to learn such implicit logic. Therefore, we consider to further improve performance by using logic knowledge. To this end, we propose to employ propositional Boolean expressions to explicitly regularize the model with logic-driven objective, which improves logical consistency between sentence-level and token-level predictions, and makes our method more interpretable. For instance, in this work, if a propaganda class is predicted by the multiple binary classifiers (indicates the sentence contains this propaganda technique), then the token-level predictions belonging to the propaganda class should also exist. We thus consider the propositional rule , formulated as:

(1)

where A and B are two variables. Specifically, substituting and into above formula as , then the logic rule can be written as:

(2)

where denotes the input, is the binary classifier for propaganda class , and

is the probability of fine-grained predictions that contains

being category of .

can be obtained by max-pooling over all the probability of predictions for class

. Note that the probabilities of the unpredicted class are set to 0 to prevent any violation, i.e., ensuring that each class has a probability corresponding to it. Our objective here is maximizing , i.e., minimizing , to improve logical consistency between coarse- and fine-grained predictions.

3.3 Inject Textual Knowledge

Declarative knowledge in natural language, i.e., the literal definitions of propaganda techniques in this work, can be regarded as somewhat textual knowledge which contains useful semantic information. To exploit this kind of knowledge, we adopt an additional encoder to encode the literal definition of each propaganda technique. Specifically, for each definition, the input sequence is modified as “” and fed into BERT. We adopt the last representation of the special token as each definition representation where represents the -th propaganda technique class. We calculate Euclidean distance between each predicted propaganda category representation and definition representation . Our objective is minimizing textual definition loss , which regularizes the model to refine the propaganda category representations.

(3)

3.4 Training and Inference

Training.

To train the whole model jointly, we introduce a weighted sum of losses which consists of the token-level loss , fine-grained sentence-level loss , textual definition loss and logical loss :

(4)

where hyper-parameters , , and are employed to control the tradeoff among losses. During training, our goal is minimizing

using stochastic gradient descent.

Model Spans Full Task
P R P R
BERT (Da San Martino et al., 2019) 50.39 46.09 48.15 27.92 27.27 27.60 -
MGN (Da San Martino et al., 2019) 51.16 47.27 49.14 30.10 29.37 29.73 -
LatexPRO 58.95 42.37 49.30 40.98 26.99 32.54 16.05
LatexPRO (T) 61.20 42.67 50.28 41.91 28.06 33.61 19.29
LatexPRO (L) 61.61 43.41 50.93 42.44 28.25 33.92 21.86
LatexPRO (T+L) 61.22 45.18 51.99 42.64 29.17 34.65 23.62
Table 2: Overall performance on fragment-level experiments (FLC task) in terms of Precision (P), recall (R) and scores on our test set. denotes the metric of consistency between sentence-level predictions and token-level predictions. Full task is the overall task of detecting both propagandistic fragments and identifying the technique, while Spans is a special case of the Full task, which only considers the spans of fragments except for their propaganda techniques. Note that (T+L), (T), and (L) denote injecting of both textual and logical knowledge, only textual knowledge, and only logical knowledge, respectively.

Inference.

For the SLC task, our method with a condition to predict “propaganda” only if the probability of propagandistic binary classification for the positive class is above 0.7. This threshold is chosen according to the number of propaganda and non-propaganda samples in the training dataset. For the FLC task, to better use the coarse-grained (sentence-level) information to guide fine-grained (token-level) prediction, we design a way that can be used to explicitly make constraints on 19-way predictions when doing inference. Prediction probabilities of 18 fine-grained binary classifications above 0.9 are set to 1, and vice versa to 0. Then the Softmax probability of 19-way predictions (except for the “none of them” class) of each token is multiplied by the corresponding 18 probabilities of propaganda techniques. This means that our model only considers making predictions for the propaganda techniques which are strongly confident the sentence contains.

4 Experiments

4.1 Experimental Settings

In this paper, we conduct experiments on Propaganda Techniques Corpus (PTC)111Note that the annotations of the official PTC test set are not publicly available, thus we split the original dev set into dev and test set as Section 2. We use the released code Da San Martino et al. (2019) to run the baseline. (Da San Martino et al., 2019) which is a large manually annotated dataset for fine-grained propaganda detection, as detailed in Section 2. We adopt score as the final metric to represent the model performance. We select the best model on the dev dataset.

We adopt BERT base cased (Devlin et al., 2019) as the pre-trained model. We implement our model using Huggingface (Wolf et al., 2019). We use AdamW as the optimizer. In our best model on the dev dataset, the hyper-parameters in loss optimization are set as , , and

. We set the max sequence length to 256, the batch size to 16, the learning rate to 3e-5 and warmup steps to 500. We train our model for 20 epochs and adopt an early stopping strategy on the average validation

score of Spans and Full Task with patience of 5. For all experiments, we set the random seed to be 42 for reproducibility.

4.2 Models for Comparison

We compare our proposed methods with several baselines for fine-grained propaganda detection. Moreover, three variants of our method are provided to reveal the impact of each component. The notations of LatexPRO (T+L), LatexPRO (T), and LatexPRO (L) denote our model which injects of both textual and logical knowledge, only textual knowledge and only logical knowledge, respectively. Each of these models will be described as follows.

BERT (Da San Martino et al., 2019) adds a linear layer on the top of BERT, and is fine-tuned on SLC and FLC tasks, respectively.

MGN (Da San Martino et al., 2019) is a multi-task learning model, which regards the SLC task as the main task and drive the FLC task on the basis of the SLC task.

LatexPRO is our baseline model without leveraging declarative knowledge.

LatexPRO (T) arguments LatexPRO with declarative textual knowledge in natural language, i.e., the literal definitions of propaganda techniques.

LatexPRO (L) injects logical knowledge by employing propositional Boolean expressions to explicitly regularize the model.

LatexPRO (T+L) is our full model in this paper.

Propaganda Technique MGN LatexPRO LatexPRO (T+L)
P R P R P R
Appeal to Authority 0 0 0 0 0 0 0 0 0
Appeal to fear-prejudice 8.41 18.26 11.52 15.69 14.90 15.28 13.53 14.90 14.18
Bandwagon 0 0 0 0 0 0 0 0 0
Black-and-White Fallacy 31.97 43.12 36.72 66.67 7.23 13.05 81.63 15.04 25.41
Causal Oversimplification 12.43 12.09 12.66 12.43 30.00 17.59 16.53 28.57 20.94
Doubt 27.12 12.38 17.00 18.06 9.09 12.09 40.82 9.26 15.10
Exaggeration,Minimisation 33.95 11.94 17.67 42.85 5.86 10.31 31.57 8.56 13.47
Flag-Waving 45.61 37.71 41.29 44.18 36.13 39.75 35.16 41.30 37.98
Loaded Language 37.20 46.45 41.31 51.69 39.19 44.58 50.28 44.39 47.15
Name Calling,Labeling 36.15 25.86 30.15 38.87 29.14 33.31 43.09 31.12 36.14
Obfus.,Int. Vagueness,Confusion 0 0 0 100.00 98.61 99.30 50.00 98.61 66.35
Red Herring 0 0 0 0 0 0 0 0 0
Reductio ad hitlerum 45.40 49.02 47.14 99.85 59.88 74.87 100.00 45.74 62.77
Repetition 35.05 24.09 26.93 46.06 28.75 35.40 48.24 26.86 34.51
Slogans 30.10 31.25 30.66 44.30 38.46 41.17 41.53 43.43 42.46
Straw Men 0 0 0 0 0 0 0 0 0
Thought-terminating Cliches 21.05 23.85 22.36 90.83 14.80 25.45 89.49 19.60 32.16
Whataboutism 0 0 0 9.09 66.50 15.99 18.75 14.50 16.35
Table 3: Detailed performance on the full task of fragment-level experiments (FLC task) on our test set. Precision (P), recall (R) and scores per technique are provided.

4.3 Experiment Results and Analysis

Fragment-Level Propaganda Detection.

The results for the FLC task are shown in Table 2. Our basic model LatexPRO has achieved better results than other baseline models, which approves the effectiveness of our fine-grained multi-task learning structure. It is worth noting that, our full model LatexPRO (T+L) significantly outperforms MGN by 10.06% precision and 2.85% for Spans task, 12.54% precision and 4.92% for Full task, which is considered as significant process on this dataset. This demonstrates that leveraging declarative knowledge in text and first-order logic helps to predict the propaganda types more accurately. Moreover, our ablated models LatexPRO (T) and LatexPRO (L) both gain improvements over LatexPRO, while LatexPRO (L) gains more improvements than LatexPRO (T). This indicates that injecting each kind of knowledge is useful, and the effect of different kinds of knowledge can be superimposed and uncoupled. It should be noted that, compared with baseline models, our models have achieved a superior performance thanks to high precision, but the recall slightly loses. This is mainly because our models tend to make predictions for the high confident propaganda types.

To further understand the performance of models for the FLC task, we make a more detailed analysis of each propaganda technique. Table 3 shows detailed performance on the Full task. Our models achieve precision and improvements of almost all the classes over baseline model, and can also predict some low-frequency propaganda techniques, e.g., Whataboutism and Obfus.,Int. This further demonstrates that our method can stress class imbalance problem, and make more accurate predictions.

Model P R
Random 30.48 51.04 38.16
All-Propaganda 30.54 100.00 46.80
BERT (Da San Martino et al., 2019) 58.26 57.81 58.03
MGN (Da San Martino et al., 2019) 57.41 62.50 59.85
LatexPRO 56.18 69.79 62.25
LatexPRO (T) 58.33 67.50 62.58
LatexPRO (L) 56.53 73.17 63.79
LatexPRO (T+L) 59.04 71.66 64.74
Table 4: Results on sentence-level experiments (SLC task) in terms of Precision (P), recall (R) and scores on our test set. Random is a baseline which predicts randomly, and All-Propaganda is a baseline always predicts the propaganda class.

Sentence-Level Propaganda Detection.

Table 4 shows the performances of different models for the SLC task. The results indicate that our model achieves superior performances over other baseline models. Compared to MGN, LatexPRO (T+L) increases the precision by 1.63%, recall by 9.16% and score by 4.89%. This demonstrates the effectiveness of our model, and shows that our model can find more positive samples which will further benefit the token-level predictions for the FLC task.

Figure 3: Qualitative comparison of 2 different models on a news article. The baseline MGN predicts spans of fragments with wrong propaganda techniques, while our method can make more accurate predictions. Here are 5 propaganda techniques: 1.Thought-terminating Cliches, 2.Loaded Language, 3.Causal Oversimplification, 4.Flag waving and 5.Repetition. (Best viewed in color)

4.4 Effectiveness of Improving Consistency

We further define the following metric to measure the consistency between sentence-level predictions which is a set of predicted propaganda technique classes, and token-level predictions which is a set of predicted propaganda techniques for input tokens:

(5)

where denotes a normalizing factor, represents the indicator function:

(6)

Table 2 presents the consistency scores . The higher the score indicates the better consistency. Results illustrate that our methods with declarative knowledge can substantially outperform the basic model LatexPRO. Compared to the basic model, our declarative knowledge augmented methods enrich the source information by introducing textual knowledge from propaganda definitions, and logical knowledge from implicit logical rules between predictions, which enables the model to make more consistent predictions.

Figure 4:

Visualization of confusion matrix result of our

LatexPRO (T+L), where O represents the none of them class.

4.5 Case Study

Figure 3 gives a qualitative comparison example between MGN and our LatexPRO (T+L). Different colors represent different propaganda techniques. The results show that although MGN could predict the spans of fragments correctly, it fails to identify their techniques to some extent. However, our method shows promising results on both spans and specific propaganda techniques, which further confirms that our method can make more accurate predictions.

4.6 Error Analysis

Although our model has achieved the best performance, it still some types of propaganda techniques are not identified, e.g., Appeal to Authority and Red Herring as shown in Table 3. To explore why our model LatexPRO (T+L) cannot predict for those propaganda techniques, we compute a confusion matrix for the Full Task of FLC task, and visualize the confusion matrix using a heatmap as shown in Figure 4. We find that most of the off-diagonal elements are in class O which represents none of them. This demonstrates most of the cases are wrongly classified into O. We think this is due to the imbalance of the propaganda and non-propaganda categories in the dataset. Similarly, Straw Men, Red Herring and Whataboutism are the relatively low frequency of classes. How to deal with the class imbalance still needs further exploration.

5 Related work

Our work relates to fake news detection and the injection of first-order logic into neural networks. We will describe related studies in these two directions.

Fake news detection draws growing attention as the spread of misinformation on social media becomes easier and leads to stronger influence. Various types of fake news detection problems are introduced. For example, there are 4-way classification of news documents Rashkin et al. (2017), and 6-way classification of short statements Wang (2017). There are also sentence-level fact checking problems with various genres of evidence, including natural language sentences from Wikipedia Thorne et al. (2018), semi-structured tables Chen et al. (2019), and images Zlatkova et al. (2019); Nakamura et al. (2019). Our work studies propaganda detection, a fine-grained problem that requires token-level prediction over 18 fine-grained propaganda techniques. The release of a large manually annotated dataset Da San Martino et al. (2019) makes the development of large neural models possible, and also triggers our work, which improves a standard multi-task learning approach by augmenting declarative knowledge expressed in both natural language and first-order logic.

Neural networks have the merits of convenient end-to-end training and good generalization, however, they typically need a lot of training data and are not interpretable. On the other hand, logic-based expert systems are interpretable and require less or no training data. It is appealing to leverage the advantages from both worlds. In NLP community, the injection of logic to neural network can be generally divided into two groups. Methods in the first

group regularize neural network with logic-driven loss functions

Xu et al. (2017); Fischer et al. (2018); Li et al. (2019)

. For example, rocktaschel2015injecting target on the problem of knowledge base completion. After extracting and annotating propositional logical rules about relations in knowledge graph, they ground these rules to facts from knowledge graph and add a differentiable training loss function. kruszewski2015deriving map text to Boolean representations, and derive loss functions based on implication at Boolean level for entailment detection. demeester2016lifted propose lifted regularization for knowledge base completion to improve the logical loss functions to be independent of the number of grounded instances and to further extend to unseen constants, The basic idea is that hypernyms have ordering relations and such relations correspond to component-wise comparison in semantic vector space. hu2016harnessing introduce a teacher-student model, where the teacher model is a rule-regularized neural network, whose predictions are used to teach the student model. wang2018deep generalize virtual evidence

Pearl (2014) to arbitrary potential functions over inputs and outputs, and use deep probabilistic logic to integrate indirection supervision into neural networks. More recently, Asai2020LogicGuidedDA regularize question answering systems with symmetric consistency and symmetric consistency. The former creates a symmetric question by replacing words with their antonyms in comparison question, while the latter is for causal reasoning questions through creating new examples when positive causal relationship between two cause-effect questions holds.

The second group is to incorporate logic-specific modules into the inference process Yang et al. (2017); Dong et al. (2019)

. For example, rocktaschel2017end target at the problem of knowledge base completion, and use neural unification modules to recursively construct model similar to the backward chaining algorithm of Prolog. evans2018learning develop a differentiable model of forward chaining inference, where weights represent a probability distribution over clauses.

Li and Srikumar (2019)

inject logic-driven neurons to existing neural networks by measuring the degree of the head being true measured by probabilistic soft logic

(Kimmig et al., 2012). Our approach belongs to the first direction, and to the best of knowledge our work is the first one that augments neural network with logical knowledge for propaganda detection.

6 Conclusion

In this paper, we propose a fine-grained multi-task learning approach, which leverages declarative knowledge to detect propaganda techniques in news articles. Specifically, the declarative knowledge is expressed in both natural language and first-order logic, which are used as regularizers to obtain better propaganda representations and improve logical consistency between coarse- and fine- grained predictions, respectively. Extensive experiments on the PTC dataset demonstrate that our knowledge augmented method achieves superior performance with more consistent between sentence-level and token-level predictions.

References

  • W. Chen, H. Wang, J. Chen, Y. Zhang, H. Wang, S. Li, X. Zhou, and W. Y. Wang (2019) TabFact: a large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164. Cited by: §5.
  • G. Da San Martino, S. Yu, A. Barrón-Cedeno, R. Petrov, and P. Nakov (2019) Fine-grained analysis of propaganda in news article. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5640–5650. Cited by: 3rd item, §1, §1, §1, §2, §2, §2, Table 2, §3, §4.1, §4.2, §4.2, Table 4, §5, footnote 1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pp. 4171–4186. Cited by: §3.1, §4.1.
  • H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou (2019) Neural logic machines. arXiv preprint arXiv:1904.11694. Cited by: §5.
  • M. Fischer, M. Balunovic, D. Drachsler-Cohen, T. Gehr, C. Zhang, and M. Vechev (2018) Dl2: training and querying neural networks with logic. Cited by: §5.
  • A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor (2012) A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pp. 1–4. Cited by: §5.
  • T. Li, V. Gupta, M. Mehta, and V. Srikumar (2019) A logic-driven framework for consistency of neural models. arXiv preprint arXiv:1909.00126. Cited by: §5.
  • T. Li and V. Srikumar (2019) Augmenting neural networks with first-order logic. arXiv preprint arXiv:1906.06298. Cited by: §5.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In

    Proceedings of the IEEE international conference on computer vision

    ,
    pp. 2980–2988. Cited by: §3.1.
  • K. Nakamura, S. Levy, and W. Y. Wang (2019) R/fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. arXiv preprint arXiv:1911.03854. Cited by: §5.
  • J. Pearl (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier. Cited by: §5.
  • H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, and Y. Choi (2017) Truth of varying shades: analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937. Cited by: §1, §5.
  • J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal (2018) FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355. Cited by: §5.
  • W. Y. Wang (2017) ” Liar, liar pants on fire”: a new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. Cited by: §1, §5.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew (2019) HuggingFace’s transformers: state-of-the-art natural language processing. ArXiv abs/1910.03771. Cited by: §4.1.
  • J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. V. d. Broeck (2017)

    A semantic loss function for deep learning with symbolic knowledge

    .
    arXiv preprint arXiv:1711.11157. Cited by: §5.
  • F. Yang, Z. Yang, and W. W. Cohen (2017) Differentiable learning of logical rules for knowledge base reasoning. In Advances in Neural Information Processing Systems, pp. 2319–2328. Cited by: §5.
  • D. Zlatkova, P. Nakov, and I. Koychev (2019) Fact-checking meets fauxtography: verifying claims about images. arXiv preprint arXiv:1908.11722. Cited by: §5.