Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

12/16/2020 ∙ by Denisa A. O. Roberts, et al. ∙ 0

This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages that are more commonly targeted by disinformation. To this end, our EnmBERT fact verification system shows evidence of transfer learning ability and 400 example mixed English-Romanian dataset is made available for cross-lingual transfer learning evaluation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The recent COVID-19 pandemic broke down geographical boundaries and led to an infodemic of fake news and conspiracy theories [40]. Evidence based claim verification (English only) has been studied as a weapon against fake news and disinformation [34]. However conspiracy theories and disinformation can propagate from one language to another. Polyglotism is not uncommon. According to a 2017 Pew Research study, of European students learn English in school 111 Furthermore recent machine translation advances are increasingly bringing down language barriers [20, 15]. Disinformation can be defined as intentionally misleading information [25]. A multilingual approach to evidence retrieval for claim verification aims at combating global disinformation, during globally significant events. The ”good cop” of the Internet [8], Wikipedia has become a source of ground truth as seen in the recent literature on evidence-based claim verification. There are more than 6mln English Wikipedia articles 222 but resources are lower in other language editions, such as Romanian (400K), which points to retrieving multilingual evidence.

As a case study in Fig. 1 we evaluate a claim about Ion Mihai Pacepa, former agent of the Romanian secret police during communism, author of books on disinformation [24, 23]. Related conspiracy theories can be found on internet platforms. For example, it was claimed that he was deceased [1] (Romanian online publication).

Figure 1: Demo. Claim: ”Ion Mihai Pacepa, the former Securitate general, is alive”.

Twitter posts in multiple languages, with strong for and against language, exist such as (English and Portuguese) 333 or (English and Polish) 444 Strong-language claim examples are ”We were tricked by Pacepa” (against) vs ”Red Horizons is one of the best political books of the 20st century ” (for). Strong language has been associated with propaganda and fake news [41]. In the following sections we review the relevant literature, present our methodology and the experimental results and conclude with final notes.

2 Related Work

The literature review touches on three topics: online disinformation, multilingual NLP and evidence based claim verification. Online Disinformation. Previous disinformation studies focused on election related activity on social media platforms like Twitter, botnet generated hyperpartisan news, 2016 US presidential election [5, 3, 4, 13]. To combat online disinformation via claim verification one must retrieve reliable evidence at scale since fake news tend to be more viral and spread faster [30], [27], [41], [35].

Multilingual Natural Language Processing Advances.

Recent multilingual applications leverage pre-training of massive language models that can be fine-tuned for multiple tasks. For example, the cased multilingual BERT (mBERT) [11], 555 is pre-trained on a corpus of the top 104 Wikipedia languages 666 It has 12 layers, 768 hidden units, 12 heads and 110M parameters. Cross-lingual transfer learning has been evaluated for tasks such as: natural language inference [9], [2], document classification [28], question-answering [7], fake Indic language tweet detection [16]. English-Only Evidence Retrieval and Claim Verification. Fact based claim verification is framed as a textual entailment task that retrieves its evidence. An annotated dataset was shared [33] and a task [34] was set up to retrieve evidence from Wikipedia documents and predict claim verification status. Recently published SotA results rely on pre-trained BERT flavors or XLNet [36]. DREAM [38], GEAR [39] and KGAT [21] achieved SotA with graphs. Dense Passage Retrieval [17] is used in RAG [19] in an end-to-end approach for claim verification.

3 Methodology

The system depicted in Fig. 2 is a pipeline with a multilingual evidence retrieval component and a multilingual claim verification component. Based on input claim in language the system retrieves evidence from Wikipedia edition in language and supports, refutes or abstains (not enough info). We employ English and Romanian as sample languages.

Figure 2: Overview of the multilingual evidence retrieval and claim verification system.

Multilingual Document Retrieval. To retrieve top Wikipedia documents for each language we employ an ad-hoc entity linking system similar to [14]

based on named entity recognition in

[10]. Entities are parsed from the (English) claim using the AllenNLP [12] constituency parser. We search for the entities and retrieve 7 English and 1 Romanian Wikipedia pages using MediaWiki API 777, based on the internationally recognized nature of the claim entities (144.9K out of 145.5K training claims have Romanian Wikipedia search results). Multilingual Sentence Selection. All sentences from each retrieved document are supplied as input to the sentence selection model. For Romanian sentences we removed diacritics [29]. We prepend evidence sentences with the page tile to compensate for the missed co-reference pronouns [31, 37]. We frame the multilingual sentence selection as a two-way classification task [14, 26]. Our architecture includes an mBERT encoder 888 and an MLP classification layer with softmax output . During training, all the parameters are fine-tuned and the MLP weights are trained from scratch. One example input is a pair of one evidence sentence and the claim [39, 37]. The encoded first

token, is supplied to the MLP classification layer. The model estimates


We only include the verifiable claims in training. The annotated evidence form positive examples, and we randomly sample 32 negative example sentences from the retrieved documents. We have two flavors of the fine-tuned model: EnmBERT only selects English negative sentences and EnRomBERT selects English (5) and Romanian (27) negative sentences. Claims are in English. We optimize the cross-entropy loss:


Multilingual Claim Verification. The claim verification step takes as input the top ranked 5 sentence-claim pairs by the sentence selection model (pointwise ranking [6]). The architecture includes an EnmBERT or EnRomBERT encoder and an MLP. We fine-tune the natural language inference model in a three-way classification task. A prediction is made for each of the 5 pairs and we aggregate based on logic rules [22]. In training for both models we use Adam optimizer [18], batch size of 32, learning rate of

, cross-entropy loss and 1 and 2 epochs of training respectively.

Conceptual End-to-End Multilingual Retrieve-Verify System. There are limitations to the ad-hoc entity linking document retrieval step for non-English languages, multilingual annotation is expensive, and the inclusion of retrieved Romanian sentences only as negative sentences in the supervised sentence selection step in the pipeline leads to biases. We propose a novel end-to-end multilingual evidence retrieval and claim verification approach similar to the English-only RAG  [19] that automatically retrieves relevant evidence passages in language from a multilingual corpus corresponding to a claim in language . In Fig. 2, the 2-step multilingual evidence retrieval is replaced with a multilingual version of dense passage retrieval (DPR) [17]

with mBERT backbone. The DPR-retrieved documents form a latent probability distribution. The claim verification model conditions on the claim

and the latent retrieved documents to generate the label . The probabilistic model is


The multilingual retrieve - verify system is jointly trained and the only supervision is at the claim - verification level. We leave this promising avenue for future experimental evaluation.

4 Experimental Results

There are no equivalent multilingual claim verification baselines so we calibrate the model results by calculating the official FEVER score 999 [33]. To evaluate the zero-shot transfer learning ability of the trained models, we translate 10 supported and 10 refuted claims with 5 evidence sentences each and combine in a mix and match development set of 400 examples. Calibration results development and test sets. In Table 1 and Fig. 3 we compare EnmBERT and EnRombert label accuracy and evidence recall on a fair dev set, the test set and on the golden-forcing dev set. The golden forcing dev set adds all golden evidence to the sentence selection input, effectively forcing perfect document retrieval recall [21]. Note that any of the available English only systems with BERT backbone such as KGAT [21] and GEAR [39] can be employed with an mBERT backbone to lift the multilingual system performance. We reach within of similar BERT-based English only systems such as [31], though our training differs so the comparison is not directly attributable to the multilingual nature. We also reach within evidence recall as compared to English-only KGAT [21] and better than [31].

Dataset Model Prec@5 Rec@5 FEVER LA
Fair-Dev EnmBERT-EnmBERT 25.54 88.60 64.62 67.63
Fair-Dev EnRomBERT-EnRomBERT 25.20 88.03 61.16 65.20
Test EnmBERT-EnmBERT 25.27 87.38 62.30 65.26
Test EnRomBERT-EnRomBERT 24.91 86.80 58.78 63.18
Table 1: Calibration of models evaluation using the official fever scores in [33].

To better understand strengths and weaknesses and the impact of including Romanian evidence we do a per class performance analysis and we also calculate FEVER-2 (score for only ”SUPPORTS” and ”REFUTES” claims). The SotA on FEVER-2 is likely given in RAG [19] at without golden evidence (fair dev set). Our EnRomBERT model reaches within . The inclusion of the Romanian sentences improves the FEVER-2 score (see Fig. 3) coming within of [32] English-only FEVER-2 SotA of on golden-forcing.

Figure 3: Error analysis per class.

On SUPPORTS and REFUTES classes, EnRomBERT outperforms EnmBERT on both fair and golden-forcing datasets. In EnRomBERT, likely the additional noise from the second language inclusion improves generalization on the English language claims. Both models struggle on the NEI class which is not surprising since there were no NEI claims included in the training set.

Model Mixed En-En En-Ro Ro-En Ro-Ro
EnmBERT 95.00 95.00 50.00 65.00 85.00
EnRomBERT 95.00 95.00 25.00 0.00 50.00
Table 2: Claim Verification Label Accuracy () for Translated Parallel Claim - Evidence Sentences.

Transfer Learning Performance Table  2 shows EnmBERT and EnRomBERT zero-shot transfer learning ability. We evaluate the two models performance on the mixed 400 examples (mixed column), En-En, En-Ro English evidence and Romanian claims, Ro-En and Ro-Ro. We directly evaluate the claim verification step. It is interesting to see the differences in cross-lingual transfer learning ability for the Ro-En, En-Ro and Ro-Ro scenarios. EnmBERT’s label accuracy on Ro-Ro is as compared to for En-En, better than EnRomBERT. The pattern is similar for Ro-En and En-Ro. It is not surprising that EnmBERT outperforms EnRomBERT because EnRomBERT learned that Romanian evidence sentences are NEI (included as negative examples in sentence selection training) which led to a bias against the Romanian evidence.

Disinformation Case Study We now evaluate the EnRomBERT system results for the case study in Fig. 1. We retrieve supporting evidence in English, Romanian and Portuguese. The page title and summaries are directly retrieved using the MediaWiki API 101010 The system will be exposed as a demo service, with limitations on number of requests and latency. Based on the top predicted evidence (in 3 languages), the system predicts that the claim is supported.

5 Final Notes

We present a first approach to multilingual evidence retrieval and claim verification to combat global disinformation. We evaluate two systems, EnmBERT and EnRomBERT, and their cross-lingual transfer learning ability for claim verification. We make available a translated claim and evidence mixed English-Romanian dataset for future multilingual research evaluation.