Introduction
Internet provides an efficient way for individuals and organizations to quickly spread information to massive audiences. However, some information are true while some are false. Malicious people spread false news, which may have significant influence on public opinion, stock prices, even presidential elections [4]. Some research shows that false news reaches more people than the truth [18]
. In this paper, we study fact checking, a fundamental task in natural language processing whose goal is to automatically assess the truthfulness of a textual claim given textual evidence.

Specifically, we do studies on FEVER [14], short for Fact Extraction and VERification, which is one of the most influential benchmark datasets for fact checking. In FEVER, evidence comes from Wikipedia. A running example which will be used throughout the paper is given in Figure 1. Given a claim with supporting evidence as the input, our goal is to predict whether the evidence supports or refutes the claim, or there is not enough information to make the decision. Existing approaches are dominated by natural language inference models [1] because the task essentially requires matching between the claim and the evidence. In most cases, a claim is a sentence while the evidence contains multiple sentences. Therefore, mining adequate and concise information from pieces of evidence is useful for matching to the claim.
Existing studies typically concatenate evidence sentences into a single string, which is used in the top-ranked system during the official FEVER challenge [15], or add a feature fusion layer on top of evidence features to further aggregate information from evidence sentences [21]. However, both methods ignore the important relational structure of evidence sentences at semantic-level, including the participants, location, and temporality of events. Let us take the example in Figure 1. Making correct prediction for this claim requires a model to understand that “Rodney King riots” is occurred in “Los Angeles County” from the first evidence, and that “Los Angeles County” is “the most populous county in the USA” from the second evidence. Simply concatenating evidence sentences as a single string would give a large distance to relevant information pieces from difference evidence sentences. Feature fusion aggregates all the information in an implicit way, which makes it hard to reason over structural information.
To address the aforementioned issues, we present a graph-based reasoning approach for fact checking. We represent evidence sentences as a graph, where nodes are extracted by SRL (Semantic Role Labelling) [12]. Nodes belonging to the same predicate-argument structure are fully connected. We further use string similarity based measurement to connect nodes of certain types (e.g. arguments, location and temporal) which are extracted from different sentences. After obtaining the constructed graph, we present a graph-based model with XLNet [19] as the backbone. We present a graph-based contextual word representation learning module and a graph-based reasoning module to leverage the graph information. In the first module, we use graph information to redefine the distance between words and produce contextual word embedding for each word in both claim and evidence sentences. In the graph-based reasoning module, we take the contextual word representations as input, and match the claim and evidence sentences over two graphs.
Experiments show that both graph-based modules improve the performance. At the time of paper submission, our system (DREAM on the official leaderboard111https://competitions.codalab.org/competitions/18814#results) achieves state-of-the-art claim verification accuracy and FEVER score. This paper makes the following contributions:
-
We propose a graph-based reasoning approach, namely Dynamic REAsoning Machine (DREAM), for fact checking. We use SRL to construct graphs and propose two novel graph-based modules for graph-based representation learning and graph-based reasoning.
-
Results verify that both proposed modules bring improvements, and our final system achieves state-of-the-art performance.
Task Definition
FEVER (Fact Extraction and VERification) is a shared task proposed by thorne2018fact thorne2018fact, in which systems are required to assess the veracity of given claims with integrated information from multiple pieces of evidence. Evidence in this task needs to be retrieved from all the documents from Wikipedia. Specifically, with a given claim, the system is asked to search potential sentence-level evidence and state the claim as “SUPPORTED”, “REFUTED” of “NOT ENOUGH INFO (NEI)”, which indicate that the claim is supported or refuted by given evidence or is not verifiable. As the example shown in Figure 1
, verification of a claim requires the ability of aggregating pieces of information from multiple pieces of evidence and reasoning over them. In FEVER, there are two official evaluation metrics. The first one is accuracy for the three-way classification (SUPPORTED/REFUTED/NEI), which is also the main focus of this work because it directly shows the verification performance of our graph-based reasoning approach. For comparison with existing studies, we also report results in terms of the second metric, i.e. FEVER score, which additionally measures whether the retrieved evidence is correct for “SUPPORTED” and “REFUTED” categories.
Pipeline
In this section, we present an overview of our pipeline. At the high level, our pipeline consists of three main components: a document retrieval model, a sentence-level evidence selection model, and a claim verification model.
Figure 2 gives an overview of our pipeline, called the Dynamic REAsoning Machine (DREAM). Given a claim as input, the document retrieval model retrieves top related documents from a dump of WikiPedia. With retrieved documents, the sentence-level evidence selection model aims to select top relevant sentences as the predicted evidence. Finally, the claim verification model performs reasoning over the claim and predicted evidence, and states the veracity of the claim. We propose our reasoning framework in the claim verification model.

In this section, we briefly introduce our strategies for the first two models. The main contribution of this work is the graph-based reasoning approach we propose in the claim verification model, which we detail in the next section.
Document Retrieval Model
The document retrieval model takes a claim and a dump of Wikipedia as input, and returns most relevant documents. We mainly follow the UNC-NLP [9], which is the top-performing system on the competition hosted for FEVER shared task [15].
The document retrieval model first adopts a keywords matching mechanism [9] to filter candidate documents from the large-scale Wikipedia. Since a large proportion (10%) of the document’s titles have disambiguation information (e.g. “Vedam (film)” ), which is hard to be identified with literal matching, we further apply the NSMN [9] model to perform semantic matching between claims and candidate documents with disambiguation title. For the document with disambiguation title, the normalized matching score for claim and document will be calculated as:
(1) |
where represents the concatenation of the title and the first sentence come from document and
indicate the output normalized probability. The documents without disambiguation title are assigned with the highest matching score and the documents with disambiguation title are assigned by calculated matching score
. These documents will be ranked and added to the resulting list. Finally, our system selects top documents from the resulting list as the searched documents.Sentence-Level Evidence Selection Model
Evidence selection model selects the top potential evidence sentences by ranking all the candidate sentences from the documents retrieved by document retrieval model.
Evidence selector is required to conduct semantic matching between a claim and each evidence candidate. We employ pre-trained models like RoBERTa [7] and XLNet [19] as the sentence encoder. In our experiments, we use RoBERTa because it performs better. The input of our sentence encoder is
where and indicate tokenized word-pieces of original claim and evidence candidate. and are symbols indicating ending of a sentence and ending of a whole input, respectively. The final representation
is obtained via extracting the hidden vector of the [CLS] token.
denotes the dimension of hidden vector.(2) |
Then we employ an MLP layer and a softmax layer to compute score
for each evidence candidate:(3) |
where is a weight metric and
denotes bias vector. Afterwards, we rank all the evidence sentences by score
and select top potential evidence sentences.Claim Verification Model
In this section, we introduce our claim verification model, which is the main contribution of this work. The task requires the ability to aggregate pieces of information from pieces of evidence and do reasoning over it to make a conclusion. Such information across multiple evidence sentences has intrinsic structures, including both intra-sentence structure such as the argument, location, and temporal of an event and inter-sentence structure such as the same mention of an argument in two sentences. Instead of simply concatenating evidence sentences into a single string, we propose to reason over a semantic-level graph for claim classification.
Our approach contains three modules, including (1) graph construction module, which constructs two semantic graphs for evidence and claim separately, (2) graph-based contextual word representation learning module, which takes the constructed graph as the input to learn a graph-enhanced contextual representation for each word in the input, and (3) graph reasoning module, which takes the outputs from the previous two modules to conduct graph-level representation learning and reasoning, and makes the prediction. Details of each module will be described below.
Graph Construction

We first introduce the common notation about graph networks that will be used throughout the paper. Then we will introduce the details of graph construction.
Graph network framework is defined as the relational learning framework built based on the graph structure. A graph is denoted as , where denotes a set of nodes and represents edges connecting them. and denote the number of nodes and edges respectively. denotes a set of neighboring nodes that have an edge connects to node . The common input of our claim verification model is the tokenized word-pieces of length .
(4) |
where is the concatenation of top evidences.
We use the same method to construct graphs for evidence sentences and the claim. Below we take evidence as the example to describe the graph construction procedure. With given evidence or claim, our graph construction module operates in following steps.
-
Tuples (set of arguments nodes) are extracted via SRL toolkits. SRL is performed to identify arguments and their roles in a sentence. The sub-graph formed by the same tuple are fully-connected by inner-tuple edges.
-
We add an inter-tuple edge between each pair of nodes from different tuples if they potentially mention the same entity. We first employ NER (Named Entity Recognition)
[10] toolkits to extract entities mentioned in the content of nodes. Assuming entity and entity come from different tuples, we add one inter-tuple edge if one of following rules are satisfied: (1) is equal to ; (2) contains ; (3) the number of overlapped words between and is larger than the half of the minimum number of words in and .
Figure 3 shows an example of the constructed graph.
Contextual Word Representation with Graph-Based Distance
Traditional reasoning approaches usually concatenate the pieces of evidence in a sequential way and feed them into a pre-trained model (e.g., XLNet) to learn the contextual word representation. Since pre-trained models adopt the absolute distance of two words in the input sequence, some closely linked nodes in the constructed graph are far away from each other. To better model the structural information in the extracted graph, we present an algorithm to re-calculate the distance between each pair of nodes in the text by introducing the distance of two nodes in the constructed graph.
However, the whole distance metric will take huge memory space and calculation time considering that each word in the extracted graph has a distinct distance vector and each element in the vector is mapped into an embedding vector. Assuming the length of the input is 512 and the dimension of distance element is 1,024, the distance tensor takes almost 268 millions memory space for one sample, making it unable to implement the whole distance metric. To address this problem, we present a trade-off approach that uses a topology sort algorithm to sort words in the extracted graph. First, we use topology sort to sort nodes in the constructed graph to shorten the distance between two closely linked nodes. Second, we feed the sorted sequence into XLNet to get the relative position of words. Furthermore, topology sort can ensure that previous nodes are either its parent nodes or its sibling nodes. This characteristic helps the model to learn the dependencies in the extracted graph.
The details of the topology sort algorithm are shown at Algorithm 1. The algorithm begins from nodes without incident relations. For each node without incident relations, we recursively visit its child nodes in a depth first search way.
In this way, we obtain the graph-guided distances between words, which will be used as the input to the XLNet model. Then, XLNet maps the input of length into a sequence of hidden vectors as follows.
(5) |

Graph-Based Reasoning Network
Taking the graphs and graph-based distance matrices as input, we first initialize node representation based on the contextual word representation. Afterwards, we update the graph by propagating the information from neighboring nodes. Finally, after obtaining graph-level representations for claim-based and evidence-based graphs, we make the alignment between two graphs and make the final prediction.
Node Representation
The reasoning module, built on top of XLNet, takes the hidden vectors learned from XLNet to initialize the representation of nodes.
Each node in the graph is a word span in the input text. The initial representation of each node is the average of hidden vectors at corresponding position. Afterwards, the representation will be updated with graph learning module.
Graph Representation Learning
In this part, we present the graph learning module, which is designed to update representation of nodes by aggregating information from their neighbors. To capture the multi-hop relational information, we employ multi-layer graph convolutional network (GCNs) [6] to update the node representation. Our intuition of using GCNs is to utilize its ability to automatically aggregate information through edges.
Here we describe the GCNs. Formally, we denote as the graph constructed by the previous graph construction method and let be a matrix containing representation of all nodes, where and denote the number of nodes and dimension of nodes representation, respectively. Each row is the representation of node . We introduce an adjacency matrix of graph and its degree matrix , where we add self-loops to matrix and .
Specifically, one-layer GCNs will aggregate information through one-hop edges. We describe it as follows:
(6) |
where is the new -dimension representation of node , is the normalized symmetric adjacency matrix, is a weight matrix, and
is an activation function. To exploit information from the multi-hop neighboring nodes, we stack multiple GCNs layers:
(7) |
where denotes the layer number and is the initial representation of node initialized from the contextual representation. We simplify as for later use, where indicates the representation of all nodes updated by k-layer GCNs. The graph learning mechanism will be performed separately for claim-based and evidence-based graph. Therefore, we denote and as the representation of all nodes in claim-based graph and evidence-based graphs, respectively. Afterwards, we utilize the graph matching module to align the graph-level node representation learned for two graphs and make the final prediction.
Graph Matching
We need to explore the related information between two graphs and make semantic alignment for final prediction.
Formally, let and denote matrices containing representation of all nodes in evidence-based and claim-based graph respectively, where and denote number of nodes in the corresponding graph.
We first employ a graph attention mechanism [17] to generate claim-specific evidence representation for each node in claim-based graph. Specifically, we first take each as query, and take all node representation as keys. We then perform graph attention on the nodes, a attention mechanism to compute attention coefficient as follows:
(8) |
which means the importance of evidence node to the claim node . and is the weight matrix and is the dimension of attention feature. We use dot-product function as here. We then normalize using softmax function:
(9) |
Afterwards, we calculate claim-centric evidence representation using the weighted sum over :
(10) |
We then perform node-to-node alignment and calculate aligned vectors by the claim node representation and claim-centric evidence representation .
(11) |
where denotes the alignment function. Inspired by shen2018improved shen2018improved, we design our alignment function as:
(12) |
where is a weight matrix and is element-wise Hadamard product. The final output is obtained by the mean pooling over . We then feed the concatenated vector of and the final hidden vector from XLNet through a MLP layer for the final prediction.
Experiments
We conduct experiments on FEVER [14], a benchmark dataset for fact extraction and verification. Each instance in FEVER dataset consists of a claim, groups of ground-truth evidence from Wikipedia and a label (i.e., SUPPORTED, REFUTED, NOT ENOUGH INFO) indicating its veracity. Furthermore, FEVER is attached with a dump of Wikipedia, which contains 5,416,537 preprocessed documents. The statistic of FEVER is shown in Table 1.
Split | SUPPORTED | REFUTED | NEI |
---|---|---|---|
Training | 80,035 | 29,775 | 35,659 |
Dev | 6,666 | 6,666 | 6,666 |
Test | 6,666 | 6,666 | 6,666 |
The two official evaluation metrics of FEVER are label accuracy and FEVER score. Label accuracy is the primary evaluation metric we apply for our experiments because it directly represents the performance of the claim verification model. We also report FEVER score, which measures whether both the predicted label and the retrieved evidence are correct. FEVER score is calculated with equation 13, where is the ground truth label, is the predicted label, is a set of ground-truth evidence, and is a set of predicted evidence.
(13) | ||||
No evidence is required if the predicted label is NEI.
Baselines
We first select three top-performing systems on FEVER shared task as the baselines.
The UNC-NLP [9] employed a semantic matching neural network for both evidence selection and claim verification. They also employed additional features (e.g., WordNet features) and symbolic rules (e.g., keywords matching).
The UCL Machine Reading Group [20] verifies the veracity of each claim-evidence pair and aggregate predicited label.
The Athene UNK TU Darmstadt team [5] encodes each claim-evidence pair followed by pooling function.
Model Comparison
Table 2 reports the overall performance of our model on the blind test set with the score showed on the public leaderboard222The public leaderboard for perpetual evaluation of FEVER (https://competitions.codalab.org/competitions/18814#results). DREAM is our user name on the leaderboard.. As shown in the Table 2, our model significantly outperforms previous systems with 76.85% label accuracy and 70.60% FEVER score. At the time of paper submission, our system achieves state-of-the-art performance compared with other methods from leaderboard.
Method | Label | FEVER |
---|---|---|
Acc. (%) | Score (%) | |
Athene | 65.46 | 61.58 |
UCL Machine Reading Group | 67.62 | 62.52 |
UNC-NLP | 68.21 | 64.21 |
GEAR-single | 71.60 | 67.10 |
DREAM (our approach) | 76.85 | 70.60 |
Ablation Study
Table 3 presents the label accuracy on the development set after eliminating different components (including the graph-based relational distance and graph-based reasoning network) separately in our model. We also report the performance of our XLNet-based baseline, which does not take any graph information, equivalent to removing both components simultaneously.
Model | Label Accuracy |
---|---|
DREAM | 79.16 |
-w/o Relative position | 78.35 |
-w/o Graph Reasoning | 77.12 |
XLNet baseline | 75.40 |
As shown in Table 3, compared to the XLNet baseline, incorporating both graph-based modules brings 3.76% improvement on label accuracy. Removing the graph-based distance drops 0.81% in terms of label accuracy. The graph-based distance mechanism can shorten the distance of two closely-linked nodes and help the model to learn their dependency. Removing the graph-based reasoning module drops 2.04% because graph reasoning module captures the structural information and performs deep reasoning over that.
Document Retrieval Results
We evaluate the performance of our document retrieval module using recall metric, which is defined as the proportion of ground-truth documents that are successfully retrieved.
Table 4 reports the results of an efficient system (first row) which is built purely based on keywords from claim and titles of all the documents with Elastic Search333https://www.elastic.co/, and reports its combination with a neural network based model. The recall of the symbolic system is good, yet can be improved by the neural model. It is a trade-off between efficiency and performance in the real application.
Method | Train | Dev. |
---|---|---|
Recall | Recall | |
Keywords+Elastic Search | 80.46 | 83.33 |
Keywords+Elastic Search+NNSM | 89.16 | 89.85 |
Evidence Selection Results
In this part, we present the performance of the sentence-level evidence selection module that we develop with different backbone. We take the concatenation of claim and each evidence as input, and take the last hidden vector to calculate the score for evidence ranking. Results from Table 5 indicate that RoBERTa performs slightly better than XLNet.
Model | Dev. Set | Test Set | ||||
---|---|---|---|---|---|---|
Acc. | Rec. | F1 | Acc. | Rec. | F1 | |
XLNet | 26.60 | 87.33 | 40.79 | 25.55 | 85.34 | 39.33 |
RoBERTa | 26.67 | 87.64 | 40.90 | 25.63 | 85.57 | 39.45 |
Error Analysis
We randomly select 200 incorrectly predicted instances and summarize the primary types of errors.
The first type of errors is caused by failing to match the semantic meaning between phrases that describe the same event. For example, the claim states “Winter’s Tale is a book.” while the evidence states “Winter ’s Tale is a 1983 novel by Mark Helprin.”. The model fails to realize that “novel” belongs to “book” and stats that the claim is refuted. Solving this type of error needs to involve external knowledge (e.g. ConceptNet [13]) that can indicate logical relationships between different events.
The misleading information in retrieved evidence causes the second type of errors. For example, the claim states “The Gifted is a movie”, and the ground-truth evidence states “The Gifted is an upcoming American television series”. How ever, the retrieved evidence also contains “The Gifted is a 2014 Filipino dark comedy-drama movie.”, which misleads the model to make the wrong judgement.
Related Work
In general, fact checking involves assessing the truthfulness of a claim. In literature, a claim can be a text or a subject-predicate-object triple [8]. In this work, we only consider textual claim. Existing datasets differ from data source and the type of supporting evidence for verifying the claim. An early work by vlachos2014fact vlachos2014fact construct 221 labeled claims in the political domain from POLITIFACT.COM and CHANNEL4.COM, given meta-data of the speaker as the evidence. POLIFACT is further investigated by following works, including ferreira2016emergent ferreira2016emergent who build Emergent with 300 labeled rumors and about 2.6K news article, wang2017liar wang2017liar who build LIAR with 12.8K annotated short statements and six fine-grained labels, and rashkin2017truth rashkin2017truth who collect claims without meta-data while providing 74K news articles. We study FEVER [14], which requires aggregating information from multiple pieces of evidence from Wikipedia for making the conclusion. FEVER contains 185,445 annotated instances, which to the best of our knowledge is the largest benchmark dataset in this area. We plan to study fact checking with adversarial attacks [16, 11] in the future.
The majority of participating teams in the FEVER challenge [15] use the same pipeline consisting of three components, namely document selection, evidence sentence selection, and claim verification. In document selection phase, participants typically extract named entities from a claim as the query and use Wikipedia search API. In the evidence selection phase, participants measure the similarity between the claim and an evidence sentence candidate by training a classification model like Enhanced LSTM [2] in a supervised setting or using string similarity function like TFIDF without trainable parameters. In this work, our focus is the claim classification phase. Top-ranked three systems aggregate pieces of evidence through concatenating evidence sentences into a single string [9]
, classifying each evidence-claim pair separately and merge the results
[20], and encoding each evidence-claim pair followed by pooling operation [5]. A recent work by zhou-etal-2019-gear zhou-etal-2019-gear is the first to use BERT to calculate claim-specific evidence sentence representation, and then develop a graph network to aggregate the information on top of BERT, regarding each evidence as a node in the graph. Our work differs from zhou-etal-2019-gear zhou-etal-2019-gear in (1) that the construction of our graph requires understanding the syntax of each sentence, which could be viewed as a more fine-grained graph, and (2) that both the contextual representation learning module and the reasoning module have model innovations of taking consideration of the graph information. Instead of training each component separately, yin2018twowingos yin2018twowingos show that joint learning could improve both claim verification and evidence identification.Conclusion
In this work, we present a graph-based approach for fact checking. When assessing the veracity of a claim given multiple evidence sentences, our approach does not conduct text-based matching at word or sentence level. Instead, our approach is built upon an automatically constructed graph, which is derived based on semantic role labeling. To better exploit the graph information, we propose two graph-based modules, one for calculating contextual word embedding using graph-based distance in XLNet, and another for learning representation for graph components and reasoning over the graph. Experiments show that both graph-based modules bring improvements and our final system is the state-of-the-art on the public leaderboard at the time of paper submission. In the future, we plan to leverage external background knowledge about the claim and evidence to improve model’s reasoning ability.
References
- [1] (2014) Naturalli: natural logic inference for common sense reasoning. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 534–545. Cited by: Introduction.
- [2] (2016) Enhanced lstm for natural language inference. arXiv preprint arXiv:1609.06038. Cited by: Related Work.
- [3] (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: Baselines.
- [4] (2017) Partisanship, propaganda, and disinformation: online media and the 2016 us presidential election. Cited by: Introduction.
- [5] (2018) UKP-athene: multi-sentence textual entailment for claim verification. arXiv preprint arXiv:1809.01479. Cited by: 3rd item, Related Work.
- [6] (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: Graph Representation Learning.
- [7] (2019) RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: Sentence-Level Evidence Selection Model.
- [8] (2014) Language-aware truth assessment of fact candidates. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1009–1019. Cited by: Related Work.
-
[9]
(2019)
Combining fact extraction and verification with neural semantic matching networks.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 6859–6866. Cited by: Document Retrieval Model, Document Retrieval Model, 1st item, Related Work. - [10] (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365. Cited by: 2nd item.
- [11] (2019) Towards debiasing fact verification models. arXiv preprint arXiv:1908.05267. Cited by: Related Work.
- [12] (2019) Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255. Cited by: Introduction.
- [13] (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: Error Analysis.
- [14] (2018) FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355. Cited by: Introduction, Experiments, Related Work.
- [15] (2018) The fact extraction and verification (fever) shared task. arXiv preprint arXiv:1811.10971. Cited by: Introduction, Document Retrieval Model, Related Work.
- [16] (2019) Adversarial attacks against fact extraction and verification. arXiv preprint arXiv:1903.05543. Cited by: Related Work.
- [17] (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: Graph Matching.
- [18] (2018) The spread of true and false news online. Science 359 (6380), pp. 1146–1151. Cited by: Introduction.
- [19] (2019) XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237. Cited by: Introduction, Sentence-Level Evidence Selection Model.
- [20] (2018) Ucl machine reading group: four factor framework for fact finding (hexaf). In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pp. 97–102. Cited by: 2nd item, Related Work.
- [21] (2019-07) GEAR: graph-based evidence aggregating and reasoning for fact verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 892–901. External Links: Link Cited by: Introduction, Baselines.
Comments
There are no comments yet.