Relation extraction aims to detect relations among entities in the text and plays a significant role in a variety of natural language processing applications. Early research efforts focus on predicting relations between entities within the sentence(Zeng et al., 2014; Xu et al., 2015a, b). However, valuable relational information between entities, such as biomedical findings, is expressed by multiple mentions across sentence boundaries in real-world scenarios (Peng et al., 2017). Therefore, the scope of extraction in biomedical domain has recently been expanded to cross-sentence level (Quirk and Poon, 2017; Gupta et al., 2018; Song et al., 2019).
A more challenging, yet practical extension, is the document-level relation extraction, where a system needs to comprehend multiple sentences to infer the relations among entities by synthesizing relevant information from the entire document (Jia et al., 2019; Yao et al., 2019). Figure 1 shows an example adapted from the recently proposed document-level dataset DocRED (Yao et al., 2019). In order to infer the inter-sentence relation (i.e., country of citizenship) between Yulia Tymoshenko and Ukrainian, one first has to identify the fact that Lutsenko works with Yulia Tymoshenko. Next we identify that Lutsenko manages internal affairs, which is a Ukrainian authority. After incrementally connecting the evidence in the document and performing the step-by-step reasoning, we are able to infer that Yulia Tymoshenko is also a Ukrainian.
Prior efforts show that interactions between mentions of entities facilitate the reasoning process in the document-level relation extraction. Thus, Verga et al. (2018) and Jia et al. (2019) leverage Multi-Instance Learning (Riedel et al., 2010; Surdeanu et al., 2012). On the other hand, structural information has been used to perform better reasoning since it models the non-local dependencies that are obscure from the surface form alone. Peng et al. (2017) construct dependency graph to capture interactions among -ary entities for cross-sentence extraction. Sahu et al. (2019) extend this approach by using co-reference links to connect dependency trees of sentences to construct the document-level graph. Instead, Christopoulou et al. (2019) construct a heterogeneous graph based on a set of heuristics, and then apply an edge-oriented model (Christopoulou et al., 2018) to perform inference.
Unlike previous methods, where a document-level structure is constructed by co-references and rules, our proposed model treats the graph structure as a latent variable and induces it in an end-to-end fashion. Our model is built based on the structured attention (Kim et al., 2017; Liu and Lapata, 2018). Using a variant of Matrix-Tree Theorem (Tutte, 1984; Koo et al., 2007), our model is able to generate task-specific dependency structures for capturing non-local interactions between entities. We further develop an iterative refinement strategy, which enables our model to dynamically build the latent structure based on the last iteration, allowing the model to incrementally capture the complex interactions for better multi-hop reasoning Welbl et al. (2018).
Experiments show that our model significantly outperforms the existing approaches on DocRED, a large-scale document-level relation extraction dataset with a large number of entities and relations, and also yields new state-of-the-art results on two popular document-level relation extraction datasets in the biomedical domain. The code and pretrained model are available at https://github.com/nanguoshun/LSR 111 Our model is implemented in PyTorch
Our model is implemented in PyTorch(Paszke et al., 2017).
Our contributions are summarized as follows:
We construct a document-level graph for inference in an end-to-end fashion without relying on co-references or rules, which may not always yield optimal structures. With the iterative refinement strategy, our model is able to dynamically construct a latent structure for improved information aggregation in the entire document.
We perform quantitative and qualitative analyses to compare with the state-of-the-art models in various settings. We demonstrate that our model is capable of discovering more accurate inter-sentence relations by utilizing a multi-hop reasoning module.
In this section, we present our proposed Latent Structure Refinement (LSR) model for the document-level relation extraction task. Our LSR model consists of three components: node constructor, dynamic reasoner, and classifier. The node constructor first encodes each sentence of an input document and outputs contextual representations. Representations that correspond to mentions and tokens on the shortest dependency path in a sentence are extracted as nodes. The dynamic reasoner is then applied to induce a document-level structure based on the extracted nodes. Representations of nodes are updated based on information propagation on the latent structure, which is iteratively refined. Final representations of nodes are used to calculate classification scores by the classifier.
2.1 Node Constructor
Node constructor encodes sentences in a document into contextual representations and constructs representations of mention nodes, entity nodes and meta dependency paths (MDP) nodes, as shown in Figure 2. Here MDP indicates a set of shortest dependency paths for all mentions in a sentence, and tokens in the MDP are extracted as MDP nodes.
2.1.1 Context Encoding
Given a document , each sentence in it is fed to the context encoder, which outputs the contextualized representations of each word in . The context encoder can be a bidirectional LSTM (BiLSTM) (Schuster and Paliwal, 1997) or BERT (Devlin et al., 2019). Here we use the BiLSTM as an example:
where , , and
represent the hidden representations of the-th, (+)-th and (-)-th token in the sentence of two directions, and denotes the word embedding of the -th token. Contextual representation of each token in the sentence is represented as by concatenating hidden states of two directions, where and is the dimension.
2.1.2 Node Extraction
We construct three types of nodes for a document-level graph: mention nodes, entity nodes and meta dependency paths (MDP) nodes as shown in Figure 2. Mention nodes correspond to different mentions of entities in each sentence. The representation of an entity node is computed as the average of its mentions. To build a document-level graph, existing approaches use all nodes in the dependency tree of a sentence (Sahu et al., 2019) or one sentence-level node by averaging all token representations of the sentence (Christopoulou et al., 2019). Alternatively, we use tokens on the shortest dependency path between mentions in the sentence. The shortest dependency path has been widely used in the sentence-level relation extraction as it is able to effectively make use of relevant information while ignoring irrelevant information (Bunescu and Mooney, 2005; Xu et al., 2015a, b). Unlike sentence-level extraction, where each sentence only has two entities, each sentence here may involve multiple mentions.
2.2 Dynamic Reasoner
The dynamic reasoner has two modules, structure induction and multi-hop reasoning as shown in Figure 3. The structure induction module is used to learn a latent structure of a document-level graph. The multi-hop reasoning module is used to perform inference on the induced latent structure, where representations of each node will be updated based on the information aggregation scheme. We stack blocks in order to iteratively refine the latent document-level graph for better reasoning.
2.2.1 Structure Induction
Unlike existing models that use co-reference links (Sahu et al., 2019) or heuristics (Christopoulou et al., 2019) to construct a document-level graph for reasoning, our model treats the graph as a latent variable and induces it in an end-to-end fashion. The structure induction module is built based on the structured attention (Kim et al., 2017; Liu and Lapata, 2018). Inspired by Liu and Lapata (2018), we use a variant of Kirchhoff’s Matrix-Tree Theorem (Tutte, 1984; Koo et al., 2007) to induce the latent dependency structure.
Let denote the contextual representation of the -th node, where , we first calculate the pair-wise unnormalized attention score between the -th and the -th node with the node representations and . The score
is calculated by two feed-forward neural networks and a bilinear transformation:
where and are weights for two feed-forward neural networks, is the dimension of the node representations, and
is applied as the activation function.are the weights for the bilinear transformation. Next we compute the root score
which represents the unnormalized probability of the-th node to be selected as the root node of the structure:
is the weight for the linear transformation. FollowingKoo et al. (2007), we calculate the marginal probability of each dependency edge of the document-level graph. For a graph with nodes, we first assign non-negative weights to the edges of the graph:
Here, can be interpreted as a weighted adjacency matrix of the document-level entity graph. Finally, we can feed into the multi-hop reasoning module to update the representations of nodes in the latent structure.
2.2.2 Multi-hop Reasoning
Graph neural networks have been widely used in different tasks to perform multi-hop reasoning (Song et al., 2018a; Yang et al., 2019; Tu et al., 2019; Lin et al., 2019), as they are able to effectively collect relevant evidence based on an information aggregation scheme. Specifically, our model is based on graph convolutional networks (GCNs) (Kipf and Welling, 2017) to perform reasoning.
Formally, given a graph with nodes, which can be represented with an adjacency matrix induced by the previous structure induction module, the convolution computation for the node at the -th layer, which takes the representation from previous layer as input and outputs the updated representations , can be defined as:
are the weight matrix and bias vector for the-th layer, respectively.
is the ReLU(Nair and Hinton, 2010) activation function. is the initial contextual representation of the -th node constructed by the node constructor.
Following Guo et al. (2019b), we use dense connections to the GCNs in order to capture more structural information on a large document-level graph. With the help of dense connections, we are able to train a deeper model, allowing richer local and non-local information to be captured for learning a better graph representation. The computations on each graph convolution layer is similar to Equation (9).
2.2.3 Iterative Refinement
Though structured attention (Kim et al., 2017; Liu and Lapata, 2018) is able to automatically induce a latent structure, recent research efforts show that the induced structure is relatively shallow and may not be able to model the complex dependencies for document-level input (Liu et al., 2019b; Ferracane et al., 2019). Unlike previous work (Liu and Lapata, 2018) that only induces the latent structure once, we repeatedly refine the document-level graph based on the updated representations, allowing the model to infer a more informative structure that goes beyond simple parent-child relations.
As shown in Figure 3, we stack blocks of the dynamic reasoner in order to induce the document-level structure times. Intuitively, the reasoner induces a shallow structure at early iterations since the information propagates mostly between neighboring nodes. As the structure gets more refined by interactions with richer non-local information, the induction module is able to generate a more informative structure.
After times of refinement, we obtain representations of all the nodes. Following Yao et al. (2019), for each entity pair , we use a bilinear function to compute the probability for each relation type as:
where and are trainable weights and bias, with being the number of relation categories, is the sigmoid function, and the subscript in the right side of the equation refers to the relation type.
We evaluate our model on DocRED (Yao et al., 2019), the largest human-annotated dataset for document-level relation extraction, and another two popular document-level relation extraction datasets in the biomedical domain, including Chemical-Disease Reactions (CDR) Li et al. (2016a) and Gene-Disease Associations (GDA) Wu et al. (2019). DocRED contains documents for training, for development and for test, totally with entities and relational facts. CDR consists of training instances, development instances, and testing instances. GDA contains documents for training and for test. We follow Christopoulou et al. (2019) to split training set of GDA into an 80/20 split for training and development.
With more than of the relational facts requiring reading and reasoning over multiple sentences, DocRED significantly differs from previous sentence-level datasets (Doddington et al., 2004; Hendrickx et al., 2009; Zhang et al., 2018). Unlike existing document-level datasets (Li et al., 2016a; Quirk and Poon, 2017; Peng et al., 2017; Verga et al., 2018; Jia et al., 2019) that are in the specific biomedical domain considering only the drug-gene-disease relation, DocRED covers a broad range of categories with relation types.
|Induction block number||2|
We use spaCy222https://spacy.io/ to get the meta dependency paths of sentences in a document. Following Yao et al. (2019) and Wang et al. (2019), we use the GloVe (Pennington et al., 2014) embedding with BiLSTM, and Uncased BERT-Base (Devlin et al., 2019) as the context encoder. All hyper-parameters are tuned based on the development set. We list some of the important hyper-parameters in Table 1.
Following Yao et al. (2019), we use and Ign
as the evaluation metrics. Igndenotes scores excluding relational facts shared by the training and dev/test sets. scores for intra- and inter-sentence entity pairs are also reported. Evaluation on the test set is done through CodaLab333https://competitions.codalab.org/competitions/20717.
3.3 Main Results
|CNN Yao et al. (2019)||41.58||43.45||51.87||37.58||40.33||42.26|
|LSTM Yao et al. (2019)||48.44||50.68||56.57||41.47||47.71||50.07|
|BiLSTM Yao et al. (2019)||48.87||50.94||57.05||43.49||48.78||51.06|
|ContexAware Yao et al. (2019)||48.94||51.09||56.74||42.26||48.40||50.70|
|GCNN ♣ Sahu et al. (2019)||46.22||51.52||57.78||44.11||49.59||51.62|
|EoG ♣ Christopoulou et al. (2019)||45.94||52.15||58.90||44.60||49.48||51.82|
|GAT ♣ Veličković et al. (2018)||45.17||51.44||58.14||43.94||47.36||49.51|
|AGGCN ♣ Guo et al. (2019a)||46.29||52.47||58.76||45.45||48.89||51.45|
|BERT Wang et al. (2019)||-||54.16||61.61||47.15||-||53.20|
|Two-Phase BERT Wang et al. (2019)||-||54.42||61.80||47.28||-||53.92|
We compare our proposed LSR with the following three types of competitive models on the DocRED dataset, and show the main results in Table 2.
Graph-based Models. These models construct task-specific graphs for inference. GCNN (Sahu et al., 2019) constructs a document-level graph by co-reference links, and then applies relational GCNs for reasoning. EoG (Christopoulou et al., 2019) is the state-of-the-art document-level relation extraction model in biomedical domain. EoG first uses heuristics to construct the graph, then leverages an edge-oriented model to perform inference. GCNN and EoG are based on static structures. GAT (Veličković et al., 2018) is able to learn the weighted graph structure based on a local attention mechanism. AGGCN (Guo et al., 2019a) is the state-of-the-art sentence-level relation extraction model, which constructs the latent structure by self-attention. These two models are able to dynamically construct task-specific structures.
BERT-based Models. These models fine-tune BERT (Devlin et al., 2019) for DocRED. Specifically, Two-Phase BERT Wang et al. (2019) is the best reported model. It is a pipeline model, which predicts if the relation exists between entity pairs in the first phase and predicts the type of the relation in the second phase.
As shown in Table 2, LSR with GloVe achieves 54.18 on the test set, which is the new state-of-the-art result for models with GloVe. In particular, our model consistently outperforms sequence-based models by a significant margin. For example, LSR improves upon the best sequence-based model BiLSTM by points in terms of . This suggests that models which directly encode the entire document are unable to capture the inter-sentence relations present in documents.
Under the same setting, our model consistently outperforms graph-based models based on static graphs or attention mechanisms. Compared with EoG, our LSR model achieves 3.0 and 2.4 higher on development and test set, respectively. We also have similar observations for the GCNN model, which shows that a static document-level graph may not be able to capture the complex interactions in a document. The dynamic latent structure induced by LSR captures richer non-local dependencies. Moreover, LSR also outperforms GAT and AGGCN. This empirically shows that compared to the models that use local attention and self-attention Veličković et al. (2018); Guo et al. (2019a), LSR can induce more informative document-level structures for better reasoning. Our LSR model also shows its superiority under the setting of Ign .
In addition, LSR with GloVe obtains better results than two BERT-based models. This empirically shows that our model is able to capture long-range dependencies even without using powerful context encoders. Following Wang et al. (2019), we leverage BERT as the context encoder. As shown in Table 2, our LSR model with BERT achieves a 59.05 score on DocRED, which is a new state-of-the-art result. As of the ACL deadline on the 9th of December 2019, we held the first position on the CodaLab scoreboard under the alias diskorak.
3.4 Intra- and inter-sentence performance
In this subsection, we analyze intra- and inter-sentence performance on the development set. An entity pair requires inter-sentence reasoning if the two entities from the same document have no mentions in the same sentence. In DocRED’s development set, about of entity pairs require information aggregation over multiple sentences.
Under the same setting, our LSR model outperforms all other models in both intra- and inter-sentence setting. The differences in scores between LSR and other models in the inter-sentence setting tend to be larger than the differences in the intra-sentence setting. These results demonstrate that the majority of LSR’s superiority comes from the inter-sentence relational facts, suggesting that the latent structure induced by our model is indeed capable of synthesizing the information across multiple sentences of a document.
Furthermore, LSR with GloVe also proves better in the inter-sentence setting compared with two BERT-based Wang et al. (2019) models, indicating latent structure’s superiority in resolving long-range dependencies across the whole document compared with the BERT encoder.
3.5 Results on the Biomedical Datasets
|Gu et al. (2017)||61.3||57.2||11.7|
|Nguyen and Verspoor (2018)||62.3||-||-|
|Verga et al. (2018)||62.1||-||-|
|Sahu et al. (2019)||58.6||-||-|
|Christopoulou et al. (2019)||63.6||68.2||50.9|
|LSR w/o MDP Nodes||64.8||68.9||53.1|
|Peng et al. (2016)||63.1||-||-|
|Li et al. (2016b)||67.3||58.9||-|
|Panyam et al. (2018)||60.3||65.1||45.7|
|Zheng et al. (2018)||61.5||-||-|
Table 3 depicts the comparisons with state-of-the-art models on the CDR dataset. Gu et al. (2017); Nguyen and Verspoor (2018); Verga et al. (2018) leverage sequence-based models. Convolutional neural networks and self-attention networks are used as the encoders. Sahu et al. (2019); Christopoulou et al. (2019) use graph-based models. As shown in Table 3, our LSR performs worse than the state-of-the-art models. It is challenging for an off-the-shelf parser to get high quality dependency trees in the biomedical domain, as we observe that the MDP nodes extracted by the spaCy parser from the CDR dataset contains much less informative context compared with the nodes from DocRED. Here we introduce a simplified LSR model indicated as “LSR w/o MDP Nodes” , which removes the MDP nodes and builds a fully-connected graph using all tokens of a document. It shows that “LSR w/o MDP Nodes” consistently outperforms sequence-based and graph-based models, indicating the effectiveness the latent structure. Moreover, the simplified LSR outperforms most of the models with external resources, except for Li et al. (2016b), which leverages co-training with additional unlabeled training data. We believe such a setting also benefits our LSR model.
|NoInf Christopoulou et al. (2019)||74.6||79.1||49.3|
|Full Christopoulou et al. (2019)||80.8||84.1||54.7|
|EoG Christopoulou et al. (2019)||81.5||85.2||50.0|
|LSR w/o MDP Nodes||82.2||85.4||51.1|
Table 4 shows the results on the distantly supervised GDA dataset. Here “Full” indicates EoG model with a fully connected graph as the inputs, while “NoInf” is a variant of EoG model without inference component Christopoulou et al. (2018). The simplified LSR model achieves the new state-of-the-art result on GDA. The “Full” model Christopoulou et al. (2019) yields a higher score on the inter-sentence setting while having a relatively low score on the intra-sentence. It is likely because that this model neglects the differences between relations expressed within the sentence and across sentences.
3.6 Model Analysis
In this subsection, we use the development set of DocRED to demonstrate the effectiveness of the latent structure and refinements.
3.6.1 Does Latent Structure Matter?
We investigate the extent to which the latent structures, that are induced and iteratively refined by the proposed dynamic reasoner, help to improve the overall performance. We experiment with the three different structures defined below. For fair comparisons, we use the same GCN model to perform multi-hop reasoning for all these structures.
We use the rule-based structure in EoG Christopoulou et al. (2019). Also, We adapt rules from De Cao et al. (2019) for multi-hop question answering, i.e., each mention node is connected to its entity node and to the same mention nodes across sentences, while mention nodes and MDP nodes which reside in the same sentence are fully connected. The model is termed QAGCN.
We explore multiple settings of these models with different block numbers ranging from 1 to 4, where a block is composed of a graph construction component and a densely connected GCN component. As shown in Figure 4, LSR outperforms QAGCN, EoG and AGGCN in terms of overall . This empirically confirms our hypothesis that the latent structure induced by LSR is able to capture a more informative context for the entire document.
3.6.2 Does Refinement Matter?
As shown in Figure 4, our LSR yields the best performance in the second refinement, outperforming the first induction by 0.72% in terms of overall . This indicates that the proposed LSR is able to induce more accurate structures by iterative refinement. However, too many iterations may lead to an drop due to over-fitting.
|- 1 Refinement||54.42||60.46||47.67|
|- 2 Structure Induction||51.91||58.08||45.04|
|- 1 Multi-hop Reasoning||54.49||59.75||47.49|
|- 2 Multi-hop Reasoning||54.24||60.58||47.15|
|- MDP nodes||54.20||60.54||47.12|
3.7 Ablation Study
Table 5 shows scores of the full LSR model and with different components turned off one at a time. We observe that most of the components contribute to the main model, as the performance deteriorates with any of the components missing. The most significant difference is visible in the structure induction module. Removal of structure induction part leads to a 3.26 drop in terms of score. This result indicates that the latent structure plays a key role in the overall performance.
3.8 Case Study
In Figure 5, we present a case study to analyze why the latent structure induced by our proposed LSR performs better than the structures learned by AGGCN. We use the entity World War II to illustrate the reasoning process and our goal here is to predict the relation of the entity pair Japan, World War II. As shown in Figure 5, in the first refinement of LSR, Word War II interacts with several local mentions with higher attention scores, e.g., 0.43 for the mention Lake Force, which will be used as a bridge between the mention Japan and World War II. In the second refinement, the attention scores of several non-local mentions, such as Japan and Imperial Japanese Army, significantly increase from 0.09 to 0.41, and 0.17 to 0.37, respectively, indicating that information is propagated globally at this step. With such intra- and inter-sentence structures, the relation of the entity pair Japan, World War II can be predicted as “participant of”, which is denoted by P1344. Compared with LSR, the attention scores learned by AGGCN are much more balanced, indicating that the model may not be able to construct an informative structure for inference, e.g., the highest score is 0.27 in the second head, and most of the scores are near 0.11.
We also depict the predicted relations of ContextAware, AGGCN and LSR on the graph on the right side of the Figure 5. Interested reader could refer to Yao et al. (2019) for the definition of a relation, such as P607, P17, etc. The LSR model proves capable of filling out the missing relation for Japan, World War II that requires reasoning across sentences. However, LSR also attends to the mention New Ireland with a high score, thus failing to predict that the entity pair New Ireland, World War II actually has no relation (NIL type).
4 Related Work
Document-level relation extraction.
Early efforts focus on predicting relations between entities within a single sentence by modeling interactions in the input sequence (Zeng et al., 2014; Wang et al., 2016; Zhou et al., 2016; Zhang et al., 2017; Guo et al., 2020) or the corresponding dependency tree (Xu et al., 2015a, b; Liu et al., 2015; Miwa and Bansal, 2016; Zhang et al., 2018). These approaches do not consider interactions across mentions and ignore relations expressed across sentence boundaries. Recent work begins to explore cross-sentence extraction (Quirk and Poon, 2017; Peng et al., 2017; Gupta et al., 2018; Song et al., 2018c, 2019). Instead of using discourse structure understanding techniques (Liu et al., 2019a; Lei et al., 2017, 2018), these approaches leverage the dependency graph to capture inter-sentence interactions, and their scope is still limited to several sentences. More recently, the extraction scope has been expanded to the entire document (Verga et al., 2018; Jia et al., 2019; Sahu et al., 2019; Christopoulou et al., 2019) in the biomedical domain by only considering a few relations among chemicals. Unlike previous work, we focus on document-level relation extraction datasets (Yao et al., 2019; Li et al., 2016a; Wu et al., 2019) from different domains with a large number of relations and entities, which require understanding a document and performing multi-hop reasoning.
Structure-based relational reasoning.
Structural information has been widely used for relational reasoning in various NLP applications including question answering (Dhingra et al., 2018; De Cao et al., 2019; Song et al., 2018a) and relation extraction (Sahu et al., 2019; Christopoulou et al., 2019). Song et al. (2018a) and De Cao et al. (2019) leverage co-reference information and set of rules to construct document-level entity graph. GCNs (Kipf and Welling, 2017) or GRNs (Song et al., 2018b) are applied to perform reasoning for multi-hop question answering (Welbl et al., 2018). Sahu et al. (2019) also utilize co-reference links to construct the dependency graph and use labelled edge GCNs (Marcheggiani and Titov, 2017) for document-level relation extraction. Instead of using GNNs, Christopoulou et al. (2019) use the edge-oriented model (Christopoulou et al., 2018) for logical inference based on a heterogeneous graph constructed by heuristics. Unlike previous approaches that use syntactic trees, co-references or heuristics, LSR model treats the document-level structure as a latent variable and induces it in an iteratively refined fashion, allowing the model to dynamically construct the graph for better relational reasoning.
We introduce a novel latent structure refinement (LSR) model for better reasoning in the document-level relation extraction task. Unlike previous approaches that rely on syntactic trees, co-references or heuristics, LSR dynamically learns a document-level structure and makes predictions in an end-to-end fashion. There are multiple avenues for future work. One possible direction is to extend the scope of structure induction for constructions of nodes without relying on an external parser.
We would like to thank the anonymous reviewers for their thoughtful and constructive comments. This research is supported by Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 Programme (MOE AcRF Tier 2 Award No: MOE2017-T2-1-156). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Ministry of Education, Singapore.
- A shortest path dependency kernel for relation extraction. In Proc. of EMNLP, Cited by: §2.1.2.
- Bidirectional recurrent convolutional neural network for relation classification. In Proc. of ACL, Cited by: 1st item.
- A walk-based model on entity graphs for relation extraction. In Proc. of ACL, Cited by: §1, §3.5, §4.
- Connecting the dots: document-level neural relation extraction with edge-oriented graphs. In Proc. of EMNLP, Cited by: §1, §2.1.2, §2.2.1, 2nd item, §3.1, §3.5, §3.5, §3.6.1, Table 2, Table 3, Table 4, §4, §4.
- Question answering by reasoning across documents with graph convolutional networks. In Proc. of NAACL-HLT, Cited by: §3.6.1, §4.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, Cited by: §2.1.1, 3rd item, §3.2.
- Neural models for reasoning over multiple mentions using coreference. In Proc. of NAACL-HLT, Cited by: §4.
- The automatic content extraction (ace) program - tasks, data, and evaluation. In Proc. of LREC, Cited by: §3.1.
- Evaluating discourse in structured text representations. In Proc. of ACL, Cited by: §2.2.3.
- Chemical-induced disease relation extraction via convolutional neural network. Database: The Journal of Biological Databases and Curation 2017. Cited by: §3.5, Table 3.
- Learning latent forests for medical relation extraction. In Proc. of IJCAI, Cited by: §4.
- Attention guided graph convolutional networks for relation extraction. In Proc. of ACL, Cited by: 2nd item, §3.3, §3.6.1, Table 2.
- Densely connected graph convolutional networks for graph-to-sequence learning. Transactions of the Association for Computational Linguistics 7, pp. 297–312. Cited by: §2.2.2.
- Neural relation extraction within and across sentence boundaries. In Proc. of AAAI, Cited by: §1, §4.
- SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In Proc. of NAACL-HLT, Cited by: §3.1.
- Document-level n-ary relation extraction with multiscale representation learning. In Proc. of NAACL-HLT, Cited by: §1, §1, §3.1, §4.
- Structured attention networks. In Proc. of ICLR, Cited by: §1, §2.2.1, §2.2.3.
- Semi-supervised classification with graph convolutional networks. In Proc. of ICLR, Cited by: §2.2.2, §4.
- Structured prediction models via the matrix-tree theorem. In Proc. of EMNLP-CoNLL, Cited by: §1, §2.2.1, §2.2.1, §2.2.1, §2.2.1.
- SWIM: a simple word interaction model for implicit discourse relation recognition. In Proc of. IJCAI, Cited by: §4.
- Linguistic properties matter for implicit discourse relation recognition: combining semantic interaction, topic continuity and attribution. In Proc of. AAAI, Cited by: §4.
- BioCreative v cdr task corpus: a resource for chemical disease relation extraction. Database : the journal of biological databases and curation 2016. Cited by: §3.1, §3.1, §4.
- CIDExtractor: a chemical-induced disease relation extraction system for biomedical literature. In Proc. of BIBM, Cited by: §3.5, Table 3.
- KagNet: knowledge-aware graph networks for commonsense reasoning.. In Proc. of EMNLP, Cited by: §2.2.2.
- Discourse representation parsing for sentences and documents. In Proc of. ACL, Cited by: §4.
- Learning structured text representations. Transactions of the Association for Computational Linguistics 6, pp. 63–75. Cited by: §1, §2.2.1, §2.2.3.
Single document summarization as tree induction. In Proc. of NAACL-HLT, Cited by: §2.2.3.
- A dependency-based neural network for relation classification. In Proc. of ACL, Cited by: §4.
- Encoding sentences with graph convolutional networks for semantic role labeling. In Proc. of EMNLP, Cited by: §4.
- End-to-end relation extraction using lstms on sequences and tree structures. In Proc. of ACL, Cited by: §4.
- Rectified linear units improve restricted boltzmann machines. In Proc. of ICML, Cited by: §2.2.2.
- Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings. In Proc. of BioNLP, Cited by: §3.5, Table 3.
- Exploiting graph kernels for high performance biomedical relation extraction. Journal of Biomedical Semantics 9 (1), pp. 7. Cited by: Table 3.
- Automatic differentiation in pytorch. Cited by: footnote 1.
- Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics 5, pp. 101–115. Cited by: §1, §1, §3.1, §4.
- Improving chemical disease relation extraction with rich features and weakly labeled data. Journal of Cheminformatics 8. Cited by: Table 3.
- Glove: global vectors for word representation. In Proc. of EMNLP, Cited by: §3.2.
- Distant supervision for relation extraction beyond the sentence boundary. In Proc. of EACL, Cited by: §1, §3.1, §4.
- Modeling relations and their mentions without labeled text. In Proc. of ECML/PKDD, Cited by: §1.
- Inter-sentence relation extraction with document-level graph convolutional neural network. In Proc. of ACL, Cited by: §1, §2.1.2, §2.2.1, 2nd item, §3.5, Table 2, Table 3, §4, §4.
- Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45, pp. 2673–2681. Cited by: §2.1.1.
- Exploring graph-structured passage representation for multi-hop reading comprehension with graph neural networks. arXiv preprint arXiv:1809.02040. Cited by: §2.2.2, §4.
- Leveraging dependency forest for neural medical relation extraction. In Proc. of EMNLP-IJCNLP, Cited by: §1, §4.
A graph-to-sequence model for AMR-to-text generation. In Proc. of ACL, Cited by: §4.
- N-ary relation extraction using graph state lstm. In Proc. of EMNLP, Cited by: §4.
- Context-aware representations for knowledge base relation extraction. In Proc. of EMNLP, Cited by: 1st item.
- Multi-instance multi-label learning for relation extraction. In Proc. of EMNLP-CoNLL, Cited by: §1.
- Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In Proc. of ACL, Cited by: §2.2.2.
- Graph theory. Clarendon Press. Cited by: §1, §2.2.1.
- Attention is all you need. In Proc. of NeurIPS, Cited by: §3.6.1.
- Graph Attention Networks. In Proc. of ICLR, Cited by: 2nd item, §3.3, Table 2.
- Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proc. of NAACL-HLT, Cited by: §1, §3.1, §3.5, Table 3, §4.
- Fine-tune bert for docred with two-step process. arXiv preprint arXiv:1909.11898. Cited by: 3rd item, §3.2, §3.3, §3.4, Table 2.
- Relation classification via multi-level attention cnns. In Proc. of ACL, Cited by: §4.
- Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics 6, pp. 287–302. Cited by: §1, §4.
RENET: a deep learning approach for extracting gene-disease associations from literature. In Proc. of RECOMB, Cited by: §3.1, §4.
- Semantic relation classification via convolutional neural networks with simple negative sampling. In Proc. of EMNLP, Cited by: §1, §2.1.2, §4.
Classifying relations via long short term memory networks along shortest dependency paths. In Proc. of EMNLP, Cited by: §1, §2.1.2, §4.
- Aligning cross-lingual entities with multi-aspect information. In Proc. of EMNLP, Cited by: §2.2.2.
- DocRED: a large-scale document-level relation extraction dataset. In Proc. of ACL, Cited by: §1, §2.3, §3.1, §3.2, §3.2, §3.8, Table 2, §4.
- Relation classification via convolutional deep neural network. In Proc. of COLING, Cited by: §1, 1st item, §4.
- Graph convolution over pruned dependency trees improves relation extraction. In Proc. of EMNLP, Cited by: §3.1, §4.
- Position-aware attention and supervised data improve slot filling. In Proc. of EMNLP, Cited by: §4.
- An effective neural model extracting document level chemical-induced disease relations from biomedical literature. Journal of biomedical informatics 83, pp. 1–9. Cited by: Table 3.
- Attention-based bidirectional long short-term memory networks for relation classification. In Proc. of ACL, Cited by: §4.