CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases

10/27/2016 ∙ by Xiang Ren, et al. ∙ University of Illinois at Urbana-Champaign Rensselaer Polytechnic Institute 0

Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25 next best method.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The extraction of entities and their relations is critical to understanding massive text corpora. Identifying the token spans in text that constitute entity mentions and assigning types (e.g., person, company) to these spans as well as to the relations between entity mentions (e.g., employed_by) are key to structuring content from text corpora for further analytics. For example, when an extraction system finds a produce" relation between company" and product" entities in news articles, it supports answering questions like “what products does company X produce?". Once extracted, such structured information is used in many ways, e.g., as primitives in information extraction, knowledge base population [10, 52], and question-answering systems [48, 3]. Traditional systems for relation extraction [2, 9, 17] partition the process into several subtasks and solve them incrementally (i.e., detecting entities from text, labeling their types and then extracting their relations). Such systems treat the subtasks independently and so may propagate errors across subtasks in the process. Recent studies [24, 32, 44] focus on joint extraction methods to capture the inhereent linguistic dependencies between relations and entity arguments (e.g., the types of entity arguments help determine their relation type, and vice versa) to resolve error propagation.

Figure 1: Current systems find relations (Barack Obama, United States) mentioned in sentences S1-S3 and assign the same relation types (entity types) to all relation mentions (entity mentions), when only some types are correct for context (highlighted in blue font).

A major challenge in joint extraction of typed entities and relations is to design domain-independent systems that will apply to text corpora from different domains in the absence of human-annotated, domain data. The process of manually labeling a training set with a large number of entity and relation types is too expensive and error-prone. The rapid emergence of large, domain-specific text corpora (e.g., news, scientific publications, social media content) calls for methods that can jointly extract entities and relations of target types with minimal or no human supervision.

Towards this goal, there are broadly two kinds of efforts: weak supervision and distant supervision. Weak supervision [6, 36, 13] relies on a small set of manually-specified seed instances (or patterns) that are applied in bootstrapping learning to identify more instances of each type. This assumes seeds are unambiguous and sufficiently frequent in the corpus, which requires careful seed selection by human [2]. Distant supervision [31, 43, 21, 49] generates training data automatically by aligning texts and a knowledge base (KB) (see Fig. 1). The typical workflow is: (1) detect entity mentions in text; (2) map detected entity mentions to entities in KB; (3) assign, to the candidate type set of each entity mention, all KB types of its KB-mapped entity; (4) assign, to the candidate type set of each entity mention pair, all KB relation types between their KB-mapped entities. The automatically labeled training corpus is then used to infer types of the remaining candidate entity mentions and relation mentions (i.e., unlinkable candidate mentions).

Dataset NYT [43] Wiki-KBP [12], BioInfer [39]
# of entity types 47 126 2,200
noisy entity mentions (%) 20.32 28.31 59.80
# of relation types 24 19 94
noisy relation mentions (%) 15.54 8.54 41.12
Table 1: A study of type label noise. (1): %entity mentions with multiple sibling entity types (e.g., actor, singer) in the given entity type hierarchy; (2): %relation mentions with multiple relation types, for the three experiment datasets.

In this paper, we study the problem of joint extraction of typed entities and relations with distant supervision. Given a domain-specific corpus and a set of target entity and relation types from a KB, we aim to detect relation mentions (together with their entity arguments) from text, and categorize each in context by target types or Not-Target-Type (None), with distant supervision. Current distant supervision methods focus on solving the subtasks separately (e.g., extracting typed entities or relations), and encounter the following limitations when handling the joint extraction task.

Domain Restriction: They rely on pre-trained named entity recognizers (or noun phrase chunker) to detect entity mentions. These tools are usually designed for a few general types (e.g., person, location, organization) and require additional human labors to work on specific domains (e.g., scientific publications).

Error Propagation: In current extraction pipelines, incorrect entity types generated in entity recognition and typing step serve as features in the relation extraction step (i.e., errors are propagated from upstream components to downstream ones). Cross-task dependencies are ignored in most existing methods.

Label Noise: In distant supervision, the context-agnostic mapping from relation (entity) mentions to KB relations (entities) may bring false positive type labels (i.e., label noise) into the automatically labeled training corpora and results in inaccurate models.

In Fig. 1, for example, all KB relations between entities Barack Obama and United States (e.g., born_in, president_of) are assigned to the relation mention in sentence (while only born_in is correct within the context). Similarly, all KB types for Barack Obama (e.g., politician, artist) are assigned to the mention “Obama" in (while only person

is true). Label noise becomes an impediment to learn effective type classifiers. The larger the target type set, the more severe the degree of label noise (see Table 


We approach the joint extraction task as follows: (1) Design a domain-agnostic text segmentation algorithm to detect candidate entity mentions with distant supervision and minimal linguistic assumption (i.e., assuming part-of-speech (POS) tagged corpus is given [22]). (2) Model the mutual constraints between the types of the relation mentions and the types of their entity arguments, to enable feedbacks between the two subtasks. (3) Model the true type labels in a candidate type set as latent variables and require only the “best" type (progressively estimated as we learn the model) to be relevant to the mention—this is a less limiting requirement compared with existing multi-label classifiers that assume “every" candidate type is relevant to the mention.

To integrate these elements of our approach, a novel framework, CoType, is proposed. It first runs POS-constrained text segmentation using positive examples from KB to mine quality entity mentions, and forms candidate relation mentions (Sec. 3.1). Then CoType performs entity linking to map candidate relation (entity) mentions to KB relations (entities) and obtain the KB types. We formulate a global objective to jointly model (1) corpus-level co-occurrences between linkable

relation (entity) mentions and text features extracted from their local contexts; (2) associations between mentions and their KB-mapped type labels; and (3) interactions between relation mentions and their entity arguments. In particular, we design a novel partial-label loss to model the noisy mention-label associations in a robust way, and adopt translation-based objective to capture the entity-relation interactions. Minimizing the objective yields two low-dimensional spaces (for entity and relation mentions, respectively), where, in each space, objects whose types are semantically close also have similar representation (see Sec. 

3.2). With the learned embeddings, we can efficiently estimate the types for the remaining unlinkable relation mentions and their entity arguments (see Sec. 3.3).

The major contributions of this paper are as follows:

  1. [leftmargin=12pt]

  2. A novel distant-supervision framework, CoType, is proposed to extract typed entities and relations in domain-specific corpora with minimal linguistic assumption. (Fig. 2.)

  3. A domain-agnostic text segmentation algorithm is developed to detect entity mentions using distant supervision. (Sec. 3.1)

  4. A joint embedding objective is formulated that models mention-type association, mention-feature co-occurrence, entity-relation cross-constraints in a noise-robust way. (Sec. 3.2)

  5. Experiments with three public datasets demonstrate that CoType improves the performance of state-of-the-art systems of entity typing and relation extraction significantly, demonstrating robust domain-independence.(Sec. 4)

Figure 2: Framework Overview of CoType.

2 Background and Problem

The input to our proposed CoType framework is a POS-tagged text corpus , a knowledge bases (e.g., Freebase [4]), a target entity type hierarchy and a target relation type set . The target type set (set ) covers a subset of entity (relation) types that the users are interested in from , i.e., and .

Entity and Relation Mention. An entity mention (denoted by ) is a token span in text which represents an entity . A relation instance denotes some type of relation between multiple entities. In this work, we focus on binary relations, i.e., . We define a relation mention (denoted by ) for some relation instance

as a (ordered) pair of entities mentions of

and in a sentence , and represent a relation mention with entity mentions and in sentence as .

Knowledge Bases and Target Types. A KB with a set of entities contains human-curated facts on both relation instances , and entity-type facts . Target entity type hierarchy is a tree where nodes represent entity types of interests from the set . An entity mention may have multiple types, which together constitute one type-path (not required to end at a leaf) in the given type hierarchy. In existing studies, several entity type hierarchies are manually constructed using Freebase [23, 15] or WordNet [55]. Target relation type set is a set of relation types of interests from the set .

Automatically Labeled Training Data. Let denote the set of entity mentions extracted from corpus . Distant supervision maps to KB entities with an entity disambiguation system [29, 20] and heuristically assign type labels to the mapped mentions. In practice, only a small number of entity mentions in set can be mapped to entities in (i.e., linkable entity mentions, denoted by ). As reported in [41, 25], the ratios of over are usually lower than 50% in domain-specific corpora.

Between any two linkable entity mentions and in a sentence, a relation mention is formed if there exists one or more KB relations between their KB-mapped entities and . Relations between and in KB are then associated to to form its candidate relation type set , i.e., . In a similar way, types of and in KB are associated with and respectively, to form their candidate entity type sets and , where . Let denote the set of extracted relation mentions that can be mapped to KB. Formally, we represent the automatically labeled training corpus for the joint extraction task, denoted as , using a set of tuples .

Problem Description. By pairing up entity mentions (from set ) within each sentence in , we generate a set of candidate relation mentions, denoted as . Set consists of (1) linkable relation mentions , (2) unlinkable (true) relation mentions, and (3) false relation mention (i.e., no target relation expressed between).

Let denote the set of unlabeled relation mentions in (2) and (3) (i.e., ). Our main task is to determine the relation type label (from the set ) for each relation mention in set , and the entity type labels (either a single type-path in or None) for each entity mention argument in , using the automatically labeled corpus . Formally, we define the joint extraction of typed entities and relations task as follows.

Definition 1 (Problem Definition)

Given a POS-tagged corpus , a KB , a target entity type hierarchy and a target relation type set , the joint extraction task aims to (1) detect entity mentions from ; (2) generate training data with KB ; and (3) estimate a relation type {None} for each test relation mention and a single type-path (or None) for each entity mention in , using and its context .

Non-goals. This work relies on an entity linking system [29] to provide disambiguation function, but we do not address their limits here (e.g., label noise introduced by wrongly mapped KB entities). We also assume human-curated target type hierarchies are given (It is out of the scope of this study to generate the type hierarchy).

3 The CoType Framework

This section lays out the proposed framework. The joint extraction task poses two unique challenges. First, type association in distant supervision between linkable entity (relation) mentions and their KB-mapped entities (relations) is context-agnostic—the candidate type sets

contain “false" types. Supervised learning 

[17, 16] may generate models biased to the incorrect type labels [42]. Second, there exists dependencies between relation mentions and their entity arguments (e.g., type correlation). Current systems formulates the task as a cascading supervised learning problem and may suffer from error propagation.

Our solution casts the type prediction task as weakly-supervised learning (to model the relatedness between mentions and their candidate types in contexts) and uses relational learning to capture interactions between relation mentions and their entity mention argument jointly, based on the redundant text signals in a large corpus.

Specifically, CoType leverages partial-label learning [37] to faithfully model mention-type association using text features extracted from mentions’ local contexts. It uses the translation embedding-based objective [5] to model the mutual type dependencies between relation mentions and their entity (mention) arguments.

Framework Overview. We propose a embedding-based framework with distant supervision (see also Fig. 2) as follows:

  1. [leftmargin=12pt]

  2. Run POS-constrained text segmentation algorithm on POS-tagged corpus using positive examples obtained from KB, to detect candidate entity mentions (Sec. 3.1).

  3. Generate candidate relation mentions from , extract text features for each relation mention and their entity mention argument (Sec. 3.1). Apply distant supervision to generate labeled training data (Sec. 2).

  4. Jointly embed relation and entity mentions, text features, and type labels into two low-dimensional spaces (for entities and relations, respectively) where, in each space, close objects tend to share the same types (Sec. 3.2).

  5. Estimate type labels for each test relation mention and type-path for each test entity mention in from learned embeddings, by searching the target type set or the target type hierarchy (Sec. 3.3).

3.1 Candidate Generation

Entity Mention Detection. Traditional entity recognition systems
 [14, 34] rely on a set of linguistic features (e.g., dependency parse structures of a sentence) to train sequence labeling models (for a few common entity types). However, sequence labeling models trained on automatically labeled corpus may not be effective, as distant supervision only annotates a small number of entity mentions in (thus generates a lot of “false negative" token tags). To address domain restriction, we develop a distantly-supervised text segmentation algorithm for domain-agnostic entity detection. By using quality examples from KB as guidance, it partitions sentences into segments of entity mentions and words, by incorporating (1) corpus-level concordance statistics; (2) sentence-level lexical signals; and (3) grammatical constraints (i.e., POS tag patterns).

We extend the methdology used in [27, 11] to model the segment quality (i.e., “how likely a candidate segment is an entity mention") as a combination of phrase quality and POS pattern quality, and use positive examples in to estimate the segment quality. The workflow is as follows: (1) mine frequent contiguous patterns for both word sequence and POS tag sequence up to a fixed length from POS-tagged corpus

; (2) extract features including corpus-level concordance and sentence-level lexical signals to train two random forest classifiers 

[27], for estimating quality of candidate phrase and candidate POS pattern; (3) find the best segmentation of using the estimated segment quality scores (see Eq. (1)); and (4) compute rectified features using the segmented corpus and repeat steps (2)-(4) until the result converges.


Specifically, we find the best segmentation for each document (in ) by maximizing the “joint segmentation quality", defined as , where

denote the probability that segment

(with starting index and ending index in document ) is a good entity mention, as defined in Eq. (1). The first term in Eq. (1) is a segment length prior, the second term measures how likely segment is generated given a length (to be estimated), and the third term denotes the segment quality. In this work, we define function as the equally weighted combination of the phrase quality score and POS pattern quality score for candidate segment , which is estimated in step (2). The joint probability can be efficiently maximize using Viterbi Training with time complexity linear to the corpus size [27]. The segmentation result provides us a set of candidate entity mentions, forming the set .

Table 2 compares our entity detection module with a sequence labeling model [26] (linear-chain CRF) trained on the labeled corpus in terms of F1 score. Fig. 3 show the high/low quality POS patterns learned using entity names found in as examples.

Relation Mention Generation. We follow the procedure introduced in Sec. 2 to generate the set of candidate relation mentions from the detected candidate entity mentions : for each pair of entity mentions found in sentence , we form two candidate relation mentions and . Distant supervision is then applied on to generate the set of KB-mapped relation mentions . Similar to  [31, 21], we sample 30% unlinkable relation mentions between two KB-mapped entity mentions (from set ) in a sentence as examples for modeling None relation label, and sample 30% unlinkable entity mentions (from set ) to model None entity label. These negative examples, together with type labels for mentions in , form the automatically labeled data for the task.

Figure 3: Example POS tag patterns learned using KB examples.
Dataset NYT Wiki-KBP BioInfer
FIGER segmenter [26] 0.751 0.814 0.652
Our Approach 0.837 0.833 0.785
Table 2: Comparison of F1 scores on entity mention detection.

Text Feature Extraction. To capture the shallow syntax and distributional semantics of a relation (or entity) mention, we extract various lexical features from both mention itself (e.g., head token) and its context (e.g., bigram), in the POS-tagged corpus. Table 3 lists the set of text features for relation mention, which is similar to those used in [31, 7] (excluding the dependency parse-based features and entity type features). We use the same set of features for entity mentions as those used in [42, 26]. We denote the set of () unique features extracted of relation mentions (entity mentions in ) as (and ).

Feature Description Example
Entity mention (EM) head Syntactic head token of each entity mention HEAD_EM1_Obama"
Entity Mention Token Tokens in each entity mention TKN_EM1_Barack"
Tokens between two EMs Each token between two EMs was", “elected", “President", “of", “the"
Part-of-speech (POS) tag POS tags of tokens between two EMs VBD", “VBN", “NNP", “IN", “DT"
Collocations Bigrams in left/right 3-word window of each EM Honolulu native", “native Barack", …
Entity mention order Whether EM 1 is before EM 2 EM1_BEFORE_EM2"
Entity mention distance Number of tokens between the two EMs EM_DISTANCE_5"
Entity mention context Unigrams before and after each EM native", “was", “the", “in"
Special pattern Occurrence of pattern “em1_in_em2" PATTERN_NULL"
Brown cluster (learned on ) Brown cluster ID for each token 8_1101111", “12_111011111111"
Table 3: Text features for relation mentions used in this work [17, 43] (excluding dependency parse-based features and entity type features). (“Barack Obama”, “United States”) is used as an example relation mention from the sentence “Honolulu native Barack Obama was elected President of the United States on March 20 in 2008.".

3.2 Joint Entity and Relation Embedding

This section formulates a joint optimization problem for embedding different kinds of interactions between linkable relation mentions , linkable entity mentions , entity and relation type labels {} and text features {} into a -dimensional

relation vector space

and a -dimensional entity vector space. In each space, objects whose types are close to each other should have similar representation (e.g., see the 3rd col. in Fig. 2).

As the extracted objects and the interactions between them form a heterogeneous graph (see the 2nd col. in Fig. 2), a simple solution is to embed the whole graph into a single low-dimensional space [19, 41]. However, such a solution encounters several problems: (1) False types in candidate type sets (i.e., false mention-type links in the graph) negatively impact the ability of the model to determine mention’s true types; and (2) a single embedding space cannot capture the differences in entity and relation types (i.e., strong link between a relation mention and its entity mention argument does not imply that they have similar types).

In our solution, we propose a novel global objective, which extends a margin-based rank loss [37] to model noisy mention-type associations and leverages the second-order proximity idea [50] to model corpus-level mention-feature co-occurrences. In particular, to capture the entity-relation interactions, we adopt a translation-based embedding loss [5] to bridge the vector spaces of entity mentions and relation mentions.

Modeling Types of Relation Mentions. We consider both mention-feature co-occurrences and mention-type associations in the modeling of relation types for relation mentions in set .

Intuitively, two relation mentions sharing many text features (i.e., with similar distribution over the set of text features ) likely have similar relation types; and text features co-occurring with many relation mentions in the corpus tend to represent close type semantics. We propose the following hypothesis to guide our modeling of corpus-level mention-feature co-occurrences.

Hypothesis 1 (Mention-Feature Co-occurrence)

Two entity mentions tend to share similar types (close to each other in the embedding space) if they share many text features in the corpus, and the converse way also holds.

For example, in column 2 of Fig. 2, (“Barack Obama", “US", ) and (“Barack Obama", “United States", ) share multiple features including context word “president" and first entity mention argument “Barack Obama", and thus they are likely of the same relation type (i.e., president_of).

Formally, let vectors represent relation mention and text feature in the -dimensional relation embedding space. Similar to the distributional hypothesis [30] in text corpora, we apply second-order proximity [50] to model the idea that objects with similar distribution over neighbors are similar to each other as follows.


where denotes the probability of generated by , and is the co-occurrence frequency between in corpus . Function in Eq. (2) enforces the conditional probability specified by embeddings, i.e., to be close to the empirical distribution.

To perform efficient optimization by avoiding summation over all features, we adopt negative sampling strategy [30] to sample multiple false features for each , according to some noise distribution  [30] (with denotes the number of relation mentions co-occurring with ). Term in Eq. (2) is replaced with the term as follows.



is the sigmoid function. The first term in Eq. (

3) models the observed co-occurrence, and the second term models the negative feature samples.

In , each relation mention is heuristically associated with a set of candidate types . Existing embedding methods rely on either the local consistent assumption [19] (i.e., objects strongly connected tend to be similar) or the distributional assumption [30] (i.e., objects sharing similar neighbors tend to be similar) to model object associations. However, some associations between and are “false" associations and adopting the above assumptions may incorrectly yield mentions of different types having similar vector representations. For example, in Fig. 1, mentions (“Obama”, “USA”, ) and (“Obama”, “US”, ) have several candidate types in common (thus high distributional similarity), but their true types are different (i.e., born_in vs. travel_to).

We specify the likelihood of “whether the association between a relation mention and its candidate entity type being true" as the relevance between these two kinds of objects (measured by the similarity between their current estimated embedding vectors). To impose such idea, we model the associations between each linkable relation mention (in set ) and its noisy candidate relation type set based on the following hypothesis.

Hypothesis 2 (Partial-Label Association)

A relation mention’s embedding vector should be more similar (closer in the low-dimensional space) to its “most relevant" candidate type, than to any other non-candidate type.

Specifically, we use vector to represent relation type in the embedding space. The similarity between is defined as the dot product of their embedding vectors, i.e., . We extend the margin-based loss in [37] and define a partial-label loss for each relation mention as follows.


The intuition behind Eq. (4) is that: for relation mention , the maximum similarity score associated with its candidate type set should be greater than the maximum similarity score associated with any other non-candidate types . Minimizing forces to be embedded closer to the mostrelevant" type in , than to any other non-candidate types in . This contrasts sharply with multi-label learning [26], where is embedded closer to every candidate type than any other non-candidate type.

To faithfully model the types of relation mentions, we integrate the modeling of mention-feature co-occurrences and mention-type associations by the following objective.


where tuning parameter on the regularization terms is used to control the scale of the embedding vectors.

By doing so, text features, as complements to mention’s candidate types, also participate in modeling the relation mention embeddings, and help identify a mention’s most relevant type—mention-type relevance is progressively estimated during model learning. For example, in the left column of Fig. 4, context words “president"helps infer that relation type president_of is more relevant (i.e., higher similarity between the embedding vectors) to relation mention (“Mr. Obama", “USA", ), than type born_in does.

Figure 4: Illustrations of the partial-label associations, Hypothesis 2 (the left col.), and the entity-relation interactions, Hypothesis 3 (the right col.).

Modeling Types of Entity Mentions. In a way similar to the modeling of types for relation mentions, we follow Hypotheses 1 and 2 to model types of entity mentions. In Fig. 2 (col. 2), for example, entity mentions “_Barack Obama" and “_Barack Obama" share multiple text features in the corpus, including head token “Obama" and context word “president", and thus tend to share the same entity types like politician and person (i.e., Hypothesis 1). Meanwhile, entity mentions “_Barack Obama" and “_Obama" have the same candidate entity types but share very few text features in common. This implies that likely their true type labels are different. Relevance between entity mentions and their true type labels should be progressively estimated based on the text features extracted from their local contexts (i.e., Hypothesis 2).

Formally, let vectors represent entity mention , text features (for entity mentions) , and entity type in a -dimensional entity embedding space, respectively. We model the corpus-level co-occurrences between entity mentions and text features by second-order proximity as follows.


where the conditional probability term is defined as . By integrating the term with partial-label loss for unique linkable entity mentions (in set ), we define the objective function for modeling types of entity mentions as follows.


Minimizing the objective yields an entity embedding space where, in that space, objects (e.g., entity mentions, text features) close to each other will have similar types.

Modeling Entity-Relation Interactions. In reality, there exists different kinds of interactions between a relation mention and its entity mention arguments and . One major kind of interactions is the correlation between relation and entity types of these objects—entity types of the two entity mentions provide good hints for determining the relation type of the relation mention, and vice versa. For example, in Fig. 4 (right column), knowing that entity mention “_US" is of type location (instead of organization) helps determine that relation mention (“Obama", “US", ) is more likely of relation type travel_to, rather than relation types like president_of or citizen_of.

Intuitively, entity types of the entity mention arguments pose constraints on the search space for the relation types of the relation mention (e.g., it is unlikely to find a author_of relation between a organization entity and a location entity). The proposed Hypotheses 1 and 2 model types of relation mentions and entity mentions by learning an entity embedding space and a relation embedding space, respectively. The correlations between entity and relation types (and their embedding spaces) motivates us to model entity-relation interactions based on the following hypothesis.

Hypothesis 3 (Entity-Relation Interaction)

For a relation mention , embedding vector of should be a nearest neighbor of the embedding vector of plus the embedding vector of relation mention .

Given the embedding vectors of any two members in , say and , Hypothesis 3 forces the “”. This helps regularize the learning of vector (which represents the type semantics of entity mention ) in addition to the information encoded by objective in Eq. (7). Such a “translating operation" between embedding vectors in a low-dimensional space has been proven effective in embedding entities and relations in a structured knowledge baes [5]. We extend this idea to model the type correlations (and mutual constraints) between embedding vectors of entity mentions and embedding vectors of relation mentions, which are modeled in two different low-dimensional spaces.

Specifically, we define error function for the triple of a relation mention and its two entity mention arguments using -2 norm: . A small value on indicates that the embedding vectors of do capture the type constraints. To enforce small errors between linkable relation mentions (in set ) and their entity mention arguments, we use margin-based loss [5] to formulate a objective function as follows.


where are negative samples for , i.e., is randomly sampled from the negative sample set with and  [5]. The intuition behind Eq. (8) is simple (see also the right col. in Fig. 4): embedding vectors for a relation mention and its entity mentions are modeled in the way that, the translating error between them should be smaller than the translating error of any negative sample.

A Joint Optimization Problem. Our goal is to embed all the available information for relation and entity mentions, relation and entity type labels, and text features into a -dimensional entity space and a -dimensional relation space, following the three proposed hypotheses. An intuitive solution is to collectively minimize the three objectives and , as the embedding vectors of entity and relation mentions are shared across them. To achieve the goal, we formulate a joint optimization problem as follows.


Optimizing the global objective in Eq. (9) enables the learning of entity and relation embeddings to be mutually influenced, such that, errors in each component can be constrained and corrected by the other. The joint embedding learning also helps the algorithm to find the true types for each mention, besides using text features.

In Eq. (9), one can also minimize the weighted combination of the three objectives to model the importance of different signals, where weights could be manually determined or automatically learned from data. We leave this as future work.

Input: labeled training corpus , text features , regularization parameter , learning rate , number of negative samples , dim.
Output: relation/entity mention embeddings /, feature embeddings , relation/entity type embedding /
1 Initialize: vectors ,,,,, as random vectors while  in Eq. (9) not converge do
2        for objective in  do
3               Sample a mention-feature co-occurrence ; draw negative samples; update {, } based on , or {, } based on Sample a mention (or ); get its candidate types (or ); draw negative samples; update and based on , or and based on
4        end for
5       Sample a relation mention ; draw negative samples; update based on
6 end while
Algorithm 1 Model Learning of CoType

3.3 Model Learning and Type Inference

The joint optimization problem in Eq. (9) can be solved in multiple ways. One solution is to first learn entity mention embeddings by minimizing , then apply the learned embeddings to optimize . However, such a solution does not fully exploit the entity-relation interactions in providing mutual feedbacks between the learning of entity mention embeddings and the learning of relation mention embeddings (see CoType-TwoStep in Sec. 4).

We design a stochastic sub-gradient descent algorithm [46] based on edge sampling strategy [50], to efficiently solve Eq. (9). In each iteration, we alternatively sample from each of the three objectives a batch of edges (e.g., ) and their negative samples, and update each embedding vector based on the derivatives. Algorithm 1 summarizes the model learning process of CoType. The proof procedure in [46] can be adopted to prove convergence of the proposed algorithm (to the local minimum).

Type Inference. With the learned embeddings of features and types in relation space (i.e., , ) and entity space (i.e., , ), we can perform nearest neighbor search in the target relation type set , or a top-down search on the target entity type hierarchy , to estimate the relation type (or the entity type-path) for each (unlinkable) test relation mention (test entity mention ). Specifically, on the entity type hierarchy, we start from the tree’s root and recursively find the best type among the children types by measuring the cosine similarity between entity type embedding and the vector representation of in our learned entity embedding space. By extracting text features from ’s local context (denoted by set ), we represent in the learned entity embedding space using the vector . Similarly, for test relation mention , we represent it in our learned relation embedding space by where is the set of text features extracted from ’s local context . The search process stops when we reach to a leaf type on the type hierarchy, or the similarity score is below a pre-defined threshold . If the search process returns an empty type-path (or type set), we output the predicted type label as None for the mention.

Computational Complexity Analysis. Let be the total number of objects in CoType (entity and relation mentions, text features and type labels). By alias table method [50], setting up alias tables takes time for all the objects, and sampling a negative example takes constant time. In each iteration of Algorithm 1, optimization with negative sampling (i.e., optimizing second-order proximity and translating objective) takes , and optimization with partial-label loss takes time. Similar to [50], we find the number of iterations for Algorithm 1 to converge is usually proportional to the number of object interactions extracted from (e.g., unique mention-feature pairs and mention-type associations), denoted as . Therefore, the overall time complexity of CoType is (as ), which is linear to the total number of object interactions in the corpus.

4 Experiments

Data sets NYT Wiki-KBP BioInfer
Relation/entity types 24 / 47 19 / 126 94 / 2,200
Documents (in ) 294,977 780,549 101,530
Sentences (in ) 1.18M 1.51M 521k
Training RMs (in ) 353k 148k 28k
Training EMs (in ) 701k 247k 53k
Text features (from ) 2.6M 1.3M 575k
Test Sentences (from ) 395 448 708
Ground-truth RMs 3,880 2,948 3,859
Ground-truth EMs 1,361 1,285 2,389
Table 4: Statistics of the datasets in our experiments.

4.1 Data Preparation and Experiment Setting

Our experiments use three public datasets111Codes and datasets used in this paper can be downloaded at: from different domains. (1) NYT [43]: The training corpus consists of 1.18M sentences sampled from 294k 1987-2007 New York Times news articles. 395 sentences are manually annotated by authors of [21] to form the test data; (2) Wiki-KBP [26]: It uses 1.5M sentences sampled from 780k Wikipedia articles [26] as training corpus and 14k manually annotated sentences from 2013 KBP slot filling assessment results [12] as test data. (3) BioInfer [39]: It consists of 1,530 manually annotated biomedical paper abstracts as test data and 100k sampled PubMed paper abstracts as training corpus. Statistics of the datasets are shown in Table 4.

Automatically Labeled Training Corpora. The NYT training corpus has been heuristically labeled using distant supervision following the procedure in [43]. For Wiki-KBP and BioInfer training corpora, we utilized DBpedia Spotlight222, a state-of-the-art entity disambiguation tool, to map the detected entity mentions to Freebase entities. We then followed the procedure introduced in Secs. 2 and 3.1 to obtain candidate entity and relation types, and constructed the training data . For target types, we discard the relation/entity types which cannot be mapped to Freebase from the test data while keeping the Freebase entity/relation types (not found in test data) in the training data (see Table 4 for the type statistics).

Feature Generation. Table 3 lists the set of text features of relation mentions used in our experiments. We followed [26] to generate text features for entity mentions. Dependency parse-based features were excluded as only POS-tagged corpus is given as input. We used a 6-word window to extract context features for each mention (3 words on the left and the right). We applied the Stanford CoreNLP tool [28] to get POS tags. Brown clusters were derived for each corpus using public implementation333 The same kinds of features were used in all the compared methods in our experiments.

Evaluation Sets. For all three datasets, we used the provided training/test set partitions of the corpora. In each dataset, relation mentions in sentences are manually annotated with their relation types and the entity mention arguments are labeled with entity type-paths (see Table 4 for the statistics of test data). We further created a validation set by randomly sampling 10% mentions from each test set and used the remaining part to form the evaluation set.

Compared Methods. We compared CoType with its variants which model parts of the proposed hypotheses. Several state-of-the-art relation extraction methods (e.g.

, supervised, embedding, neural network) were also implemented (or tested using their published codes): (1)

DS+Perceptron [26]: adopts multi-label learning on automatically labeled training data . (2) DS+Kernel [33]: applies bag-of-feature kernel [33] to train a SVM classifier using ; (3) DS+Logistic [31]: trains a multi-class logistic classifier444We use liblinear package from on ; (4) DeepWalk [38]: embeds mention-feature co-occurrences and mention-type associations as a homogeneous network (with binary edges); (5) LINE [50]: uses second-order proximity model with edge sampling on a feature-type bipartite graph (where edge weight is the number of relation mentions having feature and type ); (6) MultiR [21]: is a state-of-the-art distant supervision method, which models noisy label in by multi-instance multi-label learning; (7) FCM [16]: adopts neural language model to perform compositional embedding; (8) DS-Joint [24]

: jointly extract entity and relation mentions using structured perceptron on human-annotated sentences. We used

to train the model.

For CoType, besides the proposed model, CoType, we compare (1) CoType-RM: This variant only optimize objective to learning feature and type embeddings for relation mentions; and (2) CoType-TwoStep: It first optimizes , then use the learned entity mention embedding to initialize the minimization of —it represents a “pipeline" extraction diagram.

To test the performance on entity recognition and typing, we also compare with several entity recognition systems, including a supervised method HYENA [55], distant supervision methods (FIGER [26], Google [15], WSABIE [54]), and a noise-robust approach PLE [42].

NYT Wiki-KBP BioInfer
Method S-F1 Ma-F1 Mi-F1 S-F1 Ma-F1 Mi-F1 S-F1 Ma-F1 Mi-F1
FIGER [26] 0.40 0.51 0.46 0.29 0.56 0.54 0.69 0.71 0.71
Google [15] 0.38 0.57 0.52 0.30 0.50 0.38 0.69 0.72 0.65
HYENA [55] 0.44 0.49 0.50 0.26 0.43 0.39 0.52 0.54 0.56
DeepWalk[38] 0.49 0.54 0.53 0.21 0.42 0.39 0.58 0.59 0.61
WSABIE[54] 0.53 0.57 0.58 0.35 0.55 0.50 0.64 0.66 0.65
PLE [42] 0.56 0.60 0.61 0.37 0.57 0.53 0.70 0.71 0.72
CoType 0.60 0.65 0.66 0.39 0.61 0.57 0.74 0.76 0.75
Table 5: Performance comparison of entity recognition and typing (using strict, micro and macro metrics [26]) on the three datasets.

Parameter Settings. In our testing of CoType and its variants, we set , and based on the analysis on validation sets. For convergence criterion, we stopped the loop in Algorithm 1 if the relative change of in Eq. (9) is smaller than . For fair comparison, the dimensionality of embeddings was set to and the number of negative samples was set to for all embedding methods, as used in [50]. For other tuning parameters in the compared methods, we tuned them on validation sets and picked the values which lead to the best performance.

Method  NYT Wiki-KBP BioInfer
DS+Perceptron [26] 0.641 0.543 0.470
DS+Kernel [33] 0.632 0.535 0.419
DeepWalk [38] 0.580 0.613 0.408
LINE [50] 0.765 0.617 0.557
DS+Logistic [31] 0.771 0.646 0.543
MultiR [21] 0.693 0.633 0.501
FCM [16] 0.688 0.617 0.467
CoType-RM 0.812 0.634 0.587
CoType-TwoStep 0.829 0.645 0.591
CoType 0.851 0.669 0.617
Table 6: Performance comparison on relation classification accuracy over ground-truth relation mentions on the three datasets.

Evaluation Metrics. For entity recognition and typing, we to use strict, micro, and macro F1 scores, as used in [26], for evaluating both detected entity mention boundaries and predicted entity types. We consider two settings in evaluation of relation extraction. For relation classification, ground-truth relation mentions are given and None label is excluded. We focus on testing type classification accuracy. For relation extraction, we adopt standard Precision (P), Recall (R) and F1 score [33, 2]. Note that all our evaluations are sentence-level (i.e., context-dependent), as discussed in [21].

4.2 Experiments and Performance Study

1. Performance on Entity Recognition and Typing. Among the compared methods, only FIGER [26] can detect entity mention. We apply our detection results (i.e., ) as input for other methods. Table 5 summarizes the comparison results on the three datasets. Overall, CoType outperforms others on all metrics on all three datasets (e.g., it obtains a 8% improvement on Micro-F1 over the next best method on NYT dataset). Such performance gains mainly come from (1) a more robust way of modeling noisy candidate types (as compared to supervised method and distant supervision methods which ignore label noise issue); and (2) the joint embedding of entity and relation mentions in a mutually enhancing way (vs. the noise-robust method PLE [42]). This demonstrates the effectiveness of enforcing Hypothesis 3 in CoType framework.

NYT [43, 21] Wiki-KBP [12, 26] BioInfer [39]
Method Prec Rec F1 Time Prec Rec F1 Time Prec Rec F1 Time
DS+Perceptron [26] 0.068 0.641 0.123 15min 0.233 0.457 0.308 7.7min 0.357 0.279 0.313 3.3min
DS+Kernel [33] 0.095 0.490 0.158 56hr 0.108 0.239 0.149 9.8hr 0.333 0.011 0.021 4.2hr
DS+Logistic [31] 0.258 0.393 0.311 25min 0.296 0.387 0.335 14min 0.572 0.255 0.353 7.4min
DeepWalk [38] 0.176 0.224 0.197 1.1hr 0.101 0.296 0.150 27min 0.370 0.058 0.101 8.4min
LINE [50] 0.335 0.329 0.332 2.3min 0.360 0.257 0.299 1.5min 0.360 0.275 0.312 35sec
MultiR [21] 0.338 0.327 0.333 5.8min 0.325 0.278 0.301 4.1min 0.459 0.221 0.298 2.4min
FCM [16] 0.553 0.154 0.240 1.3hr 0.151 0.500 0.301 25min 0.535 0.168 0.255 9.7min
DS-Joint [24] 0.574 0.256 0.354 22hr 0.444 0.043 0.078 54hr 0.102 0.001 0.002 3.4hr
CoType-RM 0.467 0.380 0.419 2.6min 0.342 0.339 0.340 1.5min 0.482 0.406 0.440 42sec
CoType-TwoStep 0.368 0.446 0.404 9.6min 0.347 0.351 0.349 6.1min 0.502 0.405 0.448 3.1min
CoType 0.423 0.511 0.463 4.1min 0.348 0.406 0.369 2.5min 0.536 0.424 0.474 78sec
Table 7: Performance comparison on end-to-end relation extraction (at the highest F1 point) on the three datasets.

2. Performance on Relation Classification. To test the effectiveness of the learned embeddings in representing type semantics of relation mentions, we compare with other methods on classifying the ground-truth relation mention in the evaluation set by target types . Table 6 summarizes the classification accuracy. CoType achieves superior accuracy compared to all other methods and variants (e.g., obtains over 10% enhancement on both the NYT and BioInfer datasets over the next best method). All compared methods (except for MultiR) simply treat as “perfectly labeled” when training models. The improvement of CoType-RM validates the importance on careful modeling of label noise (i.e., Hypothesis 2). Comparing CoType-RM with MultiR, superior performance of CoType-RM demonstrates the effectiveness of partial-label loss over multi-instance learning. Finally, CoType outperforms CoType-RM and CoType-TwoStep validates that the propose translation-based embedding objective is effective in capturing entity-relation cross-constraints.

3. Performance on Relation Extraction. To test the domain independence of CoType framework, we conduct evaluations on the end-to-end relation extraction. As only MultiR and DS-Joint are able to detection entity and relation mentions in their own framework, we apply our detection results to other compared methods. Table 7 shows the evaluation results as well as runtime of different methods. In particular, results at each method’s highest F1 score point are reported, after tuning the threshold for each method for determining whether a test mention is None or some target type. Overall, CoType outperforms all other methods on F1 score on all three datasets. We observe that DS-Joint and MultiR suffer from low recall, since their entity detection modules do not work well on (where many tokens have false negative tags). This demonstrates the effectiveness of the proposed domain-agnostic text segmentation algorithm (see Sec. 3.1). We found that the incremental diagram of learning embedding (i.e., CoType-TwoStep) brings only marginal improvement. In contrast, CoType adopts a “joint modeling" diagram following Hypothesis 3 and achieves significant improvement. In Fig. 5, precision-recall curves on NYT and BioInfer datasets further show that CoType can still achieve descent precision with good recall preserved.

(a) NYT
(b) BioInfer
Figure 5: Precision-recall curves of relation extraction on NYT and BioInfer datasets. Similar trend is also observed on the Wiki-KBP dataset.

4. Scalability. In addition to the runtime shown in Table 7, Fig. 6(a) tests the scalability of CoType compared with other methods, by running on BioInfer corpora sampled using different ratios. CoType demonstrates a linear runtime trend (which validates our time complexity in Sec. 3.3), and is the only method that is capable of processing the full-size dataset without significant time cost.

(a) Scalability of CoType
(b) Effect of training set size
Figure 6: (a) Scalability study on CoType and the compared methods; and (b) Performance changes of relation extraction with respect to sampling ratio of relation mentions on the Bioinfer dataset.

4.3 Case Study

1. Example output on news articles. Table 8 shows the output of CoType, MultiR and Logistic on two news sentences from the Wiki-KBP dataset. CoType extracts more relation mentions (e.g., children), and predict entity/relation types with better accuracy. Also, CoType can jointly extract typed entity and relation mentions while other methods cannot (or need to do it incrementally).

2. Testing the effect of training corpus size. Fig. 6(b) shows the performance trend on Bioinfer dataset when varying the sampling ratio (subset of relation mentions randomly sampled from the training set). F1 scores of all three methods improves as the sampling ratio increases. CoType performs best in all cases, which demonstrates its robust performance across corpora of various size.

3. Study the effect of entity type error in relation classification. To investigate the “error propagation” issue of incremental pipeline, we test the changes of relation classification performance by (1) training models without entity types as features; (2) using entity types predicted by FIGER [26] as features; and (3) using ground-truth (“perfect”) entity types as features. Fig. 7 summarize the accuracy of CoType, its variants and the compared methods. We observe only marginal improvement when using FIGER-predicted types but significant improvement when using ground-truth entity types—this validates the error propagation issue. Moreover, we find that CoType achieves an accuracy close to that of the next best method (i.e., DS + Logistic + Gold entity type). This demonstrates the effectiveness of our proposed joint entity and relation embedding.

Figure 7: Study of entity type error propagation on the BioInfer dataset.

5 Related Work

Entity and Relation Extraction. There have been extensive studies on extracting typed entities and relations in text (i.e., context-dependent extraction). Most existing work follows an incremental diagram—they first perform entity recognition and typing [34, 40] to extract typed entity mentions, and then solve relation extraction [2, 17] to identify relation mentions of target types. Work along both lines can be categorized in terms of the degree of supervision. While supervised entity recognition systems [14, 34] focus on a few common entity types, weakly-supervised methods [18, 36] and distantly-supervised methods [41, 54, 26] use large text corpus and a small set of seeds (or a knowledge base) to induce patterns or to train models, and thus can apply to different domains without additional human annotation labor. For relation extraction, similarly, weak supervision [6, 13] and distant supervision [35, 53, 49, 21, 43, 31] approaches are proposed to address the domain restriction issue in traditional supervised systems [2, 33, 17]. However, such a “pipeline" diagram ignores the dependencies between different sub tasks and may suffer from error propagation between the tasks.

Recent studies try to integrate entity extraction with relation extraction by performing global sequence labeling for both entities and relations [24, 32, 1], incorporating type constraints between relations and their arguments [44], or modeling factor graphs [47]. However, these methods require human-annotated corpora (cleaned and general) for model training and rely on existing entity detectors to provide entity mentions. By contrast, the CoType framework runs domain-agnostic segmentation algorithm to mine entity mentions and adopts a label noise-robust objective to train models using distant supervision. In particular, [1] integrates entity classification with relation extraction using distant supervision but it ignores label noise issue in the automatically labeled training corpora.

CoType combines the best of two worlds—it leverages the noisy distant supervision in a robust way to address domain restriction (vs. existing joint extraction methods [24, 32]), and models entity-relation interactions jointly with other signals to resolve error propagation (vs. current distant supervision methods [49, 31]).

Learning Embeddings and Noisy Labels. Our proposed framework incorporates embedding techniques used in modeling words and phrases in large text corpora [30, 54, 45] ,and nodes and links in graphs/networks [50, 38]. Theses methods assume links are all correct (in unsupervised setting) or labels are all true (in supervised setting). CoType seeks to model the true links and labels in the embedding process (e.g., see our comparisons with LINE [50], DeepWalk [38] and FCM [16] in Sec. 4.2). Different from embedding structured KB entities and relations [5, 51], our task focuses on embedding entity and relation mentions in unstructured contexts.

In the context of modeling noisy labels, our work is related to partial-label learning [42, 8, 37] and multi-label multi-instance learning [49], which deals with the problem where each training instance is associated with a set of noisy candidate labels (whereonly one is correct). Unlike these formulations, our joint extraction problem deals with both classification with noisy labels and modeling of entity-relation interactions. In Sec 4.2, we compare our full-fledged model with its variants CoType-EM and CoType-RM to validate the Hypothesis on entity-relation interactions.

Text Blake Edwards, a prolific filmmaker who kept alive the tradition of slapstick comedy, died Wednesday of pneumonia at a hospital in Santa Monica. Anderson is survived by his wife Carol, sons Lee and Albert, daughter Shirley Englebrecht and nine grandchildren.
MultiR [21] person:country_of_birth, : {N/A}, : {N/A} None, : {N/A}, : {N/A}
Logistic [31] per:country_of_birth, : {person}, : {country} None, : {person},: {person, politician}
CoType : person:place_of_death, : {person,artist,director}, : {location, city} : person:children, : {person}, : {person}
Table 8: Example output of CoType and the compared methods on two news sentences from the Wiki-KBP dataset.

6 Conclusion

This paper studies domain-independent, joint extraction of typed entities and relations in text with distant supervision. The proposed CoType framework runs domain-agnostic segmentation algorithm to mine entity mentions, and formulates the joint entity and relation mention typing problem as a global embedding problem. We design a noise-robust objective to faithfully model noisy type label, and capture the mutual dependencies between entity and relation. Experiment results demonstrate the effectiveness and robustness of CoType on text corpora of different domains. Interesting future work includes incorporating pseudo feedback idea [53] to reduce false negative type labels in the training data, modeling type correlation in the given type hierarchy [42], and performing type inference for test entity mention and relation mentions jointly.

7 Acknowledgments

Research was sponsored in part by the U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), National Science Foundation IIS-1017362, IIS-1320617, and IIS-1354329, HDTRA1-10-1-0120, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative ( The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.


  • [1] I. Augenstein, A. Vlachos, and D. Maynard.

    Extracting relations between non-standard entities using distant supervision and imitation learning.

    In EMNLP, 2015.
  • [2] N. Bach and S. Badaskar. A review of relation extraction. Literature review for Language and Statistics II.
  • [3] J. Bian, Y. Liu, E. Agichtein, and H. Zha. Finding the right facts in the crowd: factoid question answering over social media. In WWW, 2008.
  • [4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
  • [5] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In NIPS, 2013.
  • [6] R. C. Bunescu and R. Mooney. Learning to extract relations from the web using minimal supervision. In ACL, 2007.
  • [7] Y. S. Chan and D. Roth. Exploiting background knowledge for relation extraction. In COLING, 2010.
  • [8] T. Cour, B. Sapp, and B. Taskar. Learning from partial labels. JMLR, 12:1501–1536, 2011.
  • [9] A. Culotta and J. Sorensen. Dependency tree kernels for relation extraction. In ACL, 2004.
  • [10] X. L. Dong, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014.
  • [11] A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. VLDB, 8(3):305–316, 2014.
  • [12] J. Ellis, J. Getman, J. Mott, X. Li, K. Griffitt, S. M. Strassel, and J. Wright. Linguistic resources for 2013 knowledge base population evaluations. Text Analysis Conference (TAC), 2014.
  • [13] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall:(preliminary results). In WWW, 2004.
  • [14] J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.
  • [15] D. Gillick, N. Lazic, K. Ganchev, J. Kirchner, and D. Huynh. Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:1412.1820, 2014.
  • [16] M. R. Gormley, M. Yu, and M. Dredze. Improved relation extraction with feature-rich compositional embedding models. EMNLP, 2015.
  • [17] Z. GuoDong, S. Jian, Z. Jie, and Z. Min. Exploring various knowledge in relation extraction. In ACL, 2005.
  • [18] S. Gupta and C. D. Manning. Improved pattern learning for bootstrapped entity extraction. In CONLL, 2014.
  • [19] X. He and P. Niyogi. Locality preserving projections. In NIPS, 2004.
  • [20] J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011.
  • [21] R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In ACL, 2011.
  • [22] D. Hovy, B. Plank, H. M. Alonso, and A. Søgaard. Mining for unambiguous instances to adapt part-of-speech taggers to new domains. In NAACL, 2015.
  • [23] C. Lee, Y.-G. Hwang, and M.-G. Jang.

    Fine-grained named entity recognition and relation extraction for question answering.

    In SIGIR, 2007.
  • [24] Q. Li and H. Ji. Incremental joint extraction of entity mentions and relations. In ACL, 2014.
  • [25] T. Lin, O. Etzioni, et al. No noun phrase left behind: detecting and typing unlinkable entities. In EMNLP, 2012.
  • [26] X. Ling and D. S. Weld. Fine-grained entity recognition. In AAAI, 2012.
  • [27] J. Liu, J. Shang, C. Wang, X. Ren, and J. Han. Mining quality phrases from massive text corpora. In SIGMOD, 2015.
  • [28] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky.

    The stanford corenlp natural language processing toolkit.

    ACL, 2014.
  • [29] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In I-Semantics, 2011.
  • [30] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
  • [31] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In ACL, 2009.
  • [32] M. Miwa and Y. Sasaki. Modeling joint entity and relation extraction with table representation. In EMNLP, 2014.
  • [33] R. J. Mooney and R. C. Bunescu. Subsequence kernels for relation extraction. In NIPS, 2005.
  • [34] D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26, 2007.
  • [35] A. Nagesh, G. Haffari, and G. Ramakrishnan. Noisy or-based model for relation extraction using distant supervision. In EMNLP, 2014.
  • [36] N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. In ACL, 2013.
  • [37] N. Nguyen and R. Caruana. Classification with partial labels. In KDD, 2008.
  • [38] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In KDD, 2014.
  • [39] S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, and T. Salakoski. Bioinfer: a corpus for information extraction in the biomedical domain. BMC bioinformatics, 8(1):1, 2007.
  • [40] L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In ACL, 2009.
  • [41] X. Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, and J. Han. Clustype: effective entity recognition and typing by relation phrase-based clustering. In KDD, 2015.
  • [42] X. Ren, W. He, M. Qu, C. R. Voss, H. Ji, and J. Han. Label noise reduction in entity typing by heterogeneous partial-label embedding. In KDD, 2016.
  • [43] S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In ECML, 2010.
  • [44] D. Roth and W.-t. Yih.

    Global inference for entity and relation identification via a linear programming formulation.

    Introduction to statistical relational learning, pages 553–580, 2007.
  • [45] B. Salehi, P. Cook, and T. Baldwin. A word embedding approach to predicting the compositionality of multiword expressions. In NAACL-HLT, 2015.
  • [46] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3–30, 2011.
  • [47] S. Singh, S. Riedel, B. Martin, J. Zheng, and A. McCallum. Joint inference of entities, relations, and coreference. In Workshop on Automated Knowledge Base Construction, 2013.
  • [48] H. Sun, H. Ma, W.-t. Yih, C.-T. Tsai, J. Liu, and M.-W. Chang. Open domain question answering via semantic enrichment. In WWW, 2015.
  • [49] M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning. Multi-instance multi-label learning for relation extraction. In EMNLP, 2012.
  • [50] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW, 2015.
  • [51] K. Toutanova, D. Chen, P. Pantel, P. Choudhury, and M. Gamon. Representing text for joint embedding of text and knowledge bases. In EMNLP, 2015.
  • [52] R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge base completion via search-based question answering. In WWW, 2014.
  • [53] W. Xu, R. H. Le Zhao, and R. Grishman. Filling knowledge base gaps for distant supervision of relation extraction. In ACL, 2013.
  • [54] D. Yogatama, D. Gillick, and N. Lazic. Embedding methods for fine grained entity type classification. In ACL, 2015.
  • [55] M. A. Yosef, S. Bauer, J. Hoffart, M. Spaniol, and G. Weikum. Hyena: Hierarchical type classification for entity names. In COLING, 2012.