1 Introduction
Incorporating human knowledge is one of the research directions of artificial intelligence (AI). Knowledge representation and reasoning, inspired by human’s problem solving, is to represent knowledge for intelligent systems to gain the ability to solve complex tasks. Recently, knowledge graphs as a form of structured human knowledge have drawn great research attention from both the academia and the industry. A knowledge graph is a structured representation of facts, consisting of entities, relationships and semantic descriptions. Entities can be realworld objects and abstract concepts, relationships represent the relation between entities, and semantic descriptions of entities and their relationships contain types and properties with a welldefined meaning. Property graphs or attributed graphs are widely used, in which nodes and relations have properties or attributes.
The term of knowledge graph is synonymous with knowledge base with a minor difference. A knowledge graph can be viewed as a graph when considering its graph structure. When it involves formal semantics, it can be taken as a knowledge base for interpretation and inference over facts. Examples of knowledge base and knowledge graph are illustrated in Fig. 1. Knowledge can be expressed in a factual triple in the form of or under the resource description framework (RDF), for example, . It can also be represented as a directed graph with nodes as entities and edges as relations. For simplicity and following the trend of research community, this paper uses the terms knowledge graph and knowledge base interchangeably.
Recent advances in knowledgegraphbased research focus on knowledge representation learning (KRL) or knowledge graph embedding (KGE) by mapping entities and relations into lowdimensional vectors while capturing their semantic meanings. Specific knowledge acquisition tasks include knowledge graph completion (KGC), triple classification, entity recognition, and relation extraction. Knowledgeaware models benefit from the integration of heterogeneous information, rich ontologies and semantics for knowledge representation, and multilingual knowledge. Thus, many realworld applications such as recommendation systems and question answering have been brought about prosperity with the ability of commonsense understanding and reasoning. Some realworld products, for example, Microsoft’s Satori and Google’s Knowledge Graph, have shown a strong capacity to provide more efficient services.
To have a comprehensive survey of current literatures, this paper focuses on knowledge representation which enriches graphs with more context, intelligence and semantics for knowledge acquisition and knowledgeaware applications. Our main contributions are summarized as follows.

Comprehensive review. We conduct a comprehensive review on the origin of knowledge graph and modern techniques for relational learning on knowledge graphs. Major neural architectures of knowledge graph representation learning and reasoning are introduced and compared. Moreover, we provide a complete overview of many applications on different domains.

Fullview categorization and new taxonomies. A fullview categorization of research on knowledge graph, together with finegrained new taxonomies are presented. Specifically, in the highlevel we review knowledge graph in three aspects: KRL, knowledge acquisition, and knowledgeaware application. For KRL approaches, we further propose finegrained taxonomies into four views including representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, KGC is reviewed under embeddingbased ranking, relational path reasoning, logical rule reasoning and meta relational learning; entityrelation acquisition tasks are divided into entity recognition, typing, disambiguation, and alignment; and relation extraction is discussed according to the neural paradigms.

Wide coverage on emerging advances.
Knowledge graph has experienced rapid development. This survey provides a wide coverage on emerging topics including transformerbased knowledge encoding, graph neural network (GNN) based knowledge propagation, reinforcement learning based path reasoning, and meta relational learning.

Summary and outlook on future directions. This survey provides a summary on each category and highlights promising future research directions.
The remainder of this survey is organized as follows: first, an overview of knowledge graphs including history, notations, definitions and categorization is given in Section 2; then, we discuss KRL in Section 3 from four scopes; next, our review goes to tasks of knowledge acquisition and temporal knowledge graphs in Section 4 and Section 5; downstream applications are introduced in Section 6; finally, we discuss future research directions, together with a conclusion in the end. Other information, including KRL model training and a collection of knowledge graph datasets and opensource implementations can be found in the appendices.
2 Overview
2.1 A Brief History of Knowledge Bases
Knowledge representation has experienced a longperiod history of development in the fields of logic and AI. The idea of graphical knowledge representation firstly dated back to 1956 as the concept of semantic net proposed by Richens [127], while the symbolic logic knowledge can go back to the General Problem Solver [109] in 1959. The knowledge base is firstly used with knowledgebased systems for reasoning and problem solving. MYCIN [138] is one of the most famous rulebased expert systems for medical diagnosis with a knowledge base of about 600 rules. Later, the community of human knowledge representation saw the development of framebased language, rulebased, and hybrid representations. Approximately at the end of this period, the Cyc project^{1}^{1}1http://cyc.com began, aiming at assembling human knowledge. Resource description framework (RDF)^{2}^{2}2Released as W3C recommendation in 1999 available at http://w3.org/TR/1999/RECrdfsyntax19990222. and Web Ontology Language (OWL)^{3}^{3}3http://w3.org/TR/owlguide were released in turn, and became important standards of the Semantic Web^{4}^{4}4http://w3.org/standards/semanticweb. Then, many open knowledge bases or ontologies were published such as WordNet, DBpedia, YAGO, and Freebase. Stokman and Vries [140] proposed a modern idea of structure knowledge in a graph in 1988. However, it was in 2012 that the concept of knowledge graph gained great popularity since its first launch by Google’s search engine^{5}^{5}5http://blog.google/products/search/introducingknowledgegraphthingsnot, where the knowledge fusion framework called Knowledge Vault [33] was proposed to build largescale knowledge graphs. A brief road map of knowledge base history is illustrated in Appendix A
2.2 Definitions and Notations
Most efforts have been made to give a definition by describing general semantic representation or essential characteristics. However, there is no such wideaccepted formal definition. Paulheim [117] defined four criteria for knowledge graphs. Ehrlinger and Wöß [35] analyzed several existing definitions and proposed Definition 1 which emphasizes the reasoning engine of knowledge graphs. Wang et al. [158] proposed a definition as a multirelational graph in Definition 2. Following previous literature, we define a knowledge graph as , where , and are sets of entities, relations and facts, respectively. A fact is denoted as a triple .
Definition 1 (Ehrlinger and Wöß[35]).
A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
Definition 2 (Wang et al.[158]).
A knowledge graph is a multirelational graph composed of entities and relations which are regarded as nodes and different types of edges, respectively.
Specific notations and their descriptions are listed in Table I. Details of several mathematical operations are explained in Appendix B.
Notation  Description 

A knowledge graph  
A set of facts  
A triple of head, relation and tail  
Embedding of head, relation and tail  
Relation set and entity set  
Vertex in vertice set  
Edge in edge set  
Source/query/current entity  
Query relation  
Text corpus  
Scoring function  
,  Nonlinear activation function 
Mapping matrix  
Tensor  
Loss function  
dimensional realvalued space  
dimensional complex space  
dimensional hypercomplex space  
dimensional torus space  
Gaussian distribution  
Hermitian dot product  
Hamilton product  
,  Hadmard (elementwise) product 
Circular correlation  
,  Vectors/matrices concatenation 
Convolutional filters  
Convolution operator 
2.3 Categorization of Research on Knowledge Graph
This survey provides a comprehensive literature review on the research of knowledge graphs, namely KRL, knowledge acquisition, and a wide range of downstream knowledgeaware applications, where many recent advanced deep learning techniques are integrated. The overall categorization of the research is illustrated in Fig.
2.Knowledge Representation Learning is a critical research issue of knowledge graph which paves a way for many knowledge acquisition tasks and downstream applications. We categorize KRL into four aspects of representation space, scoring function, encoding models and auxiliary information, providing a clear workflow for developing a KRL model. Specific ingredients include:

representation space in which the relations and entities are represented;

scoring function for measuring the plausibility of factual triples;

encoding models for representing and learning relational interactions;

auxiliary information to be incorporated into the embedding methods.
Representation learning includes pointwise space, manifold, complex vector space, Gaussian distribution, and discrete space. Scoring metrics are generally divided into distancebased and similarity matching based scoring functions. Current research focuses on encoding models including linear/bilinear models, factorization, and neural networks. Auxiliary information considers textual, visual and type information.
Knowledge Acquisition
tasks are divided into three categories, i.e., KGC, relation extraction and entity discovery. The first one is for expanding existing knowledge graphs, while the other two discover new knowledge (aka relations and entities) from text. KGC falls into the following categories: embeddingbased ranking, relation path reasoning, rulebased reasoning and meta relational learning. Entity discovery includes recognition, disambiguation, typing and alignment. Relation extraction models utilize attention mechanism, graph convolutional networks (GCNs), adversarial training, reinforcement learning, deep residual learning, and transfer learning.
Temporal Knowledge Graphs incorporate temporal information for representation learning. This survey categorizes four research fields including temporal embedding, entity dynamics, temporal relational dependency, and temporal logical reasoning.
Knowledgeaware Applications include natural language understanding (NLU), question answering, recommendation systems, and miscellaneous realworld tasks, which inject knowledge to improve representation learning.
2.4 Related Surveys
Previous survey papers on knowledge graphs mainly focus on statistical relational learning [112], knowledge graph refinement [117], Chinese knowledge graph construction [166], KGE [158] or KRL [87]. The latter two surveys are more related to our work. Lin et al. [87] presented KRL in a linear manner, with a concentration on quantitative analysis. Wang et al. [158] categorized KRL according to scoring functions, and specifically focused on the type of information utilized in KRL. It provides a general view of current research only from the perspective of scoring metric. Our survey goes deeper to the flow of KRL, and provides a fullscaled view from four folds including representation space, scoring function, encoding models, and auxiliary information. Besides, our paper provides a comprehensive review on knowledge acquisition and knowledgeaware applications with several emerging topics such as knowledgegraphbased reasoning and fewshot learning discussed.
3 Knowledge Representation Learning
KRL is also known as KGE, multirelation learning, and statistical relational learning in the literature. This section reviews recent advances on distributed representation learning with rich semantic information of entities and relations form four scopes including representation space (representing entities and relations,
Section 3.1), scoring function (measuring the plausibility of facts, Section 3.2), encoding models (modeling the semantic interaction of facts, Section 3.3), and auxiliary information (utilizing external information, Section 3.4). We further provide a summary in Section 3.5. The training strategies for KRL models are reviewed in Appendix D.3.1 Representation Space
The key issue of representation learning is to learn lowdimensional distributed embedding of entities and relations. Current literature mainly uses realvalued pointwise space (Fig. 2(a)) including vector, matrix and tensor space, while other kinds of space such as complex vector space (Fig. 2(b)), Gaussian space (Fig. 2(c)), and manifold (Fig. 2(d)) are utilized as well.
3.1.1 PointWise Space
Pointwise Euclidean space is widely applied for representing entities and relations, projecting relation embedding in vector or matrix space, or capturing relational interactions. TransE [10] represents entities and relations in dimension vector space, i.e., , and makes embeddings follow the translational principle . To tackle this problem of insufficiency of a single space for both entities and relations, TransR [88] then further introduces separated spaces for entities and relations. The authors projected entities () into relation () space by a projection matrix . NTN [139] models entities across multiple dimensions by a bilinear tensor neural layer. The relational interaction between head and tail is captured as a tensor denoted as .
Many other translational models such as TransH [164] also use similar representation space, while semantic matching models use plain vector space (e.g., HolE [113]) and relational projection matrix (e.g., ANALOGY [91]). Principles of these translational and semantic matching models are introduced in Section 3.2.1 and 3.2.2, respectively.
3.1.2 Complex Vector Space
Instead of using a realvalued space, entities and relations are represented in a complex space, where . Take head entity as an example, has a real part and an imaginary part , i.e., . ComplEx [151] firstly introduces complex vector space shown in Fig. 2(b) which can capture both symmetric and antisymmetric relations. Hermitian dot product is used to do composition for relation, head and the conjugate of tail. Inspired by Euler’s identity , RotatE [146] proposes a rotational model taking relation as a rotation from head entity to tail entity in complex space as where denotes the elementwise Hadmard product. QuatE [198] extends the complexvalued space into hypercomplex by a quaternion with three imaginary components, where the quaternion inner product, i.e., the Hamilton product , is used as compositional operator for head entity and relation.
3.1.3 Gaussian Distribution
Inspired by Gaussian word embedding, the densitybased embedding model KG2E [64] introduces Gaussian distribution to deal with the (un)certainties of entities and relations. The authors embedded entities and relations into multidimensional Gaussian distribution and . The mean vector indicates entities and relations’ position, and the covariance matrix
models their (un)certainties. Following the translational principle, the probability distribution of entity transformation
is denoted as . Similarly, TransG [174] represents entities with Gaussian distributions, while it draws a mixture of Gaussian distribution for relation embedding, where the th component translation vector of relation is denoted as .3.1.4 Manifold and Group
This section reviews knowledge representation in manifold space, Lie group and dihedral group. A manifold is a topological space which could be defined as a set of points with neighborhoods by the set theory, while the group is algebraic structures defined in abstract algebra. Previous pointwise modeling is an illposed algebraic system where the number of scoring equations is far more than the number of entities and relations. And embeddings are restricted in an overstrict geometric form even in some methods with subspace projection. To tackle these issues, ManifoldE [173]
extends pointwise embedding into manifoldbased embedding. The authors introduced two settings of manifoldbased embedding, i.e., Sphere and Hyperplane. An example of a sphere is shown in Fig.
2(d). For the sphere setting, Reproducing Kernel Hilbert Space is used to represent the manifold function, i.e.,(1)  
where maps the original space to the Hilbert space, and is the kernel function. Another “hyperplane” setting is introduced to enhance the model with intersected embeddings, i.e.,
(2) 
TorusE [34] solves the regularization problem of TransE via embedding in an ndimensional torus space which is a compact Lie group. With the projection from vector space into torus space defined as , entities and relations are denoted as . Similar to TransE, it also learns embeddings following the relational translation in torus space, i.e., . Recently, DihEdral [182] proposes dihedral symmetry group preserving a 2dimensional polygon.
3.2 Scoring Function
The scoring function is used to measure the plausibility of facts, which is also referred to the energy function in the energybased learning framework. Energybased learning aims to learn the energy function parameterized by taking as input, and to make sure positive samples have higher scores than negative samples. In this paper, the term of scoring function is adopted for unification. There are two typical types of scoring functions, i.e., distancebased (Fig. 3(a)) and similaritybased (Fig. 3(b)) functions, to measure the plausibility of a fact. Distancebased scoring function measures the plausibility of facts by calculating the distance between entities, where addictive translation with relations as is widely used. Semantic similarity based scoring measures the plausibility of facts by semantic matching, which usually adopts multiplicative formulation, i.e., , to transform head entity near the tail in the representation space.
3.2.1 Distancebased Scoring Function
An intuitional distancebased approach is to calculate the Euclidean distance between the relational projection of entities. Structural Embedding (SE) [11] uses two projection matrices and distance to learn structural embedding as
(3) 
A more intensively used principle is the translationbased scoring function that aims to learn embeddings by representing relations as translations from head to tail entities. Bordes et al. [10] proposed TransE by assuming that the added embedding of should be close to the embedding of t with the scoring function is defined under or constraints as
(4) 
Since that, many variants and extensions of TransE have been proposed. For example, TransH [164] projects entities and relations into a hyperplane as
(5) 
TransR [88] introduces separate projection spaces for entities and relations as
(6) 
and TransD [68] constructs dynamic mapping matrices and by the projection vectors , with the scoring function as
(7) 
By replacing Euclidean distance, TransA [171] uses Mahalanobis distance to enable more adaptive metric learning, with the scoring function defined as
(8) 
Previous methods used additive score functions, TransF [38] relaxes the strict translation and uses dot product as . To balance the constraints on head and tail, a flexible translation scoring function is further defined as
(9) 
Recently, ITransF [175] enables hidden concepts discovery and statistical strength transferring by learning associations between relations and concepts via sparse attention vectors. TransAt [120] integrates relation attention mechanism with translational embedding, and TransMS [187]
transmits multidirectional semantics with nonlinear functions and linear bias vectors, with the scoring function as
(10) 
KG2E [64] in Gaussian space and ManifoldE [173] with manifold also use the translational distancebased scoring function. KG2E uses two scoring methods, i.e, asymmetric KLdivergence as
(11) 
and symmetric expected likelihood as
(12) 
While the scoring function of ManifoldE is defined as
(13) 
where is the manifold function, and is a relationspecific manifold parameter.
3.2.2 Semantic Matching
Another direction is to calculate the semantic similarity. SME [8] proposes to semantically match separate combinations of entityrelation pairs of and . Its scoring function is defined with two versions of matching blocks  linear and bilinear block, i.e.,
(14) 
The linear matching block is defined as , and the bilinear form is . By restricting relation matrix to be diagonal for multirelational representation learning, DistMult [185] proposes a simplified bilinear formulation defined as
(15) 
To capture rich interactions in relational data and compute efficiently, HolE [113] introduces circular correlation of embedding, which can be interpreted as compressed tensor product, to learn compositional representations. By semantically matching circular correlation with the relation embedding, the scoring function of HolE is defined as
(16) 
By defining a perturbed holographic compositional operator as , where is a fixed vector, the expanded holographic embedding model HolEx [184]interpolates the HolE and full tensor product method. Given vectors , the rank semantic matching metric of HolEx is defined as
(17) 
It can be viewed as linear concatenation of perturbed HolE. Focusing on multirelational inference, ANALOGY [91] models analogical structures of relational data. It’s scoring function is defined as
(18) 
with relation matrix constrained to be normal matrices in linear mapping, i.e., for analogical inference. Crossover interactions are introduced by CrossE [200] with an interaction matrix to simulate the bidirectional interaction between entity and relation. The relation specific interaction is obtained by looking up interaction matrix as . By combining the interactive representations and matching with tail embedding, the scoring function is defined as
(19) 
The semantic matching principle can be encoded by neural networks further discussed in Sec. 3.3.
Aforementioned two methods in Sec. 3.1.4 with group representation also follow the semantic matching principle. The scoring function of TorusE [34] is defined as:
(20) 
By modeling relations as group elements, the scoring function of DihEdral [182] is defined as the summation of components:
(21) 
where the relation matrix is defined in block diagonal form for , and entities are embedded in realvalued space for and .
3.3 Encoding Models
This section introduces models that encode the interactions of entities and relations through specific model architectures, including linear/bilinear models, factorization models, and neural networks. Linear models formulate relations as a linear/bilinear mapping by projecting head entities into a representation space close to tail entities. Factorization aims to decompose relational data into lowrank matrices for representation learning. Neural networks encode relational data with nonlinear neural activation and more complex network structures. Several neural models are illustrated in Fig. 5.
3.3.1 Linear/Bilinear Models
Linear/bilinear models encode interactions of entities and relations by applying linear operation as:
(22) 
or bilinear transformation operations as Eq. 18. Canonical methods with linear/bilinear encoding include SE [11], SME [8], DistMult [185], ComplEx [151], and ANALOGY [91]. For TransE [10]
with L2 regularization, the scoring function can be expanded to the form with only linear transformation with onedimensional vectors, i.e.,
(23) 
Wang et al. [162] studied various bilinear models and evaluated their expressiveness and connections by introducing the concepts of universality and consistency. The authors further showed that the ensembles of multiple linear models can improve the prediction performance through experiments. Recently, to solve the independence embedding issue of entity vectors in canonical Polyadia decomposition, SimplE [76] introduces the inverse of relations and calculates the average canonical Polyadia score of and as
(24) 
where is the embedding of inversion relation. More bilinear models are proposed from a factorization perspective discussed in the next section.
3.3.2 Factorization Models
Factorization methods formulated KRL models as threeway tensor decomposition. A general principle of tensor factorization can be denoted as , with the composition function following the semantic matching pattern. Nickel et al. [114] proposed the threeway rank factorization RESCAL over each relational slice of knowledge graph tensor. For th relation of relations, the th slice of is factorized as
(25) 
The authors further extended it to handle attributes of entities efficiently [115]. Jenatton et al. [67] then proposed a bilinear structured latent factor model (LFM), which extends RESCAL by decomposing . By introducing threeway Tucker tensor decomposition, TuckER [4] learns embedding by outputting a core tensor and embedding vectors of entities and relations. Its scoring function is defined as
(26) 
where is the core tensor of Tucker decomposition and denotes the tensor product along the th mode.
3.3.3 Neural Networks
Neural networks for encoding semantic matching have yielded remarkable predictive performance in recent studies. Encoding models with linear/bilinear blocks can also be modeled using neural networks, for example, SME [8]
. Representative neural models include multilayer perceptron (MLP)
[33], neural tensor network (NTN) [139], and neural association model (NAM) [92]. Generally, they take entities and/or relations into deep neural networks and compute a semantic matching score. MLP [33] (Fig. 4(a)) encodes entities and relations together into a fullyconnected layer, and uses a second layer with sigmoid activation for scoring a triple as(27) 
where is the weight matrix and is a concatenation of three vectors. NTN [139] takes entity embeddings as input associated with a relational tensor and outputs predictive score in as
(28) 
where is bias for relation , and are relationspecific weight matrices. It can be regarded as a combination of MLPs and bilinear models. NAM [92] associates the hidden encoding with the embedding of tail entity, and proposes the relationalmodulated neural network (RMNN).
3.3.4 Convolutional Neural Networks
CNNs are utilized for learning deep expressive features. ConvE [30] uses 2D convolution over embeddings and multiple layers of nonlinear features to model the interactions between entities and relations by reshaping head entity and relation into 2D matrix, i.e., and for . Its scoring function is defined as
(29) 
where is the convolutional filters and is the vectorization operation reshaping a tensor into a vector. ConvE can express semantic information by nonlinear feature learning through multiple layers. ConvKB [110] adopts CNNs for encoding the concatenation of entities and relations without reshaping (Fig. 4(b)). Its scoring function is defined as
(30) 
The concatenation of a set for feature maps generated by convolution increases the learning ability of latent features. Compared with ConvE which captures the local relationships, ConvKB keeps the transitional characteristic and shows better experimental performance. HypER [3] utilizes hypernetwork for 1D relationspecific convolutional filter generation to achieve multitask knowledge sharing, and meanwhile simplifies 2D ConvE. It can also be interpreted as a tensor factorization model when taking hypernetwork and weight matrix as tensors.
3.3.5 Recurrent Neural Networks
Aforementioned MLP and CNNbased models learn triplelevel representation. To capture longterm relational dependency in knowledge graphs, recurrent networks are utilized. Gardner et al. [46] and Neelakantan et al. [108] propose RNNbased model over relation path to learn vector representation without and with entity information, respectively. RSN [50] (Fig. 4(d)) designs a recurrent skip mechanism to enhance semantic representation learning by distinguishing relations and entities. The relational path as with entities and relations in an alternating order is generated by random walk, and it is further used to calculate recurrent hidden state . The skipping operation is conducted as
(31) 
where and are weight matrices.
3.3.6 Transformers
Transformerbased models have boosted contextualized text representation learning. To utilize contextual information in knowledge graphs, CoKE [157] employs transformers to encode edges and path sequences. Similarly, KGBERT [188] borrows the idea form language model pretraining and takes Bidirectional Encoder Representations from Transformer (BERT) as encoder for entities and relations.
3.3.7 Graph Neural Networks
GNNs are introduced for learning connectivity structure under an encoderdecoder framework. RGCN [130] proposes relationspecific transformation to model the directed nature of knowledge graphs. Its forward propagation is defined as
(32) 
where is the hidden state of the th entity in th layer, is a neighbor set of th entity within relation , and are the learnable parameter matrices, and is normalization such as . Here, the GCN [77] acts as a graph encoder. To enable specific tasks, an encoder model still needs to be developed and integrated into the RGCN framework. RGCN takes the neighborhood of each entity equally. SACN [132] introduces weighted GCN (Fig. 4(c)), defining the strength of two adjacent nodes with the same relation type, to capture the structural information in knowledge graphs by utilizing node structure, node attributes, and relation types. The decoder module called ConvTransE adopts ConvE model as semantic matching metric and preserves the translational property. By aligning the convolutional outputs of entity and relation embeddings with kernels to be , its scoring function is defined as
(33) 
Nathani et al. [107] introduced graph attention networks with multihead attention as encoder to capture multihop neighborhood features by inputing the concatenation of entity and relation embeddings.
3.4 Embedding with Auxiliary Information
To facilitate more effective knowledge representation, multimodal embedding incorporates external information such as text descriptions, type constraints, relational paths, and visual information, with a knowledge graph itself.
3.4.1 Textual Description
Entities in knowledge graphs have textual descriptions denoted as , providing supplementary semantic information. The challenge of KRL with textual description is to embed both structured knowledge and unstructured textual information in the same space. Wang et al. [163] proposed two alignment models for aligning entity space and word space by introducing entity names and Wikipedia anchors. DKRL [176] extends TransE [10] to learn representation directly from entity descriptions by a convolutional encoder. SSP [172] models the strong correlations between triples and textual descriptions by projecting them in a semantic subspace. Joint loss function is widely applied when incorporating KGE with textual description. Wang et al. [163] used a threecomponent loss of knowledge model , text model and alignment model . SSP [172] uses a twocomponent objective function of embeddingspecific loss and topicspecific loss within textual description, traded off by a parameter .
3.4.2 Type Information
Entities are represented with hierarchical classes or types, and consequently, relations with semantic types. SSE [51] incorporates semantic categories of entities to embed entities belonging to the same category smoothly in semantic space. TKRL [178] proposes type encoder model for projection matrix of entities to capture type hierarchy. Noticing that some relations indicate attributes of entities, KREAR [89] categorizes relation types into attributes and relations and modeled the correlations between entity descriptions. Zhang et al. [204] extended existing embedding methods with hierarchical relation structure of relation clusters, relations and subrelations.
3.4.3 Visual Information
Visual information (e.g., entity images) can be utilized to enrich KRL. Imageembodied IKRL [177], containing crossmodal structurebased and imagebased representation, encodes images to entity space, and follows the translation principle. The crossmodal representations make sure that structurebased and imagebased representations are in the same representation space.
3.5 Summary
Knowledge representation learning is important in the research community of knowledge graph . This section reviews four folds of KRL with several recent methods summarized in Table II and more in Appendix C. Overall, developing a novel KRL model is to answer the following four questions: 1) which representation space to choose; 2) how to measure the plausibility of triples in specific space; 3) what encoding model to modeling relational interaction; 4) whether to utilize auxiliary information.
The most popularly used representation space is Euclidean pointbased space by embedding entities in vector space and modeling interactions via vector, matrix or tensor. Other representation spaces including complex vector space, Gaussian distribution, and manifold space and group are also studied. Manifold space has an advantage over pointwise Euclidean space by relaxing the pointwise embedding. Gaussian embeddings are able to express the uncertainties of entities and relations, and multiple relation semantics. Embedding in complex vector space can model different relational connectivity patterns effectively, especially the symmetry/antisymmetry pattern. The representation space plays an important role in encoding the semantic information of entities and capturing the relational properties. When developing a representation learning model, appropriate representation space should be selected and designed carefully to match the nature of encoding methods and balance the expressiveness and computational complexity. The scoring function with distancebased metric utilizes the translation principle, while the semantic matching scoring function employs compositional operators. Encoding models, especially neural networks, play a critical role in modeling interactions of entities and relations. The bilinear models also have drawn much attention, and some tensor factorization can also be regarded as this family. Other methods incorporate auxiliary information of textual description, relation/entity types, and entity images.
4 Knowledge Acquisition
Knowledge acquisition aims to construct knowledge graphs from unstructured text, complete an existing knowledge graph, and discover and recognize entities and relations. Wellconstructed and largescale knowledge graphs can be useful for many downstream applications and empower knowledgeaware models with the ability of commonsense reasoning, thereby paving the way for AI. The main tasks of knowledge acquisition include relation extraction, KGC, and other entityoriented acquisition tasks such entity recognition and entity alignment. Most methods formulate KGC and relation extraction separately. These two tasks, however, can also be integrated into a unified framework. Han et al. [57] proposed a joint learning framework with mutual attention for data fusion between knowledge graphs and text, which solves KGC and relation extraction from text. There are also other tasks related to knowledge acquisition such as triple classification and relation classification. In this section, threefold knowledge acquisition techniques on KGC, entity discovery and relation extraction are reviewed thoroughly.
4.1 Knowledge Graph Completion
Because of the nature of incompleteness of knowledge graphs, KGC is developed to add new triples to a knowledge graph. Typical subtasks include link prediction, entity prediction and relation prediction. Here gives a taskoriented definition as Def. 3.
Definition 3.
Given an incomplete knowledge graph , KGC is to infer missing triples .
Preliminary research on KGC focused on learning lowdimensional embedding for triple prediction. In this survey, we term those methods as embeddingbased methods. Most of them, however, failed to capture multistep relationships. Thus, recent work turns to explore multistep relation paths and incorporate logical rules, termed as relation path inference and rulebased reasoning, respectively. Triple classification as an associated task of KGC, which evaluates the correctness of a factual triple, is additionally reviewed in this section.
4.1.1 Embeddingbased Models
Taking entity prediction as an example, embeddingbased ranking methods as shown in Fig. 5(a) firstly learn embedding vectors based on existing triples, and then replace tail entity or head entity with each entity to calculate scores of all the candidate entities and rank the top entities. Aforementioned KRL methods (e.g., TransE [10], TransH [164], TransR [88], HolE [113], and RGCN [130]) and joint learning methods like DKRL [176] with textual information can been used for KGC.
Unlike representing inputs and candidates in the unified embedding space, ProjE [136] proposes a combined embedding by space projection of the known parts of input triples, i.e., or , and the candidate entities with the candidateentity matrix , where is the number of candidate entities. The embedding projection function including a neural combination layer and a output projection layer is defined as , where is the combination operator of input entityrelation pair. Previous embedding methods do not differentiate entities and relation prediction, and ProjE does not support relation prediction. Based on these observations, SENN [48] distinguishes three KGC subtasks explicitly by introducing a unified neural shared embedding with adaptively weighted general loss function to learn different latent features. Existing methods rely heavily on existing connections in knowledge graphs and fail to capture the evolution of factual knowledge or entities with a few connections. ConMask [137] proposes relationshipdependent content masking over the entity description to select relevant snippets of given relations, and CNNbased target fusion to complete the knowledge graph with unseen entities. It can only make prediction when query relations and entities are explicitly expressed in the text description. Previous methods are discriminative models which rely on preprepared entity pairs or text corpus. Focusing on medical domain, REMEDY [194]
proposes a generative model called conditional relationship variational autoencoder for entity pair discovery from latent space.
4.1.2 Relation Path Reasoning
Embedding learning of entities and relations has gained remarkable performance in some benchmarks, but it fails to model complex relation paths. Relation path reasoning turns to leverage path information over the graph structure. Random walk inference has been widely investigated, for example, the PathRanking Algorithm (PRA) [80] chooses relational path under a combination of path constraints, and conducts maximumlikelihood classification. To improve path search, Gardner et al. [46]
introduced vector space similarity heuristics in random work by incorporating textual content, which also relieves the feature sparsity issue in PRA. Neural multihop relational path modeling is also studied. Neelakantan et al.
[108] developed a RNN model to compose the implications of relational paths by applying compositionality recursively (in Fig. 5(b)). ChainofReasoning [28], a neural attention mechanism to enable multiple reasons, represents logical composition across all relations, entities and text. Recently, DIVA [20] proposes a unified variational inference framework that takes multihop reasoning as two substeps of pathfinding (a prior distribution for underlying path inference) and pathreasoning (a likelihood for link classification).4.1.3 RLbased Path Finding
Deep reinforcement learning (RL) is introduced for multihop reasoning by formulating pathfinding between entity pairs as sequential decision making, specifically a Markov decision process (MDP). The policybased RL agent learns to find a step of relation to extend the reasoning paths via the interaction between the knowledge graph environment, where the policy gradient is utilized for training RL agents.
DeepPath [180] firstly applies RL into relational path learning and develops a novel reward function to improve accuracy, path diversity, and path efficiency. It encodes states in the continuous space via a translational embedding method, and takes the relation space as its action space. Similarly, MINERVA [27] takes path walking to the correct answer entity as a sequential optimization problem by maximizing the expected reward. It excludes the target answer entity and provides more capable inference. Instead of using a binary reward function, MultiHop [86] proposes a soft reward mechanism. To enable more effective path exploration, action dropout is also adopted to mask some outgoing edges during training. MWalk [135] applies an RNN controller to capture the historical trajectory and uses the Monte Carlo Tree Search (MCTS) for effective path generation. By leveraging text corpus with the sentence bag of current entity denoted as , CPL [40] proposes collaborative policy learning for path finding and fact extraction from text.
With source, query and current entity denoted as , and , and query relation denoted as , the MDP environment and policy networks of these methods are summarized in Table III, where MINERVA, MWalk and CPL use binary reward. For the policy networks, DeepPath uses fullyconnected network, the extractor of CPL employs CNN, while the rest uses recurrent networks.
Method  State  Action  Reward  Policy Network 

DeepPath [180]  Global or  
Efficiency  Fullyconnected network (FCN)  
Diversity  
MINERVA [27]  
MultiHop [86]  
MWalk [135]  GRURNN + FCN  
CPL [40] Reasoner  
CPL [40] Extractor  stepwise delayed from reasoner  PCNNATT 
4.1.4 Rulebased Reasoning
To better make use of the symbolic nature of knowledge, another research direction of KGC is logical rule learning. A rule is defined by the head and body in the form of . The is an atom, i.e., a fact with variable subjects and/or objects, while the body can be a set of atoms. For example, given relations sonOf, hasChild and gender, and entities and
, there is a rule in the reverse form of logic programming as:
(34) 
Logical rules can been extracted by rule mining tools like AMIE [41]. The recent RLvLR [116] proposes a scalable rule mining approach with efficient rule searching and pruning, and uses the extracted rules for link prediction.
More research attention focuses on injecting logical rules into embeddings to improve reasoning, with joint learning or iterative training applied to incorporate firstorder logic rules. For example, KALE [52] proposes a unified joint model with tnorm fuzzy logical connectives defined for compatible triples and logical rules embedding. Specifically, three compositions of logical conjunction, disjunction and negation are defined to compose the truth value of complex formula. Fig. 6(a) illustrates a simple firstorder Horn clause inference. RUGE [53] proposes an iterative model, where soft rules are utilized for soft label prediction from unlabeled triples and labeled triples for embedding rectification. IterE [199] proposes an iterative training strategy with three components of embedding learning, axiom induction and axiom injection.
The combination of neural and symbolic models has also attracted increasing attention to do rulebased reasoning in an endtoend manner. Neural Theorem Provers (NTP) [129]
learns logical rules for multihop reasoning which utilizes radial basis function kernel for differentiable computation on vector space. NeuralLP
[186] enables gradientbased optimization to be applicable in the inductive logic programming, where a neural controller system is proposed by integrating attention mechanism and auxiliary memory. pLogicNet [124] proposes probabilistic logic neural networks (Fig. 6(b)) to leverage firstorder logic and learn effective embedding by combining the advantages of Markov logic networks and KRL methods, while handling the uncertainty of logic rules. ExpressGNN [202] generalizes pLogicNet by tuning graph networks and embedding, and achieves more efficient logical reasoning.4.1.5 Meta Relational Learning
The longtail phenomena exist in the relations of knowledge graphs. Meanwhile, the realworld scenario of knowledge is dynamic, where unseen triples are usually acquired. The new scenario, called as meta relational learning or fewshot relational learning, requires models to predict new relational facts with only a very few samples.
Targeting at the previous two observations, GMatching [181]
develops a metric based fewshot learning method with entity embeddings and local graph structures. It encodes onehop neighbors to capture the structural information with RGCN, and then takes the structural entity embedding for multistep matching guided by long shortterm memory (LSTM) networks to calculate the similarity scores. MetaKGR
[97], an optimizationbased meta learning approach, adopts model agnostic meta learning for fast adaption and reinforcement learning for entity searching and path reasoning. Inspired by modelbased and optimizationbased meta learning, MetaR [18] transfers relationspecific meta information from support set to query set, and archives fast adaption via loss gradient of highorder relational representation.4.1.6 Triple Classification
Triple classification is to determine whether facts are correct in testing data, which is typically regarded as a binary classification problem. The decision rule is based on the scoring function with a specific threshold. Aforementioned embedding methods could be applied for triple classification, including translational distancebased methods like TransH [164] and TransR [88] and semantic matchingbased methods such as NTN [139], HolE [113] and ANALOGY [91].
Vanilla vectorbased embedding methods failed to deal with to relations. Recently, Dong et al. [32] extended the embedding space into regionbased dimensional balls where tail region is in head region for to relation using finegrained type chains, i.e., treestructure conceptual clusterings. This relaxation of embedding to balls turns triple classification into a geometric containment problem, and improves the performance for entities with long type chains. However, it relies on the type chains of entities, and suffers from the scalability problem.
4.2 Entity Discovery
This section distinguishes entitybased knowledge acquisition into several fractionized tasks, i.e., entity recognition, entity disambiguation, entity typing, and entity alignment. We term them as entity discovery as they all explore entityrelated knowledge under different settings.
4.2.1 Entity Recognition
Entity recognition or named entity recognition (NER), when it focuses on specifically named entities, is a task that tags entities in text. Handcrafted features such as capitalization patterns and languagespecific resources like gazetteers are applied in many literatures. Recent work applies sequencetosequence neural architectures, for example, LSTMCNN
[23]for learning characterlevel and wordlevel features and encoding partial lexicon matches. Lample et al.
[79] proposed stacked neural architectures by stacking LSTM layers and CRF layers, i.e., LSTMCRF (in Fig. 7(a)) and StackLSTM. Recently, MGNER [169] proposes an integrated framework with entity position detection in various granularities and attentionbased entity classification for both nested and nonoverlapping named entities.4.2.2 Entity Typing
Entity typing includes coarse and finegrained types, while the latter one uses a treestructured type category and is typically regarded as multiclass and multilabel classification. To reduce label noise, PLE [126] focuses on correct type identification and proposes a partiallabel embedding model with a heterogenous graph for the representation of entity mentions, text features and entity types and their relationships. To tackle the increasing growth of type set and noisy labels, Ma et al. [98] proposed prototypedriven label embedding with hierarchical information for zeroshot finegrained named entity typing.
4.2.3 Entity Disambiguation
Entity disambiguation or entity linking is a unified task which links entity mentions to the corresponding entities in a knowledge graph. For example, Einstein won Noble Prize in Physics in 1921. The entity mention of “Einstein” should be linked to the entity of Albert Einstein. The trendy endtoend learning approaches have made efforts through representation learning of entities and mentions, for example, DSRM [65] for modeling entity semantic relatedness and EDKate [37] for the joint embedding of entity and text. Ganea and Hofmann [42] proposed an attentive neural model over local context windows for entity embedding learning and differentiable message passing for inferring ambiguous entities. By regarding relations between entities as latent variables, Le and Titov [81] developed an endtoend neural architecture with relationwise and mentionwise normalization.
4.2.4 Entity Alignment
Aforementioned tasks involve with entity discovery from text or a single knowledge graph, while entity alignment (EA) aims to fuse knowledge among heterogeneous knowledge graphs. Given and as two different entity sets of two different knowledge graphs, EA is to find an alignment set , where entity and entity hold an equivalence relation . In practice, a small set of alignment seeds (i.e., synonymous entities appear in different knowledge graphs) is given to start the alignment process as shown in the left box of Fig. 7(b).
Embeddingbased alignment calculates the similarity between embeddings of a pair of entities. IPTransE [207] maps entities into a unified representation space under a joint embedding framework (Fig. 7(b)) through aligned translation as , linear transformation as , and parameter sharing as . To solve error accumulation in iterative alignment, BootEA [145] proposes a bootstrapping approach in an incremental training manner, together with an editing technique for checking newlylabeled alignment.
Additional information of entities is also incorporated for refinement, for example, JAPE [144] capturing the correlation between crosslingual attributes, KDCoE [19] embedding multilingual entity descriptions via cotraining, MultiKE [197] learning multiple views of entity name, relation and attributes, and alignment with character attribute embedding [152].
4.3 Relation Extraction
Relation extraction is a key task to build largescale knowledge graphs automatically by extracting unknown relational facts from plain text and adding them into knowledge graphs. Due to the lack of labeled relational data, distant supervision [25], also referred as weak supervision or self supervision, uses heuristic matching to create training data by assuming that sentences containing the same entity mentions may express the same relation under the supervision of a relational database. Mintz et al. [103] adopted the distant supervision for relation classification with textual features including lexical and syntactic features, named entity tags, and conjunctive features. Traditional methods rely highly on feature engineering [103], with a recent approach exploring the inner correlation between features [123]. Deep neural networks is changing the representation learning of knowledge graphs and texts. This section reviews recent advances of neural relation extraction (NRE) methods, with an overview illustrated in Fig. 9.
4.3.1 Neural Relation Extraction
Trendy neural networks are widely applied to NRE. CNNs with position features of relative distances to entities [191] are firstly explored for relation classification, and then extended to relation extraction by multiwindow CNN [111] with multiple sized convolutional filters. Multiinstance learning takes a bag of sentences as input to predict the relation of entity pair. PCNN [190]
applies the piecewise max pooling over the segments of convolutional representation divided by entity position. Compared with vanilla CNN
[191], PCNN can more efficiently capture the structural information within entity pair. MIMLCNN [74]further extends it to multilabel learning with crosssentence max pooling for feature selection. Side information such as class ties
[189] and relation path [192] is also utilized.RNNs are also introduced, for example, SDPLSTM [183] adopts multichannel LSTM while utilizing the shortest dependency path between entity pair, and Miwa et al. [104] stacks sequential and treestructure LSTMs based on dependency tree. BRCNN [13] combines RNN for capturing sequential dependency with CNN for representing local semantics using twochannel bidirectional LSTM and CNN.
4.3.2 Attention Mechanism
Many variants of attention mechanisms are combined with CNNs, for example, wordlevel attention to capture semantic information of words [134] and selective attention over multiple instances to alleviate the impact of noisy instances [90]. Other side information is also introduced for enriching semantic representation. APCNN [70] introduces entity description by PCNN and sentencelevel attention, while HATT [58] proposes hierarchical selective attention to capture the relation hierarchy by concatenating attentive representation of each hierarchical layer. Rather than CNNbased sentence encoders, AttBLSTM [206] proposes wordlevel attention with BiLSTM.
4.3.3 Graph Convolutional Networks
GCNs are utilized for encoding dependency tree over sentences or learning KGEs to leverage relational knowledge for sentence encoding. CGCN [201] is a contextualized GCN model over pruned dependency tree of sentences after pathcentric pruning. AGGCN [54] also applies GCN over dependency tree, but utilizes multihead attention for edge selection in a soft weighting manner. Unlike previous two GCNbased models, Zhang et al., [196] applied GCN for relation embedding in knowledge graph for sentencebased relation extraction. The authors further proposed a coarsetofine knowledgeaware attention mechanism for the selection of informative instance.
4.3.4 Adversarial Training
Adversarial Training (AT) is applied to add adversarial noise to word embeddings for CNN and RNNbased relation extraction under the MIML learning setting [168]. DSGAN [121] denoises distantly supervised relation extraction by learning a generator of sentencelevel true positive samples and a discriminator that minimizes the probability of being true positive of the generator.
4.3.5 Reinforcement Learning
RL has been integrated into neural relation extraction recently by training instance selector with policy network. Qin et al. [122]
proposed to train policybased RL agent of sentential relation classifier to redistribute false positive instances into negative samples to mitigate the effect of noisy data. The authors took F1 score as evaluation metric and used F1 score based performance change as the reward for policy networks. Similarly, Zeng et al.
[193] and Feng et al. [39] proposed different reward strategies. The advantage of RLbased NRE is that the relation extractor is modelagnostic. Thus, it could be easily adapted to any neural architectures for effective relation extraction. Recently, HRL [147] proposed a hierarchical policy learning framework of highlevel relation detection and lowlevel entity extraction.4.3.6 Other Advances
Other advances of deep learning are also applied for neural relation extraction. Noticing that current NRE methods do not use very deep networks, Huang and Wang [66] applied deep residual learning to noisy relation extraction and found that 9layer CNNs have improved performance. Liu et al. [93] proposed to initialize the neural model by transfer learning from entity classification. The cooperative CORD [83] ensembles text corpus and knowledge graph with external logical rules by bidirectional knowledge distillation and adaptive imitation. TKMF [71] enriches sentence representation learning by matching sentences and topic words. The existence of lowfrequency relations in knowledge graphs requires fewshot relation classification with unseen classes or only a few instances. Gao et al. [43] proposed hybrid attentionbased prototypical networks to compute prototypical relation embedding and compare its distance between the query embedding.
4.4 Summary
This section reviews knowledge completion for incomplete knowledge graph and acquisition from plain text.
Knowledge graph completion completes missing links between existing entities or infers entities given entity and relation queries. Embeddingbased KGC methods generally rely on triple representation learning to capture semantics, and do candidate ranking for completion. Embeddingbased reasoning remains in individual relation level, and is poor at complex reasoning because it ignores the symbolical nature of knowledge graph, and lack of interpretability. Hybrid methods with symbolics and embedding incorporate rulebased reasoning, overcome the sparsity of knowledge graph to improve the quality of embedding, facilitate efficient rule injection, and induce interpretable rules. With the observation of graphical nature of knowledge graphs, path search and neural path representation learning are studied, but they suffer from connectivity deficiency when traverses over largescale graphs. The emerging direction of meta relational learning aims to learn fast adaptation over unseen relations in lowresource settings.
Entity discovery acquires entityoriented knowledge from text and fuses knowledge between knowledge graphs. There are several categories according to specific settings. Entity recognition is explored in a sequencetosequence manner, entity typing discusses noisy type labels and zeroshot typing, and entity disambiguation and alignment learn unified embeddings with iterative alignment model proposed to tackle the issue of limited number of alignment seed. But it may face the error accumulation problems if newlyaligned entities suffer from poor performance. Languagespecific knowledge has increased recent years, and consequentially motivates the research on crosslingual knowledge alignment.
Relation extraction suffers from noisy patterns under the assumption of distant supervision, especially in text corpus of different domains. Thus, it is important for weakly supervised relation extraction to mitigate the impact of noisy labeling, for example, multiinstance learning taking bags of sentences as inputs, attention mechanism [90]
for soft selection over instances to reduce noisy patterns, and RLbased methods formulating instance selection as hard decision. Another principle is to learn richer representation as possible. As deep neural networks can solve error propagation in traditional feature extraction methods, this field is dominated by DNNbased models as summarized in Table
IV.Category  Method  Mechanism  Auxiliary Information 

CNNs  OCNN [191]  CNN + max pooling  position embedding 
Multi CNN [111]  Multiwindow convolution + max pooling  position embedding  
PCNN [190]  CNN + piecewise max pooling  position embedding  
MIMLCNN [74]  CNN + piecewise and crosssentence max pooling  position embedding  
Ye et al. [189]  CNN/PCNN + pairwise ranking  position embedding, class ties  
Zeng et al. [192]  CNN + max pooling  position embedding, relation path  
RNNs 
SDPLSTM [183]  Multichannel LSTM + dropout  dependency tree, POS, GR, hypernyms 
LSTMRNN [104]  BiLSTM + BiTreeLSTM  POS, dependency tree  
BRCNN [13]  Twochannel LSTM + CNN + max pooling  dependency tree, POS, NER  
Attention  AttentionCNN [134]  CNN + wordlevel attention + max pooling  POS, position embedding 
Lin et al. [90]  CNN/PCNN + selective attention + max pooling  position embedding  
AttBLSTM [206]  BiLSTM + wordlevel attention  position indicator  
APCNN [70]  PCNN + sentencelevel attention  entity descriptions  
HATT [58]  CNN/PCNN + hierarchical attention  position embedding, relation hierarchy  
GCNs  CGCN [201]  LSTM + GCN + pathcentric pruning  dependency tree 
KATT [196]  Pretraining + GCN + CNN + attention  position embedding, relation hierarchy  
AGGCN [54]  GCN + multihead attention + dense layers  dependency tree  
Adversarial  Wu et al. [168]  AT + PCNN/RNN + selective attention  indicator encoding 
DSGAN [121]  GAN + PCNN/CNN + attention  position embedding  
RL  Qin et al. [122]  Policy gradient + CNN + performance change reward  position embedding 
Zeng et al. [193]  Policy gradient + CNN + +1/1 bagresult reward  position embedding  
Feng et al. [39]  Policy gradient + CNN + predictive probability reward  position embedding  
HRL [147]  Hierarchical policy learning + BiLSTM + MLP  relation indicator  
Others  ResCNNx [66]  Residual convolution block + max pooling  position embedding 
Liu et al. [93]  Transfer learning + subtree parse + attention  position embedding  
CORD [83]  BiGRU + hierarchical attention + cooperative module  position embedding, logic rules  
TKMF [71]  Topic modeling + multihead self attention  position embedding, topic words  
HATTProto [43]  Prototypical networks + CNN + hybrid attention  position embedding 
5 Temporal Knowledge Graph
Current knowledge graph research mostly focuses on static knowledge graphs where facts are not changed with time, while the temporal dynamics of a knowledge graph is less explored. However, the temporal information is of great importance because the structured knowledge only holds within a specific period, and the evolution of facts follows a time sequence. Recent research begins to take temporal information into KRL and KGC, which is termed as temporal knowledge graph in contrast to the previous static knowledge graph. Research efforts have been made for learning temporal and relational embedding simultaneously.
5.1 Temporal Information Embedding
Temporal information is considered in temporal aware embedding by extending triples into temporal quadruple as , where provides additional temporal information about when the fact held. Leblay and Chekol [82] investigated temporal scope prediction over timeannotated triple, and simply extended existing embedding methods, for example, TransE with the vectorbased TTransE defined as
(35) 
Temporally scoped quadruple extends triples by adding a time scope , where and stand for the beginning and ending of the valid period of a triple, and then a static subgraph can be derived from the dynamic knowledge graph when given a specific timestamp . HyTE [29] takes a time stamp as a hyperplane and projects entity and relation representation as , , and . The temporally projected scoring function is calculated as
(36) 
within the projected translation of . GarcíaDurán et al. [45] concatenated predicate token sequence and temporal token sequence, and used LSTM to encode the concatenated timeaware predicate sequences. The last hidden state of LSTM is taken as temporalaware relational embedding . The scoring function of extended TransE and DistMult are calculated as and , respectively. By defining the context of an entity as an aggregate set of facts containing , Liu et al. [94] proposed context selection to capture useful contexts, and measured temporal consistency with selected context.
5.2 Entity Dynamics
Realworld events change entities’ state, and consequently, affect the corresponding relations. To improve temporal scope inference, the contextual temporal profile model [165] formulates the temporal scoping problem as state change detection, and utilizes the context to learn state and state change vectors. Knowevolve [150], a deep evolutionary knowledge network, investigates the knowledge evolution phenomenon of entities and their evolved relations. A multivariate temporal point process is used to model the occurrence of facts, and a novel recurrent network is developed to learn the representation of nonlinear temporal evolution. To capture the interaction between nodes, RENET [75] models event sequences via RNNbased event encoder and neighborhood aggregator. Specifically, RNN is used to capture the temporal entity interaction, and the concurrent interactions are aggregated by the neighborhood aggregator.
5.3 Temporal Relational Dependency
There exists temporal dependencies in relational chains following the timeline, for example, . Jiang et al. [72, 73] proposed timeaware embedding, a joint learning framework with temporal regularization, to incorporate temporal order and consistency information. The authors defined a temporal scoring function as
(37) 
where is an asymmetric matrix that encodes the temporal order of relation, for a temporal ordering relation pair
. Three temporal consistency constraints of disjointness, ordering, and spans are further applied by integer linear programming formulation.
5.4 Temporal Logical Reasoning
Logical rules are also studied for temporal reasoning. Chekol et al. [17] explored Markov logic network and probabilistic soft logic for reasoning over uncertain temporal knowledge graphs. RLvLRStream [116] considers temporal closepath rules and learns the structure of rules from knowledge graph stream for reasoning.
6 KnowledgeAware Applications
Rich structured knowledge can be useful for AI applications. But how to integrate such symbolic knowledge into the computational framework of realworld applications remains a challenge. This section introduces several recent DNNbased knowledgedriven approaches with the applications on NLU, recommendation, and question answering. More miscellaneous applications such as digital health and search engine are introduced in Appendix E.
6.1 Natural Language Understanding
Knowledgeaware NLU enhances language representation with structured knowledge injected into a unified semantic space. Recent knowledgedriven advances utilize explicit factual knowledge and implicit language representation, with many NLU tasks explored. Chen et al. [22] proposed doublegraph random walks over two knowledge graphs, i.e., a slotbased semantic knowledge graph and a wordbased lexical knowledge graph, to consider interslot relations in spoken language understanding. Wang et al. [156] augmented short text representation learning with knowledgebased conceptualization by a weighted wordconcept embedding. Peng et al. [118] integrated external knowledge base to build heterogeneous information graph for event categorization in short social text.
Language modeling as a fundamental NLP task predicts the next word given preceding words in the given sequence. Traditional language modeling does not exploit factual knowledge with entities frequently observed in the text corpus. How to integrate knowledge into language representation has drawn increasing attention. Knowledge graph language model (KGLM) [96] learns to render knowledge by selecting and copying entities. ERNIETsinghua [205] fuses informative entities via aggregated pretraining and random masking. BERTMK [62] encodes graph contextualized knowledge and focuses on the medical corpus. ERNIEBaidu [142] introduces named entity masking and phrase masking to integrate knowledge into language model, and is further improved by ERNIE 2.0 [143] via continual multitask learning. Rethinking about largescale training on language model and querying over knowledge graphs, Petroni et al. [119] conducted an analysis on language model and knowledge base, and found that certain factual knowledge can be acquired via pretraining language model.
6.2 Question Answering
knowledgegraphbased question answering (KGQA) answers natural language questions with facts from knowledge graphs. Neural network based approaches represent questions and answers in distributed semantic space, and some also conduct symbolic knowledge injection for commonsense reasoning.
6.2.1 Singlefact QA
Taking knowledge graph as an external intellectual source, simple factoid QA or singlefact QA is to answer simple question involving with a single knowledge graph fact. Bordes et al. [9] adapted memory network for simple question answering, taking knowledge base as external memory. Dai et al. [26] proposed a conditional focused neural network equipped with focused pruning to reduce the search space. To generate natural answers in a userfriendly way, COREAQ [63] introduces copying and retrieving mechanisms to generate smooth and natural responses in a seq2seq manner, where an answer is predicted from the corpus vocabulary, copied from the given question or retrieved from the knowledge graph. BAMnet [21] models the twoway interaction between questions and knowledge graph with a bidirectional attention mechanism.
Although deep learning techniques are intensively applied in KGQA, they inevitably increase the model complexity. Through evaluation on simple KGQA with and without neural networks, Mohammed et al. [105]
found that sophisticated deep models such as LSTM and gated recurrent unit (GRU) with heuristics achieve the state of the art, and nonneural models also gain reasonably well performance.
6.2.2 Multihop Reasoning
Those neural network based methods gain improvements with the combination of neural encoderdecoder models, but to deal with complex multihop relation requires a more dedicated design to be capable of multihop commonsense reasoning. Structured knowledge provides informative commonsense observations and acts as relational inductive biases, which boosts recent studies on commonsense knowledge fusion between symbolic and semantic space for multihop reasoning. Bauer et al. [6] proposed multihop bidirectional attention and pointergenerator decoder for effective multihop reasoning and coherent answer generation, where external commonsense knowledge is utilized by relational path selection from ConceptNet and injection with selectivelygated attention. Variational Reasoning Network (VRN) [203] conducts multihop logic reasoning with reasoninggraph embedding, while handles the uncertainty in topic entity recognition. KagNet [85] performs concept recognition to build a schema graph from ConceptNet and learns pathbased relational representation via GCN, LSTM and hierarchical pathbased attention. CogQA [31] combines implicit extraction and explicit reasoning, and proposes a cognitive graph model based on BERT and GNN for multihop QA.
6.3 Recommender Systems
Recommender systems have been widely explored by collaborative filtering which makes use of users’ historical information. However, it often fails to solve the sparsity issue and the cold start problem. Integrating knowledge graphs as external information enables recommendation systems to have the ability of commonsense reasoning.
By injecting knowledgegraphbased side information such as entities, relations, and attributes, many efforts work on embeddingbased regularization to improve recommendation. The collaborative CKE [195] jointly trains KGEs, item’s textual information and visual content via translational KGE model and stacked autoencoders. Noticing that timesensitive and topicsensitive news articles consist of condensed entities and common knowledge, DKN [154] incorporates knowledge graph by a knowledgeaware CNN model with multichannel wordentityaligned textual inputs. However, DKN cannot be trained in an endtoend manner as entity embedding need to be learned in advance. To enable endtoend training, MKR [155] associates multitask knowledge graph representation and recommendation by sharing latent features and modeling highorder itementity interaction. While other works consider the relational path and structure of knowledge graphs, KPRN [160] regards the interaction between users and items as entityrelation path in knowledge graph and conducts preference inference over the path with LSTM to capture the sequential dependency. PGPR [170] performs reinforcement policyguided path reasoning over knowledgegraphbased useritem interaction. KGAT [159] applies graph attention network over the collaborative knowledge graph of entityrelation and useritem graphs to encode highorder connectivities via embedding propagation and attentionbased aggregation.
7 Future Directions
Many efforts have been conducted to tackle the challenges of knowledge representation and its related applications. But there still remains several formidable open problems and promising future directions.
7.1 Complex Reasoning
Numerical computing for knowledge representation and reasoning requires a continuous vector space to capture the semantic of entities and relations. While embeddingbased methods have a limitation on complex logical reasoning, two directions on the relational path and symbolic logic are worthy of being further explored. Some promising methods such as recurrent relational path encoding, GNNbased message passing over knowledge graph, and reinforcement learningbased path finding and reasoning are very promising for handling complex reasoning. For the combination of logic rules and embeddings, recent works [124, 202] combines Markov logic networks with KGE, aiming to leveraging logic rules and handling their uncertainty. Enabling probabilistic inference for capturing the uncertainty and domain knowledge with efficiently embedding will be a noteworthy research direction.
7.2 Unified Framework
Several knowledge graph representation learning models have been verified as equivalence, for example, Hayshi and Shimbo [61] proved that HolE and ComplEx are mathematically equivalent for link prediction with a certain constraint. ANALOGY [91] provides a unified view of several representative models including DistMult, ComplEx, and HolE. Wang et al. [162] explored connections among several bilinear models. Chandrahas et al. [133] explored the geometric understanding of additive and multiplicative KRL models. Most work formulated knowledge acquisition KGC and relation extraction separately with different models. Han et al. [57] put them under the same roof and proposed a joint learning framework with mutual attention for information sharing between knowledge graph and text. A unified understanding of knowledge representation and reasoning is less explored. An investigation towards unification in a way similar to the unified framework of graph networks [5], however, will be worthy to bridge the research gap.
7.3 Interpretability
Interpretability of knowledge representation and injection is a key issue for knowledge acquisition and realworld applications. Preliminary efforts have been done for interpretability. ITransF [175] uses sparse vectors for knowledge transferring and interprets with attention visualization. CrossE [200] explores the explanation scheme of knowledge graphs by using embeddingbased path searching to generate explanations for link prediction. Recent neural models, however, have limitations on transparency and interpretability, although they have gained impressive performance. Some methods combine blackbox neural models and symbolic reasoning by incorporating logical rules to increase the interoperability. Interpretability can convince people to trust predictions. Thus, further work should go into interpretability and improve the reliability of predicted knowledge.
7.4 Scalability
Scalability is crucial in largescale knowledge graphs. There is a tradeoff between computational efficiency and model expressiveness, with a limited number of works applied to more than 1 million entities. Several embedding methods use simplification to reduce the computation cost, for example, simplifying tensor product with circular correlation operation [113]. However, these methods still struggle with scaling to millions of entities and relations.
Probabilistic logic inference such as using Markov logic networks is computationally intensive, making it hard to be scalable to largescale knowledge graphs. Rules in a recent neural logical model [124] are generated by simple bruteforce search, making it insufficient on largescale knowledge graphs. ExpressGNN [202] attempts to use NeuralLP [186] for efficient rule induction. But there still has a long way to go to deal with cumbersome deep architectures and the increasingly growing knowledge graphs.
7.5 Knowledge Aggregation
The aggregation of global knowledge is the core of knowledgeaware applications. For example, recommendation systems use knowledge graph to model useritem interaction and text classification jointly to encode text and knowledge graph into a semantic space. Most of current knowledge aggregation methods design neural architectures such as attention mechanism and GNNs. The natural language processing community has been boosted from largescale pretraining via transformers and variants like BERT models, while a recent finding
[119] reveals that pretraining language model on unstructured text can actually acquire certain factual knowledge. Largescale pretraining can be a straightforward way for injecting knowledge. However, rethinking the way of knowledge aggregation in an efficient and interpretable manner is also of significance.7.6 Automatic Construction and Dynamics
Current knowledge graphs rely highly on manual construction, which is laborintensive and expensive. The widespread applications of knowledge graphs on different cognitive intelligence fields require automatic knowledge graph construction from largescale unstructured content. Recent research mainly works on semiautomatic construction under the supervision of existing knowledge graphs. Facing the multimodality, heterogeneity and largescale application, automatic construction is still of great challenge.
The mainstream research focuses on static knowledge graphs, with several work on predicting temporal scope validity and learning temporal information and entity dynamics. Many facts only hold within a specific time period. Considering the temporal nature, dynamic knowledge graph can address the limitation of traditional knowledge representation and reasoning.
8 Conclusion
Knowledge graphs as the ensemble of human knowledge have attracted increasing research attention, with the recent emergence of knowledge representation learning, knowledge acquisition methods, and a wide variety of knowledgeaware applications. The paper conducts a comprehensive survey on the following four scopes: 1) knowledge graph embedding, with a full scale systematic review from embedding space, scoring metrics, encoding models, embedding with external information, and training strategies; 2) knowledge acquisition of entity discovery, relation extraction, and graph completion from three perspectives of embedding learning, relational path inference and logical rule reasoning; 3) temporal knowledge graph representation learning and completion; 4) realworld knowledgeaware applications on natural language understanding, recommendation systems, question answering and other miscellaneous applications. In addition, some useful resources of datasets and opensource libraries, and future research directions are introduced and discussed. Knowledge graph hosts a large research community and has a wide range of methodologies and applications. We conduct this survey to have a summary of current representative research efforts and trends, and expect it can facilitate future research.
Appendix A A Brief History of Knowledge Bases
Knowledge bases experienced a development timeline as illustrated in Fig. 10.
Appendix B Mathematical Operations
Hermitian dot product (Eq. 38) and Hamilton product (Eq. 39) are used in complex vector space (Sec. 3.1.2). Given and represented in complex space , the Hermitian dot product is calculated as the sesquilinear form of
(38) 
where is the conjugate operation over . The quaternion extends complex numbers into fourdimensional hypercomplex space. With two dimensional quaternions defined as and , the Hamilton product is defined as
(39)  
The Hadmard product (Eq. 40) and circular correlation (Eq. 41) are utilized in semantic matching based methods (Sec. 3.2.2). Hadmard product, denoted as or , is also known as elementwise product or Schur product.
(40) 
Circular correlation is an efficient computation calcuated as:
(41) 
Appendix C A Summary of KRL Models
We conduct a comprehensive summary on KRL models in Table V. The representation space has an impact on the expressiveness of KRL methods to some extent. By expanding pointwise Euclidean space [10, 139, 113], manifold space [173], complex space [151, 146, 198] and Gaussian distribution [64, 174] are introduced. ManifoldE [173] relaxes the realvalued pointwise space into manifold space with more expressive representation from the geometric perspective. When and is set to be zero, the manifold collapses into a point. With the introduction of rotational Hadmard product, RotatE [146] can also capture inversion and composition patterns as well as symmetry and antisymmetry. QuatE [198] uses Hamilton product to capture latent interdependency within fourdimensional space of entities and relations, and gains more expressive rotational capability than RotatE. Group theory remains less explored to capture rich information of relations. The very recent DihEdral [182]
firstly introduces the finite nonAbelian group to preserve the relational properties of symmetry/skewsymmetry, inversion and composition effectively with the rotation and reflection properties in the dihedral group. Ebisu and Ichise
[34] summarized that the embedding space should follow three conditions, i.e., differentiability, calculation possibility, and definability of a scoring function.Distancebased and semantic matching scoring functions consist of the foundation stones of plausibility measure in KRL. Translational distancebased methods, especially the groundbreaking TransE [10], borrowed the idea of distributed word representation learning and inspired many following approaches such as TransH [164] and TransR [88] which specify complex relations (1toN, Nto1, and NtoN) and the recent TransMS [187] which models multidirectional semantics. As for the semantic matching side, many methods utilizes mathematical operations or compositional operators including linear matching in SME [8], bilinear mapping in DistMult [185], tensor product in NTN [139], circular correlation in HolE [113] and ANALOGY [91], Hadamard product in CrossE [200], and quaternion inner product in QuatE [198].
Recent encoding models for knowledge representation have developed rapidly, and generally fall into two families of bilinear and neural networks. Linear and bilinear models use productbased functions over entities and relations, while factorization models regard knowledge graphs as threeway tensors. With the multiplicative operations, RESCAL [114], ComplEx [151], and SimplE [76] also belong to the bilinear models. DistMult [185] can only model symmetric relations, while its extension of ComplEx [151] managed to preserve antisymmetric relations, but involves redundant computations [76]. ComplEx [151], SimplE [76], and TuckER [4] can guarantee full expressiveness under specific embedding dimensionality bounds. Neural networkbased encoding models start from distributed representation of entities and relations, and some utilizes complex neural structures such as tensor networks [139], graph convolution networks [130, 132, 107], recurrent networks [50] and transformers [157, 188] to learn richer representation. These deep models have achieved very competitive results, but they are not transparent, and lack of interpretability. As deep learning techniques are growing prosperity and gaining extensive superiority in many tasks, the recent trend is still likely to focus on more powerful neural architectures or largescale pretraining, while interpretable deep models remains a challenge.
Comments
There are no comments yet.