1 Introduction
Taxonomy, a systematic categorization scheme, is an effective way to organize and classify knowledge
harlin1998taxonomy; stewart2008building. Taxonomies have been used to support many downstream applications such as content management in ecommerce wang2021enquire; zhang2014taxonomy, web search yin2010building; Liu2019AUC, digital libraries yu2020steam, and NLP tasks yang2020co; hua2016understand; yang2017efficiently. The curation of taxonomies mostly relies on human experts, which can be timeconsuming and expensive, and hence suffer from limited coverage of the knowledge jurgens2016semeval. To alleviate this issue and handle constantly emerging new concepts, automating the taxonomy construction has attracted attentions from the research community wang2017short. One type of such automated taxonomy curation is taxonomy expansion, which enriches an existing taxonomy to incorporate new and broader concepts. Specifically, the expansion of a taxonomy is performed as attaching new concept nodes to proper positions of a seed taxonomy graph, which is usually represented as a hierarchical tree vedula2018enriching.To systematically enrich a taxonomy graph, concept embeddings are firstly learned by structurally characterizing the concepts in the existing taxonomies, which are then used to match the embeddings of query concepts for the expansion. Prior works learn the concept embeddings with local structural features, such as edge semantic representation manzoor2020expanding and graph neural networks (GNN) Shen2020TaxoExpanST. However, as a concept can lead to multiple subconcepts, the sizes of taxonomies expand exponentially with respect to their levels. The Euclidean embedding space, where existing works commonly build upon, fails to account for this property. In contrast, a hyperbolic space nickel2017poincare; sarkar2011low, where the circumference of a negativecurved space grows exponentially with regard to the radius as illustrated in Figure 1, can better capture such special characteristics of taxonomies.
In this paper, we present HyperExpan, a taxonomy expansion framework based on hyperbolic representation learning, that: (1) better preserves the taxonomical structure in a more expressive hyperbolic space, (2) effectively characterizes concepts by exploiting sparse neighborhood information beyond standard parentchild relations alyetal2019every; le2019inferring, and (3) improves inference precision and generalizability by leveraging pretrained distributional features.
Specifically, HyperExpan incorporates two types of features to exploit the structural presentation of a taxonomy: a relative positional embedding of a node depending on its relation to the anchor node, and an absolute positional embedding defined by its depth within a taxonomy. HyperExpan first constructs an ego subgraph around the potential attaching candidate concepts, i.e. the anchor concepts, and then leverages a hyperbolic graph neural network (HGNN) to obtain the anchor concept embeddings. A parentchild matching score for the attachment is subsequently produced by comparing both the anchor and query concept embeddings in the same hyperbolic space.
We evaluate HyperExpan on WordNet and Microsoft Academic Graph datasets. Experiments show that the learned hyperbolic concept embeddings achieve better expansion performance than the Euclidean counterpart, outperforming the stateoftheart models. We also perform ablation studies to demonstrate the effectiveness of each component and the design choice of HyperExpan. Our contributions are summarized as follows: (1) We present an effective and generalizable taxonomy expansion framework via hyperbolic representation learning. (2) We introduce methods to incorporate pretrained distributional features and taxonomyspecific information in the hyperbolic GNN design. (3) We show that our framework achieves stateoftheart performance on expanding four large realworld taxonomies.
2 Preliminaries
We introduce preliminaries about hyperbolic geometry and then define the task.
Poincaré Ball  Lorentz Model  

Distance  
Exponential Map  
Logarithmic Map  
Addition  
Matrix Multiplication 
2.1 Hyperbolic Geometry
Hyperbolic space is a nonlinear space with constant negative curvature as opposed to Euclidean space which has zero curvature. The curvature of a space measures how a geometric object deviates from a flat plane.^{1}^{1}1Here we assume a unit hyperbolic space (curvature ) in this section. Specifically in this work, we mainly employ the following two models of hyperbolic geometry beltrami1868teoria; cannon1997hyperbolic: the Poincaré ball model and the Lorentz model, with some intermediate projective operations defined by the Klein model (see § 3.1).
There are several essential vector operations required for learning embeddings in a hyperbolic space, including: (1) computing the distance between two points, (2) projecting from a hyperbolic space to a Euclidean space, and vice versa, (3) adding and multiplying matrices, (4) concatenating two vectors, and (5) transformation among hyperbolic models. These necessary algebraic operations are summarized in Table
1.For each point in the hyperbolic space, we denote the associated tangent space centered around as , which is always a subset of the Euclidean space. We make use of the exponential map and logarithmic map to project points in the hyperbolic space to the local tangent space for precise approximation, and viceversa. Setting the origin (north pole) of the hyperbolic space as the center, we can obtain a common tangent space across different manifolds as long as they are of the same dimension and modeled by the same hyperbolic model using and projection. And hence, we can use and to perform the projection within a neural network that has a mixture of hyperbolic and Euclidean layers.
The addition and matrix multiplication operations in Poincaré model are based on Möbius transformation ungar2001hyperbolic; Ganea2018HyperbolicNN; gulcehre2018hyperbolic, which are defined in Table 1. In the Lorentz model, we utilize the tangent space to perform matrix multiplication and parallel transport to perform the addition Chami2019HyperbolicGC.
For concatenating two hyperbolic vectors, we perform a generalized version of the concatenation operation Ganea2018HyperbolicNN; lopezstrube2020fully to prevent the resulting vector from being out of the manifold, as shown below:
where , and are parameters.
The Poincaré ball model , the Klein model and the hyperboloid/Lorentz model are used in our work, and we perform different computation on different models. These models are isometric isomorphic. Given a node , the bijections between node on Lorentz model and its corresponding mapped node on Poincaré ball are cannon1997hyperbolic; iversen1992hyperbolic:
The bijections between and its mapped node on the Klein model are:
2.2 Taxonomy Expansion
In this work, a taxonomy is mathematically defined as a directed acyclic concept graph , where each node represents a concept, and each directed edge denotes a parentchild relation in which and is a pair of hierarchically related concepts (e.g. change integrity explode). Given an existing taxonomy , the goal of the taxonomy expansion is to attach a set of new concepts to , expanding it to where are new edges whose children must be .
An illustration of the taxonomy expansion is as shown in Figure 1, where the query nodes (new concepts) are attached to the proper positions depending on the surrounding anchor nodes (existing concepts). Following the settings of prior works Shen2020TaxoExpanST; zhang2021tmn, we consider attaching different query concepts independently from each other to simplify the problem. Each concept in has its profile information, i.e. concept definitions, concept names, and related articles etc. (See § 4.1 for more details.)
3 HyperExpan
We propose HyperExpan, a taxonomy expansion framework based on hyperbolic geometry and GNNs. As shown in Figure 2, HyperExpan consists of the following main steps: 1) initial concept feature generations utilizing the profile information (§ 3.1). 2) encoding query and anchor concept features with hyperbolic (graph) neural networks (§ 3.2). 3) computing the queryanchor embedding matching scores for attaching query concepts to proper anchor positions (§ 3.3). We will describe each step in details and how to train the matching model (§ 3.4) in the following sections.
3.1 Initial Concept Features
We mainly leverage two types of profile information to obtain the initial concept (either in query or existing taxonomy) features: the name and the definition sentences of a concept. We firstly embed the two profile information by applying an average pooling over the word embeddings of each profile word, and then take the mean of the two embedded profile information to produce the fixeddimension initial concept embedding. Our framework does not require the initial word embeddings to be defined in a specific geometry, and thus it can be either Euclidean, such as fastText bojanowski2016enriching, or hyperbolic, such as Poincaré GloVe tifrea2018poincare, which embeds words in a Cartesian product of hyperbolic spaces. Note that since Poincaré GloVe is defined in hyperbolic space, the aforementioned mean operation can no longer be the usual Euclidean average since it may produce results that are out of the manifold. Instead, we use Einstein midpoint method gulcehre2018hyperbolic to perform the average pooling. Denote the token embeddings as and as number of tokens in a sentence, the midpoint can be computed as:
where denotes the Lorentz factors. Einstein midpoint has the most concise form with the Klein coordinates gulcehre2018hyperbolic, therefore we project Poincaré embeddings to the Klein model to calculate the midpoint, and then project the results back to the Poincaré model. We project the initial concept embeddings to the hyperbolic space initialized by the following network design and used as the network input.
3.2 Anchor Concept Representation
We learn a parameterized model to encode anchor nodes , taking the initial concept features as inputs, and output the hyperbolic embedding vectors . We use HGNN to model the concepts in a hyperbolic space and exploit the structured representation of a taxonomy. We leave the basics of Euclidean Graph Convolutional Networks in Appendix A.
HGNN performs the neighbor aggregation operation in a hyperbolic space , which can be a Lorentz model or a Poincaré model , following corresponding numerical operations defined in § 2.1. Note that the standard neighbor aggregation operation in (Euclidean) GNN may lead to manifold distortion when embedding graphs with scalefree or hierarchical structure deza2009geometry; bachmann2020constant.
The first layer of an HGNN maps initial node features (can be on a Euclidean or any hyperbolic spaces) to , followed by a series of cascaded HGNN layers. At each layer, the HGNN performs four operations in the following order: 1) transforming node features to messages in a predefined hyperbolic space, 2) transforming messages to the tangent space for each node, 3) performing neighborhood aggregation on the tangent space, and 4) projecting updated tangential node embeddings to hyperbolic space . In this work, our HGNN design is based on the hyperbolic graph convolutional network Chami2019HyperbolicGC.
Ego Graph. To encode anchor concepts with an HGNN, an ego graph centered around anchor concept is firstly constructed, where all nodes on such a graph is bounded by a certain edge distance.
Positional Features. To further exploit the structural presentation of a taxonomy, we incorporate the relative and absolute positional embeddings as inputs to an HGNN layer. With respect to a given center node, the neighbors of such node can be of one of the following three relative positions: parent, child, and self. Denote as the relative position of node if the center node is , we have the relative positional embedding as: .
Motivated by you2019position; wangetal2019self, we equip the HGNN model with the positionawareness by incorporating an absolute position embedding. We define an absolute position, , of a node as its depth (i.e. level w.r.t the root) within the entire taxonomy. Since the expansion task does not break the structure of the existing taxonomy, such position encoding is fixed for a given node. The depthdependent position embedding is defined as . And hence, the overall inputs to each HGNN layer is a concatenation of the original node embeddings and the two taxonomyspecific features ^{2}^{2}2Superscript and indicate the node feature is in Euclidean space and hyperbolic space respectively.:
Note that the positional embeddings are initialized and then projected to hyperbolic space following Table 1, while the concatenation is as described in § 2.1. is the initial concept feature obtained following § 3.1.
Feature Transformation. At layer , we transform the embedding vectors produced by the previous layer to message
by applying a hyperbolic linear transformation:
where and denotes multiplication and addition in hyperbolic space with curvature .
Neighborhood Aggregation. The neighborhood aggregation encapsulates neighboring features to update the center node. To enable an importanceweighted aggregation and for the simplicity to reuse Euclidean operations to derive the attention scores, we firstly apply a logarithmic mapping to project the messages to a tangent space. Let be the center node and be one of its neighbor nodes, we compute an attention weight by applying an Euclidean MLP to the concatenated tangential representations of the two messages following:
where is a softmax function over all neighbors . The center node embedding is thus obtained by a weighted sum of the neighboring tangential embeddings. Finally, we apply an exponential mapping to project the aggregated tangential center node embedding to the hyperbolic model as:
Note that for a better local hierarchical approximation, an independent local tangent space is created for each center node during the neighborhood aggregation, instead of using the tangent space of the hyperbolic origin (i.e. using and instead of and ). The curvature of a hyperbolic model can either be a fixed number or a learnable parameter, where our experiments show that learned tends to yield better performance. The update rule of the embedding of node can thus be defined as:
and we concatenate the updated node embedding with taxonomyspecific features and use as input for next layer. Finally we obtain the ego graph representation using the finalized node embeddings via a weighted readout function for the 1hop neighbor nodes. Given as 1hop ego graph, as node ’s relative positions (parent, child or self) related to center node , as the weight for nodetype, then the concept representation for anchor node is:
3.3 Matching Module
Given the initial concept features of a query concept , we obtain the query concept representation by projecting to the hyperbolic space using the exponential mapping function (if ) or hyperbolic model transformation (if is in other hyperbolic models other than ) defined in § 2.1. Note that the hyperbolic spaces used to obtain the anchor and query concept representations need to be consistent.
After obtaining the hyperbolic embedding representation for each query concept and each anchor concept , and
are concatenated with hyperbolic operations, and then we feed the concatenated vector to an HNN. We construct hyperbolic multilayer perceptron (MLP) based on the operations defined in
Ganea2018HyperbolicNN, and a onelayer HNN is defined as:where and are learnable parameters. Since lies in a hyperbolic space, its update during training needs to be calibrated to remain in the proper manifold. is the elementwise nonlinearity, where and denotes multiplication and addition in hyperbolic space, respectively, under the curvature . Note that HNN is equivalent to a Euclidean MLP if is set to 0, i.e. the embedding space is not curved.
3.4 Learning and Inference
We train the HyperExpan framework with a metric learning paradigm by utilizing the existing taxonomies as the training resources.
Training Data Construction. The data pairs that are used to train the matching module is generated in a selfsupervised manner following Shen2020TaxoExpanST. We only consider exact parentchild node pairs on the seed taxonomy as the positive samples, i.e. there exists a direct edge . For each query node , we randomly sample other nodes (without its immediate children) on the seed taxonomy to form negative training instances . Anchoring at node , the positive and negative samples form a single group of training instances . We repeatedly apply this operation on each edge of the seed taxonomy to construct our training data .
Learning Objective. We adopt InfoNCE loss oord2018representation as the main training objective:
where and is the positive sample . The InfoNCE loss is essentially a cross entropy loss which identifies the positive pairs (items in the numerator) among all the possible candidates (items in the denominator).
Inference. During the inference time, for each new query concept (unseen from the seed taxonomy) , we compute the matching scores between the query concept and every candidate anchor nodes in the existing taxonomy . We then rank these anchor nodes by the matching score to create a ranked list of length for deciding where to attach such new concept.
4 Experiments
We evaluate the HyperExpan and its variants on four largescale realworld taxonomies utilized by Shen2020TaxoExpanST and zhang2021tmn.
4.1 Experimental Setup
Datasets. Following Shen2020TaxoExpanST; zhang2021tmn, we take WordNet 3.0 and 1000 domainspecific concepts defined in SemEval2016 Task 14 Benchmark dataset jurgens2016semeval, where only hypernymhyponym relations are considered. WordNet thereof is separated into the verb version WordNetVerb and the noun version WordNetNoun. We also use subgraphs of the FieldofStudy Taxonomy in Microsoft Academic Graph sinha2015overview containing descendants of “psychology” and “computer science” node and refer as MAGPSY and MAGCS.
Dataset 
# Nodes  # Edges  Depth 

WordNetVerb 
13,936  13,408  13 
WordNetNoun  83,073  76,812  20 
MAGPSY  23,187  30,041  6 
MAGCS  24,754  42,329  6 
More detailed statistics of each dataset are in Table 2. For each dataset, 1000 leaf nodes are randomly sampled as query nodes as the validation set, and another 1000 leaf nodes form the test set. Since these validation and testing nodes are all leaf nodes, only minimum changes are required to make the remaining taxonomy still a valid one without introducing nonexisted edges. For WordNetVerb and WordNetNoun, we generate the initial concept features following § 3.1. We assume each concept has only one name and we induce the concept name from the WordNet synset name. For MAGPSY/CS, we use 250d indomain concept name word embeddings provided by Shen2020TaxoExpanST trained using skipgram model on paper abstract corpus. Since we do not have access to the source profile information, we cannot obtain initial concept features as designed in § 3.1. As a result, we cannot run two RoBERTabase baseline models introduced in the following section on the MAGPSY/CS dataset.
Evaluation Metrics. We follow prior studies zhang2021tmn; Shen2020TaxoExpanST; manzoor2020expanding to report several widelyused ranking metrics: MeanRank (MR), Mean Reciprocal Rank (MRR),^{3}^{3}3We report the MRR numbers scaled by 10 following previous works to amplify the performance difference. Recall @ K and Precision @ K.
Baseline Models.
We compare HyperExpan with the following strong baseline models:

[leftmargin=*]

RoBERTabase Zeroshot: we use RoBERTabase as feature extractor to obtain initial embeddings as described in § 3.1 without finetuning

RoBERTabase FT: the above design but update the LM’s parameters

Hyperbolic MLP: we concatenate initial features of query and anchor concepts and feed into a twolayer hyperbolic MLP

GCN kipf2016semi: HyperExpan’s design but use Euclidean GCN to update node embeddings in ego graph of the anchor concept, use fastText to obtain initial concept features, and use Euclidean MLP as the matching module

GAT velivckovic2017graph: the above method but use GAT to update node embeddings.
We compare HyperExpan with the following stateoftheart taxonomy expansion frameworks:

[leftmargin=*]

TaxoExpan Shen2020TaxoExpanST uses GCN and GAT to get node embeddings of ego networks of anchor nodes and average all node embeddings to get anchor concept representation. But the ego network only includes direct children and parent of the anchor concept. They also inject relative positional embeddings to GNN.

ARBORIST manzoor2020expanding combines global and local taxonomic information to explicitly model heterogeneous and unobserved edge semantics.

TMN zhang2021tmn uses auxiliary scorers to capture various finegrained signals including query to hypernym and query to hyponym semantics and introduces a channelwise gating mechanism to retain taskspecific information.
4.2 Experimental Results
Model 
MR  MRR  Recall %  Precision %  MR  MRR  Recall %  Precision %  
@1  @5  @10  @1  @5  @10  @1  @5  @10  @1  @5  @10  
WordNetVerb (Candidates #: 11,936)  WordNetNoun (Candidates #: 81,073)  
ARBORIST  608.7  0.280  10.8  24.0  27.7  6.7  4.8  3.2  1095.1  0.435  16.5  28.4  34.1  16.8  5.8  3.5 
TaxoExpan  502.8  0.439  12.4  28.2  35.2  12.4  5.6  3.5  649.6  0.562  19.7  38.2  47.4  20.1  7.8  4.8 
TMN  465.0  0.479  14.9  31.6  37.9  13.2  6.4  4.0  501.0  0.595  20.7  40.5  50.1  21.1  8.3  5.1 
RoBERTabase 0shot  2132.8  0.172  4.3  10.1  12.6  4.3  2.0  1.3  25235.4  0.158  13.7  15.7  15.7  14.0  3.2  1.6 
RoBERTabase FT  1535.7  0.155  2.4  6.4  9.9  2.4  1.3  1.0  27748.2  0.148  5.9  13.7  13.7  6.0  2.8  1.4 
Hyperbolic MLP  617.4  0.419  10.5  25.6  33.7  10.5  5.1  3.4  869.6  0.502  18.1  33.6  41.7  18.5  6.9  4.3 
GCN  456.9  0.445  10.9  27.2  34.5  10.9  5.4  3.5  684.1  0.563  20.9  39.8  47.3  21.3  8.1  4.8 
GAT  471.7  0.449  11.6  28.7  35.6  11.6  5.7  3.6  640.7  0.585  22.3  40.9  49.7  22.7  8.3  5.1 
HyperExpan  400.8  0.517  15.0  32.8  42.7  15.0  6.6  4.3  573.6  0.607  23.9  42.1  52.5  24.4  8.6  5.4 

MAGPSY (Candidates #: 21,187)  MAGCS (Candidates #: 22,754)  
ARBORIST  119.9  0.722  21.0  48.4  62.9  25.8  12.5  7.7  284.7  0.602  15.1  38.9  49.4  24.6  12.6  8.0 
TaxoExpan  68.5  0.775  26.1  56.9  69.5  33.8  14.7  9.0  189.8  0.661  15.9  42.9  55.4  25.8  13.9  9.0 
TMN  73.0  0.781  25.8  58.7  70.5  33.4  15.2  9.1  160.5  0.667  16.0  43.1  56.3  26.0  14.0  9.1 
Hyperbolic MLP  74.1  0.739  21.8  51.4  64.9  28.2  13.3  8.4  101.4  0.650  13.7  38.0  53.4  22.3  12.4  8.7 
GCN  51.4  0.742  23.8  52.5  64.3  30.8  13.6  7.4  90.3  0.653  14.5  39.6  53.3  23.6  12.9  8.7 
GAT  48.6  0.751  23.6  52.4  65.8  30.5  13.5  8.5  92.2  0.676  15.9  41.9  56.0  25.9  13.6  9.1 
HyperExpan  38.4  0.827  28.8  63.0  75.3  37.2  16.3  9.7  74.4  0.689  16.1  44.6  58.0  26.1  14.5  9.4 

The overall experimental results are shown in Table 3
. We introduce our implementation details and hyperparameter settings in Appendix
B.Among ARBORIST, TaxoExpan and TMN, TMN achieves the strongest result consistently. Note that TMN is trained on taxonomy completion task and only perform inference on taxonomy expansion task. Anchor node representations are learned coupled with different potential children of the query concept which provides finegrained signals. TaxoExpan performs better than ARBORIST showing the importance of neighborhood information. Experiments using RoBERTabase on two WordNet datasets indicate that RoBERTa language model falls drastically behind in this contextindependent task. Since the profile sentences are very short and the task is more rely on commonsense rather than context understanding, language models cannot benefit from contextualized representation, which consolidates the observation by
liu2020towards. We can observe Hyperbolic MLP is worse than GNN models since it does not use neighborhood profile information. Hyperbolic MLP outperforms ARBORIST with a large margin on all datasets. The comparison between GCN and GAT indicates that attentive aggregation is more helpful with the sparse neighborhood representation. If we compare HyperExpan with GCN and GAT, we can observe that the expressiveness of the hyperbolic space leads to a large performance increase (9.5% and 6.9% recall@10 increase on MAGPSY and WordNetVerb and MRR scaled by 10 increase ranging from 0.013 to 0.076). Overall, HyperExpan consistently outperforms all models across four datasets except MR for WordNetNoun.#  Model  MRR  Rec  Prec  

@10  @1  w/o trainable curvature  416.7  0.490  14.4  31.7  40.8  14.4  6.3  4.1  
2  Poincaré i/o Lorentz model  0.494  39.8  13.0  fastText i/o Poincaré GloVe  479.1  0.494  15.2  32.5  41.0  15.2  6.5  4.1  
4  anchor + parent + children  0.506  42.2  15.0  #4 + anchor’s ancestors  446.7  0.505  15.5  33.6  42.5  15.5  6.7  4.3  
6  #5 + anchor’s descendants  0.517  42.7  15.0  #6 + anchor’s siblings  422.2  0.502  14.5  32.1  41.7  14.5  6.4  4.2  
8  w/o Relative Pos Emb  0.497  40.8  13.0  w/o Absolute Pos Emb  468.7  0.503  14.3  33.4  41.2  14.3  6.7  4.1  
10  w/o both Positional Emb  0.482  38.8  12.5  HyperExpan  400.8  0.517  15.0  32.8  42.7  15.0  6.6  4.3 
To further help understand the contribution of different incorporated techniques, we present a series of ablation study results in Table 4. Specifically, we have the following observations:
According to lines 13, the trainable curvature learns finegrained suitable manifold setting and lead to almost 2% recall@10 improvement (lines 13). Replacing the default Lorentz model with Poincaré model notably hinders the performance which can be explained by Lorentz model’s numerical stability since the distance function of the Poincaré model contains fraction Chami2019HyperbolicGC; Peng2021HyperbolicDN. We replace Poincaré GloVe initial word embedding with fastText in line 3 and the result shows that Poincaré GloVe contains better feature for our task.
We explore different choices of neighborhood aggregation in lines 47. We observe that 2hop neighborhood aggregation leads to improvement over 1hop in terms of recall@10 and precision@1 (line 5). Adding descendant of the anchor node supports with better characterization of nodes (line 6). However, we observe a noticeable drop when we further add sibling nodes into the aggregation neighborhood (line 7). The potential reason is that the sibling nodes are very diverse, and thus are not closely related to the anchor node.
In lines 8 to 10, we investigate the effect of positional embeddings. A larger performance drop is caused if we remove relative position embeddings (line 8), in comparison to a lesser drop when removing the absolute position embedding (line 9). We hypothesize that the absolute position embedding (depth information) is provided implicitly in the ego graph by edges among events. Line 10 shows that both embeddings are essential to boost the performance by almost 4% gain in recall@10.
5 Related Works
Our work is connected to two lines of research.
Taxonomy Expansion
Taxonomy expansion task fits in realworld application scenario that automatically attach new concepts or terms into a human curated seed taxonomy vedula2018enriching. Traditional methods leverage predefined patterns to extract hypernymhyponym pairs for taxonomy expansion nakashole2012patty; jiang2017metapad; agichtein2000snowball. Some works use external data and expand taxonomy in a specific domain. For example, toral2008named use Wikipedia named entities to expand WordNet, wang2014hierarchical use query logs to expand search engine category taxonomy. Some works expand a generic taxonomy without using external resources. For example, shwartz2016improving encode taxonomy traversal paths to seize on the dependency between concepts, Shen2020TaxoExpanST use a GNN model that handles this task, ARBORIST manzoor2020expanding produces concept representations using signals from both edge semantics and surface forms of concepts. STEAM yu2020steam
formulates the taxonomy expansion task as a minipathbased prediction task and introduces a cotraining process for semisupervised learning. Recently,
zhang2021tmn propose the taxonomy completion task in which the new concept can be inserted between existing concepts on taxonomy. zhang2021tmn also introduce the TMN model whose auxiliary scorers capture different finegrained signals. Comparing with these methods using Euclidean space, HyperExpan uses hyperbolic representation learning to provide feature space with low distortion especially for lowerlevel concepts on taxonomies.Hyperbolic Representation Learning
nickel2017poincare present an efficient algorithm to learn embeddings in a supervised manner based on Riemannian optimization and shows it performs well on link prediction task even with a smaller dimension. Ganea2018HyperbolicNN presents common neural network operations in hyperbolic space and Liu2019HyperbolicGN extends GNN operations to Riemannian manifolds with differentiable exponential and logarithmic maps. Most related to our work, Chami2019HyperbolicGC
derives Graph Convolutional Neural Network (GCN)’s operations in the Lorentz model of hyperbolic space. Hyperbolic representation learning is broadly applied to lexical representations
dhingra2018embedding; tifrea2018poincare; zhuetal2020hypertext, organizational chart induction chen2019embedding, hierarchical classification lopezstrube2020fully; chen2020hyperbolic, knowledge association sun2020knowledge, knowledge graph completion
wang2021mixed; balazevic2019multi and event prediction suris2021hyperfuture. A more comprehensive summarization is given in a recent survey by Peng2021HyperbolicDN.There are studies that leverage hyperbolic representation learning to perform taxonomy extraction from text, which are connected to this work. Such studies use Poincaré embeddings trained by hypernymy pairs extracted by lexicalsyntactic patterns hearst1992automatic to infer missing nodes le2019inferring and refine preexisting taxonomies alyetal2019every. The patterns suffer from missing and incorrect extractions, and are dedicated to capturing hypernymy relations between nouns. Hence, only terms that are recognizable by the designed patterns are able to be attached to the taxonomy. These works solely rely on graph structures of the taxonomy to obtain hyperbolic embeddings of known concepts, and cannot handle emerging, unseen concepts using their profile information. This is one of the problems that are addressed in this work.
6 Conclusion and Future Work
We present HyperExpan, a taxonomy expansion model which better preserves the taxonomical structure in an expressive hyperbolic space. We use an HGNN to incorporate neighborhood information and positional features of concepts, as well as profile features that are essential to jumpstart zeroshot concept representations. Experimental results on WordNet and Microsoft Academic Graph taxonomies show that HyperExpan performs better than its Euclidean counterparts and consistently outperforms stateoftheart taxonomy expansion models. In the future, we plan to extend HyperExpan for inducing dynamic taxonomies zhu2021learning and taxonomy alignment sun2020knowledge.
Acknowledgments
Many thanks to Liunian Harold Li, Fan Yin, IHung Hsu, Rujun Han and Shuowei Jin for contribution, discussion and internal reviews, to lab members at the PLUS lab and UCLANLP for suggestions, and to the anonymous reviewers for their feedback. This material is based on research supported by DARPA under agreement number FA87501920500. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.
Ethical Considerations
This work does not present any direct societal consequence. The proposed method aims at improving representation learning to support automated expansion of taxonomies. We believe this study leads to intellectual merits that benefit from automated knowledge acquisition for constructing knowledge representations with complex or sparse structures. It could also potentially lead to broad impacts, as the obtained taxonomical knowledge representations can support various knowledgedriven tasks. It is important to note that the precision of top taxonomy expansion predictions is still not high even with the stateoftheart method, so human validation is needed before the taxonomy generated by the automated method is used in realworld applications. All experiments are conducted on open datasets.
References
Appendix A Graph Convolutional Networks
Graph convolutional network (GCN) kipf2016semi is a widelyused variant of graph neural network. GCN defines one hop of graph message passing as a combination of the feature transformation and the neighborhood aggregation at a single layer . The input feature transformation is defined as:
where is a set of neighboring nodes of node , and are learnable weight and bias parameters for layer . The neighborhood aggregation is then defined as:
where denotes the scores for a weighted aggregation, i.e. how important node is for node , and
is a nonlinear activation function. By cascading multiple layers of GCN, the message can be propagated over several hops of neighborhoods. The node embeddings in the graph are being updated during the training process. Notice that the superscript
in the above equation denotes the curved space, i.e. , the aggregation is performed in a Euclidean space.Appendix B Implementation Details
All the models in this work are trained on a single Nvidia A100 GPU^{4}^{4}4https://www.nvidia.com/enus/datacenter/a100/ on a Ubuntu 20.04.2 operating system. The hyperparameters for each model are manually tuned against different datasets, and the checkpoints used to evaluate are selected by the best performing ones on the development set.
Our entire codebase is implemented in PyTorch.
^{5}^{5}5https://pytorch.org/ The implementations of the transformerbased models are extended from the huggingface^{6}^{6}6https://github.com/huggingface/transformers code base wolfetal2020transformers. The implementations of the models compared with, i.e. TMN, TaxoExpan and ARBORIST, are obtained and adapted from the original author released code repositories.b.1 Hyperparameters
We introduce the hyperparameters used throughout this work and the searching bounds for the manual hyperparameter tuning in Table 5.
We set burnin epoch number to 20 during which we use 1e5 learning rate, after the burnin epochs, the learning rate is 1e3 with ReduceLROnPlateau scheduler with 10 patience epochs. For each positive sample, we generate 31 negative samples. Dimension for anchor concept representation (output dimension of HGNN) is set to 100. We use two GNN layers by default. We use stochastic Riemannian Adam optimizer
geoopt2020kochurov; nickel2017poincare. For absolute and relative positional embedding, we use 50 dimensions by default. We use MRR of the validation set as the metric to monitor for an early stop.Type  Batch Size  Initial LR 

Bound (lower–upper)  8128  – 
Number of Trials  2–4  2–3 
Comments
There are no comments yet.