1 Introduction
Realworld entities and their interactions are often represented as graphs. Given two or more graphs created in different domains, graph matching aims to find a correspondence across different graphs. This task is important for many applications, , matching the protein networks from different species (Sharan & Ideker, 2006; Singh et al., 2008), linking accounts in different social networks (Zhang & Philip, 2015)
, and feature matching in computer vision
(Cordella et al., 2004; Zanfir & Sminchisescu, 2018). However, because it is NPhard, graph matching is challenging and often solved heuristically. Further complicating matters, the observed graphs may be noisy (
, containing unreliable edges), which leads to unsatisfying matching results using traditional methods.A problem related to graph matching is the learning of node embeddings, which aims to learn a latent vector for each graph node; the collection of embeddings approximates the topology of the graph, with similar/related nodes nearby in embedding space. Learning suitable node embeddings is beneficial for graph matching, as one may seek to align two or more graphs according to the metric structure associated with their node embeddings. Although graph matching and node embedding are highly related tasks, in practice they are often treated and solved independently. Existing node embedding methods (Perozzi et al., 2014; Tang et al., 2015; Grover & Leskovec, 2016) are designed for a single graph, and applying such methods separately to multiple graphs doesn’t share information across the graphs, and, hence, is less helpful for graph matching. Most graph matching methods rely purely on topological information (, adjacency matrices of graphs) and ignore the potential functionality of node embeddings (Kuchaiev et al., 2010; Neyshabur et al., 2013; Nassar et al., 2018). Although some methods consider first deriving embeddings for each graph and then learning a transformation between the embeddings, their results are often unsatisfying because their embeddings are predefined and the transformations are limited to orthogonal projections (Grave et al., 2018) or rigid/nonrigid deformations (Myronenko & Song, 2010).
This paper considers the joint goal of graph matching and learning node embeddings, seeking to achieve improvements in both tasks. As illustrated in Fig. 1, to address this goal we propose a novel GromovWasserstein learning framework. The dissimilarity between two graphs is measured by the GromovWasserstein discrepancy (GW discrepancy), which compares the distance matrices of different graphs in a relational manner, and learns an optimal transport between the nodes of different graphs. The learned optimal transport indicates the correspondence between the graphs. The embeddings of the nodes from different graphs are learned jointly: the distance between the embeddings within the same graph should approach the distance matrix derived from data, and the distance between the embeddings across different graphs should reflect the correspondence indicated by the learned optimal transport. As a result, the objectives of graph matching and node embedding are unified as minimizing the GromovWasserstein discrepancy (Peyré et al., 2016)
between two graphs, with structural regularizers. This framework leads to an optimization problem that is solved via an iterative process. In each iteration, the embeddings are used to estimate distance matrices when learning the optimal transport, and the learned optimal transport regularizes the learning of embeddings in the next iteration.
There are two important benefits to tackling graph matching and node embedding jointly. First, the observed graphs often contain spurious edges or miss some useful edges, leading to noisy adjacency matrices and unreliable graph matching results. Treating the distance between learned node embeddings as complementary information of observed edges, we can approximate the topology of graph more robustly, and accordingly, match noisy graphs. Second, as shown in Figure 1, our method regularizes the GW discrepancy and learns embeddings of different graphs on the same manifold,^{1}^{1}1The GW discrepancy is applicable to the embeddings on different manifolds even those with different dimensions. However, making the embeddings on the same manifold is helped by imposing the proposed regularizers, reducing the difficulty of matching. instead of learning an explicit transformation between the embeddings with predefined constraints. Therefore, the proposed method is more flexible and has lower risk of model misspecification (, imposing incorrect constraints on the transformation); the distance between the embeddings of different graphs can be calculated directly without any additional transformation. We test our method on realworld matching problems and analyze its performance, including its convergence, consistency and scalability. Experiments show that our method obtains encouraging matching results, with comparisons made to alternative approaches.
2 GromovWasserstein Learning Framework
Assume we have two sets of entities (nodes), denoted as source set and target set . Without loss of generality, we assume that . For each set, we observe a set of interactions between its entities, , , where or , and counts the appearances of the interaction . Accordingly, the data of these entities can be represented as two graphs, denoted and , and we focus on the following two tasks: ) Find a correspondence between the graphs. ) Obtain node embeddings of the two graphs, , and . As discussed above, these two tasks are unified in a framework based on GromovWasserstein discrepancy.
2.1 GromovWasserstein discrepancy
GromovWasserstein discrepancy was proposed in (Peyré et al., 2016), which is a natural extension of GromovWasserstein distance (Mémoli, 2011). Specifically, the definition of GromovWasserstein distance is as follows:
Definition 2.1.
Let and be two metric measure spaces, where is a compact metric space and
is a Borel probability measure on
(with defined in the same way). The GromovWasserstein distance iswhere
is the loss function and
is the set of all probability measures on with and as marginals.This defines an optimal transportlike distance (Villani, 2008) by comparing the metric spaces directly: it calculates distances between pairs of samples within each domain and measures how these distances compare to those in the other domain. In other words, it does not require one to directly compare the samples across different spaces and the target spaces can have different dimensions. When and are replaced with dissimilarity measurements rather than strict distance metrics, and the loss function is defined more flexibly, , meansquareerror (MSE) or KLdivergence, we relax the GromovWasserstein distance to the proposed GromovWasserstein discrepancy
. These relaxations make the proposed GromovWasserstein learning framework suitable for a wide range of machine learning tasks, including graph matching.
In graph matching, a metricmeasure space corresponds to the pair of a graph , where represents a distance/dissimilarity matrix derived according to the interaction set , , each is a function of . The empirical distribution of nodes is denoted , which counts the appearance of each node in . Given two graphs and , the GromovWasserstein discrepancy between and is defined as
(1) 
Here, . is an elementwise loss function, with typical choices the square loss and the KLdivergence . Accordingly, and each , and represents the inner product of matrices; is the optimal transport between the nodes of two graphs, and its element represents the probability that matches . By choosing the largest for each , we find the correspondence that minimizes the GW discrepancy between the two graphs.
However, such a graph matching strategy raises several issues. First, for each graph, its observed interaction set can be noisy, which leads to an unreliable distance matrix. Minimizing the GW discrepancy based on such distance matrices has a negative influence on matching results. Second, the GromovWasserstein discrepancy compares different graphs relationally based on their edges (, the distance between a pair of nodes within each graph), while most existing graph matching methods consider the information of nodes and edges jointly (Neyshabur et al., 2013; Vijayan et al., 2015; Sun et al., 2015). Therefore, to make a successful graph matching method, we further consider the learning of node embeddings and derive the proposed GromovWasserstein learning framework.
2.2 Proposed model
We propose to not only learn the optimal transport indicating the correspondence between graphs but also simultaneously learn the node embeddings for each graph, which leads to a regularized GromovWasserstein discrepancy. The corresponding optimization problem is
(2) 
The first term in (2) corresponds to the GW discrepancy defined in (1), which measures the relational dissimilarity between the two graphs. The difference here is that the proposed distance matrices consider both the information of observed data and that of embeddings:
(3) 
Here is a distance matrix, with element that is a function measuring the distance between node embeddings;
is a hyperparameter controlling the contribution of embeddingbased distance to the proposed distance.
The second term in (2) represents the Wasserstein discrepancy between the nodes of the two graphs. Similar to the first term, the distance matrix is also derived based on the node embeddings, , , and its contribution is controlled by the same hyperparameter . This term measures the absolute dissimilarity between the two graphs, which connects the target optimal transport with node embeddings. By adding this term, the optimal transport minimizes both the GromovWasserstein discrepancy based directly on observed data and the Wasserstein discrepancy based on the embeddings (which are indirectly also a function of the data). Furthermore, the embeddings of different graphs can be learned jointly under the guidance of the optimal transport — the distance between the embeddings of different graphs should be consistent with the relationship indicated by the optimal transport.
Because the target optimal transport is often sparse, purely considering its guidance leads to overfitting or trivial solutions when learning embeddings. To mitigate this problem, the third term in (2) represents a regularization of the embeddings, based on the prior information provided by and . In this work, we require the embeddingbased distance matrices to be close to the observed ones:
(4) 
where the definition of loss function is the same as that used in (1). Note that if we observe partial correspondences between different graphs, , , we can calculate a distance matrix for the nodes of different graphs, denoted as , and require the distance between the embeddings to match with , as shown in the optional term of (4). This term is available only when is given.
The proposed method unifies (optimal transportbased) graph matching and node embedding in the same framework, and makes them beneficial to each other. For the original GW discrepancy term, introducing the embeddingbased distance matrices can suppress the noise in the datadriven distance matrices, improving robustness. Additionally, based on node embeddings, we can calculate the Wasserstein discrepancy between graphs, which further regularizes the target optimal transport directly. When learning node embeddings, the Wasserstein discrepancy term works as the regularizer of node embeddings — the values of the learned optimal transport indicate which pairs of nodes should be close to each other.
3 Learning Algorithm
3.1 Learning optimal transport
Although (2) is a complicated nonconvex optimization problem, we can solve it effectively by alternatively learning the optimal transport and the embeddings. In particular, the proposed method applies nested iterative optimization. In the th outer iteration, given current embeddings and , we solve the following subproblem:
(5) 
This subproblem is still nonconvex because of the quadratic term . We solve it iteratively with the help of a proximal point method. Inspired by the method in (Xie et al., 2018), in the th inner iteration we update the target optimal transport via
(6) 
Here, a proximal term based on KullbackLeibler (KL) divergence, , is added as a regularizer. We use projected gradient descent to solve (6), in which both the gradient and the projection are based on the KL metric. When the learning rate is set as , the projected gradient descent is equivalent to solving the following optimal transport problem with an entropy regularizer (Benamou et al., 2015; Peyré et al., 2016):
(7) 
where , and . This problem can be solved via the SinkhornKnopp algorithm (Sinkhorn & Knopp, 1967; Cuturi, 2013) with linear convergence.
In summary, we decompose (5) into a series of updating steps. Each updating step (6) can be solved via projected gradient descent, which is a solution to a regularized optimal transport problem (7). Essentially, the proposed method can be viewed as a special case of successive upperbound minimization (SUM) (Razaviyayn et al., 2013), whose global convergence is guaranteed:
Proposition 3.1.
Every limit point generated by our proximal point method, , , is a stationary point of the problem (5).
Note that besides our proximal point method, another method for solving (5) involves replacing the KLdivergence in (6) with an entropy regularizer and minimizing an entropic GW discrepancy via iterative Sinkhorn projection (Peyré et al., 2016). However, its performance (, its convergence and numerical stability) is more sensitive to the choice of the hyperparameter . The details of our proximal point method, the proof of Proposition 3.1, and its comparison with the Sinkhorn method (Peyré et al., 2016) are shown in the Supplementary Material.
Parameter controls the influence of node embeddings on the GW discrepancy and the Wasserstein discrepancy. When training the proposed model from scratch, the embeddings and are initialized randomly and thus are unreliable in the beginning. Therefore, we initialize with a small value and increase it with respect to the number of outer iterations. We apply a simple linear strategy to adjust : with the maximum number of outer iterations set as , in the th iteration, we set .
3.2 Updating embeddings
Given the optimal transport, , we update the embeddings by solving the following optimization problem:
(8) 
This problem can be solved effectively by (stochastic) gradient descent. In summary, the proposed learning algorithm is shown in Algorithm
1.3.3 Implementation details and analysis
Distance matrix The distance matrix plays an important role in our GromovWasserstein learning framework. For a graph, the datadriven distance matrix should reflect its structure. Based on the fact that the counts of interactions in many realworld graphs is characterized by Zipf’s law (Powers, 1998), we treat the counts as the weights of edges and define the element of the datadriven distance matrix as
(9) 
This definition assigns a short distance to pairs of nodes with many interactions. Additionally, we hope that the embeddingbased distance matrix can fit the datadriven distance matrix easily. In the following experiments, we test two kinds of embeddingbased distance: 1) Cosinebased distance:
. 2) Radial basis function (RBF)based distance:
. When applying the cosinebased distance, we choose such that the maximum approaches to . When applying the RBFbased distance, we choose . The following experiments show that these two distances work well in various matching tasks.Complexity and Scalability When learning optimal transport, one of the most timeconsuming steps is computing the loss matrix
, which involves a tensormatrix multiplication. Fortunately, as shown in
(Peyré et al., 2016) when the loss function can be written as for functions , which is satisfied by our MSE/KL loss, the loss matrix can be calculated as . Because tends to be sparse quickly during the learning process, the computational complexity of is , where . For dimensional node embeddings, the complexity of the embeddingbased distance matrix is . Additionally, we can apply the inexact proximal point method (Xie et al., 2018; Chen et al., 2018a), running onestep SinkhornKnopp projection in each inner iteration. Therefore, the complexity of learning optimal transport is . When learning node embeddings, we can apply stochastic gradient descent to solve (8). In our experiments, we select the size of the node batch as and the objective function of (8) converges quickly after a few epochs. Therefore, the computational complexity of the embeddingbased distance submatrix is just
, which may be ignored compared to that of learning optimal transport. In summary, the overall complexity of our method is , and both the learning of optimal transport and that of node embeddings can be done in parallel on GPUs.According to the above analysis, we find that the proposed method has lower complexity than many existing graph matching methods. For example, the GRAAL and its variants (MalodDognin & Pržulj, 2015) have complexity, which is much slower than the proposed method. Additionally, the complexity of our method is independent of the number of edges (denoted as ). Compared to other wellknown alternatives, , NETAL with , our method has at least comparable complexity for dense graphs ().
4 Related Work
GromovWasserstein learning GromovWasserstein discrepancy extends optimal transport (Villani, 2008) to the case when the target domains are not registered well. It can also be viewed as a relaxation of GromovHausdorff distance (Mémoli, 2008; Bronstein et al., 2010) when pairwise distance between entities is defined. The GW discrepancy is suitable for solving matching problems like shape and object matching (Mémoli, 2009, 2011). Besides graphics and computer vision, recently its potential for other applications has been investigated, , matching vocabulary sets between different languages (AlvarezMelis & Jaakkola, 2018) and matching weighted directed networks (Chowdhury & Mémoli, 2018). The work in (Peyré et al., 2016) considers the GromovWasserstein barycenter and proposes a fast Sinkhorn projectionbased algorithm to compute GW discrepancy (Cuturi, 2013). Similar to our method, the work in (Vayer et al., 2018) proposes a fused GromovWasserstein distance, combining GW discrepancy with Wasserstein discrepancy. However, it does not consider the learning of embeddings and requires the distance between the entities in different domains to be known, which is inapplicable to matching problems. In (Bunne et al., 2018), an adversarial learning method is proposed to learn a pair of generative models for incomparable spaces, which uses GW discrepancy as the objective function. This method imposes an orthogonal assumption on the transformation between the sample and its embedding; it is designed for fuzzy matching between distributions, rather than the graph matching task that requires pointtopoint correspondence.
Graph matching Graph matching has been studied extensively, with a wide range of applications. Focusing on proteinprotein interaction (PPI) networks, many methods have been proposed, including methods based on local neighborhood information like GRAAL (Kuchaiev et al., 2010), and its variants MIGRAAL (Kuchaiev & Pržulj, 2011) and LGRAAL (MalodDognin & Pržulj, 2015); as well as methods based on global structural information, like IsoRank (Singh et al., 2008), MAGNA++ (Vijayan et al., 2015), NETAL (Neyshabur et al., 2013), HubAlign (Hashemifar & Xu, 2014) and WAVE (Sun et al., 2015). Among these methods, MAGNA++ and WAVE consider both edge and node information. Besides bioinformatics, network alignment techniques are also applied to computer vision (Jun et al., 2017; Yu et al., 2018), document analysis (Bayati et al., 2009) and social network analysis (Zhang & Philip, 2015). For small graphs, , the graph of feature points in computer vision, graph matching is often solved as a quadratic assignment problem (Yan et al., 2015). For large graphs, , social networks and PPI networks, existing methods either depend on a heuristic searching strategy or leverage domain knowledge for specific cases. None of these methods consider graph matching and node embedding jointly from the viewpoint of GromovWasserstein discrepancy.
Node embedding Node embedding techniques have been widely used to represent and analyze graph/network structures. The representative methods include LINE (Tang et al., 2015), Deepwalk (Perozzi et al., 2014), and node2vec (Grover & Leskovec, 2016). Most of these embedding methods first generate sequential observations of nodes through a randomwalk procedure, and then learn the embeddings by maximizing the coherency between each observation and its context (Mikolov et al., 2013). The distance between the learned embeddings can reflect the topological structure of the graph. More recently, many new embedding methods have been proposed, , the anonymous walk embedding in (Ivanov & Burnaev, 2018) and the mixed membership word embedding (Foulds, 2018), which help to improve the representations of complicated graphs and their nodes. However, none of these methods consider jointly learning embeddings for multiple graphs.
5 Experiments
We apply the GromovWasserstein learning (GWL) method to both synthetic and realworld matching tasks, and compare it with stateoftheart methods. In our experiments, we set hyperparameters as follows: the number of outer iterations is , the number of inner iteration is , , , and is the MSE loss. When solving (8), we use Adam (Kingma & Ba, 2014) with learning rate and set the number of epochs to , and the size of batches as . The proposed method based on cosine and RBF distances are denoted GWLC and GWLR, respectively. Additionally, to highlight the benefit from joint graph matching and nodeembedding learning, we consider a baseline that purely minimizes GW discrepancy based on datadriven distance matrices (denoted as GWD).
5.1 Synthetic data
We verify the feasibility of our GWL method by first considering a synthetic dataset. We simulate the source graph as follows: for each , we select nodes randomly from , denoted as . For each selected edge , there are interactions between these two nodes. Accordingly, is the union of all simulated . The target graph is constructed by first adding noisy nodes to the source graph, , , and then generating noisy edges between the nodes in via the simulation method mentioned above, , .
We set and . Under a certain configuration, we simulate the source graph and the target one in trials. For each trial, we apply our method (and its baseline GWD) to match the graphs and calculate node correctness as our measurement: Given the learned correspondence set and the ground truth set of correspondences , we calculate percent node correctness as . To analyze the rationality of the learned node embeddings, we construct in two ways: for each , we find its matched node via () (as shown in line 13 of Algorithm 1) or ()
. Additionally, the corresponding GW discrepancy is calculated as well. Assuming that the results in different trials are Gaussian distributed, we calculate the
confidence interval for each measurement.Figure 2 visualizes the performance^{2}^{2}2The performance of GWLR is almost the same with that of GWLC, so here we just show the results of GWLC. of our GWLC method and its baseline GWD, which demonstrates the feasibility of our method. In particular, when the target graph is identical to the source one (, ), the proposed GromovWasserstein learning framework can achieve almost node correctness, and the GW discrepancy approaches zero. With the increase of , the noise in the target graph becomes serious, and the GW discrepancy increases accordingly. It means that the GW discrepancy reflects the dissimilarity between the graphs indeed. Although the GWD is comparable to our GWLC in the case with low noise level, it becomes much worse when . This phenomenon supports our claim that learning node embeddings can improve the robustness of graph matching. Moreover, we find that the node correctness based on the optimal transport (blue curves) and that based on the embeddings (orange curves) are almost the same. This demonstrates that the embeddings of different graphs are on the same manifold, and their distances indicate the correspondences between graphs.
In the above experiments, with synthetic data, we demonstrate the feasibility of our method and localize the advantages of jointly performing graph matching and nodeembedding learning. In the below experiments, we make comparisons with many of the stateoftheart methods on real data.
5.2 MC3: Matching communication networks
MC3 is a dataset used in the MiniChallenge 3 of VAST Challenge 2018, which records the communication behavior among a company’s employees on different networks.^{3}^{3}3http://vacommunity.org/VAST+Challenge+2018+MC3 The communications are categorized into two types: phone calls and emails between employees. According to the types of the communications, we obtain two networks, denoted as CallNet and EmailNet. Because an employee has two independent accounts in these two networks, we aim to link the accounts belonging to the same employee. We test our method on a subset of the MC3 dataset, which contains employees and their communications through phone calls and emails. In this subset, for each selected employee there is at least one employee in a network (either CallNet or EmailNet) having over times communications with him/her, which ensures that each node has at least one reliable edge. Additionally, for each network, we can control the density of its edge by thresholding the count of interactions. When we only keep the edges corresponding to the communications happening more than times, we obtain two sparse graphs: the CallNet contains edges and the EmailNet contains edges. When we keep all the communications and the corresponding edges, we obtain two dense graphs, the CallNet contains edges and the EmailNet contains edges. Generally, experience indicates that matching dense graphs is much more difficult than matching sparse ones.
We compare our methods (GWLR and GWLC) with wellknown graph matching methods: the graduated assignment algorithm (GAA) (Gold & Rangarajan, 1996), the lowrank spectral alignment (LRSA) (Nassar et al., 2018), TAME (Mohammadi et al., 2017), GRAAL^{4}^{4}4http://www0.cs.ucl.ac.uk/staff/natasa/GRAAL., MIGRAAL^{5}^{5}5http://www0.cs.ucl.ac.uk/staff/natasa/MIGRAAL., MAGNA++^{6}^{6}6https://www3.nd.edu/~cone/MAGNA++., HugAlign and NETAL.^{7}^{7}7Both HugAlign and NETAL can be downloaded from the website http://ttic.uchicago.edu/~hashemifar. These alternatives achieve the stateoftheart performance on matching largescale graphs, , protein networks. Table 1 lists the matching results obtained by the different methods.^{8}^{8}8For GWD, GWLR and GWLC, here we show the node correctness calculated based on the learned optimal transport. For the alternative methods, their performance on sparse and dense graphs is inconsistent. For example, GRAAL works almost as well as our GWLR and GWLC for sparse graphs, but its matching result becomes much worse for dense graphs. For the baseline GWD, it is inferior to most graphmatching methods on node correctness, because it purely minimizes the GW discrepancy based on the information of pairwise interactions (, edges). Additionally, GWD merely relies on datadriven distance matrices, which is sensitive to the noise in the graphs. However, when we take node embeddings (with dimension ) into account, the proposed GWLR and GWLC outperform GWD and other considered approaches consistently, on both sparse and dense graphs.
Method  CallEmail (Sparse)  CallEmail (Dense) 

NC (%)  NC (%)  
GAA  34.22  0.53 
LRSA  38.20  2.93 
TAME  37.39  2.67 
GRAAL  39.67  0.48 
MIGRAAL  35.53  0.64 
MAGNA++  7.88  0.09 
HugAlign  36.21  3.86 
NETAL  36.87  1.77 
GWD  23.160.46  1.770.22 
GWLR  39.640.57  3.800.23 
GWLC  40.450.53  4.230.27 
To demonstrate the convergence and the stability of our method, we run GWD, GWLR and GWLC in trails with different initialization. For each method, its node correctness is calculated based on optimal transport and the embeddingbased distance matrix. The confidence interval of the node correctness is estimated as well, as shown in Table 1. We find that the proposed method has good stability and outperforms other methods with high confidence. Figure 3(a) visualizes the GW discrepancy and the node correctness with respect to the number of outer iterations; the confidence intervals are shown as well. In Figure 3(a), we find that the GW discrepancy decreases and the two kinds of node correctness increase accordingly and become consistent with the increase of iterations, which means that the embeddings we learn and their distances indeed reflect the correspondence between the two graphs. Figure 3(b) visualizes the learned embeddings with the help of tSNE (Maaten & Hinton, 2008). We find that the learned node embeddings of different graphs are on the same manifold and the overlapped embeddings indicate matched pairs.
5.3 MIMICIII: Procedure recommendation
Besides typical graph matching, our method has potential for other applications, like recommendation systems. Such systems recommend items to users according to the distance/similarity between their embeddings. Traditional methods (Rendle et al., 2009; Chen et al., 2018b) learn the embeddings of users and items purely based on their interactions. Recent work (Monti et al., 2017; Ying et al., 2018) shows that considering the user network and/or item network is beneficial to improve recommendation results. Such a strategy is also applicable to our GromovWasserstein learning framework: given the network of users, the network of items, and the observed interactions between them (, partial correspondences between the graphs), we learn the embeddings of users and items and the optimal transport between them via minimizing the GW discrepancy between the networks. Because the learned embeddings are on the same manifold, we can calculate the distance between a user and an item directly via the cosinebased distance or the RBFbased distance. Accordingly, we recommend each user with the items with shortest distances. For our method, the only difference between the recommendation task and previous graph matching task is that we observed some interactions and take the optional regularizer in (4) into account. The in (4) is calculated via (3).
We test the feasibility of our method on the MIMICIII dataset (Johnson et al., 2016), which contains patient admissions in a hospital. Each admission is represented as a sequence of ICD (International Classification of Diseases) codes of the diseases and the procedures. The diseases (procedures) appearing in the same admission construct the interactions of the disease (procedure) graph. We aim to recommend suitable procedures for patients, according to their disease characteristics. To achieve this, we learn the embeddings of the ICD codes for the diseases and the procedures with the help of various methods, and measure the distance between the embeddings. We compare the proposed GromovWasserstein learning method with the following baselines: ) treating the admission sequences as sentences and learning the embeddings of ICD codes via traditional word embedding methods like Word2Vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014); ) the distilled Wasserstein learning (DWL) method in (Xu et al., 2018), which trains the embeddings from scratch or finetunes Word2Vec’s embeddings based on a Wasserstein topic model; and ) the GWD method that minimizes the GW discrepancy purely based on the datadriven distance matrices, and then learns the embeddings regularized by the learned optimal transport. The GWD method is equivalent to applying our GWL method and setting the number of outer iterations . For the GWD method, we also consider the cosine and RBFbased distances when learning embeddings (, GWDC and GWDC).
For fairness of comparison, we use a subset of the MIMICIII dataset provided by (Xu et al., 2018), which contains patient admissions, corresponding to diseases and procedures. For all the methods, we use of the admissions for training, for validation, and the remaining for testing. In the testing phase, for the th admission, , we may recommend a list of procedures with length , denoted as , based on its diseases and evaluate recommendation results based on the ground truth list of procedures, denoted as . Given , we calculate the top precision, recall and F1score as follows: , , . Table 2 shows the results of various methods with and . We find that our GWL method outperforms the alternatives, especially on the top1 measurements.
We analyze the learned optimal transport between diseases and procedures from a clinical viewpoint. We highlight some matched diseaseprocedure pairs, , “(d41401^{9}^{9}9The descriptions of the ICD codes are listed in the Supplementary Material.) Coronary atherosclerosis of native coronary artery (p3961) Extracorporeal circulation auxiliary to open heart surgery”, “(dV053) Need for prophylactic vaccination and inoculation against viral hepatitis (p9955) Prophylactic administration of vaccine against other diseases”, and “(dV3001) Single liveborn, born in hospital, delivered by cesarean section (p640) Circumcision”, etc. We asked two clinical researchers to check the pairs corresponding to ; they confirmed that for over of the pairs, either the procedures are clearly related to the treatments of the diseases, or the procedures clearly lead to the diseases as side effects or complications (other relationships may be less clear, but are implied by the data). The learned optimal transport, and all pairs of ICD codes and their evaluation results are shown in the Supplementary Material.
Method  Top1 (%)  Top5 (%)  

P  R  F1  P  R  F1  
Word2Vec  39.95  13.27  18.25  28.89  46.98  32.59 
GloVe  32.66  13.01  17.22  27.93  44.79  31.47 
DWL (Scratch)  37.89  12.42  17.16  27.39  43.81  30.81 
DWL (Finetune)  40.00  13.76  18.71  30.59  48.56  34.28 
GWDR  46.29  17.01  22.32  31.82  43.81  33.77 
GWDC  43.16  15.79  20.77  31.42  42.99  33.25 
GWLR  46.20  16.93  22.22  32.03  44.75  34.18 
GWLC  47.46  17.25  22.71  32.09  45.64  34.31 
6 Conclusions and Future Work
We have proposed a GromovWasserstein learning method to unify graph matching and the learning of node embeddings into a single framework. We show that such joint learning is beneficial to each of the objectives, obtaining superior performance in various matching tasks. In the future, we plan to extend our method to multigraph matching tasks, which may be related to GromovWasserstein barycenter (Peyré et al., 2016) and its learning method. Additionally, to improve the scalability of our method, we will explore new GromovWasserstein learning algorithms.
References
 Altschuler et al. (2017) Altschuler, J., Weed, J., and Rigollet, P. Nearlinear time approximation algorithms for optimal transport via Sinkhorn iteration. arXiv preprint arXiv:1705.09634, 2017.
 AlvarezMelis & Jaakkola (2018) AlvarezMelis, D. and Jaakkola, T. GromovWasserstein alignment of word embedding spaces. In EMNLP, pp. 1881–1890, 2018.
 Bayati et al. (2009) Bayati, M., Gerritsen, M., Gleich, D. F., Saberi, A., and Wang, Y. Algorithms for large, sparse network alignment problems. In ICDM, pp. 705–710, 2009.
 Benamou et al. (2015) Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., and Peyré, G. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
 Bronstein et al. (2010) Bronstein, A. M., Bronstein, M. M., Kimmel, R., Mahmoudi, M., and Sapiro, G. A GromovHausdorff framework with diffusion geometry for topologicallyrobust nonrigid shape matching. International Journal of Computer Vision, 89(23):266–286, 2010.
 Bunne et al. (2018) Bunne, C., AlvarezMelis, D., Krause, A., and Jegelka, S. Learning generative models across incomparable spaces. NeurIPS Workshop on Relational Representation Learning, 2018.

Chen et al. (2018a)
Chen, L., Dai, S., Tao, C., Zhang, H., Gan, Z., Shen, D., Zhang, Y., Wang, G.,
Zhang, R., and Carin, L.
Adversarial text generation via featuremover’s distance.
In NIPS, pp. 4671–4682, 2018a.  Chen et al. (2018b) Chen, X., Xu, H., Zhang, Y., Tang, J., Cao, Y., Qin, Z., and Zha, H. Sequential recommendation with user memory networks. In WSDM, pp. 108–116, 2018b.
 Chowdhury & Mémoli (2018) Chowdhury, S. and Mémoli, F. The GromovWasserstein distance between networks and stable network invariants. arXiv preprint arXiv:1808.04337, 2018.
 Cordella et al. (2004) Cordella, L. P., Foggia, P., Sansone, C., and Vento, M. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10):1367–1372, 2004.
 Cuturi (2013) Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS, pp. 2292–2300, 2013.
 Foulds (2018) Foulds, J. Mixed membership word embeddings for computational social science. In AISTATS, pp. 86–95, 2018.
 Gold & Rangarajan (1996) Gold, S. and Rangarajan, A. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):377–388, 1996.
 Grave et al. (2018) Grave, E., Joulin, A., and Berthet, Q. Unsupervised alignment of embeddings with Wasserstein Procrustes. arXiv preprint arXiv:1805.11222, 2018.
 Grover & Leskovec (2016) Grover, A. and Leskovec, J. node2vec: Scalable feature learning for networks. In KDD, pp. 855–864, 2016.
 Hashemifar & Xu (2014) Hashemifar, S. and Xu, J. Hubalign: An accurate and efficient method for global alignment of protein–protein interaction networks. Bioinformatics, 30(17):i438–i444, 2014.
 Ivanov & Burnaev (2018) Ivanov, S. and Burnaev, E. Anonymous walk embeddings. In ICML, 2018.
 Johnson et al. (2016) Johnson, A. E., Pollard, T. J., Shen, L., Liwei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., and Mark, R. G. MIMICIII, a freely accessible critical care database. Scientific data, 3:160035, 2016.
 Jun et al. (2017) Jun, S.H., Wong, S. W., Zidek, J., and BouchardCôté, A. Sequential graph matching with sequential monte carlo. In AISTATS, pp. 1075–1084, 2017.
 Kingma & Ba (2014) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Kuchaiev & Pržulj (2011) Kuchaiev, O. and Pržulj, N. Integrative network alignment reveals large regions of global network similarity in yeast and human. Bioinformatics, 27(10):1390–1396, 2011.
 Kuchaiev et al. (2010) Kuchaiev, O., Milenković, T., Memišević, V., Hayes, W., and Pržulj, N. Topological network alignment uncovers biological function and phylogeny. Journal of the Royal Society Interface, pp. rsif20100063, 2010.
 Maaten & Hinton (2008) Maaten, L. v. d. and Hinton, G. Visualizing data using tSNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
 MalodDognin & Pržulj (2015) MalodDognin, N. and Pržulj, N. LGRAAL: Lagrangian graphletbased network aligner. Bioinformatics, 31(13):2182–2189, 2015.
 Mémoli (2008) Mémoli, F. GromovHausdorff distances in Euclidean spaces. In CVPR Workshops, pp. 1–8, 2008.
 Mémoli (2009) Mémoli, F. Spectral GromovWasserstein distances for shape matching. In ICCV Workshops, pp. 256–263, 2009.
 Mémoli (2011) Mémoli, F. GromovWasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11(4):417–487, 2011.
 Mikolov et al. (2013) Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
 Mohammadi et al. (2017) Mohammadi, S., Gleich, D. F., Kolda, T. G., and Grama, A. Triangular alignment TAME: A tensorbased approach for higherorder network alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 14(6):1446–1458, 2017.

Monti et al. (2017)
Monti, F., Bronstein, M., and Bresson, X.
Geometric matrix completion with recurrent multigraph neural networks.
In NIPS, pp. 3697–3707, 2017.  Myronenko & Song (2010) Myronenko, A. and Song, X. Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275, 2010.
 Nassar et al. (2018) Nassar, H., Veldt, N., Mohammadi, S., Grama, A., and Gleich, D. F. Low rank spectral network alignment. In WWW, pp. 619–628, 2018.
 Neyshabur et al. (2013) Neyshabur, B., Khadem, A., Hashemifar, S., and Arab, S. S. NETAL: A new graphbased method for global alignment of protein–protein interaction networks. Bioinformatics, 29(13):1654–1662, 2013.
 Pennington et al. (2014) Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. In EMNLP, pp. 1532–1543, 2014.
 Perozzi et al. (2014) Perozzi, B., AlRfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In KDD, pp. 701–710, 2014.
 Peyré et al. (2016) Peyré, G., Cuturi, M., and Solomon, J. GromovWasserstein averaging of kernel and distance matrices. In ICML, pp. 2664–2672, 2016.
 Powers (1998) Powers, D. M. Applications and explanations of zipf’s law. In Proceedings of the joint conferences on new methods in language processing and computational natural language learning, pp. 151–160, 1998.
 Razaviyayn et al. (2013) Razaviyayn, M., Hong, M., and Luo, Z.Q. A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23(2):1126–1153, 2013.
 Rendle et al. (2009) Rendle, S., Freudenthaler, C., Gantner, Z., and SchmidtThieme, L. BPR: Bayesian personalized ranking from implicit feedback. In UAI, pp. 452–461, 2009.
 Sharan & Ideker (2006) Sharan, R. and Ideker, T. Modeling cellular machinery through biological network comparison. Nature biotechnology, 24(4):427, 2006.
 Singh et al. (2008) Singh, R., Xu, J., and Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 2008.
 Sinkhorn & Knopp (1967) Sinkhorn, R. and Knopp, P. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.
 Sun et al. (2015) Sun, Y., Crawford, J., Tang, J., and Milenković, T. Simultaneous optimization of both node and edge conservation in network alignment via WAVE. In International Workshop on Algorithms in Bioinformatics, pp. 16–39, 2015.
 Tang et al. (2015) Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. Line: Largescale information network embedding. In WWW, pp. 1067–1077, 2015.
 Vayer et al. (2018) Vayer, T., Chapel, L., Flamary, R., Tavenard, R., and Courty, N. Fused GromovWasserstein distance for structured objects: theoretical foundations and mathematical properties. arXiv preprint arXiv:1811.02834, 2018.
 Vijayan et al. (2015) Vijayan, V., Saraph, V., and Milenković, T. MAGNA++: Maximizing accuracy in global network alignment via both node and edge conservation. Bioinformatics, 31(14):2409–2411, 2015.
 Villani (2008) Villani, C. Optimal transport: Old and new, volume 338. Springer Science & Business Media, 2008.
 Xie et al. (2018) Xie, Y., Wang, X., Wang, R., and Zha, H. A fast proximal point method for Wasserstein distance. arXiv preprint arXiv:1802.04307, 2018.
 Xu et al. (2018) Xu, H., Wang, W., Liu, W., and Carin, L. Distilled Wasserstein learning for word embedding and topic modeling. In NIPS, pp. 1723–1732, 2018.
 Yan et al. (2015) Yan, J., Xu, H., Zha, H., Yang, X., Liu, H., and Chu, S. A matrix decomposition perspective to multiple graph matching. In ICCV, pp. 199–207, 2015.
 Ying et al. (2018) Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph convolutional neural networks for webscale recommender systems. arXiv preprint arXiv:1806.01973, 2018.
 Yu et al. (2018) Yu, T., Yan, J., Wang, Y., Liu, W., et al. Generalizing graph matching beyond quadratic assignment model. In NIPS, pp. 861–871, 2018.
 Zanfir & Sminchisescu (2018) Zanfir, A. and Sminchisescu, C. Deep learning of graph matching. In CVPR, pp. 2684–2693, 2018.
 Zhang & Philip (2015) Zhang, J. and Philip, S. Y. Multiple anonymized social networks alignment. In ICDM, pp. 599–608, 2015.
7 Supplementary Material
7.1 The scheme of proposed proximal point method
In the th ouer iteration, we learn the optimal transport iteratively. Particularly, in the th inner iteration, we update the target optimal transport via solving (6) based on the SinkhornKnopp algorithm (Sinkhorn & Knopp, 1967; Cuturi, 2013). Algorithm 2 gives the details of our proximal point method in the th outer iteration, where converts a vector to a diagonal matrix, and and represent elementwise multiplication and division, respectively.
7.2 The convergence of each updating step
The proposed proximal point method decomposes a nonconvex optimization problem into a series of convex updating steps. Each updating step corresponds to the solution to a regularized optimal transport problem, which is solved via Sinkhorn projections. The work in (Altschuler et al., 2017) proves that solving the regularized optimal transport based on Sinkhorn projections is with linear convergence. The work in (Xie et al., 2018) further proves that the linear convergence holds even just applying onestep Sinkhorn projection in each updating step (, ). Therefore, the updating steps of the proposed method are with linear convergence.
7.3 Global convergence: The proof of Proposition 3.1
Proposition 3.1 Every limit point generated by our proximal point method, , , is a stationary point of the problem (5).
Proof.
When learning the target optimal transport, the original optimization problem (5) is with a nonconvex and differentiable objective function
(10) 
and a closed convex set as the constraint of . As a special case of successive upperbound minimization (SUM), our proximal point method solves (5) via optimizing a sequence of approximate objective functions: starting from a feasible point , the algorithm generates a sequence according to the update rule:
(11) 
where
(12) 
is an approximation of at the th iteration, and is the point generated in the previous iteration.
Obviously, we have

is continuous in .
Additionally, because and the equality holds only when , we have

.

.
According to the Proposition 1 in (Razaviyayn et al., 2013), when the conditions C2 and C3 are satisfied, for the differentiable function and its global upper bound , we have

with , where
is the directional derivative of along the direction , and is the directional derivative only with respect to .
According to the Theorem 1 in (Razaviyayn et al., 2013), when the approximate objective function in each iteration satisfies C1C4, every limit point generated by the proposed method, , is a stationary point of the original problem (5). ∎
7.4 Connections and comparisons with existing method
Note that when replacing the KLdivergence in (6) with an entropy regularizer , we derive an entropic GW discrepancy, which can also be solved by the SinkhornKnopp algorithm. Accordingly, the in Algorithm 2 (line 6) is replaced with . In such a situation, the proposed algorithm becomes the Sinkhorn projection method in (Peyré et al., 2016).
For both these two methods, the number of Sinkhorn iterations and the weight of (proximal or entropical) regularizer are two significant hyperparameters. Figure 4 shows the empirical convergence of these two methods using different hyperparameters with respect to the number of inner iterations ( in Algorithm 2). We can find that for the Sinkhorn method it can obtain smaller GW discrepancy than our proximal point method when the weight of regularizer is very small () and . However, in such a situation both of these two methods suffer from a high risk of numerical instability. When enlarging , the stability of our method is improved obviously, and it is still able to obtain small GW discrepancy with a good convergence rate. The Sinkhorn method, on the contrary, converges slowly when . In other words, our method is more robust to the change of , and we can choose in a wide range and achieve a tradeoff between convergence and stability easily. Additionally, although the increase of helps to improve the stability of our method slightly, , suppressing the numerical fluctuations after the GW discrepancy converges, such an improvement is so obvious as the cost on the computational complexity. Therefore, in practice we set .
Disease Procedure  CR1  CR2  

1.00  d4019: Unspecified essential hypertension p9604: Insertion of endotracheal tube  
0.22  d4019: Unspecified essential hypertension p966: Enteral infusion of concentrated nutritional substances  
0.17  d4280: Congestive heart failure, unspecified p966: Enteral infusion of concentrated nutritional substances  ✓  ✓ 
0.64  d4280: Congestive heart failure, unspecified p9671: Continuous invasive mechanical ventilation for less than 96 consecutive hours  
0.36  d42731: Atrial fibrillation p3961: Extracorporeal circulation auxiliary to open heart surgery  ✓  ✓ 
0.18  d42731: Atrial fibrillation p8856: Coronary arteriography using two catheters  
0.16  d42731: Atrial fibrillation p8872: Diagnostic ultrasound of heart  ✓  ✓ 
0.34  d41401: Coronary atherosclerosis of native coronary artery p3961: Extracorporeal circulation auxiliary to open heart surgery  ✓  ✓ 
0.29  d41401: Coronary atherosclerosis of native coronary artery p8856: Coronary arteriography using two catheters  ✓  ✓ 
0.42  d5849: Acute kidney failure, unspecified p9672: Continuous invasive mechanical ventilation for 96 consecutive hours or more  ✓  
0.44  d25000: Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled p3615: Single internal mammarycoronary artery bypass  ✓^{a}  
0.20 
d51881: Acute respiratory failure p3893: Venous catheterization, not elsewhere classified 
✓  ✓ 
0.22  d51881: Acute respiratory failure p9904: Transfusion of packed cells  ✓  ✓ 
0.29  d5990: Urinary tract infection, site not specified p3893: Venous catheterization, not elsewhere classified  ✓  ✓ 
0.22  d53081: Esophageal reflux p9390: Noninvasive mechanical ventilation  ✓  
0.23  d2720: Pure hypercholesterolemia p3891: Arterial catheterization  
0.48  dV053: Need for prophylactic vaccination and inoculation against viral hepatitis p9955: Prophylactic administration of vaccine against other diseases  ✓  ✓ 
0.53  dV290: Observation for suspected infectious condition p9955: Prophylactic administration of vaccine against other diseases  ✓  
0.30  d2859: Anemia, unspecified p9915: Parenteral infusion of concentrated nutritional substances  ✓^{b}  
0.24  d486: Pneumonia, organism unspecified p9671: Continuous invasive mechanical ventilation for less than 96 consecutive hours  ✓✓  ✓✓ 
0.18  d2851: Acute posthemorrhagic anemia p9904: Transfusion of packed cells  ✓  ✓ 
0.18  d2762: Acidosis p966: Enteral infusion of concentrated nutritional substances  
0.28  d496: Chronic airway obstruction, not elsewhere classified p3722: Left heart cardiac catheterization  
0.16  d99592: Severe sepsis p3893: Venous catheterization, not elsewhere classified  ✓  ✓ 
0.26  d0389: Unspecified septicemia p966: Enteral infusion of concentrated nutritional substances  
0.26  d5070: Pneumonitis due to inhalation of food or vomitus p3893: Venous catheterization, not elsewhere classified  ✓  ✓ 
0.33  dV3000: Single liveborn, born in hospital, delivered without mention of cesarean section p331: Incision of lung  
0.17  d5859: Chronic kidney disease, unspecified p9904: Transfusion of packed cells  ✓  ✓ 
0.16  d412: Old myocardial infarction p8853: Angiocardiography of left heart structures  ✓  ✓ 
0.18  d2875: Thrombocytopenia, unspecified p3893: Venous catheterization, not elsewhere classified  ✓  ✓ 
0.25  d41071: Subendocardial infarction, initial episode of care p3723: Combined right and left heart cardiac catheterization  ✓✓  ✓ 
0.21  d4240: Mitral valve disorders p9904: Transfusion of packed cells  
0.31  dV3001: Single liveborn, born in hospital, delivered by cesarean section p640: Circumcision  ✓  ✓ 
0.23  d40391: Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease p3893: Venous catheterization, not elsewhere classified  ✓  ✓ 
0.17  d78552: Septic shock p9904: Transfusion of packed cells  ✓  
0.17  d9971: Cardiac complications, not elsewhere classified p8856: Coronary arteriography using two catheters  ✓✓  ✓ 
0.27  d7742: Neonatal jaundice associated with preterm delivery p9983: Other phototherapy  ✓  ✓ 
0.25  dV502: Routine or ritual circumcision p9955: Prophylactic administration of vaccine against other diseases  ✓^{c} 

The relationship here is that usually people with diabetes also have heart disease, and heart disease can require a coronary artery bypass

The relationship here is that if someone has a chronic disease they can develop anemia of chronic disease and they may also be requiring parenteral nutrition for some specific condition

This procedure is not inherently related to the disease, but they do appear together frequently in the same medical record because they both happen to newborn babies.

The procedure is related to the treatment of the disease.

The procedure can lead to the disease as side effect or complication.
7.5 MIMIC III: More details of experiments
The enlarged optimal transport between diseases and procedures learned by our method is shown in Figure 5. The pairs whose values in the optimal transport matrix are larger than are listed in Table 3. Additionally, we ask two clinical researchers to evaluate these pairs — for each pair, each research independently checks whether the procedure is potentially related to the disease. The columns of “CR1” and “CR2” in Table 3 give the evaluation results. For each pair, the “✓” means that the procedure is potentially related to the treatments of the disease, while the “✓” means that the procedure can lead to the disease as side effect or complication. We can find that 1) the evaluation results from different clinical researchers are with high consistency; 2) over of the pairs are reasonable: they correspond to either “diseases and their treatments” or “procedures and their complications”. These phenomena demonstrate that the learned optimal transport is clinicallymeaningful to some extent, which reflects some relationships between diseases and procedures. Table 4 lists the ICD codes of diseases and procedures and their detailed descriptions.
ICD code  Disease/Procedure 

d4019  Unspecified essential hypertension 
d41401  Coronary atherosclerosis of native coronary artery 
d4241  Aortic valve disorders 
dV4582  Percutaneous transluminal coronary angioplasty status 
d2724  Other and unspecified hyperlipidemia 
d486  Pneumonia, organism unspecified 
d99592  Severe sepsis 
d51881  Acute respiratory failure 
d5990  Urinary tract infection, site not specified 
d5849  Acute kidney failure, unspecified 
d78552  Septic shock 
d25000  Diabetes mellitus without mention of complication, type II or unspecified type 
d2449  Unspecified acquired hypothyroidism 
d41071  Subendocardial infarction, initial episode of care 
d4280  Congestive heart failure, unspecified 
d4168  Other chronic pulmonary heart diseases 
d412  Pneumococcus infection in conditions classified elsewhere and of unspecified site 
d2761  Hyposmolality and/or hyponatremia 
d2720  Pure hypercholesterolemia 
d2762  Acidosis 
d389  Unspecified septicemia 
d4589  Hypotension, unspecified 
d42731  Atrial fibrillation 
d2859  Anemia, unspecified 
d311  Cutaneous diseases due to other mycobacteria 
dV3001  Single liveborn, born in hospital, delivered by cesarean section 
dV053  Need for prophylactic vaccination and inoculation against viral hepatitis 
d4240  Mitral valve disorders 
dV3000  Single liveborn, born in hospital, delivered without mention of cesarean section 
d7742  Neonatal jaundice associated with preterm delivery 
d42789  Other specified cardiac dysrhythmias 
d5070  Pneumonitis due to inhalation of food or vomitus 
dV502  Routine or ritual circumcision 
d2760  Hyperosmolality and/or hypernatremia 
dV1582  Personal history of tobacco use 
d40390  Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage I through stage IV, or unspecified 
dV4581  Aortocoronary bypass status 
dV290  Observation for suspected infectious condition 
d5845  Acute kidney failure with lesion of tubular necrosis 
d2875  Thrombocytopenia, unspecified 
d2767  Hyperpotassemia 
d32723  Obstructive sleep apnea (adult)(pediatric) 
dV5861  Longterm (current) use of anticoagulants 
d2851  Acute posthemorrhagic anemia 
d53081  Esophageal reflux 
d496  Chronic airway obstruction, not elsewhere classified 
d40391  Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease 
d9971  Gross hematuria 
d5119  Unspecified pleural effusion 
d2749  Gout, unspecified 
d5859  Chronic kidney disease, unspecified 
d49390  Asthma, unspecified type, unspecified 
d45829  Other iatrogenic hypotension 
d3051  Tobacco use disorder 
dV5867  Longterm (current) use of insulin 
d5180  Pulmonary collapse 
p9604  Insertion of endotracheal tube 
p9671  Continuous invasive mechanical ventilation for less than 96 consecutive hours 
p3615  Single internal mammarycoronary artery bypass 
p3961  Extracorporeal circulation auxiliary to open heart surgery 
p8872  Diagnostic ultrasound of heart 
p9904  Transfusion of packed cells 
p9907  Transfusion of other serum 
p9672  Continuous invasive mechanical ventilation for 96 consecutive hours or more 
p331  Spinal tap 
p3893  Venous catheterization, not elsewhere classified 
p966  Enteral infusion of concentrated nutritional substances 
p3995  Hemodialysis 
p9915  Parenteral infusion of concentrated nutritional substances 
p8856  Coronary arteriography using two catheters 
p9955  Prophylactic administration of vaccine against other diseases 
p3891  Arterial catheterization 
p9390  Noninvasive mechanical ventilation 
p9983  Other phototherapy 
p640  Circumcision 
p3722  Left heart cardiac catheterization 
p8853  Angiocardiography of left heart structures 
p3723  Combined right and left heart cardiac catheterization 
p5491  Percutaneous abdominal drainage 
p3324  Closed (endoscopic) biopsy of bronchus 
p4513  Other endoscopy of small intestine 
Comments
There are no comments yet.