1 Introduction
It is important for the financial institutions to know their client well in order to mitigate credit risks [17], deal with fraud [16] and recommend relevant services [4]. One of the defining properties of a particular bank client is his or her social and financial interactions with other people. It motivates to look on the bank clients as on the network of interconnected agents [18, 4, 21]. Thus, graphbased approaches can help to leverage this kind of data and solve the above mentioned problems more efficiently.
Importantly, information about clients and especially about their neighborhood is never complete – market is competitive and we can not expect all the people to use the same bank. Thus, some of the financial interactions are effectively hidden from the bank. That leads to the necessity to uncover hidden connections between clients with limited amount of information which can be done using link prediction approaches [20].
From the other hand, the financial networks have two notable features. First one is the size – the number of clients can be of order of millions and the number of transactions is estimated in billions. The second important feature is the dynamic structure of considered networks – the neighborhood of each client is everevolving. The classical link prediction algorithms are only capable of working with graphs of a much smaller size, while the temporal component is usually not considered
[20]. Recently, several studies addressed largescale graphs [23] as well as temporal networks [14]. However, only few works consider the financial networks, see, for example, [4] and [18].We base our research on the well developed paradigms of graph mining with neural networks including graph convolution networks [11, 9], graph attention networks [19] and SEAL framework for link prediction [25]. The considered approaches consistently show stateoftheart results in many applications but, to the best of our knowledge, were not yet used for financial networks. Our key contributions can be formulated as follows:

We build a scalable approach to link prediction in temporal graphs with the focus on extensive usage of Recursive Neural Networks (RNNs) both as feature generators for graph nodes and as a trainable attention mechanism for the graph edges.

We validate the proposed approaches on the link prediction and credit scoring problems for the realworld financial network with millions of nodes and billions of edges. Our experiments show that our improved models perform significantly better than the standard ones and efficiently exploit rich transactional data available for the edges and nodes while allowing to proceed largescale graphs.
2 Problem and Data
From the prospectives of network science and data analysis, the considered problem of linking bank clients is the link prediction problem in graphs with two notable peculiarities. The first one is that the considered graph of clients and transactions between them is very large, having the order of millions of nodes and billions of edges. The second peculiarity is that both nodes and edges have rather complex attributes represented by times series of bank transactions of different types. We want to note that such kind of problem is not limited to banking as graphs with similar structure appear in social networks, telecom companies, and other scenarios, where we consider some objects as nodes and a certain type of communication between them. Thus, the algorithms developed in our work might be applicable beyond banking for any link prediction problem with timesseries attributes.
In what follows, we first discuss the dataset studied in our work and then explain some peculiarities of the problem statement.
2.1 Dataset
The considered dataset is obtained from one of the large European banks. The data consists of user transactions and money transfers between users during five years. All the data is depersonalized with each transaction being described by timestamp, amount and currency. Thus, we observe a graph with a set of vertices and a set of edges . Here, an edge means that there was at least one transfer between a pair of clients and over the observed time period. Each node is represented by a time series of transactions for client , while each edge is represented by a time series of transfers between clients and . Finally, we obtain a huge 86million nodes graph with about 4 billions of edges.
Such graph size makes its analysis difficult to approach, since the majority of the graph processing methods aimed to solve node classification, graph classification or link prediction problems are suitable for graphs of much smaller size [8]. The time complexity of such methods usually grows at least as , where is number of nodes, limiting the possible graph sizes to several thousands of nodes and up to a one hundred thousand of edges.
As a result, when we work with a particular node or with a particular edge, we are forced to consider certain subgraphs around the target node or the target pair of nodes (for example, see [24]). In this work, we follow this approach and consider the subgraph around target nodes extracting hop 1 or hop 2 neighbors.
2.2 Problem Statement and Validation
Our goal is to determine how stable is the relationship between nodes. We start by describing outoftime validation (see a similar approach in [12]), more specifically, we consider time interval time for and use all the information available (e.g. all the transactions and transfers) as an information encoded in a graph. Given the information available for the period we aim to predict the structure of the graph for the time interval with . In what follows, we say that there is an edge between two nodes in a graph for the certain time period if there was at least one transaction between these nodes during the considered period. Thus, we end up with link prediction problem where the pair of nodes is described by the graph structure and attributes during the period and the target label corresponds to the existence of the transaction between the pair of nodes during the period . In all the experiments below we take equal to one year and equal to 3 months.
We note that usually link prediction models are validated in a different way, e.g. by edge sampling [12]. In this approach, the whole set is considered as positive samples, while negative samples are constructed by taking node pairs (
is a hyperparameter and
means the set size), which do not intersect with . Then, the subgraph is passed to the link prediction algorithm, hiding the link, if it exists. In order to build training, validation and test parts, one divides positive and negative edge sets into three corresponding nonintersecting sets.However, we think that for the timeevolving graphs in general and banking data in particular the outoftime validation is more sensible. Thus, in this work, we focus on the outoftime validation, while still providing a part of the experiments for both settings.
3 Neural Network Model for Link Prediction with Transactional Data
In this section, we describe the proposed neural network for solving a link prediction task powered by rich transactional data. The most challenging part is the work with transactional data itself, which is basically a multidimensional time series.
As a base graph neural network, we take SEAL framework [25]. Its input parameters are an adjacency matrix of a graph and a node feature matrix
with each row containing a feature vector for the corresponding node. Then SEAL considers the neigborhood subgraph for the target pair of nodes and performs several graph convolutions followed by sortpooling operation and fully connected layers, see Figure
1. However, the considered network of bank transactions does not have an explicit adjacency matrix or a vector of features as both clients and interactions between them are represented by time series. In the following, we are going to adapt SEAL framework to work with time series data by processing them with RNN. Moreover, we make a number of specific improvements to the structure of SEAL model making it more efficient.3.1 Recursive Neural Network Powers Graph Neural Network
3.1.1 RNN as Feature Generator
The powerful way of working with time series data is to build a Recurrent Neural Network (RNN,
[6]). The main question is what objective function RNN should target. We suggest to pretrain RNN model on the credit scoring problem similar to [2], see also additional details in Section 5.5. The model takes a timeseries of user transactions and aims to predict the credit default. For that purpose, we take a quiet simple Recurrent Neural Network, which consists of GRU cell [5], followed by a series of fully connected layers. Importantly, such RNN model learns in the intermediate layers the meaningful vector representation for the transactions of the client. In the following, we call these vectors embedded transactions and use them as node feature vectors in all the considered graph neural network models.3.1.2 RNN as Attention Mechanism
The question of processing time series corresponding to the graph edges is even more challenging than the one for nodes. The simplest way is just to ignore the whole time series and consider binary adjacency matrix with edges present for pairs of nodes with at least one transfer between them. However, in this case we lose significant amount of important information as the properties of transfers between clients are apparently directly linked with our link prediction objective.
In order to get the full use of the data, we first note that one can consider a RNN model predicting the link between two nodes using solely the time series of transfers between them, see Figure 2. However, such a RNN model does not allow us to detect new possible connections since there is no data about interaction between users in this case. To overcome that drawback, a model based on a transactional graph can be used.
We first note that standard graph convolutional architectures (like GCN [11] or SEAL [25]) perform convolution operation by simple averaging over the neighborhood: [c] h’_i = σ(1—Ni— ∑_j ∈N_i W h_j), i = 1, …, n, where are node embedding vectors before the convolution operation, are their counterparts after it, are learnable weights, is a set of immediate neighbors of node and, finally,
is an activation function. The averaging operation implies that all the neigbors have an equal influence on the considered nodes which is apparently very unnatural in the majority of applications.
Graph Attention Networks [19] mitigate this problem by introducing weights and consider the weighted sum: [c] h’_i = σ(1—Ni— ∑_j ∈N_i α_ij W h_j). However, in the work [19] coefficients are computed solely basing on node features
. Instead, in order to use the full information about the graph, we propose to use the probabilities of the links between nodes output by RNN model as weights in the adjacency matrix, which then is passed to graph neural network.
The resulting model is called SEALRNN, see the architecture on Figure 3. After extracting an enclosing subgraph around the target link, all time series corresponding to edges are processed by RNN and the output probabilities are used to form weighted adjacency matrix which together with generated nodes features are passed into SEAL model.
3.2 Graph Neural Network (2SEAL)
3.2.1 Pooling
We propose another pooling operation instead of sortpooling in the SEAL model. Sortpooling layer holds (hyperparameter) most valuable in the sense of sorting (descending order) node embeddings while filtering out the other embeddings. In contrast, we suggest taking embeddings of two nodes, between which we aim to predict the link. The idea is natural since we want to predict the link between exactly these two nodes, while their embeddings still contain information about the neighboring nodes. Most importantly, it reduces the number of learned parameters in the neural network, and we do not need neither a sorting operation, nor 1D convolution after pooling (the purpose of 1D convolution in SEAL framework is to reduce the size of obtained output, which is , where is a sum of node features dimension and dimensions of the graph convolution outputs). We name the proposed model 2SEAL, see the schematic representation on Figure 4.
3.2.2 Modified Structural Labels
Working in terms of outoftime validation, we decided to change structural labels proposed in the SEAL framework. In SEAL framework, each node receives a structural label generated by a DoubleRadiusNodeLabelling procedure, which meets the following conditions:

two target nodes and have label ‘1’;

nodes with different distances to both and have different labels.
The aim of the labels is to encode some of the topological information about the graph structure. These structural labels are concatenated with initial node features (if exist), and passed to neural network as node features. The labelling function ( is a node index) is the following: [c] f(i) = 1 + min(d_x, d_y) + (d/2) [(d/2) + (d%2)  1], where , , , and are the integer quotient and remainder of division respectively, while is distance between nodes. Authors of initial paper suggest to take into account all subgraph nodes except during computing distance , and similarly for .
We suggest not to hide nodes and during finding distances . That better suits outoftime validation by allowing to keep in data patterns for all kinds of combinations of link existence in the observed graph and link existence in the future.
4 Related Work
The idea to consider bank clients as a large network of interconnected agents was raised in the past several years [18, 4, 21]
. The number of bank clients counts in millions, so we solve the link prediction problem for graphs with millions of nodes, which requires the usage of scalable methods. There are few ways to handle the graphs of such size mentioned in the literature, mostly being the simple heuristics that compute some statistics for the immediate neighborhoods of target nodes, for example, Common Neighbors
[13], AdamicAdar [1] and others [20]. However, these models are not trainable and do not use the information about node features, which limits their performance in realworld applications.The main challenge in the construction of machine learning models for link prediction is to handle variation in the graph size. One approach is presented in WLNM
[24] – it is to use WeisfeilerLehman structural labels [22] to prioritize nodes and to leave only the important one from the immediate neighborhood of evaluated nodes. After that, we can use regular denseconnected neural networks.The graph convolution networks [11] showed good performance on graph datasets. Original GCN is supposed to use the whole graph, and it is prohibitive for the graph on a scale of millions of nodes. In [25], the SEAL framework was proposed, which is to extract enclosing subgraphs around the target link and include such a pooling layer in the neural network architecture, which holds the fixed number of nodes for every subgraph. This allows using the model on arbitrarily sized graphs.
The novel Graph attention model GAT
[19] allows specifying different weights to different nodes in the neighborhoods. That approach could be used to leverage sequence information on the edges by adding attention coefficients to the graph convolutions.5 Experiments
5.1 Dataset Preprocessing
Firstly, we divide the whole time interval and the set of user IDs into three nonintersecting parts: first three years, fourth year, and fifth year, they correspond to training, validation, and test time and users segments, see Figure 5. Taking a point in one of the time intervals, we define the base and the target segment. The base segment corresponds to the time segment before the point, while the target segment corresponds to time after. For edge sampling validation, we observe graph state restricted to the base segment, while the target is whether there was at least one transfer between users during this time. For the outoftime validation setting, the target is whether there is at least one transfer between users during the target segment. We consider ROCAUC measure as a quality metric for the link prediction task.
5.2 Baselines
Due to the need in the scalability we consider only simple similaritybased approaches, such as Common Neighbors, AdamicAdar Index, Resource Allocation, Jaccard Index and Preferential Attachment, as baselines for our task (see
[20] for the description of the methods). Also, we take the SEAL model [25] as a baseline (with embedded transactions concatenated with structural labels as node features). The results can be found in Table 1. As we may see, the results obtained from simple heuristic methods are beaten by the neural network solution. Also, there is a gap in the ROC AUC score for different validation settings. It could be explained by the fact that the problem of prediction into the future is a more difficult problem than finding hidden links in the current graph state.5.3 Implementation details
We use PyTorch
[15] and PyTorch Geometric [7] to implement the models. Each model was trained with Adam optimizer [10] using learning rate scheduler and hyperparameter optimization [3] for the number of layers, size of the layers and initial learning rate. We used the server with single GPU (NVIDIA Tesla P100), 32 CPU cores Intel i7 and 512 GB of RAM in all the experiments.5.4 Link Prediction Results
Method  Edge sampling  Out of time 

Common Neighbors  0.398  0.629 
AdamicAdar  0.391  0.646 
Resource Allocation  0.35  0.639 
Jacard Index  0.284  0.62 
Preferential Attachment  0.746  0.497 
SEAL  0.85  0.77 
Method  Edge sampling  Out of time 

SEAL  0.85  0.77 
WLSEAL  0.87  0.75 
2SEAL  0.89  0.78 
The first improvement of the initial SEAL model is the new pooling operation. SEAL and 2SEAL models are described in the previous sections (see Sections 3 and 4). We additionally consider WLSEAL pooling operation which is based on the idea of the WeisfeilerLehman graph isomorphism test. Quiet similarly to the idea described in [24], we propose to color nodes of enclosing subgraphs by the PaletteWL algorithm (Algorithm 3 in [24]), thereby get nodes ordering. After that, we take only (hyperparameter) the most significant nodes of the subgraph, as an input of the neural network. Thus, all subgraphs have the same size, so there is no need for a pooling operation after convolution layers. We expect that such pooling is more meaningful in the sense of intuition, but the drawback of such a model is computationally expensiveness of coloring algorithm (). The results can be found in Table 2. We observe that both WLSEAL and 2SEAL are superior to SEAL. However, 2SEAL shows the best results, being less computationally expensive model which motivates us to focus the further studies on this model.
Another set of experiments is devoted to the exploration of the features. In the previous set of experiments on neural networks, we used a concatenation of embedded transactions (the output of an intermediate level of RNN which solves a credit scoring task) and structural labels as node features. We provide experiments in different settings of node features embedded transactions, embedded transactions concatenated with structural labels, structural labels, and modified structural labels (structural labels and modified structural labels are described in Section 3.2.1). Surprisingly, the usage of embedded transactions plays a negative role in the link prediction task. We explain it by the fact that similar purchases do not play a significant role in problems of finding new connections in the network, while network structure and people’s connections are a way more important. Also, modified structural labels (without hiding the link) gave us a better performance.
The final set of experiments is based on the work with data corresponding to edges (see Table 3), where we consider different RNNbased models, see details in Section 3.1. We see that in every setting (except embedded transactions + structural labels for 2SEAL model), we have a large increase in the ROC AUC score (almost 0.1 in some of the cases) for the proposed models. We conclude that 2SEAL model with RNN attention is the best model for link prediction for the considered banking dataset.
The summary of the results can be found in Table 4. We observe the significant improvement in the ROC AUC score for the proposed 2SEALRNN model compared to the best heuristic approach and SEAL.
Method  ET  ET+SL  SL  Modified SL 

SEAL  0.62  0.747  0.74  0.76 
SEALRNN  0.61  0.787  0.78  0.794 
2SEAL  0.7  0.739  0.77  0.787 
2SEALRNN  0.727  0.804  0.83  0.858 
Method  Result, ROC AUC 

Best heuristic  0.646 
SEAL  0.74 
2SEAL  0.79 
2SEALRNN  0.858 
5.5 Credit Scoring Results
In this section, we want to show the applicability of the developed link prediction models to other problems relevant for the banking. One of the most important problems in the bank is to control the risks related to working with clients, especially in the process of issuing a loan. This problem is called credit scoring [17], and usually the ensemble of predictive models is used, which in particular are based on user transactional data. For example, the RNN model run on time series of transactions has been shown to be very efficient in credit scoring [2].
The usage of information available in the network of clients may further improve the prediction quality. We consider the credit scoring dataset of approximately one hundred thousand clients which is a part of our initial dataset. Our experiments show the standard Graph Convolutional Network (GCN) [11] trained on these data improves over baseline RNN model by Gini, see Table 5. However, GCN model is known to treat all the neighboring nodes equally without any prioritization (see discussion in Section 3.1), which is apparently not correct for the bank clients some of which have much more influence on the particular client than the others. This issue was addressed in the literature by introducing graph attention mechanism based on the available node features [19].
In our work, we propose to use the developed link prediction model (2SEALRNN) as an attention mechanism by reweighing the neighbouring nodes with coefficients proportional to the probabilities of the connection output by the link prediction model. Unlike standard Graph Attention Networks [19], our attention mechanism considers not only node features but also the topology of the graph while still allowing to train the final credit scoring model in endtoend fashion. In Table 5, we compare GCN performance which use binary adjacency matrices, and adjacency matrices weighed by the link prediction model. We note that we use the embeddings obtained by RNN as node features in both models. The results show that the link prediction model used as an attention in GCN allows almost to double the effect of considering graph structure in credit scoring problem. We believe that the further study of the link prediction based attentions in graph neural network may lead to even better credit scoring models.
Method  Result, in Gini index 

Standard GCN  + 0.8% 
GCN with LPbased attention  + 1.4% 
6 Conclusion
In this work, we developed the graph convolutional neural network, which can efficiently solve the link prediction problem in largescale temporal graphs appearing in banking data. Our study shows that to benefit from the rich transaction data fully, one needs to efficiently represent such data and carefully design the structure of the neural network. Importantly, we show the effectiveness of Recursive Neural Networks as building blocks of temporal graph neural network, including a nonstandard approach to the construction of attention mechanism based on RNNs. We also modify the existing GNN pooling procedures to simplify and robustify them. The developed models significantly improve over baselines and provide highquality predictions on the existence of stable links between clients, which enables bank with a powerful instrument for the analysis of clients’ network. In particular, we show that the usage of the obtained link prediction model as an attention module in the graph convolutional neural network allows to improve the quality of credit scoring.
References
 [1] (2001) Friends and neighbors on the web. Social Networks 25, pp. 211–230. Cited by: §4.

[2]
(2019)
ETrnn: applying deep learning to credit loan applications
. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2183–2190. Cited by: §3.1.1, §5.5.  [3] (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning  Volume 28, ICML’13, pp. I–115–I–123. Cited by: §5.3.
 [4] (2019) DeepTrax: embedding graphs of financial transactions. CoRR abs/1907.07225. External Links: Link, 1907.07225 Cited by: §1, §1, §4.

[5]
(2014)
Learning phrase representations using rnn encoder–decoder for statistical machine translation.
In
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pp. 1724–1734. Cited by: §3.1.1.  [6] (1994) Recurrent neural networks and robust time series prediction. IEEE transactions on neural networks 5 (2), pp. 240–254. Cited by: §3.1.1.
 [7] (2019) Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, Cited by: §5.3.
 [8] (2017) Representation learning on graphs: methods and applications. IEEE Data Engineering Bulletin. Cited by: §2.1.
 [9] (2017) Inductive representation learning on large graphs. In NIPS, Cited by: §1.
 [10] (2014) Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR, Cited by: §5.3.
 [11] (2017) Semisupervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR, Cited by: §1, §3.1.2, §4, §5.5.
 [12] (2007) The linkprediction problem for social networks. J. AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY. Cited by: §2.2, §2.2.
 [13] (2001) Clustering and preferential attachment in growing networks.. Physical review. E, Statistical, nonlinear, and soft matter physics 64 2 Pt 2, pp. 025102. Cited by: §4.
 [14] (2020) Evolvegcn: evolving graph convolutional networks for dynamic graphs. In AAAI, Cited by: §1.
 [15] (2019) PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. External Links: Link Cited by: §5.3.
 [16] (2010) A comprehensive survey of data miningbased fraud detection research. arXiv preprint arXiv:1009.6119. Cited by: §1.
 [17] (2012) Credit risk scorecards: developing and implementing intelligent credit scoring. Vol. 3, John Wiley & Sons. Cited by: §1, §5.5.
 [18] (2019) Solve fraud detection problem by using graph based learning methods. arXiv preprint arXiv:1908.11708. Cited by: §1, §1, §4.
 [19] (2018) Graph attention networks. In 6th International Conference on Learning Representations, ICLR, Cited by: §1, §3.1.2, §4, §5.5, §5.5.
 [20] (2015) Link prediction in social networks: the stateoftheart. Science China Information Sciences 58 (1), pp. 1–38. Cited by: §1, §1, §4, §5.2.
 [21] (2018) Scalable graph learning for antimoney laundering: a first look. arXiv preprint arXiv:1812.00076. Cited by: §1, §4.
 [22] (1968) Reduction of a graph to a canonical form and an algebra arising during this reduction. Cited by: §4.
 [23] (2018) Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: §1.
 [24] (2017) Weisfeilerlehman neural machine for link prediction. In KDD, pp. 575–583. Cited by: 2nd item, §2.1, §4, §5.4.
 [25] (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, pp. 5165–5175. Cited by: 2nd item, §1, §3.1.2, §3, §4, §5.2.
Comments
There are no comments yet.