Missing Data as Part of the Social Behavior in Real-World Financial Complex Systems

04/01/2018 ∙ by Guy Kelman, et al. ∙ 0

Many real-world networks are known to exhibit facts that counter our knowledge prescribed by the theories on network creation and communication patterns. A common prerequisite in network analysis is that information on nodes and links will be complete because network topologies are extremely sensitive to missing information of this kind. Therefore, many real-world networks that fail to meet this criterion under random sampling may be discarded. In this paper we offer a framework for interpreting the missing observations in network data under the hypothesis that these observations are not missing at random. We demonstrate the methodology with a case study of a financial trade network, where the awareness of agents to the data collection procedure by a self-interested observer may result in strategic revealing or withholding of information. The non-random missingness has been overlooked despite the possibility of this being an important feature of the processes by which the network is generated. The analysis demonstrates that strategic information withholding may be a valid general phenomenon in complex systems. The evidence is sufficient to support the existence of an influential observer and to offer a compelling dynamic mechanism for the creation of the network.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


When individuals are aware of being monitored, their behavior is likely to improve. This intuitive insight was first studied through a sequence of experiments in the field of organizational behavior, conducted at the Hawthorne works in the 1920s [15]. The trials showed that the workers’ mere awareness of being monitored resulted in increased productivity, regardless of the direction of change in the working conditions. This effect is known as the ‘Hawthorne Effect’, or the ‘Observer Effect’.

In this paper we argue that the presence of a self-interested data collector in financial complex networks may instigate a change in the behavior of participants such that the network visible to the collector differs from the actual one. This effect has far-reaching implications on our understanding of the processes generating such networks: the way by which the players connect or dissociate, the kind of information that is hidden, and the overall effect of this on the stability and functioning of these systems.

Imagine for a moment that a monitoring entity such as a federal or regulatory agency, a bank or even a concerned parent, is observing a network of social or business ties (a crime network, the financial transactions of a stock broker, or the social activity of a child in his favorite social network). Assume further that subjects are aware of this monitoring procedure. It is only natural to expect that some information will be withheld, strategically, from the monitoring entity. The strategic disclosure of information may be a consequence of the activity in the neighborhood where the agent resides, and the sensitivity of his payoff to the observer’s presence. Unfortunately, detection of the influential presence of an observer is extremely challenging because the true underlying network is not given to us, and thus it is particularly hard to discriminate the true dynamics from the visible one. Notwithstanding, we believe that this strategic withholding of information from the observer leaves traces on the visible topology as well as on other, intrinsic, measurable attributes of the participants.

Although it is possible that many complex systems in the real-world are influenced by the data collection procedure, only a small subset fulfil the set of prerequisites which may offer the opportunity to test the hypothesis that the visible system was formed under the presence of an influential observer. The prerequisites are: (i) The network appears to be dominated by sampling error, with abnormally large number of disconnected components, large diameter, or a small number of neighbors on average. (ii) There exists an underlying theory that specifies the motivations of different agents to hide or disclose information, and the interest of the observer to monitor the ties formed by them. (iii) The consequent sensitivity of different agents to the observer’s presence is quantifiable and could be retrieved from the data. (iv) The visible network permits the measuring of the intrinsic attributes of agents, the communication patterns between them, and the incentives to disclose or withhold information. To this list we could add the trivial requirements that is met by most network models: (v) It is possible to obtain topological aggregates of the visible network, and to compare them with the topological measures of a network under full information.

We propose a procedure to measure a system under monitoring. The procedure begins by analyzing the topological attributes of the visible network vis a vis the expected structure under full information. If major discrepancies appear to be present, we move on as follows: we identify the observer and the agents, classify agents into classes (species) that exhibit different sensitivities to the presence of the observer, and prescribe an information disclosure strategy to each class. The proposed classification is derived from a known underlying theoretical foundation (e.g. profit maximization, risk aversion, thinking at the margin) that may support such differential information disclosure strategy. We then check whether the visible network indeed exhibits such species-dependent information exposure - as predicted by the theory - and compare the revealed effect with the one expected under full information. We complement the procedure by adding mediation analysis to rule out possible alternative dynamic mechanisms of the system that are observationally equivalent.

To make our point we searched for a complex system that has interacting agents, and where a collector of the data possesses potential impact on the payoffs of the agents. We further required that it would be possible to estimate the sensitivities of agents to the observer’s presence. Naturally, financial networks admit to these criteria. These networks usually consist of profit-maximizing agents (firms) that are grouped into well defined classes according to attributes such as industry, credit-rating, firm size, aversion towards risk, liquidity, and so on. Moreover, data on these networks are normally recovered by self-interested third party agencies that interact with subsets of the agents and possess a direct and measurable influence on the agent’s payoffs. These self-interested parties may be banks or regulatory agencies.

We were able to locate a large-scale business network where firms are buyers and sellers and the transactions between them are the ties. These data are collected by a bank for operational needs of maintaining credit lines with its borrowers. Thus it plays the role of the observer and has a clear incentive to monitor the financial activity of its customers. This network complies with the prerequisites prescribed above; it is in the real world, the presence of the bank may motivate the agents to strategically hide or disclose information, and all the intrinsic attributes of the agents are quantifiable, allowing us to control for a rich set of covariates, e.g. from firms’ financial statements. And last, the topology of the network may provide additional insight that may help to uncover the processes generating such networks. The generating processes can unravel if we could use the network to discover production chains, or rather production trees by means of flow dynamics, or locate central nodes by means of agent-based traversal.

Many studies of financial networks [44], [40], [5] assume that the network data are complete in the sense that the visible nodes and links are all there is. Such a working assumption is shared across disciplines that use networked data, and the sampling procedures are usually random in order to correct for known biases [38], [23], [22]. Yet, what if links are missing intentionally, that is, not at random? And what if the process that is causing linkages to be missing is itself a process of the network? With these questions in mind it may be plausible to revisit the family of works on trade credit [34] [36] and others on stock exchange trading and modeling noting that another kind of process that causes missing data may exist; and that this process is part of the network’s generating mechanisms. We believe that our approach offers a change in perspective in the analysis methodology that may constitute a proper statistical examination for this possible non-random cause of missingness, non-response [43], [24] and non-observability [38].

Measuring real-world complex networks has become a growing trend in the past decade [21], [25], [20] mainly for two reasons: (1) the development of methods for deducing the processes underlying the generation of a network from its structure [37] [11] [31] and (2) the growing availability of detailed digital data on the interactions between participants. However, the overall impact of many such underlying processes deduced from network structures has been called into question because of a suspicion that, in the process of creating a network model, links were removed in the sampling process. The accidental removal of even a single link may, in some cases, deform the realized network’s topology such that the researcher is led to interpret the structure inaccurately and create a false understanding of the real processes. Furthermore, in networks it is sometimes useful to predict the missing links in order to obtain a workable structure [28], [1]. Now, link prediction requires knowing the class of network under investigation. False inference on the generating process may lead to a wrong guess of the network classification and may pull the prediction to form a network model that is not true to reality. It is therefore not surprising that a myriad of such ‘damaged’ networks are, and were, discarded.

In a later study, [19], the authors address the problem of the reliability of links. They use a stochastic block model framework to create a reliability measure of individual links, given the observation and a family of acceptable realizations thereof. The link reliabilities are used to identify missing and spurious interactions in the observed network. The authors then test the performance of their approach by randomly adding and removing links from five high-quality, error-free networks which results in high accuracy of feature recovery.While their approach is designed to predict the potential appearance of spurious links,our methodology aims to provide non-invasive tools that would assist in determining whether nodes or links are missing at random, or that a strategic, systematic missingness is more likely the case. For this reason, we stress that this study does not aspire to offer a method for link prediction or draw inferences on the topology of the true network.

Our paper is related to several other distinct branches of existing literature. One is on-line self-disclosure and impression management, and information manipulation in social networks. This literature demonstrates the prevalence of information manipulation regarding participants’ physical attributes in on-line dating sites [45], and the role of the number of friends and their appearance and behavior on evaluations of individuals in social networks [47], [46], [49]. Further, it relates to discussions on the strategic, signalling role of displaying one’s social connections on social network profiles [9], [10]. The main empirical finding in these papers is that popularity and attractiveness of individuals in social networks are strongly affected by their friends’ appearance. While our study finds similar effect in a real world financial network it has a different focus and implications. Popularity, which determines social status and payoff, is replaced by credit worthiness, that determines financial costs and so economic profits. Rather than focusing on appearance of one’s friends’ and the effect it has on the social status, our work is centered around developing a method to uncover strategic information withholding given the varying sensitivities of different agents’ payoffs to the appearance of their social or business ties. Our main claim is that manipulated information has an effect on the part of the network that is visible, rather than on the status of individuals, and this is what we seek to explore.

Further, the paper is related to the literature on hidden populations. This literature highlights the usefulness of networks in resolving the sampling problem of subgroups that are deliberately missing from the records [39], [14], [24]

. Several studies are concerned with problems of missing data in networks in longitudinal studies. There, evolution of the network structure is part of the cause for missingness

[23]. Sampling and consequent analyses of hidden populations, such as cocaine users, sexual disease transmission, criminal networks or other networks where observations are missing, may be carried out by link-tracing [35], [41] or link-prediction [8], [33]. Alas, in many cases link tracing is not possible since a second round of queries to the actors may not be attainable, and link prediction should be avoided for reasons mentioned above. Some argue that more data in greater detail are needed to accurately derive the nature of the social effect [29]. However, if data are missing not at random gathering more data will not improve the network description.

In a recent paper directly related to ours [42], the authors make a first systematic attempt to account for non-random missingness of links and its effect on estimates of key network statistics, by removing nodes through a weighting process that factors in centrality of nodes and chance. Using select networks and network measures, and controlling for the portion of missing links, the authors then pursue a set of Monte Carlo simulations of taking out nodes in these networks. Finally, they compare the simulated to the original ones in order to gauge the level of bias that missing links impose. Their main finding is that bias is worse when central nodes are missing. While their approach is intended to gauge the level of bias, given the percentage of links missing and species (e.g. central vs. non-central), our contribution is to determine how likely is the given visible network formed through a process of strategic withholding of links from an influential observer, without any prior knowledge on the node species and percentage of missing links.

The rest of the paper is organized as follows; the Materials and Methods section gives basic definitions and formal concepts, and describes the procedure. In section A case study in the real-world

we apply the methodology on a large-scale real-world inter-firm trade network. We conclude and discuss several implications and limitations of our method in the last two sections. In the appendices we give an overview of financial trade networks, and lay out descriptive statistics that accompany the case study.

Materials and Methods

In this section we lay out the formalism of two mutually dependent network models: the monitored network of nodes and links and the true network that has nodes and links. The nodes perform interactions, and so they will be referred as agents. Further, a single, unique, observer exists in the system, who can see the monitored but not the true network . This observer interacts with agents and may reward them based on the agents to which they are linked. We assume that the visible network is a subset of the true one. Formally we write , and consequently and .

We mark the symbols designating the true network with a prime ().

Basic Definitions

We consider the network of agents, of which only

are visible to the observer. A directional weighted link exists between pairs of agents if they interact. An ordered pair

designates a directional link between an interacting pair of two nodes , who play different roles in this interaction. Each directional link has an associated weight . Similarly for the pair . To motivate this we could imagine a network of (c)ustomers who pay amounts of upon purchasing goods from (s)ellers.

Let be the set of all possible agent species with a typical element . The agent’s species is relevant in our set-up, as the observer determines an agent’s pay-off based on their species.

Following standard Bayesian analysis, there exists a prior probability distribution

over the set which is common knowledge, such that is known. The categorized species of an agent could be its credit rating, as discussed in the example section A case study in the real-world. We mark the species of agent by . It is noteworthy that may not be accessible to the observer, and that this fact will come in handy later on. In a similar way we can define the bandwidth function that may operate on the subset of agents neighboring with to extract the weights on the links pointing to agent : . We will use instead of where it is obvious.

The pair in the true network, may not be visible to the observer. This can happen only if the pair is in the complement of the visible network


Pairs that are symbolized are strictly members of the visible network and ones written as are strictly invisible to the observer.

We define the set of actions available to the observer as with a typical action . Similarly, we define by the set of actions available to the agent, and call this set the messages. A message sent out by agent gives away part of the information about agents that are directly and visibly linked to ,

In contrast, the true neighborhood of , , may encapsulate and can be defined as

Examples of these definitions can be seen in figure 1: , , , and . and its neighborhood are all in . The messages passed to observer , and , can be any set of parameters on and .


















Figure 1: A sketch of the possible situations in a monitored network. Following (1), the boxed area is , where information on the agents and the links is visible. Dashed links and hollow nodes outside of the boxed area are in . The blue agents are either in contact with the observer or exposed to it. The double arrows carry messages between the agents and the observer. In the text are some walk-through examples.

We now focus on the right member of the ordered pair, , who we define as the agent that may interact with the observer. This member of the pair may choose to play strategies in which partial or all information may be concealed from the observer. For this reason, the species of agent has two components: a historical component based on past interactions with the observer, and the social interaction component

The reason for this separation is that is accessible to the observer at all times, but the neighborhood-related component that potentially affects the payoffs, may not be so. To illustrate the use of and we could consider the credit rating of a firm which reflects the historical moral of payments of the borrower. However the overall risk of default might also depend on the current business activities with the financial neighborhood of first degree. Thus may be the mean financial costs among all the trade-partners. Hereafter we define for brevity and call it the neighborhood indicator of agent .

Asymmetry of information thus exists owing to the fact that is selectively disclosed and should be estimated by the observer from a probability distribution. So, in order to take proper action the observer estimates the species of based on this prior probability and additional information delivered by the message .

A possible message may contain the identities of all the neighbors of :


however, the agent may play a strategy of disclosing partial information about his neighbors. i.e. sending the message . A message that is a subset of we name the information set of node .

In several set-ups we may find that more information is carried than only the set of neighboring nodes. We will encounter such a set-up in the example section. Thus (2) may be redefined as


for a subset of , where may be any set of arguments intrinsic to . The information exposure of each node could be a summary indicative of the difference between and its counterpart under full information.

The payoff that every player would like to maximize in his own benefit is a scalar marked by . Payoffs are earned after each single binary interaction of the observer and an agent. There are three determinants to the payoffs: the agent species, its message and the observer’s action. This can be written as

where the payoff of an agent is , and the observer’s payoff from acting upon is . Here, is the set of all neighborhood indices estimated from the set of messages .

Let us now recall a previously drawn distinction between monitored and unmonitored networks. Designation of the social ties corresponds, in general, with two dimensions: the species and the species of the supplier , and the weights on the links pointing to , . With perfect information, the observer may take action corresponding exactly to these parameters, to receive the optimal payoff in a transaction with agent , . If information is reduced, the observer must complement the missing details with what he has. The optimal payoff then becomes , a function of the reduced message , the expected species of the agent , and the now optimal action . Hereafter in certain contexts we may omit the factors that implicitly relies on.

For two individuals and of different species , we say that is more sensitive than to the impact of its relationship with its neighborhood whenever its expected payoff is more responsive to the neighborhood indicator. Namely,


from this follows the information disclosure strategy of the species towards the observer in the sense that , being of species is more selective in exposing information. This condition motivates our conception of measuring the network under the hypothesis of an influential observer.

The next section uses these diagnostics in a case study of a real-world business network. There, we classify the agents to species in the form of credit-rating classes, we identify the self-interested observer in this network which is the bank, we show the sensitivity that firms of different rating classes exhibit to the impact of the bank’s rating mechanism, and that given this differential sensitivity we define the information exposure that gives rise to two main strategies of disclosure: as expected, agents that are more sensitive to their neighborhood’s trading pattern will be more selective in giving away information to the bank. We explain these by means of standard economic theory, and conclude with mediation analysis to rule out alternative hypotheses to the existence of a monitoring entity.

For general background on financial networks, and descriptive statistics on our network, collected by an Italian bank the reader is referred to appendices A and C.

A case study in the real-world

In this section we deploy analysis on a test case from the financial real world. We follow the procedure prescribed in the introduction, namely:

  • Present evidence on network attributes with contrast to what would be expected under full information.

  • Describe a theory from which a species-dependent information exposure is likely to be present.

  • Present the revealed pattern in our test case network compared to the one expected under random sampling design.

  • Perform mediation analysis where we rule out the possibility that this observation is given by other intervening variables.

Let us now consider the financial system that was mentioned previously, where the agents are firms that buy or sell from each other using invoices (later to be termed Trade Credit), and the observer is a bank. The theoretical foundation is the profit maximization principle [48], the species are risk classes of agents, the payoffs are the profit functions of the firms, the messages that firms send are the result of a strategic choice on the amount of sales invoices to expose and which of their sales invoices to bring to the bank as collateral for a loan, and the strategies that the bank plays are the interest rates charged on their loans. Both the agent and the observer play their strategies conditional on the species of the agent.

We will soon find that many observations on this network place in our hand the possible rejection of the hypothesis that the network is unmonitored. A brief, non-exhaustive, list of evidence is the elimination of most nodes and links with minimal cleaning procedure, the complete breakage of the network to paths shorter than the production chain [26], and the inability to detect propagation of distress in the 2008 financial crisis [18].

For further details on financial networks in general and trade credit networks we refer the reader to appendix B.

Network characteristics

In order to convince the reader that the network has a problem we intend to contrast several stylized facts: (a) the network is directed and is an agglomerate of trees (such that money is expected to travel long distances), (b) the network is negatively assorted (so that agents of similar connectivity don’t engage), (c) the structure of the system is similar to a social network, and: (d) money travels short distances (long paths are not utilized), (e) buyers are also sellers (the network is not bipartite and loops should occur), and (f) the agents have a strong tendency to affiliate with like agents (contrasting the negative degree assortativity feature).

Taken together, bullets (a) through (f) surrender the possibility that the network indicates that an influential observer exists.This concludes the first prerequisite and warrants further testing according to our proposed methodology.

network email LJ WikiTlk Amazon Our
Nodes 265214 4847571 2394385 403394 345403
Edges 420045 68993773 5021410 3387388 2874830
Nodes Largest WCC 224831 (0.85) 4843953 (0.99) 2388953 (0.99) 403364 (1.00) 341023 (0.98)
Edges Largest WCC 395270 (0.94) 68983820 (1.00) 5018445 (0.99) 3387224 (1.00) 1091873 (0.38)
Nodes Largest SCC 34203 (0.13) 3828682 (0.79) 111881 (0.05) 395234 (0.98) 101186 (0.29)
Edges Largest SCC 151930 (0.36) 65825429 (0.95) 1477893 (0.29) 3301092 (0.97) 193198 (0.56)
Clustering Coeff 0.0671 0.2742 0.0526 0.4177 0.003
Triangles 267313 285730264 9203519 3986507 195318
Frac. Closed. Triangs 0.001373 0.04266 0.001112 0.06206 2.04e-05
Diameter 14 16 9 21 20
90% Effective Diam. 2.5 6.5 4 7.6 12
Table 1: Comparison of topological measures in directed networks that may possess a social component. The parentheses give proportions from the full data. Our network is on the rightmost column, other data sets are from [27]. The clustering coefficient and the fraction of closed triangles (underlined) indicate a directional tree-like structure, which is expected from a production chain. However, together with the large diameter it contrasts the reciprocity of roles, namely that half of the nodes are buyers and sellers. WCC=Weakly connected component, SCC=Strongly connected component.

We begin our exploration by looking at some bare topological measures of our system and comparable ones. Table 1 lists networks that are known to be directed and possess a social component. Following a cleaning procedure that utilizes financial considerations, we present our network on the rightmost column. We elaborate on the cleaning procedure in appendix C.

Browsing the common measures of the directed network, we find that connected component sizes are small. The size of the largest component is 101,186 nodes (out of a total of 345,403 nodes) with a diameter of 20. Of the remaining 244,217 nodes, 239,780 are situated in small clusters of 1 link each. In many of the sub-graphs the central nodes deal small contract sizes, e.g. phone companies or couriers, each of which is financially irrelevant to the system due to the extremely small total transaction volumes between them and each of their peers. Filtering out these irrelevant firms causes the network to break completely.

Next we address the question of whether the network is similar to a social network, from interaction of seller-buyer pairs and their respective credit ratings. The distribution of the in-degree of sellers often reveals information about the way a network is formed. Here the in-degree distribution is a power law, indicating an association mechanism similar to popularity. Although the seller is required to recruit the buyers, in the relevant literature on diffusion we witness an inevitable coupling of mass media (external factors) and word of mouth (local factors). A good historical review of market models can be found in [17]. In order to render a power law of the degree distribution, the seller’s own reputation should act to reverse the damping effect of diffusion due to e.g. advertisement. This confirms the point we made that the network is social.

As for peer-interaction in financial context, the credit rating of sellers and their buyers are measurables external to the network’s topological ones. Using these, we intend to show more evidence supporting our claim of non-random missingness: Generally, large firms deal with large contracts, small firms deal small. Although there are exceptions, e.g. the phone and courier companies, figure 2 indicates that having many buyers means both that the median contract size is small and that the neighbor’s degree is likely to be small. The conclusion is that the clustering around a seller is negatively assorted. In other words, highly connected firms tend to be positioned away from one another in the network and thus render the network more vulnerable to systemic shock (removal of the highly-connected sellers creates an impact across the whole network because their neighborhoods are not densely interconnected). The merit here is that the network is less likely to percolate in the sense that distress does not spontaneously amplify itself [32].

(a) mass
(b) number
Figure 2: Median contract size 2(a) and average buyer (neighbor) degree 2(b) plotted against seller’s degree , from the sellers in the trade-credit full network (n=273,726). The network is dissortative both in connectivity and mass. The slope in 2(b) can be approximated by .

Further, we created a cross tabulation of RATING scores for all sellers in the data set (cf. appendix C) and their buyers that have RATING information, which is summarized visually in figure 3. RATING of the seller is in the columns and RATING of their buyer is in the rows. Table elements are therefore enumerations of all pairs of RATING scores possible in the data. Essentially this is a description of RATING on the two ends of each link between trading firms.

Figure 3: Affinity between sellers and buyers. A compressed tiles mosaic of the cross tabulation RATING seller RATING buyer of the 2,802,976 pairs. The plotting scheme is described in [16].

From this table we created a test of independence of the categories. This test produced a statistic

with 56 degrees of freedom and a p-value identical to zero. The conclusion is that we can reject complete independence between RATING of a seller and the average RATING of his buyers and suggest a tendency of sellers to affiliate with buyers having similar RATING.

The figure shows the tile mosaic of the paired RATING classes with color coding that reveals the tendency of sellers to associate with buyers having a similar RATING; the area of each tile in the mosaic is proportional to the number of pairs where seller has RATING=X and buyer has RATING=Y. A blue tile marks significantly higher than expected occurrence, and a red color paints a significantly lower than expected pair count.

The result of combining the stylized facts, and the additional financial parameters leaves a rather empty picture of the full network structure. Furthermore, the financial insight (like the contract sizes or the production chain lengths) contributes to the uncertainty in classifying the static and dynamic properties of this network. Thus, the network measures encourage us to move further to the next step of providing a theory.

Theoretical foundations for strategic information exposure by different classes of agents

The principle of thinking at the margin, a profit maximization approach

Is there any systematic difference between the amount of information that firms expose to the bank? Presenting invoices to stand as collateral against a loan with higher face value than is really needed may have a non-trivial effect on the borrower’s terms on loan: On the one hand, with the now larger collateral presented, the face value of the loan will be greater. On the other hand, by exposing an invoice representing a sale made to a non credit-worthy buyer, the borrower runs the risk of tarnishing their own reputation.[49]. To better understand the workings of these two conflicting effects, we must resort to the fundamental economic principle of “thinking at the margin”, or marginalism. The application of this principle here is straightforward: once enough collateral is discounted to finance the production stage, the incremental contribution of yet another invoice in extending more credit from the bank becomes unnecessary for covering costs of production.

Figure 4: A test, rejecting complete independence between a buyer firm’s credit-rating and their total invoiced obligations (p-value with df=72). The mosaic suggests a tendency of the buyers to partition into two groups: credit-worthy buyers that pay large amounts in trade-credit, and credit-constrained buyers that, in addition, share the low-end of the market purchase power. We highlight several points: (1) that middle-rated firms exercise purchases to a greater extent than other classes (both high and low), and (2) that risky firms make purchases on credit at an order of magnitude less than the other classes.

While this principle of marginalism motivates the withholding of certain invoices from the bank, there’s yet a need to explain why this scenario is most appropriate to middle-rated firms. To complete the picture, we introduce another well-known concept of Homophily, which refers to the tendency of agents to associate with those bearing similar characteristics. This phenomenon was found in many social systems, and many markets in particular (e.g. the labor market where profitable firms match with high productivity workers, or the marriage market [4]). As seen in Figure 3, this is also the case in our trade network: Sellers associate with same-rating buyers. Now, figure 4 provides evidence that low-credit rated buyers show a tendency to engage in small sales. This, combined with the marginalism argument presented above, has a straightforward practical implication: Chances are that credit constrained sellers would not sell to large and reputable buyers. Should they require financing, they may collateralize the bulk of their invoices, large and small.

A similar practice is applied by sellers with solid credit histories. These sellers enjoy good terms on loan, so with the aim of getting as much ‘cheap’ credit as possible, these high-rated sellers have no reason to withhold invoices, as the majority of these invoices don’t incur the implicit penalty of reduced terms on loan. In between are the middle rated sellers that sell to either high or low rated buyers (large and small). These sellers would exercise logic to prioritize their receivables when presented for discounting.

The sensitivity of an agent to its neighborhood

Consider a seller embedded inside its trade-neighborhood of buyers . We may recall prerequisite (iii) in the introduction section, that describes how the network must lend itself to measure the sensitivities of nodes of different species with respect to the neighborhood indicator.

We now consider the financial costs (FC) of the participants.111Roughly, this is the interest paid on loans. This factor is intrinsic to all nodes, including the direct neighbors of any agent. So it is possible to measure the correspondence between an agent’s financial costs and the aggregated financial costs of its neighbors. Figure 5

plots this estimation. The axes give the financial costs of a seller and the geometric mean of the financial costs of his customers, in logarithmic values. In this instance, little or no correlation exists between the financial costs across the trade relationships.

Figure 5: Financial costs (interest paid) over sales of a seller compared against log average FC/S of the buyer. The main effect is negligible between the of a seller and of the trade-neighborhood.

We could further plot these financial costs vs. neighborhood mean financial costs separately per each credit rating class. The estimator for different credit rating classes, which is shown in figure 6, gives evidence that a percent change in the financial costs of the supplier corresponds with up to 7.9% in the average financial costs across the direct neighbors, depending on the credit rating of the seller. The greatest effect can be seen among the ‘B’, or medium rated firms, while the ‘A’ rated firms show little or no effect. Compared against the overall effect in figure 5, this figure demonstrates that the ratios of sellers belonging to the ‘B’ and ‘C’ credit rating classes exhibit greater sensitivity to their neighboring nodes.

It is important at this stage to appreciate that different sensitivities to the immediate neighborhood do exist through this internal property of financial costs.

Figure 6: with reference to (4), of the seller is plotted against log average of the buyer neighborhood. The agents are grouped into the three investment grades. It is immediately evident that high credit rated sellers (in panel A) have little or no correlation with the financial costs of their buyers, while the medium (B) and low (C) credit-rated sellers percentage increases in financial costs correspond with constant percentage changes in the mean financial costs of their neighborhood. These elasticities are estimated as 7.9% and 4.6% respectively.

The information exposure

One of the quantities determining the proportion of the network visible to the bank is the ratio between the total amount in the invoices that each seller registered at the bank during the year, over their annual net-sales, as reported in their financial statement. This provides the necessary topological aggregate of the network under full information, as required by our method. To describe this quantity we use the following parameters.

We name the agents in the ordered pair as the (c)ustomer (i.e. the buyer) and the (s)upplier. Both are members of the population of agents .

The amount of invoices that seller sent to customer and which he also presented as collateral to the bank for a loan we call . This is the annual aggregate of the invoices presented by on account of the contracts written by him to his customer . The face value of an invoice serves as collateral for a short term loan in the ‘credit line’. The annual total of all the invoices that presented to the bank is:


This amount should be a good proxy for the amount of the total of short term loans that received to finance production.

The information exposure of seller , symbolized , is the proportion of sales of seller , that were presented as collateral:


A quick interpretation of the information exposure parameter could be the following: The net-sales is an item listed in the profit and loss statement of a firm and is an annual aggregate (also termed a flow variable). The net sales item condenses all possible profit-making activity of the firm, and implicitly includes all the possible information from the firm’s trade neighborhood in the full network, thus it is a rough description of the reduction in information . The numerator , is the aggregate face value of all the invoices presented for discount in the short term. Thus, firms running on low profit margins are expected to display ‘i’ relatively close to 1 because the amount of operational credit should be proportional (and close to) the sale amount. A low ratio of collateral size over net-sales means that in the short term, the firm leverages to a lesser extent in order to finance the production or goes to get credit elsewhere. For this latter claim we assume that our sample is not biased and therefore, the credit obtained from other banks occurs uniformly across all the firms in our sample.

The value of ‘i’ is greater or equal to zero and can exceed unity. There are three possible situations:

  • i’ could be greater than one. A naive222We assume that our data does not contain traces of illegal activity. Otherwise we would have to remove from our dataset firms that have . view for why is that there is misalignment between the time-frames in the data; the closing of the audit and the expiration of all trade-credit contracts that were signed in the same year.

  • If seller has , the numerator in (6) vanishes. The interpretation is that is not a direct client of the bank. Rather his customer, , is. The customer entered into the system by executing an outgoing payment transaction.

  • When , a discrepancy exists between the total collateral and the net-sales. This could hint that the production of requires loans that are smaller than the sales. It is an indication of a healthy use of the credit line.

This variable should not be confused with the information sets and . It is however possible that and may represent the same identities exactly, i.e. in networks where only exposure summaries are available. In the example set up, the typical message will be written as in (3). Here will be the credit rating of ’s customer, . Based on the last two sections in the introduction, the RATING score is expected to have a non-trivial relationship with information exposure. We will therefore define quantities that relate the two, such as the average information exposure for each RATING score. This is estimated over seller firms that have rating score . The set of sellers with rating is


where is the RATING score of firm . The average information exposure over all firms having the same RATING is


where is the information set of seller firm . In further analysis we will filter out firms having .

The revealed effect

Let us consider what would the information exposure parameter look like were we to introduce it under random sampling assumptions. For this a work by Watanabe et. al. [50] that estimates the relationship between in-degree and the total of accounts receivable is useful. They used a full business network where only the identities of firms are known (no fund transfers were recorded). They estimated that under full information independent of the rating, and so, grouping (6) by size classes we have . Figure 7 reflects our finding in contrast to this estimation.

Figure 7: Information exposure of sellers against their credit worthiness in two situations: The revealed effect and the expected theoretical effect under the random. To increase our confidence in the ordering of RATING we use Financial Costs over Total Bank Loans as a convenient continuous proxy parameter, since RATING is ordinal (For grouping by RATING classes we refer to fig. 9). Far up the Y-scale we theorize the information disclosure under the random, with a symmetric envelope of error bars that have the maximal widths of the main plot on the bottom. There, is approximately unity and independent of the credit-worthiness, so shows [50].

Notably, in the figure we use a continuous quantity as a proxy for credit worthiness otherwise described in this document by the RATING variable. The underlying intuition is straightforward; note that sellers with good credit rating will usually enjoy low interest rates and a relatively seizable credit line. Consequently they obtain low values of FC/TBL. On the other extreme, low-credit firms have limited access to bank credit hence their FC/TBL is expected to be greater (, df=49, p-value , in the test of independece between RATING and FC/TBL).

As we can see in figure 7 the effect of information exposure with credit worthiness is significantly distinguishable from what would be expected under random sampling. This rules out the possibility that random missingness and sampling errors constitute the main effect.

This is a summarizing evidence in support of the existence of an observer and the implied intentional exposure by the agents. It still remains to see whether other confounding factors may cause this observational equivalence.

Mediation analysis

With reference to figure 3, it is important to note at this stage that a test of independence is categorical and does not take into consideration any ordering of the columns or the rows. However, the table used as input maintains the original ordering of the RATING classes. Thus, the pattern that appears as blue along the diagonal does indicate higher-than-expected encounter of similarities in the two nodes sharing a trade-link. And any other ordering of the columns or rows would result in a less compelling pattern.

One reservation could be made on the result above: it is remarkable that the RATING of the seller is so similar to the average RATING score of their buyers. Looking at the sectoral affiliation of the buyers and the sellers (figure 8) it seems that sellers and buyers are, in the main, trading inside the same industries especially when within the manufacturing sectors (categories 1 .. 3 of the NACE industrial classification [13]).

The visual mosaic is symmetric, i.e. it can be transposed while maintaining an almost identical pattern of red and blue tiles. One exception is major category 5, wholesale, that disrupts this symmetry. Firms in industries 1 and 3 sell to firms in 5. According to our data set firms in 5 do not sell to those in 1 nor 3, but rather to those in 4 (energy). In the real world, wholesale trade does connect between manufacturers, but in our case, it only provides a transient path out of the manufacturing industries, and splits the supply chain. The accounting procedures of wholesale firms are at the root of this problem. A good explanation for this can be found in [26].

Figure 8: Input-Output test of independence between buyer and seller. The 1-digit NACE industrial classifications are roughly: manufacturing, 4 is energy, 5 is wholesale and retail, 6 is transport, and 7 is real estate.

Since sectoral affiliation is known as one of the macro-determiners of RATING scores, we addressed this conflation by creating a list of firms that trade outside their industry, and then performed the test of independence of the RATING score again. Approximately 15% of the buyer-seller links connect firms from the same industrial classification. Using only the inter-industrial trades, we obtain a similar pattern as in figure 3 with high degree of confidence. Again we can reject the hypothesis of complete independence of the RATING score between buyers and their sellers, and suggest the same trending behavior as before.

Figure 9 gives the relation between RATING and the information exposure, ‘i’ in the colorful tiles, or its average as a curve. From this figure we learn that information exposure does indeed depend on RATING; the average information exposure is at its minimum in the middle of the RATING scale. RATING scores of firms in the ‘speculative’ financing group have the lowest average information exposure. We postulate that these middle-rated firms optimize the amount of information they expose.

To quantify the relation we created a partition of the data set into equal-count ‘i’-groups: the set of 129,584 suppliers was ordered by ‘i’ and then a division into groups was made every 12,958 or 12,959 records. The smallest value of ‘i’ in each group give the tick labels on the Y-axis. The area of each tile in the mosaic is proportional to the count of sellers that have RATING=X and , being the tick marker index. The color code marks either significantly higher (blue) or significantly lower than expected (red) frequency of occurrence.

The relationship of RATING and ‘i’ is, again, a U-shape.

In the two extreme RATING scores, 1 and 9, the information exposure is the greatest. This sits well with the expectations that ‘investment’ grade firms will have dispersed their risk and therefore are indifferent to collateral quality. And that firms in risk of default will be (or think they are) forced by the bank to surrender all possible information.

Figure 9: RATING group vs. average information exposure parameter as defined in (8). The grouping procedure is described in the text. To illustrate the U-shape there is an overlaying line plot of the averages .

It might be that middle-rated borrowers are in less need of credit or get their credit elsewhere. However, seeing from the distribution of RATING classes most of the firms in the industrial population are populating the middle-rated places, and possibly sell to both the investment and the credit constrained firms. Given this frame, middle-rated firms collect sales (and collateral) from the complete range of buyers. Should the “bad” buyers coincidentally be the ultimate providers of collateral with the smallest total face values, prioritizing the invoices is straightforward based on probability of payback, and marginal contribution to credit financing.

To show this we present a test of independence between buyer’s rating class and their total purchases on trade-credit (amount of invoices that they received from their sellers) in figure 4. The reason not to normalize by the net-sales is that we’d like to keep the presented invoices clear of contamination by the implicit social factors, and leave in only the direct and visible ones.

We generate a cross tabulation of buyer firm rating class vs. equal-counts of buyer firms that are ordered by log of total invoice face values. The area (or rather the heights) of the tiles in each column are scaled proportional to the relative population sizes within the same credit-rating class.

It is important to note at this stage that from the financial statements the proportion of purchases over sales is fixed at approximately half (across all financial statements - mean=0.49, median=0.45, n=513,082, sd=0.42). In figure 4, the bulk of firms that make large purchases is prominently situated in the middle credit-rated classes. As such, it is a tracer for proportionally larger production scale of the middle-rated firms and thus they must be in greater need for credit, definitely not less.

Not coincidentally, as the figure shows, if a seller would rank buyers according to invoice face values, the incrementally small additions to the total sales would be from the lower-end rated partners.

If middle-rated sellers send out invoices exclusively to middle-rated buyers (fig 3), but middle-rated sellers also need less credit, (fig 9), then middle-rated buyers should also be purchasing less in invoices, contrast to what can be seen on figure 4, where they hold the highest volumes in purchases.

We could further suggest an alternative scenario that supports the same observation of the U-shape (figure 9), whereby middle credit-rated suppliers use multiple credit sources and therefore present smaller volumes in collateral to each bank. This may occur whenever presenting more collateral would have little effect on the terms on loan with any single bank. Under such ‘decreasing returns’ scenario the U-shape may flatten. However, reaching this plateau warrants starting a new credit line elsewhere. It is noteworthy that this possibility still falls within our methodology because whether driven by impression management or by decreasing returns, information is withheld due to strategic considerations on part of the borrower, giving rise to differential information exposure across species. A risk planning strategy of the bank is considered by us inferior to the option of active withholding of information by the borrower because competing banks may exercise this same strategy by nature of their competition, and the fact that borrowers still take their business with the monitoring bank, whose own view this study exposes, means that prioritizing of invoices still occurs. Controlling for externalities such as competition and geographical proximity to the financial sector is in the scope of another paper currently in preparation.


A common working assumption in many studies is that the observed network is observed at random and that the network available to us is a representative sample of the true network. A major challenge to this view is posed by the possibility that the network is monitored. Not only is the true network not visible to us, neither is the process that causes the data to be missing. So a direct statistical test is not feasible. We offer to circumvent this problem by a cascade of hypotheses and mediation analysis. Assuming the network is not monitored corresponds with the fact that the observer’s payoff is not sensitive to the quality of the social ties, and the agents are not expected to withhold information on quality. Therefore, the information exposed by agents in the unmonitored network should be independent of the species of the agent or else a spurious correlation should exist owing to the intervention of mediators. Controlling for these mediators, one can, in theory, test whether the network available to us possesses the species-independence property or not.

In the example we were able to: (a) show that data are missing but not at random and suggest that the data collector has an impact on the process by which firms appear to associate, and especially on a certain subgroup of the agents, namely the medium credit-rated firms; (b) focusing on this subset of the agents, there is a clear social dimension which is related to an accounting behaviour that is fondly termed in the literature ‘window-dressing’ [2] and thus (c) show that the immediate neighborhoods of these agents are more likely to be missing complete links and nodes structure; (d) suggest a tendency of this network to be vulnerable to targeted removal of specific species of nodes. These species may be correlated with industries, similar to the scenario of a systemic shock arising when the bank issues a regulatory action on the industry as a whole.

Since the bank is collecting the data and so assumes a point of view we suggest that (e) distress response due to internal dynamics is rarely visible to the bank. The synchronized response subsequent to an intervention of the regulatory system is visible, however, because of its causal nature.

Implications, applications and limitations

We have developed a method for exploring data on networks in situations where hidden information affects the outcome. Essentially to measure a network under the hypothesis that data are missing not at random, we choose a set that has (a) a system of interacting agents classifiable into species, (b) an observer and visible interactions between it and the players, (c) a theory that describes the payoff and offers interpretation to why would different species play different strategies. Finally (d) a measurable quantity of information that can be compared from additional input.

The proposed methodology can be used to derive the likelihood that an agent embedded in a network hides information conditional on some characteristic of the agent. In our empirical example this likelihood is associated with the credit worthiness of an industrial firm: medium rated firms have a higher probability to strategically hide information from the observer. The observer, in our specific case a bank, can use the proposed methodology to identify a subset of the population of corporate customers that are highly likely to hide information. This subset can be the target of a specific action by the bank aimed at probing the supply chain connections of the firm that are currently hidden. For example, a possible policy would be to offer a lower interest rate on loans if the firms present an invoice as collateral. In a simple version this reduction can be flat regardless of the rating of the buyer that is listed on the invoice. In a more aggressive version the bank can offer a higher reduction on interest for presenting invoices issued to lower rated buyers. The goal is to create an incentive scheme that encourages the suppliers to disclose invoices that they otherwise would be inclined to hide. Once the bank has acquired a more precise picture of the firms’ supply chains, it can adjust the rating of the firms accordingly. The proposed algorithm is essential in limiting the subset of firms that are the target of these costly actions.

We believe that there is an internal ranking mechanism of social tie quality employed by agents that renders their incentive to expose information about some peers but not others. In trade networks the lower ranked nodes may threaten the stability of the structure more than the better-ranked ones. Therefore, the more links are missing, the more is the structure susceptible to dynamic breakage. This intuition encourages the possible replication of the method in other fields where networks have a social component with either a ranking mechanism, or that information exposure could be quantified.

Last, this study points out the possibility that a single observer exists and that this entity not only monitors the agents but affects their payoffs. In other networks, several observers may exist and so agents may vary their strategies accordingly; thus further concealing their links or even creating links that do not exist.


We acknowledge the partial support by the Institute for New Economic Thinking (INET), grant IN01100017.


  • [1] Adamic, L. A. and Adar, E., Friends and neighbors on the web, Social Networks 25 (2003) 211–230.
  • [2] Allen, L. and Saunders, A., Bank window dressing: Theory and evidence, Journal of Banking & Finance 16 (1992) 585–623.
  • [3] Altman, E. I., Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance 23 (1968) 589–609.
  • [4] Becker, G. S., Altruism in the family and selfishness in the market place, Economica 48 (1981) 1–15.
  • [5] Boissay, F., Credit chains and the propagation of financial distress, Technical Report 573, European Central Bank (2006), http://ideas.repec.org/p/ecb/ecbwps/20060573.html.
  • [6] Bottazzi, G., Grazzi, M., Secchi, A., and Tamagni, F., Financial and economic determinants of firm default, Journal of Evolutionary Economics 21 (2011) 373–406.
  • [7] Bottazzi, G. and Secchi, A., Explaining the distribution of firm growth rates, The RAND Journal of Economics 37 (2006) 235–256.
  • [8] Clauset, A., Moore, C., and Newman, M. E., Hierarchical structure and the prediction of missing links in networks, Nature 453 (2008) 98–101.
  • [9] Donath, J., Signals in social supernets, Journal of Computer‐Mediated Communication 13 (2007) 231–251.
  • [10] Donath, J. and Boyd, D., Public displays of connection, BT technology Journal 22 (2004) 71–82.
  • [11] Dorogovtsev, S., Mendes, J., and Samukhin, A., Structure of growing networks with preferential linking, Physical Review Letters 85 (2000) 4633–4636.
  • [12] Duchin, R., Ozbas, O., and Sensoy, B. A., Costly external finance, corporate investment, and the subprime mortgage credit crisis, Journal of Financial Economics 97 (2010) 418–435.
  • [13]

    eurostat, E. C., Statistical classification of economic activities in the european community, rev. 1.1 (2002),

  • [14] Frank, O. and Snijders, T., Estimating the size of hidden populations using snowball sampling, journal of official statistics-Stockholm 10 (1994) 53–67.
  • [15] Franke, R. H. and Kaul, J. D., The hawthorne experiments: First statistical interpretation, American sociological review 43 (1978) 623–643.
  • [16]

    Friendly, M., Mosaic displays for multi-way contingency tables,

    Journal of the American Statistical Association 89 (1994) 190–200.
  • [17] Goldenberg, J., Libai, B., Solomon, S., Jan, N., and Stauffer, D., Marketing percolation, Physica A: Statistical Mechanics and its Applications 284 (2000) 335–347.
  • [18] Golo, N., Kelman, G., Brée, D. S., Lamieri, M., and Solomon, S., Many-to-one contagion of economic growth or economic crisis across the trade credit inter-firm network, arXiv:q-fin.GN/1506.01734 (2015).
  • [19] Guimerà, R. and Sales-Pardo, M., Missing and spurious interactions and the reconstruction of complex networks, Proceedings of the National Academy of Sciences 106 (2009) 22073–22078.
  • [20] Holloway, T., Bozicevic, M., and Borner, K., Analyzing and visualizing the semantic coverage of wikipedia and its authors, Complexity 12 (2007) 30–40.
  • [21] Huberman, B. A. and Adamic, L. A., Internet: Growth dynamics of the world-wide web, Nature 401 (1999) 131.
  • [22]

    Huisman, M., Imputation of missing network data: Some simple procedures,

    Journal of Social Structure 10 (2009) 1–29.
  • [23] Huisman, M. and Snijders, T. A., Statistical analysis of longitudinal network data with changing composition, Sociological methods & research 32 (2003) 253–287.
  • [24] Huisman, M. and Steglich, C., Treatment of non-response in longitudinal network studies, Social networks 30 (2008) 297–308.
  • [25] Jin, E. M., Girvan, M., and Newman, M. E. J., Structure of growing social networks, Phys. Rev. E 64 (2001) 046132.
  • [26] Kelman, G., Bree, D., Manes, E., Lamieri, M., Golo, N., and Solomon, S., Dissortative from the outside, assortative from the inside: Social structure and behavior in the industrial trade network, in Proceedings of the 48th Annual Hawaii International Conference on System Sciences, IEEE Computer Society (Computer Society Press, 2015, 2015), p. 10.
  • [27] Leskovec, J. and Krevl, A., SNAP Datasets: Stanford large network dataset collection, http://snap.stanford.edu/data (2014).
  • [28] Liben-Nowell, D. and Kleinberg, J., The link-prediction problem for social networks, Journal of the American Society for Information Science and Technology 58 (2007) 1019–1031.
  • [29] Manski, C. F., Identification of endogenous social effects: The reflection problem, Review of Economic Studies 60 (1993) 531–542.
  • [30] Marotta, G., Trade credit in italy: Evidence from individual firm data, Technical report (2000).
  • [31] Newman, M. E. J., Clustering and preferential attachment in growing networks, Phys. Rev. E 64 (2001) 025102.
  • [32] Newman, M. E. J., Assortative mixing in networks, Physical Review Letters 89 (2002) 208701.
  • [33] Ohnishi, T., Takayasu, H., and Takayasu, M., Hubs and authorities on japanese inter-firm network: Characterization of nodes in very large directed networks, Progress of Theoretical Physics Supplement 179 (2009) 157–166.
  • [34] Omiccioli, M., Trade credit as collateral., Technical Report 553, Temi di discussione della Banca d’Italia (2005).
  • [35] Pattison, P. E., Robins, G. L., Snijders, T. A., and Wang, P., Conditional estimation of exponential random graph models from snowball sampling designs, Journal of mathematical psychology 57 (2013) 284–296.
  • [36] Petersen, M. A. and Rajan, R. G., Trade credit: Theories and evidence, The Review of Financial Studies 10 (1997) 661–691.
  • [37] Redner, S., How popular is your paper? an empirical study of the citation distribution, european physical journal B 4 (1998) 131–134.
  • [38] Rosenbaum, P. R. and Rubin, D. B., The central role of the propensity score in observational studies for causal effects, Biometrika 70 (1983) 41–55.
  • [39] Rubin, D. B., Inference and missing data, Biometrika 63 (1976) 581–592.
  • [40] Sieczka, P., Sornette, D., and Holyst, J. A., The lehman brothers effect and bankruptcy cascades, The European Physical Journal B-Condensed Matter and Complex Systems 82 (2011) 257–269.
  • [41] Şimşek, Ö. and Jensen, D., Navigating networks by using homophily and degree, Proceedings of the National Academy of Sciences 105 (2008) 12758–12762.
  • [42] Smith, J. A., Moody, J., and Morgan, J. H., Network sampling coverage ii: The effect of non-random missing data on network measurement, Social Networks 48 (2017) 78–99.
  • [43] Stork, D. and Richards, W. D., Nonrespondents in communication network studies problems and possibilities, Group & Organization Management 17 (1992) 193–209.
  • [44] Tamura, K., Miura, W., Takayasu, M., Takayasu, H., Kitajima, S., and Goto, H., Estimation of flux between interacting nodes on huge inter-firm networks, Vol. 16 (World Scientific, 2012), ISBN 2010-1945, pp. 93–104.
  • [45] Toma, C. L., Hancock, J. T., and Ellison, N. B., Separating fact from fiction: An examination of deceptive self-presentation in online dating profiles, Personality and Social Psychology Bulletin 34 (2008) 1023–1036.
  • [46] Tong, S. T., Van Der Heide, B., Langwell, L., and Walther, J. B., Too much of a good thing? the relationship between number of friends and interpersonal impressions on facebook, Journal of Computer‐Mediated Communication 13 (2008) 531–549.
  • [47] Utz, S., Show me your friends and i will tell you what type of person you are: How one’s profile, number of friends, and type of friends influence impression formation on social network sites, Journal of Computer‐Mediated Communication 15 (2010) 314–335.
  • [48] Varian, H. R., Microeconomic Analysis, 3rd edn. (W. W. Norton & Company, Inc., 500 Fifth Avenue, New York, N.Y. 10110, 1992).
  • [49] Walther, J. B., Van Der Heide, B., Kim, S., Westerman, D., and Tong, S. T., The role of friends’appearance and behavior on evaluations of individuals on facebook: Are we known by the company we keep?, Human communication research 34 (2008) 28–49.
  • [50] Watanabe, H., Takayasu, H., and Takayasu, M., Biased diffusion on japanese inter-firm trading network: Estimation of sales from network structure, New J. Phys. 14 (2012) 043034.

Appendix A Financial trade networks

Financial networks are known for being negatively assorted, i.e. neighbouring nodes in the network are dissimilar, in particular as regards to the degree of their in- and out-links. Among practitioners and economists this property is desired because it renders the financial network robust to percolation (propagation of distress or growth). The knowledge that contagion rarely happens may catch us by surprise when financial shocks do indeed propagate from the local level to the national/international level. In the events preceding the 2008 financial crisis, small systemic shocks affected large proportions of the industrial and trade networks. The usual response of firms to market downturns was then amplified and this response swept across the network using the monetary (communication) channels. One reason for the lack of control over this incident was that a proportion of the communication channels was not known to the banking system: the high risk mortgages were traded in the market but the credit-unworthy clients behind them remained anonymous.

a.1 General characteristics of trade networks

Assortative mixing in networks is a term describing the correlation of ‘popularity’ between different nodes. Popularity is attributed to a node and measured by the number of incoming links to it. A network is positively assorted if the number of incoming links to a node is positively correlated with the number of incoming links of its neighbors. In assorted networks messages can spread within a small number of steps since there are many redundant links via which a message could travel. Negatively assorted networks contain highly connected nodes that are positioned sparsely throughout the network. Thus, in this species of topology the fast spread of messages is less likely [32]. However, if attributes of nodes are known, it is possible to combine structural and behavioral information for efficient routing inside this network [41].

a.2 Characteristics of the network under study

The asymmetric links in this network represent the exchange of goods/services for financial payments between agents; it is similar to the better known communication networks. Each link is between a seller firm and one of its customers who bought a product or service from them. Our data contain a snapshot in time of many firms that provide goods and services in exchange for financial payments in the year 2007. This was the year when financial crises were occurring global-wide [12].

These data were collected by a single large Italian bank, fulling its function as an intermediating agent in a delayed payment procedure. The bank recorded the names of the two parties and the amounts of money that one firm, the buyer, owes the other, the seller.

In the data, the network contains a record of financial interactions between peers. The interaction under investigation is recorded when a discount process occurs. The bare explanation of a discount on an invoice is that an owner of an invoice will sell it to a financial institute for a lower price than its face value. The buyer of the invoice will be the new creditor and will take upon himself the risk that the debtor will become insolvent. This risk is combined into the rate of discount.

Today, banks offer their customers a cheaper alternative to selling their trade bills. A customer of the bank can ‘collateralize its accounts receivable’: instead of buying the invoice, the bank will extend a secured loan using the face value of the invoice as collateral. It is commonly known that, when using discounted invoices, the borrower firm is a seller, not a buyer. Extensive reviews of the reasons why this may be so were suggested in [34] and [30]. Here we note one obvious reason: the seller needs to secure funds only for production of the goods/services, whereas the buyer needs to cover the total amount of the invoice. i.e. the costs of production and the seller’s profit. Lower amounts on loan impose less risk on the lender and in return a more affordable discount is offered.

The data are not publicly available; they were directly accessible to only one researcher, who worked for the bank. Programs to extract summary data from these two data sources were written by us and executed on a computer inside the bank. The summary data, which we possess, were then further analysed to obtain the results reported here.

Appendix B Credit rating and financial costs

In order to facilitate an efficient discounting mechanism the banks created a credit-rating procedure. When the customers of the bank require loans, they should qualify as credit-worthy, i.e. be borrowers that are financially capable of paying back. Credit-rating is a score provided to all banks by an external entity;333CeBi - Centrale dei Bilanchi, a financial analyses service for the Italian banking system. that uses a computerized system which automatically assigns a RATING score, , to each firm.

The score is independently computed from the financial statements of each firm and is in the range where low credit-rating is indicated by a high value. The common practice is to further group this index into classes: high () which characterizes firms of investment grade, medium for firms somewhat susceptible to defaults, and low credit rating . Firms that score into the ‘low’ class have a high probability to default on payments. They are regarded by practitioners as having little or no access to bank credit. A firm with a score of 9 will rarely qualify for borrowing. However, since these firms appear in our data as borrowers from the bank, we assume that they did receive loans (figure 10

); low credit rating could be caused by the industry in which the firm operates. The calculation of the RATING score is proprietary but shows correlation with Altman’s Z-score

[3]. For a comprehensive explanation of the RATING score the reader is referred to [6]. It is also important to note that the RATING score of a firm is visible to all the banks with whom that firm does business. This is part of a transparent national credit system that was erected in Italy. A common credit registry is also available in other countries.

(a) frequency
(b) size
Figure 10: Histograms of sellers in (cf appendix C): firm counts (N) and total net-sales (S) per RATING score. RATING=1, 2, 7 and 8 exhibit the largest deviation of sales per number of firms.

Naturally, there are more buyers than sellers. However, the distribution of RATING scores is identical once normalized by the total number in each group. Figure 10 displays the RATING histograms of sellers.

Credit-rating affects the terms on loans. When a firm believes that its bank is imposing conditions that are unreasonable, it may resort to other means of financing. In general, banks are the sole providers of loan financing and by declining a loan based on its terms (interest rate) the firm must consider other financing channels, the most intuitive of which are dropping the contracts, or using trade-credit; the firm will ask to delay its debt to sellers and collect immediate payments from its buyers [36]. In either case, a social component is added to the pool of financing channels, and this can be traced on the network of trading firms.

Appendix C Fundamental statistics

The privately operated and public firms in the Italian industries are all borrowers from the bank. The data sets contain individual firm level information about transactions between firms (registered, collateralized invoices) and financial statement data. In the example there are 1,578,812 firms connected by 7,290,072 trade links but only 273,726 of them are sellers (have incoming trade-credit links). The network nodes are crossed with available balance sheet data of 703,858 firms in 2007. The balance sheet data come from the Italian Centrale dei Bilanchi (CeBi) and is similar in structure to MICRO.1 that was presented in [7] and elsewhere. The resulting network has 345,403 nodes connected by 2,874,830 links. In this network, 140,580 nodes are sellers and 129,584 of them are sellers that appear in the next time. This subgroup of sellers is named and the coverage by industry appears in table 2 below. It is notable that the manufacturing and construction are the largest industries. Being a seller in means having balance sheet sales that are greater than zero, at least one incoming payment link, and the linked buyers also have balance sheet information. Importantly, 122,728 of the sellers in (94%) have additional outgoing links, so they are also buyers. The remaining 215,819 firms in the network appear only as buyers. Thus, to a proportion of one half the network of buyers and sellers cannot be classified as bipartite since breaking the links from exclusive buyers leaves one third of the network intact (). Further, this observation design is at random: there are two bank consortia in Italy, and any borrower may do business with one or the other. The nodes in are guaranteed to be customers of this bank (borrowers), but a similar number of customers should therefore be of the other bank. The remainder are simply buyers that do not finance production via trade-credit. So we should expect an overlap of one third of the firms in the records of both banks. Indeed, all nodes are buyers but out of 345,403 in total, 62% are not sellers that borrow from the bank that collects the data.

NACE N 10 3 11 426 12 84 13 9 14 757 15 3073 17 3779 18 2160 19 2076 20 2102 21 1433 22 3121 23 160 24 2438 25 3959 26 3444 27 1367 28 12628 NACE N 29 9199 30 298 31 2730 32 726 33 1721 34 771 35 414 36 4744 37 304 40 42 41 12 45 7997 50 4818 51 27808 52 3843 55 194 60 2550 61 21 NACE N 62 7 63 1360 64 135 65 1 67 51 70 87 71 382 72 3731 73 49 74 4636 80 92 85 220 90 556 91 10 92 273 93 214
Table 2: Sample coverage: the number of seller firms in the trade network of the year 2007, sorted by their industrial sector. The list above excludes 6,569 firms with unidentified sectors. The industrial codes follow the format in NACE v1.1[13]. Roughly 10-13 are mining, 15-37 are manufacturing, 50-52 is construction, and 71-74 are services.

NACE - The European industrial classification scheme is a hierarchical numbering system. The leftmost digit is the major industry code. Further sub-classifications can be achieved by inspecting the less significant digits, up to 4 digits. In our analyses we use a two-digit classification.