Characterization of the Firm-Firm Public Procurement Co-Bidding Network from the State of Ceará (Brazil) Municipalities

04/17/2021 ∙ by Marcos Lyra, et al. ∙ unl 0

Fraud in public funding can have deleterious consequences for the economic, social, and political well-being of societies. Fraudulent activity associated with public procurement contracts accounts for losses of billions of euros every year. Thus, it is of utmost relevance to explore analytical frameworks that can help public authorities identify agents that are more susceptible to incur in irregular activities. Here, we use standard network science methods to study the co-biding relationships between firms that participate in public tenders issued by the 184 municipalities of the State of Ceará (Brazil) between 2015 and 2019. We identify 22 groups/communities of firms with similar patterns of procurement activity, defined by their geographic and activity scopes. The profiling of the communities allows us to highlight groups that are more susceptible to market manipulation and irregular activities. Our work reinforces the potential application of network analysis in policy to unfold the complex nature of relationships between market agents in a scenario of scarce data.



There are no comments yet.


page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Despite the weight of public procurement in governmental budgets [1], it is still one of the activities that is most vulnerable to corruption [2, 3]. In that context, corruption can have many forms [4] and occur at any point in the procurement cycle—from the pre-tendering to the tendering and the past-award phases—making it difficult to detect and measure [5, 6, 7]. In Brazil alone, corruption in procurement contracts can represent an additional to of the expected price, which represents losses of around 200 billion Reais annually [8]

. Likewise, in Europe, it is estimated that losses are of around 5B Euros annually

[9]. Naturally, these losses undermine the ability of governments and public authorities to push-forward essential investments in health, education, infrastructure, security, housing, and social services [10, 11]. Unsurprisingly, there is a considerable effort to develop analytical solutions to understand and mitigate the effects of corruption in the public procurement process [12]. Recently, the increasing availability of open data concerning public administration activities [13] has renewed the scientific community’s efforts on uncovering the hidden connections between participating agents and how their relationships can link to fraudulent activities [14, 15].

One of the most challenging aspects of identifying corruption in the context of public procurement contracts is related to the lack of labeled data. Hence, it is often impossible to know which instances correspond to corruption [16]

. In that sense, past works have approached this problem from an unsupervised learning perspective, meaning that they look to extract from the data more information about the relationships between the involved parties and, thus, flag groups of agents with patterns associated with a high risk of corruption. In that sense, a fundamental principle in public procurement is that of transparency in bidding

[17, 4, 18, 19]. In that sense, competition leads to greater efficiency for the public sector. As such, firms that developed the necessary relationships to achieve leverage to manipulating a tender process at a high risk of corrupting the procurement process [20].

Figure 1: Panel a), graphical representation of the process employed to infer the Firm-Firm co-bidding network. Panel b), comparison between the frequency of bidders per tender in the original data set (gray) and in the working data set (red) after filters have been applied. Panel c), comparison between the frequency of bids per firm in the original data set (gray) and in the working data set (red) after filters have been applied. In panel b) and c) dashed line represents the OLS regression lines, the domain of the line indicates the domain used for fitting the curve.

An open issue remains, can the communities identified from firms co-biding patterns allow us to highlight groups of firms that are more susceptible to collusion and market manipulation? The use of network analysis for the study of corruption is not new [21, 22, 23]. In the context of public procurement, past studies can be divided into two main groups: 1) works that explore bipartite relationships between public bodies and firms [24, 25]; and 2) studies that explore firm-firm co-biding relationships in public tenders [26, 27, 28, 29]. Both approaches have their merits, and each is more suitable to identify different mechanics underlying the manipulation of the procurement process. For instance, bipartite relationships are suitable to identify fraud that stems from bribes and influence ties; while firm-firm relationships are more suited to identify cartels and collusion. Despite these, the use of network analysis to study the relationship between firms in procurement bids is a relatively new venture [27], and more evidence is required in order to have a clear picture of the universality of existing patterns and mechanics across cultural and socio-economic contexts.

Here, we use methods from network science and complexity sciences to map and characterize the co-bidding network [16, 30, 31, 27] between firms that participated in public tenders issued by the municipalities of the state of Ceará (Brazil). In that sense, we provide a characterization of the relationships between competing firms and identify the major communities of firms that often compete for tenders with similar scope. Moreover, we argue that some of such communities have characteristics that place them at a higher risk of market manipulation and irregular activities often associated with corruption.


We use data from Tribunal de Contas do Estado Ceará (Brazil) covering public tenders issued by the 184 municipalities of the State of Ceará between 2015 to 2019. Each observation informs about the bid of a firm to a tender and whether the bid was one of the winning bids. It also includes information about the municipality that issued the tender, and whether a firm won a contract. Hence, the data is naturally represented through a bipartite nature [32], which connects firms to tenders (see Figure 1a). The data set contains observations that account for the bids of firms to tenders.

Information about the firms and tenders is anonymized, and bidding values are not available. Moreover, the data set does not contain information about which contracts/firms have been investigated in the past for irregularities.

Network Inference

Since we are interested in studying the relationships between firms we focus on the Firm-Firm projection. We estimate the projection from the co-bidding patterns of firms [30] using the Jaccard similarity coefficient [33, 34, 35, 16]. Figures 1a shows a graphical illustration of the structure of the data and depicts the steps conducted in order to infer the Firm-Firm network from the original Tender-Firm bipartite structure.

In order to infer the Firm-Firm co-bidding network, we start by discarding all firms that did not bid at least once during each year of analysis. By doing so, we are able to extract the core of active firms, while removing firms with sporadic activity. Figure 1b and 1c compare the original () with the filtered data set (). In particular, showing that filtering tends to remove excess participants from tenders, while it does not affect the distribution of the number of bids done by each firm. Likewise, we refer to firms present in the firm-firm co-biding network as Established firms. The final working data set includes firms, which account for bids to tenders.

Figure 2: Panel a) shows the degree distribution,

. The dashed black line represents the best exponential distribution fitted to the tail (

) of the empirical distribution. Panel b) shows the average clustering coefficient average per degree, . The dashed black line shows the best linear fit. Results have been estimated from the entire graph.

Hence, next, we compute the centered Jaccard coefficient [35] is between each pair of firms, which can be computed as:


where is one if if firm made a bid to tender , being zero otherwise; and is the fraction of tenders in which firm participated (. The second term in Equation 1 provides the expected number of observations when the bids from both firms are independent and identically distributed through a Bernoulli process [35]. Hence, the centered Jaccard coefficient allows us to distinguish between positive and negative associations between firms.

Finally, we estimate the significance of the computed (i.e., to test the hypothesis that ). To that end, we bootstrap a null distribution () of centered Jaccard coefficient for each by generating an ensemble of randomizations of the initial bipartite network. Data was randomized in order to ensure that the number of bids observed per firm and per year remained constant while keeping constant the number of firms bidding to each tender. Then, using statistical inference methods [36], we estimate the p-value associated with

by calculating the upper tail probability of obtaining a value equal or greater than

from the cumulative frequency of the null-distribution . We discard links with -value .

Figure 3: Graphical representation of the Firm-Firm network, which relates firms with similar bidding pattern. In order to build the network we consider only the most active firms, and edges with a significant Jaccard similarity index. Represented is the giant component with some relevant disconnected components. Nodes in the giant component are colored according to one of the eight major communities (out of ) identified using the Louvain algorithm [37]. The presented partition of the network achieves a modularity of .

The resulting firm-firm co-bidding network contains nodes and edges. Relationships are treated as undirected and unweighted, identifying firms that have a similar pattern of bidding. The network exhibits an average degree of , with a cluster coefficient of [38], and connected components. Figure 2 shows the Degree Distribution (panel 2a) decays exponentially with the degree. Meaning that the underlying mechanics of co-bidding can be approximated with a random attachment process [39]. However, the average clustering coefficient shows an inverse relationship with the degree (panel 2b), suggesting the existence of some level of hierarchy in the structure of the network. It is noteworthy to mention that the largest connected component contains nodes, edges, and a clustering coefficient of .

Results and Discussion

Figure 3 presents the giant component of the firm-firm co-bidding network. In particular, we highlight the eight largest communities (nodes are colored accordingly). Using the Louvain algorithm [37] we identified communities with a modularity of . We refer to these communities as , ,…,, whose index is ordered in descending order to the number of firms in the communities. The high modularity of the network is, however, unsurprisingly and can be explained by the fact that the network represents firms in different markets characterized by the different nature of contracts (e.g., works, services, etc) in different regions. As indicated in Figure 3, the largest communities divide the network into two major groups of firms that operate mostly in the northern (Red, Blue, and Green) and south (Purple and Yellow) regions of the state of Ceará, but also on contracts that deliver Food services (Blue and Purple) or construction works (Red and Yellow). Interestingly, the remaining communities highlighted identify firms that compete at state-wide level (Pink and light Blue) and one particular instance of a community (Violet) operating in a small region (Jaguaribe) and a single contract type (Food services). Hence, the network highlights competing markets and provides a characterization that is interpretable.

As discussed above, communities in Firm-Firm network structures can be used to identify market segments. In limiting scenarios, cases in which firms form ”echo chambers” or highly dense communities, it also allows us to flag groups of firms that present a high risk of collusion and procurement manipulation. In other words, corruption. As such, next, we explore the use of network science as an approach to classify the communities of firms that might be of interest to investigate deeper by the authorities of interest.

Figure 4: Characterization of the ten largest communities by the diversity of bids done by region and type. Panel a) shows the distribution of bids within each community by geographical region (left) and contract/tender type (right). For each community we compute the Simpson’s diversity index ( and ). Panel b) compares communities by their diversity of contracts in terms of regional span and type. Note that in panel a) we only show results for the ten largest communities, which are representative of the results. Communities not identified by a color code in Figure 2 are shown in gray in panel b), in the particular community corresponds to the gray clique easily identifiable in the bottom left of the network in Figure 3.

Activities Diversity

We start by looking at the regional diversity on which firms performed their activities (e.g., bid on tenders), and also the diversity of the type of contracts that they bid to. While a firm with low diversity in both regional reach and contract-type can simply indicate a firm that is narrow in both scope and domain, the existence of clear groups (i.e., a community of firms) that share such indicator can highlight more a troublesome scenario. In particular, it can indicate the conditions for firms to coordinate and cooperate to control a specific market in a specific regional context, and should be investigated with further care. To that end, we start by estimating the Simpson’s diversity index111In some fields the Simpson’s index is also known as the Herfindahl index. for each community, which can be computed as


where corresponds to the fraction of bids done in a procurement contract of type {Consumables Health, Services, Construction, Events, Food, Fuel,…} or regions type {Metropolinana, Norte, Sul, Noroeste,…} by the firms in community , . The quantity is normalized per community, so that . We estimate independently for each community (), and for contracts according to the region that issue the tender and the tender contract type (e.g., services, food, tenancy, constraction, etc). Our choice of the Simpson’s index over other alternatives (e.g., entropy) is due to its straightforward interpretation: the probability that two bids from a community are in the same category (e.g., region or contract type).

Figure 4a illustrates the empirical distributions () of procurement activity for the ten largest communities. We show the results for both the Regional distribution of activities and by Contract Type. Blue colors denote a low relative frequency of bids, while red identifies a high frequency. These indicators allow us to infer the degree of specialization and agglomeration of a community. In particular, we find that Community 8 () activities are agglomerated in a single region (Jaguaribe) and firms specialize in one type of contract (Food). The same conclusion can also be inferred from the high levels of , which means that Community 8 has low diversity of activity distribution. Figure 4b compares all the communities in terms of the two diversity indicators defined above. We find a clustering of communities in the bottom left quadrant— a low level of agglomeration and specialization—that we associate with healthy markets composed of firms that, on average, have a diversified portfolio of activities and regional distribution. In contrast, in the top right quadrant, we find communities that rely on procurement contracts of a single type and agglomerated in a small number of regions.

The combination of these two diversity indicators, at the community level, provides a powerful feature to identify groups of firms that can dominate over a niche market or, in the worst case, develop undesirable leverage, as a group, in negotiating procurement contracts. Hence, lowering the desirable efficiency that public procurement aims at achieving in the tendering process. However, it is important to stress that these metrics are just indicative of potential problems, and thus the true nature of the activities of the firms in each community should be carefully investigated by the corresponding local authorities.

Bidding Coordination

Figure 5: Characterization by bidding activity. Panel a) shows the average number of single bids per community (i.e., the average of the number of times a firm in a community participated in a tender as the single bidder). We compare the values of each company with the average of the entire population of firms (horizontal red line). Panel b) shows the average number of bidders in tenders that firms within a community typically participates. We normalize the value obtained for each community by the number of firms in that community. The horizontal red line shows the threshold that marks the size of the community

To further investigate the risk/susceptibility of market manipulation by firms, we next look at the propensity that each community has in participating in ”single bidder” contracts. Another pattern often associated with corruption and loss of efficiency. Hence, what is the susceptibility of each community to such practice? To answer that question, we start by investigating the average number of times, per community, that a firm is the single bidder of a tender. Figure 5a shows the results for all communities in the largest component of the Firm-Firm network. Traditionally a higher level of single bidders would be the main indicator of market manipulation. However, at the community level, both very low and very high levels are indicative of unusual activity. In particular, if we set as a baseline number of single bids by a typical firm. The reason being that low levels indicate the risk of coordination (e.g., firms participating coherently in the same contracts) while high levels can sign the prevalence of less competitive markets. Overall, of the largest 10 communities, only Community 8 exhibits low levels of single bidders, a pattern extended to Communities 14 and 21 as well.. In contrast, we see that community 12 strongly deviates from the average with average value of single bidding that is roughly four times larger than the average.

In addition, it is important to look at the average number of bidders per tender in order to assess the potential existence of coherent behavior, that is, coordination between the firms in a community. To that end, for each community, we estimate the average number of bids per tender, which we normalize by the size of the community (i.e., the number of firms in a community). Interestingly, Figure 5b shows that in Community 8 firms participated tend to participate in tenders with a number of participating firms that matches almost exactly the size of the community. While, in some cases—Communities 14 and 19—the numbers are several times larger. Noteworthy to mention that this analysis is biased by the size of the communities, so the expectation would be to see a smoothly increasing relationship, with the largest community achieving the smallest value, and in the limiting case of a community with a single firm we would obtain the maximum. However, it is clear that in some cases–Communities 8, 14, and 19– there are clear deviations.


In this manuscript, we explore the potential of mining a large data set of public tenders collected from the activity of firms to compete for procurement contracts issued by the municipalities of the State of Ceará (Brazil). By matching firms with similar bidding patterns, we have inferred a firm-firm network comprising a total of nodes and edges.

We show that we are able to identify communities of firms with similar bidding patterns. The network exhibits a high modular structure partitioned in communities. These communities cluster firms that have a similar scope in procurement activity both in the nature of the contracts they celebrate and in the regional reach of their activities. Moreover, we look at two diversity indicators—regional diversity and procurement contract nature diversity—as a sign of the potential of certain communities to develop leverage over the procurement process. In other words, in affecting the expected efficiency of the market. Finally, we look at the sizes of the tenders, first by looking at the abundance of single bidders in communities, and secondly by looking at the average number of bidders in each tender. Overall we have identified on a particular community (Community 8) that combines several undesirable properties. Community 8 involves a group of firms that offers Food services in the region of Jaguarabe. They have an unusually low number of single bids; the average number of participating firms per tender matches the number of firms in the community, and they exhibit a high specialization and agglomeration in their activities.

Finally, it is important to highlight some shortcomings in our analysis and future working directions. The lack of pre-labeled data on past cases of corruption largely limited our ability to make any causal link between the network structure, its motifs, and the location of firms in the network with irregular procurement behavior. In that sense, our results are merely exploratory and show the potential of combining network science methods with descriptive statistics to highlight relevant groups of firms according to their activity pattern in a data-scarce environment. Future works should look at the evolution of the network, that is, if a larger temporal window is available, to capture the evolution and segregation of communities of interest but also of their parametric path in terms of the diversity of their activities.


MSL was financially supported by Tribunal de Contas do Estado do Ceará through a Phd scholarship. BD, FLP, and FB acknowledge the financial support provided by FCT Portugal under the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC). The authors are thankful to Tribunal de Contas do Estado do Ceará for sharing the data for this study. The authors are thankful to Cristian Candia for the useful discussions and insights. The findings, interpretations, and conclusions expressed by the authors in this work do not necessarily reflect the views of the Tribunal de Contas of Ceará.


  • [1] Oecd.stat.: Government at a glance—2017 edition: public procurement,, accessed: 2021-03-14.
  • Murray [2014] J. G. Murray, Procurement fraud vulnerability: a case study, EDPACS the EDP audit, control and security newsletter 49, 7 (2014).
  • OECD [2015] OECD, Preventing corruption in public procurement, (2015).
  • Angulo Garzaro [2018] N. Angulo Garzaro, Eu competition law and public procurement: Competition-driven limits imposed to public bodies when they source goods, works and services, Works and Services (April 9, 2018)  (2018).
  • Mufutau [2016] O. V. Mufutau, G. O. nad Mojisola, Detection and prevention of contract and procurement, fraud crtalyst to organization profitability, Journal of Business and Management 18, 09 (2016).
  • Rustiarini et al. [2019] N. Rustiarini, T. Sutrisno, N. Nurkholis, and W. Andayani, Why people commit public procurement fraud? the fraud diamond view, Journal of Public Procurement 19, 345 (2019).
  • Whiteman [2019] R. Whiteman, Fraud and corruption tracker, The Chartered Institute of Public Finance and Accountancy – CIPFA  (2019).
  • Zeferino [2020] L. F. Zeferino, A corrupção na construção de edifícios públicos no brasil: análise de instrumentos inibidores e facilitadores na etapa de projeto arquitetônico,  (2020).
  • Hafner et al. [2016] M. Hafner, J. Taylor, E. Disley, S. Thebes, M. Barberi, M. Stepanek, and M. Levi, The Cost of Non-Europe in the Area of Organised Crime and Corruption: Annex II-Corruption (RAND, 2016).
  • Søreide [2002] T. Søreide, Corruption in public procurement. Causes, consequences and cures (Chr. Michelsen Intitute, 2002).
  • Beittel et al. [2019] J. S. Beittel, P. Meyer, and et al., Combating Corruption in Latin America: Congressional Considerations (Congressional Research Service, 2019).
  • Fazekas et al. [2018] M. Fazekas, L. Cingolani, and B. Tóth, Innovations in objectively measuring corruption in public procurement, in Helmut K. Anheier, Matthias Haber, Mark A. Kayser (Eds.). Governance Indicators. Approaches, Progress, Promise. (Oxford, UK: Oxford University Press, 2018) Chap. 7, pp. 154–185.
  • Curado et al. [2020] A. Curado, B. Damásio, S. Encarnação, C. Candia, and F. L. Pinheiro, Scaling behavior of public procurement activity, arXiv preprint arXiv:2007.15276  (2020).
  • Herrera [2019] M. D. Herrera, Using social network analysis in open contracting data to detect corruption and collusion risks, Eindhoven University of Technology - Department of Mathematics and Computer Science  (2019).
  • Kertész and Wachs [2021] J. Kertész and J. Wachs, Complexity science approach to economic crime, Nature Reviews Physics 3, 70 (2021).
  • Wachs and Kertész [2019] J. Wachs and J. Kertész, A network approach to cartel detection in public auction markets, Scientific reports 9 (2019).
  • Adjei-Bamfo et al. [2019] P. Adjei-Bamfo, T. Maloreh-Nyamekye, and A. Ahenkan, The role of e-government in sustainable public procurement in developing countries: A systematic literature review, Resources, Conservation and Recycling 142, 189 (2019).
  • Nowrousian [2019] B. Nowrousian, Combatting public procurement criminality or simple rules for complex cases, Journal of Financial Crime  (2019).
  • Spagnolo [2012] G. Spagnolo, Reputation, competition, and entry in procurement, International Journal of Industrial Organization 30, 291 (2012).
  • Hanák and Serrat [2018] T. Hanák and C. Serrat, Analysis of construction auctions data in slovak public procurement, Advances in Civil Engineering 2018 (2018).
  • Lauchs et al. [2011] M. Lauchs, R. Keast, and N. Yousefpour, Corrupt police networks: uncovering hidden relationship patterns, functions and roles, Policing & society 21, 110 (2011).
  • Chang [2018] Z. Chang, Understanding the corruption networks revealed in the current chinese anti-corruption campaign: A social network approach, Journal of Contemporary China 27, 735 (2018).
  • Grassi et al. [2019] R. Grassi, F. Calderoni, M. Bianchi, and A. Torriero, Betweenness to assess leaders in criminal networks: New evidence using the dual projection approach, Social Networks 56, 23 (2019).
  • Fazekas and Tóth [2016] M. Fazekas and I. J. Tóth, From corruption to state capture: A new analytical framework with empirical applications from hungary, Political Research Quarterly 69, 320 (2016).
  • Wachs et al. [2019] J. Wachs, T. Yasseri, B. Lengyel, and J. Kertész, Social capital predicts corruption risk in towns, Royal Society open science 6, 182103 (2019).
  • Toth et al. [2014] B. Toth, M. Fazekas, A. Czibik, and I. Toth, Toolkit for detecting collusive bidding in public procurement with examples from hungary, Corruption Research Center Budapest  (2014).
  • Reeves-Latour and Morselli [2017] M. Reeves-Latour and C. Morselli, Bid-rigging networks and state-corporate crime in the construction industry, Social Networks 51, 158 (2017).
  • Morselli and Ouellet [2018] C. Morselli and M. Ouellet, Network similarity and collusion, Social Networks 55, 21 (2018).
  • Wachs et al. [2020]

    J. Wachs, M. Fazekas, and J. Kertész, Corruption risk in contracting markets: A network science perspective, International Journal of Data Science and Analytics , 1 (2020).

  • Piccolo et al. [2018] S. Piccolo, S. Lehmann, and A. Maier, Design process robustness: a bipartite network analysis reveals the central importance of people, Design Science 4 (2018).
  • Ramalho et al. [2020]

    H. M. B. Ramalho, A. T. C. Almeida, and A. A. Fraga, Detection of suspected collusion cases in public biddings: an application of the a priori machine learning algorithm for the state of paraiba, in 

    Teroria e Prática em Administração - TPA, Vol. 10 (2020) pp. 5–22, 2nd ed.
  • Fierăscu [2017] S. I. Fierăscu, The networked phenomenon of state capture: Network dynamics, unintended consequences, and business-political relations in hungary, Central European University  (2017).
  • Veech [2013] J. A. Veech, A probabilistic model for analysing species co-occurrence, Global Ecology and Biogeography 22, 252 (2013).
  • Mainali et al. [2017] K. P. Mainali, S. Bewick, P. Thielen, T. Mehoke, F. P. Breitwieser, S. Paudel, A. Adhikari, J. Wolfe, E. V. Slud, D. Karig, et al., Statistical analysis of co-occurrence patterns in microbial presence-absence datasets, PLoS One 12, e0187132 (2017).
  • Chung et al. [2019] N. C. Chung, B. Miasojedow, M. Startek, and A. Gambin, Jaccard/tanimoto similarity test and estimation methods for biological presence-absence data, BMC bioinformatics 20, 1 (2019).
  • Gotelli [2000] N. J. Gotelli, Null model analysis of species co-occurrence patterns, Ecology 81, 2606 (2000).
  • Blondel et al. [2008] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment 2008, P10008 (2008).
  • Newman et al. [2020] M. E. Newman, G. T. Cantwell, and J.-G. Young, Improved mutual information measure for clustering, classification, and community detection, Physical Review E 101, 042304 (2020).
  • Albert and Barabási [2002] R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, Reviews of modern physics 74, 47 (2002).
  • [40] In some fields the Simpson’s index is also known as the Herfindahl index.