Identifying collusion groups using spectral clustering

09/22/2015 ∙ by Suneel Sarswat, et al. ∙ 0

In an illiquid stock, traders can collude and place orders on a predetermined price and quantity at a fixed schedule. This is usually done to manipulate the price of the stock or to create artificial liquidity in the stock, which may mislead genuine investors. Here, the problem is to identify such group of colluding traders. We modeled the problem instance as a graph, where each trader corresponds to a vertex of the graph and trade corresponds to edges of the graph. Further, we assign weights on edges depending on total volume, total number of trades, maximum change in the price and commonality between two vertices. Spectral clustering algorithms are used on the constructed graph to identify colluding group(s). We have compared our results with simulated data to show the effectiveness of spectral clustering to detecting colluding groups. Moreover, we also have used parameters of real data to test the effectiveness of our algorithm.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Trading in a stock exchange

A stock exchange provides a platform for people to trade stocks of companies that are listed in the exchange. Suppose a potential buyer intents to buy a stock . So offers or bids a price for one unit of stock of . A potential seller , who intents to sell , offers or asks a price for one unit of stock of . Such offers by potential buyers or sellers are called orders. If the bidding price is greater than or equal to the asking price, then a trade takes place. This means that transfers a certain units of (say, ) to , and pays the total money for the units to as per the matched price. The quantity is called the volume of the trade.

The stock exchange does not reveal the identity of a buyer or a seller. Normally, there are many buyers and sellers for a stock. For any stock , such numbers vary throughout the day. In most markets the incoming buy or sell order is either matched with the existing order or placed in a priority queue, where priorities are based on the price. For illiquid stocks, it is possible that a buyer a one seller plan together and place orders on a predetermined price and quantity so that bidding and asking prices match exactly. Such trades are called synchronized trades.

In 2011, Securities Exchange Board of India (SEBI), the stock market regulatory of India, initiated regulatory actions against certain individuals [1]. As per the report [1], these individuals were suspected to be involved in creating substantial volumes, which appear to be artificial in nature, executing synchronized and structured trades. This group of individuals was also found to be increasing or maintaining prices and providing misleading signals to the market by artificially injecting volumes in certain stocks and also contributing to the price movement. Further, SEBI observed that such trades appear to be taking place in an unbridled manner. These traders also trade with other non-colluding persons.

In this paper, we present an algorithm for this problem of identifying such groups of individuals in an efficient manner. Throughout the paper, such groups are referred to as collusion/colluding groups.

1.2 Problem formulation

Trading in a stock exchange can be represented as a simple undirected weighted graph , where each vertex of represents a trader in a stock exchange, and there is an edge between two vertices and of if (i) there is a trade between and or (ii) there is a trader who has traded with both and . Every edge of is assigned a weight . The parameters such as price movement, number of trades, total volume, commonality between every pair are used to compute the weights. Since a collusion group is expected to be closely connected through trades, the corresponding subsets of vertices of are called clusters in . The problem of identifying collusion groups in a stock exchange reduces to the problem of identifying clusters in .

1.3 Our approach

For detecting collusion groups, graph clustering methods have been used earlier by Palshikar and Apte [7], and Islam et. al [5]. Their algorithms use total volume to compute the weights between two traders, and these algorithms have been tested on small simulated data. For the first time, we apply spectral clustering technique to the problem. Spectral clustering has been successfully used to find communities in graphs by White and Smyth [11]. Moreover, for defining closeness between two vertices of the graph, we use a function to assign weights on edges, where the function is defined in terms of volumes, number of transactions between two individuals, price movements, and commonality between traders. Furthermore, we have used objective metric for choosing the number of colluding groups proposed by Newman and Girvan [6]. Our algorithm is easy to implement, and it is tested on actual data of SEBI, showing a good performance in practice. Note that our graph is very large compared to the graphs used in the earlier works.

In the next section, we present a spectral clustering technique used in this paper for locating collusion groups. In Section , we experiment on the data and present the results.

2 Spectral Clustering

Spectral clustering is one of the well known modern clustering techniques, used for separating out big data in groups based on closeness. Let represent a weighted adjacency matrix of a weighted graph as defined earlier. Let and be two disjoint subsets of . Let and denote the sum of weights of edges of graph induced by and respectively. Let denote the sum of weights of edges between and . It is easy to see that if and are the two different collusion groups of , then should have very low value, whereas both and should have high values. Intuitively, measures closeness amongst the vertices of . So, it is natural to look for subsets and such that and are maximized and is minimized. Formally, we wish to locate two such subsets and , such that and together have low values.

Suppose, we add and instead of considering them separately. So, we get a equation called MinMaxCut, which was introduced by Ding et. al [2].


The choice of and for which achieves its minimum can be considered as two clusters. This is one method of identifying clusters which we use in this paper. There are other methods for computing clusters such as RatioCut [4], NormalizedCut [8] etc. For our problem of identifying collusion groups, traders in the same cluster have more transaction with each other unlike the traders between the different clusters. The Eq.(1) captures these properties than other methods. The Eq.(1) can be generalized to Eq.(2) for clusters as follows [9].


The choice of , which minimizes Eq.(2), gives different clusters. However, the problem of finding such subsets of vertices is NP-hard [10]. Ding et. al [2] showed that the problem in Eq. 2 can be formulated as trace minimization problem with relaxation on the constrains as follows.


Here is a diagonal matrix such that is sum of the weights of edges of vertex and

. The solution of the above problem can be obtained as a solution of generalized eigenvalue problem. In this case, the solution

is to find the first eigenvectors of i.e. eigenvector associated with the first smallest eigenvalues of as the columns of . For converting a real value solution, means algorithm can be used on the rows of to obtain discrete clusters [9].

2.1 Number of clusters

To find the number of clusters in we use the modularity function proposed by Newman and Girvan [6]. It is defined as follows.


The optimal number of clusters can be achieved by finding the value of for which is maximized.

3 Our Algorithm

3.1 Computing edge weights

Let us first understand the common features of colluding groups. Assume that two traders and belong to a collusion group. It has been observed that and generally trades several times between them on the same stock within a reasonable period of time . Furthermore, during these trades, they tend to trade a large quantity of . Sometimes, they even trade at a very high or low price from the last trades, if the purpose is to manipulate the price of the stock. In addition, and may even trade through a set of intermediate traders.

Let be the total number of trades of between traders corresponding to and during . Observe that can be zero if the corresponding traders have not traded during . Let (or ) denote the maximum (respectively, minimum) value of for all pair . So, . Since is expected to be close to for two traders in a collusion group, the value of the ratio can be used to assess their closeness. Analogously, and are computed to assess their closeness using volumes and prices respectively. Note that there can be multiple prices for the multiple trades between the two traders. So, is volume weighted average price between and . Let and be the set of neighbors of and respectively. To incorporate the intermediate traders in the , the common neighbors are expected to be very close to the total number of neighbors . Note that contains and contains . Hence, we have the following formula for computing .


Observe that the value in the above equation is one if and share all their neighbors. Even if traders corresponds to and are not trading directly but they trade through the same set of traders, the edge between them receives non-zero weights.

3.2 Computing Clusters

Now, we present the main steps of the algorithm for locating clusters in a graph .

Input: and .
For . Construct from .
Compute and .
Compute the first eigenvectors of and construct matrix by placing Eigenvectors as columns of .
Construct matrix from by assigning .
For , let

be the vector corresponding to the

row of .
Cluster the points with the -means algorithm into clusters .
Compute .
Pick the corresponding partition which maximize .
Output: Clusters with .

Algorithm 1 Spectral Graph Clustering

4 Experiment and Results

4.1 Market data

Market data consist of all trades of every stock for the entire period in the two main exchanges in India, namely, National Stock Exchange and Bombay Stock Exchange. The total number of trades for a period of one year are more than a billion for all the stocks. Each trade data contains all information or parameters consisting of (i) codes of the two traders, (ii) date and time of the trade, (iii) stock name (iv) traded price of the stock, and (v) traded volume of the stock.

Consider a situation where two traders and have same address or same telephone number or off-market transactions or any other common parameter. Off-market trades are those trades where stocks are transfered from one account to another account directly without exchange. These trades indicates that two individuals know each other and are trading knowingly. These informations can be used by the regulators to verify the validity of colluding group.

Using Algorithm 1, we analyzed trade data for the period of 17 months. We observed that some stocks had colluding groups. In many such stocks, there is usually only one colluding group per stock, i.e., . After Algorithm 1 identified a cluster for a company, the regulators of the stock markets verified whether was indeed a colluding group, using the parameters of traders in mentioned above. The verification showed that

included most members of the colluding groups. Since the details of the results are classified, here we use publicly available data and simulated data to demonstrate the performance of our algorithm. For the simulated data we assumed that weights follows uniform distribution and the parameters of actual data is considered for the simulated data.

4.2 Simulated data

We construct a random graph which is used as an input to Algorithm 1. Let be a random graph of size

such that there is an edge between any two vertices of the graph with probability

[3]. Initialize by . Choose any two subsets and of size and respectively. Add edges in and similarly in such that graph induced by is and is with . The weight matrix of is constructed such that follows uniform distribution i.e. if or else where . Then is used as an input to the Algorithm 1. The experiments are repeated for many times for various values of . The clusters , and identified by the algorithm are compared with and , and the results are shown below.

Figure 2 is a pictorial image of adjacency matrix of with , ,,,, and . The ordering of second eigenvector of is used to the pictorial image of reordered adjacency matrix is presented in Figure 1. Figure 3 is the plot of eigenvalues of .

Figure 1: Adjacency matrix of for .

Figure 2: The adjacency matrix is obtained after running the algorithm and shifting rows and columns using orders of second eigenvector of . In this matrix two colluding groups are clearly visible.

Figure 3: The eigenvalues of for . The three isolated dots on left indicates three clusters in the graph [9].

4.3 Publicly available data

We have used the stock trade data of Bombay Stock Exchange for the period of 2011. These data contains information such as traders order ids, volume, price and time of the trade. The order id is a digit number and we assume that the first digits corresponds to the id of an individual for our experiments. In Figure 4 we show the values for various values of for a stock. We have also compared the results obtained by this algorithm in case only one of the parameter is used i.e. number of transactions, volume, price or commonality.

Figure 4: In this figure the value is computed using our algorithm. The value corresponds to the value in equation 4 when the number of transactions is the only parameter to compute weights in the graph i.e. only first term in equation 3.1 is considered to compute the weights. Similarity , and corresponds to volume, price and commonality i.e. second, third and forth terms in equation 3.1 respectively. The total number of clusters in this example is .

4.4 Financial Implications of price manipulation

Price manipulation in stock market may impact the other financial institutions. For example, many banks provide loan against stocks and the amount of loan depends on the current price of the stock. If the price of a stock is manipulated to make it higher, then the sanctioned loan amount from a bank can be increased. In case of the default of the loan, the loan becomes non performing asset to the bank since it is difficult for the bank to sell the stock and recover full amount of the loan.

There is another reason which may motivate traders to manipulate the price of a stock. In many economy, a short-term capital losses can be set off against long/short term capital gains to compute the taxable income. To understand this, let us consider two investors and . Assume that has some long term taxable income (say, ) from some business and has incurred short term loss of . Suppose buys some stock from at a very highly manipulated price. After the manipulation is stopped, the price of reduces and is sold back to by at lower price. This brings a short term loss, say , in the account of and a short term gain of in the account of . This way can save his taxes through losses of by and still does not have to pay any tax. So the tax, which otherwise could have gone to the government, gets converted into black money.

5 Conclusion

Detecting colluding group is a challenge for the regulators of the securities markets. So, an automated surveillance system which detects the suspect group of traders involved in colluding is an important problem. In this work we have presented an algorithm which detects such groups. Simulated data is constructed here in such a way that it resembles the actual data. Naturally, our Algorithm 1 also perform well on the simulated data. Hence, our algorithm is very practical for identifying collusion groups.

Spectral clustering can be used for other finance problems as well. For example, finding the clusters of the stocks which are similar. This can be used to diversify a portfolio. It can also be used to classify the mutual funds into various categories. One technique to formulate the problem is to use weights between two mutual funds as Jaccard similarity coefficient, where a mutual fund can be considered a set of stocks.


The authors thank National Institute of Securities Markets for providing trading data for the analysis. The author gratefully acknowledges helpful comments and suggestions of Lata Chari, Mohit Garg, Daya Gaur and Bodhayan Roy.


  • [1]
  • [2] Ding, C. H., He, X., Zha, H., Gu, M., and Simon, H. D. A min-max cut algorithm for graph partitioning and data clustering. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on (2001), IEEE, pp. 107–114.
  • [3] Erdős, P., and Rényi, A. On random graphs I. Publ. Math. Debrecen 6 (1959), 290–297.
  • [4] Hagen, L., and Kahng, A. B. New spectral methods for ratio cut partitioning and clustering. Computer-aided design of integrated circuits and systems, ieee transactions on 11, 9 (1992), 1074–1085.
  • [5] Islam, M. N., Haque, S. R., Alam, K. M., and Tarikuzzaman, M. An approach to improve collusion set detection using mcl algorithm. In Computers and Information Technology, 2009. ICCIT’09. 12th International Conference on (2009), IEEE, pp. 237–242.
  • [6] Newman, M. E., and Girvan, M. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
  • [7] Palshikar, G. K., and Apte, M. M. Collusion set detection using graph clustering. Data Mining and Knowledge Discovery 16, 2 (2008), 135–164.
  • [8] Shi, J., and Malik, J. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 8 (2000), 888–905.
  • [9] Von Luxburg, U. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
  • [10] Wagner, D., and Wagner, F. Between min cut and graph bisection. Springer, 1993.
  • [11] White, S., and Smyth, P. A spectral clustering approach to finding communities in graph. In SDM (2005), vol. 5, SIAM, pp. 76–84.