Correlations of Multi-input Monero Transactions

01/12/2020 ∙ by Nathan Borggren, et al. ∙ 0

A variety of correlations are detected in the Monero blockchain. The joint distribution of the time-since-last-transaction between elements of pairs of RingCTs is enhanced in comparison with the product of the marginal distributions. Similarly there is an enhancement in the joint distribution of the hour timestamps between the same pairs. Lastly, we find another enhancement when the correlation is measured between the hour timestamps of the transaction itself and the elements of the RingCTs. We calculate some adjustments to the probabilities of which input in a RingCT is real, providing an additional heuristic to denoising the Monero blockchain.



There are no comments yet.


page 2

page 3

page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Privacy issues with respect to the pseudo-anonymity of Bitcoin-like blockchains were understood since its origin [9]. Techniques were developed to exploit these vulnerabilities by interacting directly with entities and using the gained information to tag these entities [7]. Indeed, at this stage of Bitcoin’s development, large scale datasets are freely available e.g. [6]

and have been used for machine learning, trading, and economics studies

[3]. Topological methods that take advantage of the graph-nature of blockchains have also been combined with machine learning to characterize exchanges [10, 2] and identify malware [1]. Any attempt to capture all such deanonymization efforts will be partial and unsatisfactory but for a deeper review consider [4].

As a result of these privacy limitations new coins have emerged and gained in popularity. Monero, ZCash, and Grin have added to the bitcoin protocol or have developed entirely new cryptographic techniques to address these issues. Analysis of the activity in 2018 of the cryptocurrency exchange ShapeShift showed a retreat from Bitcoin towards privacy-centric coins [2]. The Shapeshift terms of service itself acknowledges the irony of using this exchange to move into these coins for privacy purposes and advises against it. Monero developers have also warned users of dangers in overconfidence in the privacy offered and continuously seek improvements to their methods.

Nonetheless, it is apparent that the ShapeShift exchange and Monero in particular have been used to attempt laundering of illicit activities. Regardless of the activity being illicit or not, through ShapeShift or not, heuristics and techniques have been developed to analyze Monero exposing users to exposure of their privacy and providing law enforcement analytic tools for deeper investigation. This present work will introduce and develop an additional heuristic that can be used to aide in tracing real transactions through the Monero blockchain.

2 Correlations

When a Monero transaction aggregates coins from multiple inputs the RingCT feature of Monero mixes the real transaction with currently 10 past transactions randomly chosen from the blockchain. Monero uses a draw from a known gamma distribution with parameters given by

and to repeatedly pick times and then a transaction near that time is found in the blockchain and chosen for mixing. (see lines 132-133 at [5])

The distribution of inputs then can be expressed as a mixture . However we hypothesize that any correlation that occurs can only come from pairs of real transactions inputs, as draws from the fake distribution do not depend on the real input or previous draws from the fake distribution. Thus correlations may be present that are revealing to which input is the real input.

We seek then to measure empirically the correlation:


For example, consider a transaction with two RingCT inputs. There are then 121 pairs of inputs for which we fill our 2d histogram with the measured times giving us the empirical joint distribution. We can also collect the times of the 22 involved inputs to get the marginal distributions. For transactions with more than two inputs, this procedure can be repeated for each pair of inputs.

Correlations are measured this way over 150000 blocks, 1650000 to 1800000, and shown in fig. 1.

(a) Log of the Joint Distribution,
(b) Log of the Correlation,
Figure 1: Empirical Joint and Correlation functions, a bin is approximately 16 days wide.

2(a) Mixed-event correlations

We would like to insure that any correlation arising is a result of the signer of the transaction having authority over the pair of inputs. However, this is not guaranteed. To mod out the background correlations that can emerge from other sources (low-sampling) we build an empirical distribution where and are now from the inputs of different transactions. The mixture component, , is itself really a mixture of all the individual users. We would like for the correlation to come from a single individual user, not the whole collective. By constructing this background correlation we can reveal the correlation information that occurs between two random participants of the network.

(a) Log of the background Joint Distribution, .
(b) Log of the background Correlation, .
Figure 2: Background Joint and Correlation functions. Again a bin is approximately 16 days wide. While correlations are maintained from transaction volume, the same-wallet correlations have been removed by correlating across transactions.

By close inspection of figs 1 and 2 one can see that there is indeed structure in the foreground not present in the background, seen for example along the diagonal from the lower left to top right wherein the foreground the high correlations remain high over larger time scales.

The expression for correlation that enhances these differences is and shown in fig 5(a). The fact that this expression is non-zero implies different individuals have different transactional habits, providing an opportunity to look for signal associated to particular individuals habits. Such an analysis has been carried out for bitcoin [8], but has not been done for Monero.

Figure 3: Log of . A bin is approximately 16 days wide.

2(b) CDF-binned Histograms

The bins used for the first stage of this analysis were rather wide; one bin is 16 days. However there is interesting structure emerging already. Bin ‘40’ is about October of 2017, the beginning of one of Bitcoin’s remarkable bull runs. The high correlations for that time period are likely an artifact of the simple fact that an anomalous number of transactions come from that time period simply because of the volume of transactions occuring during that bull run. We are using the assumption that, generally, the price of Monero tracks the price of Bitcoin, which we use as a proxy for the whole cryptocurrency market.

We remind the reader that these are log plots, so that these correlations are incredibly large.

The gamma distribution used for mixins as well as the apparent-real-use transaction behavior in Monero are very heavily weighted towards short time scales. This means we are histogramming over a lot of potentially revealing timescales. One bin in our histogram corresponds to 16 days, which is around 11000 monero blocks, and a typical transaction will have many mixins and real contributions from this windowing.

Thus, despite this remarkable correlation structure, at this stage the majority of pairs are filling the same bin, the lowest left at the origin, so more effort needs to be made to use this fact to reveal what members of the rings are real and which are fake.

Transaction volume is also far from constant, so times of high volume require a large number of mixins as well, so it is possible there are high correlations between times of large transaction volume. Our background correlation in principle could account for that, but we had run the background for the same time range as the foreground.

Monero test networks, where we control the agents and their transaction behavior are being designed, that allow us to account for these sources or avoid them altogether.

We have repeated this procedure using bins with time edges computed from the cdf function so that each bin is equally likely to have a transaction in it. Those results are shown in 4.

(a) Log of the cdf-binned Joint Distribution, .
(b) Log of the cdf-binned Correlation, .
Figure 4: CDF binned Joint and Correlation functions.
Figure 5: A 1-d figure showing the correlations along the diagonal, in which timestamps are contemporaneous. Error bars are computed by repeating the correlation calculation every 10000 blocks.

2(c) Time-of-Day Correlations between rings

Next we investigate other candidate sources of correlations. For example, if a user transacts at the same times each day then a positive correlation would arise between the hour time stamps of a ring and another ring from the presence of an abundance of transactions with particular hour time stamps. Similarly, the correlation would arise as well for the same reason between the hour time stamps of a ring and the hour time stamp of the transaction itself. Indeed this is what we measure and observe in fig. 6.

(a) Log of .
(b) Log of .
Figure 6: Correlations arising from patterns-of-life emerge by histogramming the hour stamps.

3 Translations into probabilities

To understand the effect the correlations have on the probabilities of the ring-index being the real transaction let us first consider the Time-of-Day Correlations between the hour of the transaction and the hour of the ring members.

Let denote the value of the correlation in the bin corresponding to the hour of input i and the tx-hour. For example say is 1.1 so that i-hour is 10 percent more likely to have occurred given tx-hour. The probability increase must be offset by a decrease in probability in every other member of the ring. All things being equal, we distribute the remaining probability over the other possibilities. We construct the matrix equation to update the probabilities given the correlation.


4 Conclusions

We have shown that correlations in the Monero blockchain can be detected and can reveal information associated to the real transaction within a ring. These correlations in future releases of Monero could be removed by adding a copula step to correlate the spoofed transactions as well, but residue of this effect from past transactions would continue to propagate.


  • [1] C. G. Akcora, Y. Li, Y. R. Gel, and M. Kantarcioglu (2019) BitcoinHeist: topological data analysis for ransomware detection on the bitcoin blockchain. CoRR abs/1906.07852. External Links: Link, 1906.07852 Cited by: §1.
  • [2] N. Borggren, G. Koplik, P. Bendich, and J. Harer (2017) Deanonymizing shapeshift: linking transactions across multiple blockchains. Note: Cited by: §1, §1.
  • [3] N. Borggren (2017) Deep learning of entity behavior in the bitcoin economy. Note: Cited by: §1.
  • [4] M. Conti, E. Sandeep Kumar, C. Lal, and S. Ruj (2018-Fourthquarter) A survey on security and privacy issues of bitcoin. IEEE Communications Surveys Tutorials 20 (4), pp. 3416–3452. External Links: Document, ISSN 2373-745X Cited by: §1.
  • [5] fluffypony et al. (2015) Monero project. GitHub. Note: Cited by: §2.
  • [6] A. Janda (2013-2017) Wallet explorer. Note: Cited by: §1.
  • [7] S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G. M. Voelker, and S. Savage (2013) A fistful of Bitcoins: Characterizing payments among men with no names. Proceedings of the Internet Measurement Conference - IMC ’13 (6), pp. 127–140. External Links: Document, ISBN 9781450319539, ISSN 15577317, Link Cited by: §1.
  • [8] J. V. Monaco (2015) Identifying Bitcoin users by transaction behavior. 9457, pp. 945704. External Links: Document, ISBN 978-1-62841-573-5, ISSN 0277-786X, Link Cited by: §2(a).
  • [9] S. Nakamoto (2008) Bitcoin: A Peer-to-Peer Electronic Cash System. External Links: Link Cited by: §1.
  • [10] S. Ranshous, C. A. Joslyn, S. Kreyling, K. Nowak, N. F. Samatova, C. L. West, and S. Winters (2017) Exchange pattern mining in the bitcoin transaction directed hypergraph.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    10323 LNCS, pp. 248–263.
    External Links: Document, ISBN 9783319702773, ISSN 16113349 Cited by: §1.