Cryptocurrencies have seen significant growth in recent years due to the rapid development of blockchain technologies and the digital economic system. By the end of July 2021, the global cryptocurrency market capitalization reaches over $ 1.5 trillion (26). Thousands of cryptocurrencies and decentralized applications (DApps) are emerging in the ecosystem.
The prosperity of the cryptocurrency ecosystem drives the need for digital assert trading platforms. Thus, hundreds of cryptocurrency exchanges are emerging to facilitate the trading of digital assets. Cryptocurrency exchanges can be categorized into two types: centralized exchange (CEX) and decentralized exchange (DEX). CEX, as the traditional trading mechanism, requires a central entity as the intermediary to complete cryptocurrency trading between its users. Therefore, the trustworthiness of the middlemen plays a vital role in this trading mechanism, as all the user activities and digital asserts are under the control of the central operators111Most CEXs have adopted Know Your Customer (KYC) verification to prevent money laundering and other financial crimes.. Security and privacy issues of CEXs are reported from time to time (39; 40; 51). To facilitate the free trading and eliminate the potential security and privacy issues, the burgeoning decentralized finance (DeFi) ecosystem aims to employ decentralized and non-custodial financial applications, including the cryptocurrency exchange. DEX is introduced to allow users to trade their cryptocurrencies without transferring the custody of their cryptocurrencies to the middlemen, thereby mitigating the security issues of CEX and providing better privacy by eliminating KYC verification.
Uniswap is one of the most prominent cryptocurrency DEXs built atop the Ethereum blockchain (61). Unlike most exchanges, which match buyers and sellers to determine prices and execute trades, Uniswap adopts the automated market maker (AMM) model (3). This model involves smart contracts creating liquidity pools of cryptocurrencies that are automatically traded based on pre-set algorithms. As a DEX, anyone can use the pools to swap between cryptocurrencies for a small fee. In addition, users can also be liquidity providers by depositing cryptocurrencies into the liquidity pools and earn said swap fees as incentives. By the time of this study, Uniswap has amassed a total market liquidity of over $ 1.6 Billion, with over $ 200 Million trading volume per day (62).
Where there is money, there are those who follow it. The growing popularity of Uniswap is continuing to attract scammers. Uniswap does not maintain any rules or criteria for cryptocurrency listing, meaning that anybody can list a token on the exchange. Thus, scammers take the opportunity to list scam cryptocurrencies to trick unsuspecting users. It is reported that some scam cryptocurrencies impersonate token sales for popular cryptocurrency projects (35; 63; 28). For example, on August 19 2020, upcoming DeFi lending protocol Teller Finance tweeted that a fake Teller token and an Uniswap pool had been created, and many users were cheated.
Despite this, to the best of our knowledge, no previous studies have systematically characterized or measured scam tokens on Uniswap. We are unaware of to what extent scam tokens exist on the Uniswap exchange, and how much impact they introduced to the overall ecosystem.
This Work. In this paper, we take the first step to detect and characterize scam tokens on Uniswap. We first collect all the transactions related to the Uniswap exchange, and investigate the landscape of Uniswap from different perspectives (see Section 3). Then, we propose a hybrid approach for flagging scam tokens and scam liquidity pools on Uniswap accurately (see Section 4). We manually labelled a scam token benchmark dataset, and identify features that can be used to distinguish them. Our detection approach is powered by a guilt-by-association based expanding method, and a machine-learning based detection and verification technique. We have identified over 10K scam tokens and pools in total (which is a lower-bound), meaning that roughly 50% of tokens listed on Uniswap are scam tokens. At last, we demystify these scam tokens from various perspectives, including the scam behaviors, the scammers, and the impacts (see Section 5). Beyond the scam tokens and their creators, we further identify over 40K collusion addresses controlled by the scammers, which are used to facilitate the success of the scams. We show that, the scammers at least profit $16 million from over 40K potential victims on Uniswap.
We make the following main research contributions in this paper:
We are the first to propose a reliable approach for identifying scam tokens and their associated liquidity pools on Uniswap
. We first make the effort to contribute by far the largest scam token benchmark dataset, and then we propose a guilt-by-association based expansion method and a machine-learning based classifier to identify the most reliable scam tokens.
We identify over 10K scam tokens and scam liquidity pools, revealing the shocking fact that Uniswap is flooded with scams. We believe the scams are prevalent on other DEXs and decentralized finance platforms, due to the inherent loose regulation of the DeFi ecosystem.
We systematically characterize the behaviors, the working mechanism, and the financial impacts of Uniswap scams. We observe that scammers usually employ multiple addresses to carry out a scam, and thus we design a method for detecting the collusion addresses to gain a deep understanding of the scams. We have identified scam addresses in total, including collusion addresses. The scammers have gained at least $16 millions.
To the best of our knowledge, this is the first in-depth study of Uniswap scams at scale, longitudinally and across various dimensions. Our results motivate the need for more research efforts to illuminate the widely unexplored scams in the decentralized finance ecosystem. We will release the labelled scam token dataset and all the experiment results to the research community.
2.1. Blockchain and Ethereum
Blockchain is a shared, immutable, and distributed ledger that facilitates the process of recording transactions and tracking assets in a P2P network, which was invented in 2008 (12). It is resistant to data modification due to the cryptographic design. By this design, each transaction in the block is verified by the confirmation of most participants in the system. Bitcoin network is the first Blockchain based decentralized system, which demonstrated the feasibility to construct a decentralized value-transfer system that can be shared across the world and virtually free to use.
. Ethereum is an open-source decentralized blockchain platform featuring smart contract functionality.Ether (ETH) is the official cryptocurrency on Ethereum, which is mined by Ethereum miners as a reward for computations. ETH is the second largest cryptocurrency based on the market cap (26).
2.2. Ethereum Accounts and Transactions
In Ethereum, an account is the basic unit to identify an entity. An account is identified by a fixed-length hash-like address. Accounts can be user-controlled or deployed as smart contracts. For the accounts that are controlled by users, i.e., by anyone with private keys, they are called external owned accounts (EOAs). The accounts controlled by code are called contract owned accounts (COAs). Both kinds of accounts have the ability to send, receive, hold ETH and tokens, or interact with deployed smart contracts. The key difference is that, only an EOA can initiate transactions while a COA can only send transactions in response to receiving transactions.
2.2.2. Transactions on Ethereum
A transaction refers to an action initiated by an EOA and it is the way that users interact with Ethereum network. Transaction is used to modify or update the state stored in the Ethereum network and it requires a fee and must be mined to become valid. A transaction can include binary data (called the “payload”) and Ether. If a transaction is sent from an EOA to another EOA, the transaction is called “external transaction”, which will be included in the blockchain and can be obtained by parsing the blocks. The other type of transaction, initiated by executing a smart contract, is called “internal transaction”. Internal transactions are usually triggered by external transactions and are not stored in the blockchain directly. When smart contracts are involved in a transaction, multiple events that log the running status of contracts could be emitted for developers and DApps to track behavior of these contracts.
2.3. Smart Contract and ERC-20 Token
2.3.1. Smart Contract
A smart contract is a computer program or a transaction protocol that can execute automatically with the terms of the agreement written in contract code. The contract code controls the execution, and the corresponding transactions can be tracked but cannot be reversed. Ethereum implements a Turing-complete language on its blockchain, and now it is the largest blockchain platform that supports smart contracts with millions deployed smart contracts.
2.3.2. ERC-20 Token
In contrast to digital coins like Bitcoin and Ether, which are native to their own blockchain, “tokens” require existing blockchain platforms. On the Ethereum platform, there are over 400K tokens by the time of this study, and most of them are smart contracts following the ERC-20 standard222ERCs stands for Ethereum Requests for Comments, which are technical documents used by smart contract developers., which specifies a list of rules and interfaces that tokens should follow. Some of these rules include total supply of the tokens, how the tokens are transferred and how the transactions are approved, etc. Note that, Ethereum does not enforce any restrictions on the names and symbols of tokens, which may open doors for scammers to abuse the ERC-20 tokens. We will show that, due to the less regulation of Uniswap and Ethereum, scam ERC-20 tokens are prevalent in the ecosystem (see Section 4 and Section 5). To remove ambiguity, in this paper, we will describe a token in the form of “name (symbol)” with a footnote of token address.
2.4. DEX, AMM, and Uniswap
2.4.1. Decentralized Exchange
Due to the open-source and decentralized nature of cryptocurrencies, it is demanded that the exchange of cryptocurrencies should have no central authorities involved, and thus decentralized exchanges (DEXs) are born. A blockchain-based DEX does not store user funds and personal data on centralized servers, but instead matches buyers and sellers of digital assets through smart contracts. DEXs are an important part of the burgeoning DeFi ecosystem.
There are multiple kinds of DEXs. The first generation is order-based P2P exchange, which uses order books. These order books compile a record of all open buy and sell orders for a particular asset. For example, dYdX (30), IDEX (42), and EtherDelta (31) fall to this category. The second generation is liquidity pool based exchange that completes trades through automated market makers (AMMs). The representative ones are Uniswap (61), Bancor (6), and Balancer (5).
2.4.2. Automated Market Makers.
Automated market makers (AMMs) allow digital assets to be traded without permission and automatically by using liquidity pools instead of a traditional market of buyers and sellers. On the AMM markets, users trade against a pool of tokens, i.e., a liquidity pool. Users can supply token into the liquidity pools and the price of tokens in the pool is determined by a mathematical formula. Liquidity providers normally earn a fee for providing tokens to the pool, and the fee is paid by the traders who interact with the pool.
Uniswap is a leading DEX built atop Ethereum designed to facilitate automated exchange transactions between ETH and ERC-20 tokens, providing liquidity automatically on Ethereum. Uniswap is the largest decentralized exchange and the fourth-largest cryptocurrency exchange overall by daily trading volume by the time of this study.
Uniswap V1, the first version of the protocol, was created in November 2018 by Hayden Adams, and it supports all the ETH to ERC-20 liquidity pools and enables swaps between ERC-20 tokens via ETH. In May 2020, Uniswap launched its V2 version with many new features and optimizations. For example, it uses WETH (Wrapped ETH, a ERC-20 token that represents ETH 1:1) instead of the native ETH in its core contracts and enables direct ERC-20 to ERC-20 swaps, thus halving the fees when performing such transactions. Also, it enables non-standard ERC-20 tokens such as USDT and BNB, which opened up the potential market. Futhermore, the Uniswap V2 introduces “flash swaps”, which allow users to borrow tokens from a Uniswap pool, perform some activities with external services and pay back these tokens, like flash loans. These changes set the stage for exponential growth in AMM adoption. In May 2021, Uniswap V3 was launched, which provides new features like concentrated liquidity and multiple fee tiers, making the protocol more flexible and efficient. Since Uniswap V3 was just launched, these three versions of Uniswap operate independently. Uniswap V2 remains the most popular one, with a large amount of tokens locked in it. Our study in this paper focuses on Uniswap V2, while the observations and implications are applicable to other versions of Uniswap and other DEXs.
2.5. Interacting with Uniswap
Users can interact with Uniswap through three kinds of operations, i.e., creating a liquidity pool, adding/removing liquidity, and swapping tokens. The general process is shown in Figure 1.
2.5.1. Creating the liquidity pools.
In Uniswap, users trade against liquidity pools. A liquidity pool is a trading venue for a pair of ERC-20 tokens. One can create a liquidity pool that does not exist by interacting with Uniswap V2 contracts.
2.5.2. Adding/removing liquidity.
After the pool is created, users can add liquidity by depositing the pair of two tokens in the pool. The users who add liquidity to the pool are called liquidity providers (LPs for short) and they will receive liquidity provider tokens (LP tokens for short). A “mint” event will be emitted when liquidity is added. Whenever a trade occurs, a 0.3% fee is charged to the transaction sender. This fee is distributed pro-rata to all LPs in the pool upon completion of the trade, which stimulates people to provide liquidity.
In order for the pool to begin facilitating trades, someone must seed it with an initial deposit of each token after a pool’s creation. Thus, the pool creator will usually add the first liquidity when creating pool. If the first LP supplies A tokens and B tokens, he will receive LP tokens and the total supply of the pool token is . If there are already A tokens and B tokens in the pool, the new LP could supply A tokens and B tokens based on the current ratio and he will receive LP tokens and the total supply of LP token changes to .
The LPs could also remove liquidity from the pool through burning their LP tokens. After removing the liquidity, they can receive the pair of tokens based on the LP tokens they burn and the current token supply in the pool. A “burn” event will be emitted when LP tokens are burned. For example, if there are A tokens and B tokens in the pool and the total supply of LP token is , when an LP burns LP token, he will receive A tokens and B tokens where and the total supply of LP tokens will be .
2.5.3. Swapping tokens.
When a user wants to trade a pair of tokens in a pool, the user will first send tokens to the pool. Then the pool will calculate the exchange rate and send the target tokens. The exchange rate is determined by the “constant product” formula , where is a constant and are the reserve balance of two tokens in the pool. In a swap transaction, the LP token will not change and a “swap” event will be emitted. Due to this formula, one token’s price in the pool will rise when people are swapping the other token for this one. For example, if the pool has A tokens and B tokens and the user sends A tokens for B tokens. The swap will follow Eqn. (1):
where implies the 0.3% of fee and will be the quantity of B tokens the user gets.
3. General Overview of the Uniswap Exchange
3.1. Dataset Collection
3.1.1. Collection Method
We utilize The Graph (16), a sandbox for querying data and endpoints for blockchain developers, to collect transaction events (i.e., mint, swap, and burn events) related to Uniswap. It provides a snapshot of the current state of Uniswap and also tracks the historical data. Note that an Ethereum transaction can emit multiple events including Uniswap-related events and other events, but The Graph only records all data related to Uniswap events. Moreover, some important information is not recorded in the log. For example, when a user interacts with the Uniswap router contract for trading some tokens to Ethers (not WETH), the router contract will transfer the tokens on behalf of the user, exchange the WETH to Ethers, and transfer the Ethers to the user. Thus, the log will record the router contract as the swap event receiver instead of the user. As we need to analyze the detailed token transfer flow of transactions related to Uniswap for characterizing the scam token activities in Section 5, we further fetch the whole transaction information related to events on Uniswap, such as amount of ETH transferred, input data, internal transactions, and all event logs.
|Data Type||# of Entities||Event Type||# of Events|
3.1.2. Dataset Overview
We have synchronized all the tokens and events from May 5th 21:00 UTC to December 6th 18:00 UTC, 2020. Table 1 shows an overview of our dataset. Since the first transaction which created USD Coin (USDC)-Wrapped Ether (WETH) liquidity pool happened on May 5th, 2020, there are over 20 million transaction events on Uniswap V2 by the time of this study. There are kinds of tokens and liquidity pools in total.
3.2. The Rising of Uniswap
Figure 2 (a) shows the daily token listing and liquidity pool creation on Uniswap. After three month of the launch of Uniswap V2, there are roughly over 100 tokens and liquidity pools listed on Uniswap daily. It witnessed the spike of tokens and pairs (pools) on Uniswap in October 2020, when tokens were added and pools were created in this month. Figure 2 (b) shows the number of transaction events relevant to Uniswap. Since its launch, Uniswap attracted great attention quickly. It remains roughly 200K transaction events daily by the time of this study. For example, on October 6th, there were transaction events on Uniswap with pairs traded. It is not surprising to see that swap transactions, as a major function of Uniswap, account for 94% of the total events. Figure 2 (c) shows the daily volume and total liquidity. After August 7th, the daily volume exceeded $100 million. The liquidity of Uniswap reached $3.4 billion on 13th November and dropped due to the end of Uniswap (UNI) liquidity program (64)333In this program that ended on November 17th, users can earn UNI by adding liquidity to the 4 major liquidity pools.. Nevertheless, by the time of this study, there were still over $1.7 billion worth of tokens locked in Uniswap V2.
3.3. The Liquidity Pools and Tokens
3.3.1. Liquidity Pools.
We observe that over 90% () of the liquidity pools have a value of less than 1 USD locked in Uniswap, which means that these pairs have low levels of liquidity on Uniswap or have low values. For example, although the pool LiquidityBomberB (LBB)- LiquidityBomberA (LBA)444LP token address:0xa0f198fc128b83c5f71cc61d105adf6c7d6fd88f has a large amount of tokens ( tokens), both of them have no value at all. The pair with the largest USD liquidity is Wrapped Bitcoin (WBTC)- Wrapped Ether (WETH)555LP token address:0xbb2b8038a1640196fbe3e38816f3e67cba72d940, which locks over 195 millions USD on Uniswap. Consequently, Uniswap only records the trade volume of pools with a certain level of liquidity, and over 95% of the liquidity pools’ volume are not recorded by Uniswap due to the lack of liquidity. Figure 5 shows the distribution of transaction and trading volume for liquidity pools that have recorded trade volumes on Uniswap. They have a total number of trading volume of over $41 billion. Obviously, it follows the typical Power-law distribution, i.e, top 1% of liquidity pools occupy over 65% of the transaction events on Uniswap. Figure 5 shows the top-10 popular liquidity pools on Uniswap ranked by the transaction events. It can be seen that, stablecoins (e.g., Tether (USDT), USD Coin (USDC), Dai (DAI)) and Uniswap governance tokens (i.e., Uniswap (UNI) token), often have a large popularity.
Figure 5 shows the distribution of the involved address for each liquidity pool, which reflects the attention from investors. Over 70% of the liquidity pools have been involved by less than 20 addresses and over 70% of the liquidity pools have only 1 liquidity provider. As opposed to it, the 5 popular pairs all have more than 10K liquidity providers and the Wrapped Ether (WETH)- Tether (USDT) liquidity pool has more than 80K addresses involved.
3.3.2. Tokens on Uniswap.
From the perspective of ERC-20 tokens, top 1% of the tokens occupy over 80% of the transactions and involve in over 96% of the trading volume on Uniswap, which follows the Power-law effect as well. When considering popular tokens with the most number of liquidity pools, the stable coins also take the lead. Over 90% (19,790) of the tokens only have one pair. In total, WETH is paired with over 20,924 tokens (83.2% of all liquidity pools), followed by USDT (1,049), USDC (462), DAI (406), and UNI (253).
3.4. Pool Creators and Investors
3.4.1. Pool Creators
All the liquidity pools analyzed were created by addresses. Among them, addresses had created at least 2 liquidity pools and 120 addresses created more than 10 pools. For example, the address 0x3bcfa9357ab84baec04313650d0eebb3fd51070d created 91 liquidity pools and most of them are pairs of WETH and DeFi-related tokens such as “Keep3r”, “Wootrade Network”, “Aegis.finance”, etc. Due to the massive pool creation for various DeFi tokens, the address is suspicious to be a scammer and these pools are likely to be used in scams. We will further analyze these scams in Section 5.
In total, addresses have participated in the transactions on Uniswap collected in this study (i.e., the addresses that had mint, swap, burn transactions on Uniswap). Over 70% () of the addresses only participate in swap transactions, and 5% () of the addresses only focus on mint and burn transactions. However, most of the investors are inexperienced, as roughly 80% of them have less than 15 Uniswap transactions. Nevertheless, we observe that many investors have thousands of transactions. For example, the address 0x80c5e690836660x80c5e6908368cb9db503ba968d7ec5a565bfb389 has the most mint transactions, and it has added liquidity times on 209 pairs. We further analyze the interacted liquidity pools of these participants. Roughly 45% () addresses have transactions with only one pool, and over 90% () of the addresses have transactions with less than 15 pools. However, 27 addresses have interacted with over 1K pools. We manually inspect them and find that them are likely to be trading bot contracts engaged in arbitrage activities due to their repeated trading behaviors.
Uniswap has attracted a large amount of tokens and created a prosperous trading environment. Nevertheless, there are many liquidity pools that were not created for long-term uses, since they have a low level of liquidity and only a few users joined in the trading activities of these pools. Most of the participants are new to Uniswap. Explosion of DeFi projects may attract these inexperienced investors, which can also be exploited by attackers. Details will be studied in the later sections.
4. Identifying Scam Tokens on Uniswap
4.1. A Motivating Example
As a decentralized exchange, Uniswap does not have any rules for token listing, i.e., anyone can list a token and create a liquidity pool freely. Thus, scammers can take the opportunity to list scam tokens to cheat unsuspecting users. In this study, we find that there are many tokens with same/similar names, which are highly suspicious to be scams. More and more evidences show that scam tokens have appeared on Uniswap. For example, on Nov 23th 2020, shortly after Andre Cronje, the famous Yearn Finance (YFI) creator, announced his new DeFi project Deriswap, attackers created a fake token “Deriswap (DWAP)”777Token address:0x05c1ad0323b3f7f25cff48067fa60fa75dc7ba4f and a liquidity pool had been created on Uniswap (34). The whole process of this scam is shown in Figure 6. The scammer888Address:0xac830c76fc37ef3dd4c28c9b7ee548d1a46112eb adds liquidity with 70 ETH and 500K DWAP tokens initially, and later removes liquidity of 217K scam tokens and 161.7 ETH. The attacker profits roughly 90 ETH (roughly $54K)999Since the prices of tokens fluctuate every day, we calculate the profit of these scam tokens according to the price of top 50 tokens on December 6th 2020 on Uniswap. It applies to all remaining content. considering the fee and swap cost. It is very surprising to see that, the whole process from the token creation to the withdraw only took under 20 minutes. It is a common type of scam called “Rug Pull” in Uniswap, which will be detailed in Section 5. This makes us wonder how many scam tokens/pools are listed and to what extent they have an impact on the investors. Thus, in the following, we develop a reliable approach to identify scam tokens/pools on Uniswap, and further characterize them.
4.2. Approach Overview
4.2.1. Key Idea
Our preliminary exploration suggests that scammers usually list tokens and pools that look very similar to the existing cryptocurrency projects, due to the less regulation of both Uniswap and Ethereum. The targeted projects are usually official tokens (e.g., USDT) which have been already released on Uniswap, or famous DeFi projects that are looking to conduct a token sale. Further, the scam pools are usually short-lived, as the scammers would remove liquidity soon when there are victims falling into the trap. It suggests that the scam tokens and liquidity pools have quite unique features when compared with other normal tokens/pools, and these features can be used to distinguish scam tokens from the normla ones.
4.2.2. Overview of our approach.
The overall workflow of our scam detection framework is shown in Figure 7, which is made up of three major components. (1) Ground truth labelling component is used to collect official (normal) tokens and the most reliable scam tokens (i.e., the fake tokens whose names or symbols are identical with the official ones). The labelled ground truth dataset is used as the seeds for further expansion. (2) Guilt-by-association based expansion component is used to enlarge our labelled scam token dataset based on two reliable heuristics. (3) Machine learning based scam detection and verification component is used to identify more scam tokens based on the features learnt from our labelled dataset. To eliminate potential false positives, we use strict verification strategy to only label the most reliable scam tokens. Note that, guilt-by-association based expansion would be further applied to the identified new scam tokens, and produce our final results. Our approach considers both the naming characteristics and the transaction behaviors of scam tokens. Further, our approach is powered by the strict verification strategy, and thus it can produce the most reliable results.
4.3. Ground truth labelling
4.3.1. Official tokens.
We first collect a list of popular tokens from the CoinMarketCap (26) ranking list, and the Etherscan (32) ranking list , and then use the following method to manually verify them. The popular official tokens usually have been listed on large CEXs (i.e., they have been verified by the operators of CEXs), with the exchange rates for US dollars. Since some official tokens may migrate from old addresses to new addresses (e.g., due to security issues), we further flag those old token addresses as official ones too. Through this way, we collect official tokens in total.
4.3.2. Scam tokens.
We label the scam token seeds in two ways. First, as Ethereum does not enforce any restrictions on the names and the symbols of the newly created tokens, some fake tokens use identical identifier names to imitate the official tokens to trick victims by means of airdrop scam and arbitrage scam (Gao et al., 2020). By comparing the token names and the symbols of all the ERC-20 tokens in Uniswap with the labelled official tokens, we have flagged fake tokens. Further, as Etherscan usually marks phishing or scam tokens, we implement a crawler to collect the tags of tokens, which collects more scam tokens.
In total, our ground truth labelling phase has collected tokens in total, with official tokens and (=) scam tokens.
4.4. Guilt-by-association based expansion
Empirically, scammers usually create more than one scam tokens to expand the scale of their scam campaigns. Therefore, we mark all Ethereum accounts that have created a scam token or a liquidity pool (i.e., the pool that trades scam tokens) in our labelled dataset as scam creators. For other tokens/pools created by these scam creators, they are highly suspicious to be scam tokens as well. We call this strategy “Guilt-by-association”, which has been used in previous work to identify malicious domains and malware (Sebastian and Caballero, 2020; Khalil et al., 2018).
4.4.1. Expansion based on scam token creators.
After excluding 7 addresses that are tagged by Etherscan as Contract Deployer, scam creators are marked. We then identify all the tokens created by these scam creators and obtain new candidate scam tokens. We then manually verify a portion of tokens (e.g., that have more than transactions) to verify the reliability. Specifically, we search the tokens on Google to check whether the official websites exist, and whether there are scam accusations on BitcoinTalk and other forums, etc. Among candidate scam tokens, 29 of them have more than transactions. We have manually verified these 29 tokens and find no false positives, which suggests the reliability of our heuristics.
4.4.2. Expansion based on scam pool’s creator and first mintor
The Ethereum account who creates the scam liquidity pool (i.e., the liquidity pool reserves a pair consisting of the scam token and other token) and firstly adds liquidity to the scam pool is marked as a scam creator as well. In this way, we flag scam creators. And (87.4%) of them are overlapped with the scam token creators labeled in the previous step. We further find the tokens created by these scam addresses and expand new scam tokens through this method. Similarly, we have manually verified these 24 tokens and find no false positives.
In total, based on the labelled scam token seeds, we further expand () scam tokens. These tokens, along with the labelled tokens in Section 4.3, will be used to train a machine learning classifier for identifying more scam tokens. Note that, we will further use reliable heuristics to verify the scam tokens flagged by the machine learning classifier, and the expansion method will be further adopted to new confirmed scam tokens to enlarge our dataset (see Section 4.5).
4.5. Machine learning based detection and verification
The aforementioned two phases can only flag the most obvious scam tokens. However, there are many other scam tokens that impersonate token sales for popular DeFi projects (e.g., Teller Finance) and famous brand names (e.g., Facebook). It is non-trivial for us to get a list of targeted tokens/DeFi/brands for comparison. Thus, in the following, we seek to identify scam tokens based on their transaction behaviors on Uniswap. We first train a machine learning classifier, and apply it to all the unlabelled tokens (see Section 4.5.1). For the flagged suspicious tokens, we examine them and summarize several highly reliable heuristics for verification (see Section 4.5.2). Finally, for the newly verified scam tokens, we further adopt the expansion technique (see Section 4.5.3).
4.5.1. Machine learning classifier.
Based on our preliminary observations, we use a comprehensive set of features to train a scam token classifier (see Table 3 in Appendix).
1) Time-series features. Scam tokens and pools are usually short-lived. Once the scam has attracted some victims, scammers tend to remove all liquidity of the pool to get all the reserved tokens in the pool and gain a profit. The scam token and the corresponding liquidity pools would be discarded, as it is easy for the scammer to launch a new scam token. Thus, for each token, we analyze its active period (i.e., from its first transaction to the latest one) and use it as one feature. Note that, to eliminate the bias introduced by our dataset collection process (e.g., a new normal token listed on Uniswap would lead to a short active period in our dataset), we further consider the active interval between the last transaction of the token and the time of our dataset collection, as a feature.
Further, we observe that the distribution of transaction events (i.e., mint, swap, burn) in scam tokens is quite different from the normal ones. For example, for a scam token, the mint events are more concentrated at the beginning of its life-cycle (i.e., scammers provide the liquidity to attract victims), while the burn events are more concentrated at the end of its life-cycle (i.e., scammers remove the liquidity to gain a profit). As a contrast, for official tokens, mint and burn events are distributed across the time. Thus, for a given token, we propose to analyze the relative position of different types of events (i.e., mint, swap, burn) in terms of occurrence time, which can reflect the activity of a token to some extent, and define the relative time position of each type of event as:
where is the number of events of the specific type (i.e., mint, swap or burn), denotes the timestamp of the -th event, and represent the timestamps of this token’s first and last transactions, respectively. If a token’s mint events are concentrated at the initial stage, then will be close to 0. Thus, we further extract five features related to the time position of each type of event, including mint events, swap events, swap-from events (i.e., swap target token for the other token), swap-to events (i.e., swap other token for target token) and burn events. In total, we extract seven time-series features, which are shown in Table 3 of the Appendix.
2) Transaction features. The number of transactions can reflect the popularity and the volume of a token. Here, for a given token, we consider its transactions on both Uniswap and the overall Ethereum network. A trustworthy official tokens will have transactions beyond Uniswap. The extracted transaction features include the total number of transactions on Uniswap and Ethereum respectively, the number and the proportion of the transaction events (i.e., mint, burn and swap) and the number of involved addresses, with 24 kinds of features in total.
3) Investor features. We observe that a large portion of the investors of scam tokens are inexperienced, i.e., associated with few transactions. Thus, we believe the greedy new comers are the major targets of the scammers. Thus we extract 4 kinds of investor features for each token, including the average number of trading pools they interacted with (mint/burn or swap), and the average number of (mint/burn or swap) transactions.
4) Uniswap specific features. For each token, we further extract features from its state on Uniswap, including the number of liquidity pools it involved, its trade volume, the total liquidity of the token, etc. The details of these features are shown in Table 3 of the Appendix.
Based on the extracted features, we next train a machine learning classifier. We have tried different kinds of models to train the classifier, including Logistic Regression(Dreiseitl and Ohno-Machado, 2002), SVM (Chang and Lin, 2011)2001)
, and XGBoost(Chen and Guestrin, 2016). Our experiment results suggest that the random forest model achieves the best result. Using 10-fold cross validation, the Precision, Recall, and F1 score of our classifier are 96.45%, 96.79% ,and 96.62%, respectively. It suggests the high accuracy of our approach. Thus, we further apply the trained classifier to the unlabelled tokens on Uniswap. Our classifier flags tokens as potential scam tokens based on their transaction behaviors.
Our machine learning classifier flags the suspicious tokens with potential scam behaviors. Although our classifier can achieve excellent results, it cannot achieve 100% accuracy. As our goal is to characterize and measure the landscape of scam tokens on Uniswap, a dataset with high accuracy is a must. Thus, we next use strict verification strategy to label the most reliable scam tokens, and perform the characterization study based on them. We randomly select 200 flagged suspicious tokens, seeking to find the clues that can be used to confirm they are scams. By analyzing their token names/symbols and searching on Google, we devise two highly reliable rules.
First, we find that many of the flagged tokens share the identical token names with each other. Although they did not counterfeit the popular official tokens we labelled in Section 4.3, we observe that they seek to promote the scams by exploiting the eye-catching DeFi projects and related hot topics. For example, we find that there are scam tokens share the identical name and they pretend to be the famous DeFi project yearn.finance (73) (YFI). As another example, there are 12 tokens named bore.finance and none of them are the official tokens since the real one (14) is a BSC (Binance Smart Chain) (11) token, rather than an ERC-20 token on Ethereum. Thus, we group the suspicious tokens based on their token names and symbols. For the suspicious tokens with identical names, we further search these names to verify whether they are the counterfeit ones. In this way, of the suspicious tokens are flagged as scam tokens by us with high confidence.
Second, we observe that many scam tokens impersonate to be the tokens released by some popular companies, authorities, organizations or celebrities, while actually there are no official released by these entities. For example, we find a number of scam tokens with the names related to Google, Amazon, TikTok, Trump, Elon Musk, etc. Thus, two authors manually go over the token names and cross checked the results. We consider a token to be a scam token if both of the two authors label it as scam. In this way, we identify such cases.
In total, by applying the heuristics to the suspicious tokens flagged by the machine learning classifier, we identify () scam tokens. Note that, for the remaining (=) suspicious tokens flagged by our classifier, they are likely to be scams too, while we lack of strong evidence. For example, the Phoenix.Finance (PF) token101010Token address:0x03c2f1b1ba5c5a6cf6d0af816a721a5827171704 is suspicious since it only has 12 token transfer transactions and the creator of the token add and remove liquidity within one day. However, they did not fall into the aforementioned two strict heuristics we proposed, and we did not incorporate them into our scam dataset.
Following the “guilt-by-association” expansion, we expand our scam token list by analyzing the creators of newly identified scam tokens and liquidity pools, and obtain more scam tokens. Thus, in the machine learning phase, we identify scam tokens in total.
Based on the extensive analysis, we flag scam tokens with very high confidence, i.e., 4,052 scams flagged in the ground truth labelling phase, 2,461 ones are expanded based on guilt-by-association, and 4,451 tokens are further expanded by using our machine-learning based detection and verification technique. These scam tokens are associated with liquidity pools. We want to reemphasize that, the number of identified scam tokens is indeed a lower-bound, as we enforce a strict verification method to get the most reliable results.
5. Characterizing the Scams
We next characterize the flagged scam tokens and liquidity pools by investigating their scam behaviors, the scammers, and the financial impact.
5.1. General Overview of Scam Tokens and Scam Liquidity Pools
We have identified a total of scam tokens with liquidity pools, accounting for 50.34% of all the tokens (44.66% of all pools) on Uniswap. Their total trade volume reaches over $365 millions. Ethereum addresses have interacted with these scam tokens with transaction events in total. Since this is only a lower-bound of scam tokens, it shows that Uniswap is flooded with scam tokens.
5.1.2. The trend of scam tokens
As shown in Figure 8 (a), the creation of scam tokens and scam liquidity pools roughly follow the overall trend of Uniswap (see Section 3.2) and the peak appeared in October 2020, where over 3.8K scam liquidity pools were created on Uniswap. In our dataset, the first scam token Bizcoin (BIZ) (which is flagged by heuristics in Section 4.5.2 and there are 2 tokens with name “Bizcoin” and 3 tokens with the symbol “BIZ”)111111LP token address:0xde65eed30da8107ce49e8f1952391e16756c2998 appeared on May 19th 2020 and there were four other scam tokens listed on Uniswap this day. According to the post on 4chan (1), it was promoted as a community token and appeared to be profitable. The token received over 1K transactions and earned the creator about $3600. Figure 8 (b) shows the trend of scam tokens’ transaction events on Uniswap. On average, there are over transaction events related to scam tokens daily, and the peak reaches almost 20K transaction events. The volume and the liquidity of scam tokens are shown in Figure 8 (c)121212The daily volume and liquidity data of scam tokens come from Uniswap API. On average, scam tokens has a daily trade volume of $1.8 million. Different from the general trend, the trade volume of scam tokens often exceeds their liquidity, indicating the skyrocketing in price within a short time. The scammers often use this trick to attract more people to invest in the scam tokens (see Section 5.2).
Besides, we also compare the creation times of scam tokens on Ethereum and their corresponding pools on Uniswap, to investigate if these tokens are organized in scam campaigns. In our dataset, (98.6%) tokens were created after the Uniswap V2 launch and (53%) scam tokens were created in September and October, which was the most active period of Uniswap. Besides, for over 92% of the scam tokens, their creation on Uniswap and their related liquidity pool’s creation on Uniswap were done within one day. This suggests that most of the scam tokens were created specialized for carrying out scam campaigns on Uniswap.
5.1.3. The pools of scam tokens.
Roughly 98% () of the scam tokens have only 1 scam liquidity pool. Nevertheless, some attackers try to create a number of pools to reach as many victims as possible. For example, the token LEV131313Address:0x3868bd6e8b392eb8dbc8cdcd0c538dc66529adbe has been paired with 11 kinds of tokens, among which 10 tokens are the leading official tokens and the other 1 scam token was minted by the same liquidity provider of LEV. Besides WETH, other official tokens like USDT, USDC and DAI are also favoured by attackers. We further investigate how long it takes for the scammers to remove the liquidity they inserted (i.e., usually after victims rushed to the scam pools) by calculating the interval between the scammers’ first mint and burn events. Surprisingly, over 86% of the scam liquidity pools have an interval within 1 day, and 37% of the pools’ liquidity were removed within 1 hour. This suggests that attackers prefer to act quickly to secure their scammed money before the victims realized the scams.
5.2. Understanding the Scam Behaviors
We next investigate the behaviors of these scam tokens and liquidity pools, i.e., how they cheat unsuspecting users and get a profit. We first randomly select 100 scam liquidity pools, and manually examine their transactions on Uniswap and their corresponding token smart contracts to investigate their scam behaviors. Then we design methods to check all the liquidity pools and scam tokens in our dataset. In general, all the scam liquidity pools are created for the “rug pull” scams, while some of the scam tokens use many tricks to secure or enlarge the scam tokens’ profits.
5.2.1. The “Rug Pull” Scams.
A rug pull is a common kind of scam where developers abandon a project and take their investors’ money (56). From the liquidity pools’ perspective on Uniswap, the ultimate purposes for performing rug pull scams is to fool the victims to invest the scam tokens they created and then drain the money of the pools. The motivating example in Figure 6 shows a case of the rug pull scam. The scammer usually creates a scam token and then provides liquidity of the token by pairing it with a leading cryptocurrency on Uniswap. They will promote the scam token through social networks with attractive advertisements, usually through Telegram. When enough victims rush into the liquidity pool and exchange for the worthless tokens with valuable WETHs or other stable coins, the scammer will withdraw everything from the liquidity pool, and the victims will get nothing but the worthless scam tokens instead. It can explain why many liquidity pools on Uniswap have low levels of liquidity, as we observed in Section 3.3. By analyzing the mint and the burn transaction events, all the scam liquidity pools are carrying out the “Rug Pull” scams.
Further, the “Rug Pull” scams are often combined with other tricks. Most of the attackers were found to swap scam tokens using tricks like pump-and-dump scams (54), making a scam coin skyrocketing in price within hours. Due to the mechanism of Uniswap, purchasing tokens will raise the price and the volume of these tokens. This trick will create the illusion that the scam token is popular and profitable, which can attract inexperienced investors. Since attackers can sell the scam tokens they have or remove the liquidity of the scam token pools after the rises of token prices, most of the money they invested in the scam pool will eventually go back to the attackers and many attackers are willing to perform this trick. In our dataset, 93% of the liquidity pools have ever had swap transactions initiated by token/pool creators or their collusion addresses, and some even swapped for a large amount of money. For example, the token creator 0x2faea647a49a43187ff19cdd5698489ea9a6acb1 swapped 150 WETHs for the token RadixDLT.com (RADIX)141414Token address:0x4b7266fa8ffda838c64c4b93a7092afe4bd68ed4 , which leads to the increase of the price by about 80%. Besides using the creators addresses to add liquidity and swap for tokens, many scam campaigns are using multiple collusion addresses to add/remove liquidity or swap tokens. These addresses could be operated like normal investors, which makes them hard to be detected.
Second-round Scams. Besides, 439 liquidity pools are found to perform second round scams, i.e., after they make a profit by removing liquidity from scam token pools, scammers add liquidity again to the same pools and start a new round of scam. For example, the pool creator 0x1bf3bd8e8afe80d786caa69f98385e0aa7e312ff created liquidity pool for Xfinances (XFIS) token151515Token address:0xac51e84ccf9ff013f54cc53bbed80250e558d1aa and added liquidity to it on 2020 October 9th. He earned over 72 ETHs (roughly $43K) through the rug pull scam in about 30 minutes. And only after 4 minutes, he added liquidity again. He then removed liquidity again after 6 hours, and made another profit of 26 ETHs (roughly $16K). It suggests that many victims never check the transaction history of the pool before they rush into it.
5.2.2. Scam Token Smart Contracts with Traps.
From the perspective of scam tokens, some of them have inserted well-designed traps in the code to further cheat the investors. Through exploring abnormal transactions involving multiple recipients and investigating open source scam contracts, we have observed two types of such tricks and designed a method to identify them.
Backdoors in scam token contracts. In general, a scam token campaign would meet two issues in successfully carrying out the scams. First, some experienced investors may not believe in the token/pool creators who hold most of the tokens they issued. Second, once victims found they were cheated after the trading in Uniswap, they usually seek to to mitigate a loss, i.e., by swapping back their valuable tokens as soon as possible before the liquidity has been removed by the scammers. Thus, we observe that some advanced actions could be performed by scammers to either prevent the victims from getting back of their money or gain the trust of investors.
For the first issue, token creators could design malicious freely mint token contracts to enable specific scam addresses to add their token balance deliberately while claiming they do not hold many scam tokens to reduce the victims’ suspicions. By doing so, these specific addresses could swap the tokens in the liquidity pool when the token price is at a high point (i.e., some victims rushed to the pool). These specific scam addresses could perform this kind of operations stealthily since this kind of operation may even not emit events for users to track based on the design. Figure 9 (a) shows an example of this kind of scam token contract, where the Leopard lending ecology (LLE) token 161616Token address:0x48fa649638318aa0e85dc0fec425c015304d175a, its contract name is called “SoloToken” lefts a back door for the scam address 0x7eed24c6e36ad2c4fef31ec010fc384809050926. The address can call the “mint” function to add any mount of LLE token for it deliberately. The function was not implemented to emit any event and thus users and Dapps rely on events to track the transfers of this token cannot find the mint activity easily. In fact, the address added tokens to it (which are even more than the token’s total supply) and swapped these tokens to get roughly Ethers (roughly $28K) from the LLE liquidity pool.
To address the second issue, some token creators come up with a trick to prevent victims from selling tokens except for token creators themselves by designing sale-restrict token contracts. Figure 9 (b) shows such an example. As written in the contract modifier, the VIPswap (VIP) token 171717Token address:0x3b0407c648dd2f3eaa23fc69f952d98b2f24257e allows all the users to buy VIP tokens while restrict all the users except the contract owner to sell them. The token creator 0x7af0f3e99a30b682d61c07be19c5874fb80e3832 created 58 such tokens and these tokens gained a profit of over Ethers (roughly $46K).
To identify the backdoors, we first analyze the token contracts and identify the contracts that used Solidity modifiers in their key functions (like selling tokens or minting tokens), as Solidity modifier is mainly used for automatically checking a condition prior to executing a function. Then, we manually check the code to see if these modifiers are used to restrict the functions to be executed only by attackers. By this, we find 297 freely mint token contracts with 131 token creators and 373 sale-restrict token contracts token contracts with 109 token creators.
Advance-fee tokens Besides gaining a profit from “Rug Pulls” scams, some attackers have designed tokens that will charge a fee when users perform mint, swap, or burn operations. For an advance-fee token, the fee rules are often written in its codes to transfer part of the tokens to a specific address every time when a swap operation happens. This specific address is claimed to be a bonus or reward pool to reward the users involved in the transactions of this tokens.
To identify them, we track all the transactions related to scam tokens, and identify abnormal transactions where scammers received tokens or ETHs even though they are not the direct participants of these transactions. We have identified 63 advance-fee tokens. A typical example is shown in Figure 9 (c)181818Transaction Hash:0xb63891bebda1d330e06603d680accfb1c5ace82c1e473e5cfa620b511d544ae9. The piratetoken.finance (PIRATE) token191919Token address:0x94152edd72eab86c016c3c5fb40376a88f10de5b will charge a 5% fee from investors to the address 0x2118f52b76602fde203f4ad1ea48690223af7568, which is declared by the scammer as a daily bonus to a random user address. However, the address will never send out the bonus and this is just an excuse to stimulate token transfers and intend to raise the token’s price in disguise.
5.3. Understanding the Scammers
As aforementioned, different kinds of scam addresses controlled by the scammer would collude to carry out a scam. In general, the following five kinds of scam Ethereum addresses are involved: 1) scam token addresses, the token used to carry out scams; 2) scam liquidity pool addresses, the liquidity pools consist of pairs of scam tokens and other tokens; 3) the creators of scam tokens, addresses that create the scam tokens on Ethereum; 4) scam pool creators (first mintors), addresses that create the scam liquidity pools on Uniswap; and 5) collusion addresses, addresses which cooperate with scam token/pool creators to carry out scam campaigns. Since scam tokens and scam pools have already been investigated in previous sections, we will analyze in detail the scam token creator addresses, scam pool creator addresses and collusion addresses.
5.3.1. The creators of scam tokens.
The scam tokens are created by scammers, and 89% of them were first minted by their creators. Roughly 76% of the scam token creators only created one scam token while 1% of the scam creators have created more than 10 scam tokens. The scam creator that released most number of tokens is 0x3bcfa9357ab84baec04313650d0eebb3fd51070d, with 87 scam tokens in total, including several counterfeit tokens like Keep3r (KPR)202020Token Address:0x66f04254ca406cedf222687afe873a35da573f2c,YKeep3r.network (YKP3R)212121Token Address:0x7a1c0213c9e05ed1b20d691ceda7387a62725143 ,etc., targeting at Keep3rV1 (KP3R) (44), a famous DeFi project.
5.3.2. The creators of scam liquidity pools.
As to the liquidity pools, 11,269 scam pools are created by creators (first mintor). When considering liquidity providing, pool creators usually initially provide a substantial amount of liquidity into their pool to cultivate investor confidence, as the liquidity they provided will eventually go back to them. Over 60% of the pool creators have ever provided liquidity with valuable tokens at a cost of more than $10K. The most lavish pool creator has provided liquidity with ETH (roughly $962K) in the Cybercore.Finance (CYBER)222222Token address:0x5f21f580261a773aab2bff6cbc9814f6e7a67d78-WETH pool.
5.3.3. Collusion Scam Addresses.
To attract victims and avoid the scams being easily detected, collusion scam addresses are usually utilized to collaborate with the scam token/pool creators to carry out scams. Some collusion addresses are participated in providing and removing liquidity of the scam pools while the other collusion addresses are used to swap tokens (e.g., like the pump and dump aforementioned). The collusion addresses could swap valuable tokens for scam tokens to raise the price of scam tokens, or in contrast, they could swap scam tokens for valuable tokens back to drain the pool and make a profit. In the case of Super Core Reserve Token (SCRT)232323Token address:0x002ef27dee7a7d74ba59671385c51aa3d561d228, the token creator and pool creator 0xc7f82560c727c2045e7c19f8bc29c5cb8d258f7c first transferred 750 SCRT to its collusion address 0x39e407b5cf03311251c79c60dbc72b842007ba12. Then this collusion address waited for the price rise, sold out the tokens it had and transferred the ETHs it got to the creator. The behaviors of collusion addresses may look similar to victims on Uniswap and it is hard to distinguish them based solely on Uniswap events, while we should not regard the collusion addresses as victims. Thus, we further design a method to accurately detect collusion addresses.
Detecting collusion addresses. One major characteristic of the collusion addresses is that they should have strong connections with other scam addresses operated by the same scammer. They first need to operate on the same scam Uniswap pool with other scam addresses. Besides, there should be money flows between the collusion addresses and other known scam addresses, as the collusion addresses may either receive money from scammers to interact with the pools (i.e., add liquidity or swap scam tokens) or transferring money they earned (by removing liquidity or sell scam tokens) to the scammers for aggregation. Thus, to effectively differentiate collusion addresses and victims, we have categorized the collusion addresses into the following four categories based on their transaction behaviors (i.e, mint, burn, and swap) on Uniswap and summarized their features, which is shown in Figure 10.
Add liquidity to the scam pool. For the addresses that inserted liquidity to the scam pool, if they have ever received Ether or stable coins (i.e., according to the corresponding token pairs of the pool) from the known scam addresses of the pool (e.g., scam token/pool creators) before their adding liquidity transactions, we will flag them as collusion addresses.
Remove liquidity from the scam pool. For the addresses that removed liquidity from the scam pool, if they have transferred the Ether or stable tokens to known scam addresses of the pool after their removing liquidity transactions, we will flag them as collusion addresses.
Swap valuable coins for the scam tokens. For the addresses that swapped Ether or stable coins for scam tokens (in order to raise the token price and attract victims), if they have ever received Ether or stable coins from known scam addresses of the pool before their swapping transactions, we will flag them as collusion addresses.
Swap scam tokens for valuable coins. For the address swapped scam tokens for Ether or stable coins to gain a profit, if they have transferred the valuable tokens to known scam addresses of the pool after their swapping transactions, we will flag them as collusion addresses.
We believe these heuristics are comprehensive (i.e., covering all kinds of possible behaviors of collusion addresses on Uniswap) and reliable (i.e., by no means a victim would behave like this). Thus, from the known scam addresses (i.e., scam token/pool creators) of a given scam pool, we iteratively discover the collusion addresses. At last, we get collusion addresses connected with scam token/pool creators. Among them, addresses had been used in swap operations, while addresses had been involved in liquidity related operations.
Overall, scam addresses related to the scams are identified, including scam tokens, scam pools, scam token creators, scam pool creators, and collusion addresses. Note that, one address can serve more than one roles across different scam liquidity pools.
|0xfc2903fa0ee403b0e49cc7fb0919f04c4a49ee28||certik.foundation (CTK)||Wrapped Ether (WETH)||1,502,119||366|
|0x0383eeb899e7fc0f4f696ebfcb5672ad7e0d271c||woo.network (WOO)||Wrapped Ether (WETH)||1,188,844||321|
|0xaa2e4317b13e3b4edfd45642516b31e211c3e71f||medicalveda.com (MVEDA)||Wrapped Ether (WETH)||871,450||142|
|0xa356939e22878af64560ba7e4253650f8cd9915d||flamingo.finance (FLM)||Wrapped Ether (WETH)||843,913||155|
|0xb7864c708ad58af75c756c26b1ba155bfa0e2307||yfi.group (YFIG)||Wrapped Ether (WETH)||706,504||1,699|
|0xf2486c8f03afb444783427d620bf75510766e88d||akash.network (AKT)||Wrapped Ether (WETH)||628,423||120|
|0x57a5dd974adac8738d6796502c899d13e8903141||Alpha Finance Lab (ALPHA)||Wrapped Ether (WETH)||597,094||155|
|0x9e3fcc46ef41eb5c20f404c4c35848deb34044fc||Deriswap (DWAP)||Wrapped Ether (WETH)||498,349||124|
|0xaacd36c877408824ee59540b0c093804d7e9a7d9||Meridian Network (MRDN)||Wrapped Ether (WETH)||489,992||923|
|0x700fa01ac5b01d6d92384062906f463292e682c9||Injective Protocol (INJ)||Wrapped Ether (WETH)||477,553||135|
5.4. Measuring the Financial Impact
We next perform an impact analysis on these scams based on all the transactions related to Uniswap we collect in Section 3.1.1. In each transaction, all the balance change related to participants on Uniswap are calculated and tracked. In total, these scam tokens profit over $16 million, including over 28K ETHs and other leading official tokens, from potential victim addresses (i.e., all the scam liquidity pool investors excluding the scam addresses).
On average, each scam liquidity pool has gained a profit of . Table 2 shows the top 10 most profitable pools. The most profitable scam liquidity pool, reserving certik.foundation (CTK) - Wrapped Ether (WETH) pair, made a profit of over $1.5 million. It impersonated to be the official token of Certik (19), a blockchain security company whose official token contract was built on BSC chain, and fooled 366 potential victims. Since most of the scam tokens only have one liquidity pool (see Section 5.1.3), the overall impact of scam tokens is similar with the liquidity pools, and the top-10 profitable tokens are also shown in Table 2. It is notable that all these 10 tokens are the imitations of existing blockchain projects. Six of the top-10 scam tokens are camouflaging official ERC-20 tokens, while some scam campaigns also created counterfeit cryptocurrency targeting official tokens that released on other blockchain platforms, such as flamingo.finance (FLM) on NEO blockchain.
Our observations suggest the urgency to identify and avoid the scam tokens on DEXs. The root cause of this kind of scam is that DEX like Uniswap does not maintain any rules for token listing, and Ethereum does not regulate the naming schemes of scam tokens. Thus, there is a strong need to design policies to regulate the token releasing on Ethereum and the liquidity pool listing on DEXs. However, this may contradict with the goal of DEX, a fully decentralized marketplace. We believe a token reputation system is needed to decrease the impact of scam tokens. Techniques like the ones proposed in this paper could be used to flag suspicious tokens, and they can be further embedded in the DEX front-end to warn users when they try to engage with the suspicious tokens. Further, awareness should also be raised among investors. Rather than searching for tokens or pairs on Uniswap (as the scam token names are confusingly similar with the official ones), the investors should rely on trusted sources like CoinMarketCap or the official sites of DeFi projects to make sure they are trading with the official tokens. Also, before diving into the liquidity pool, investors should carefully check the transaction history of the pool, and pay special attention to the coin skyrocketing in price within a short time. At last, for the operation team of a DeFi project, they should be aware of the scam token abuse (which would hurt their reputation), and regularly post public announcements to remind their investors.
Our study carries certain limitations. First, our scam token detection framework relies on some heuristics and manual efforts for verification. While these heuristics proved effective, we acknowledge that they are too strict that the compiled scam token list may be incomplete. Indeed, our machine learning classifier flags much more suspicious tokens, while to the best of our knowledge, it is non-trivial to verify them and we could identify no better alternatives. Therefore, the characterization study in this paper provides the lower-bound results of the scam tokens on Uniswap. Nevertheless, we have curated by far the largest scam token dataset which will be shared to the research community. Second, although we have tried our best to understand the workflow of these scams and reveal the scam campaigns behind, there might be more complex operation network of the scams (e.g., we did not track how they launder the scammed money). Thus, it is quite possible that there are many scam addresses we did not observe. Third, the scams revealed in this paper are quite possible prevalent in other DEXs as well, while we only study Uniswap V2 in this paper. Scam tokens thrive on DEXs because these types of exchanges allow users to list tokens for free and without audit. In the future work, we will explore scams on other DEXs.
7. Related Work
Research on Cryptocurrency Exchanges. Some researchers are focused on the security issues of centralized cryptocurrency exchanges (Kim and Lee, 2018; McCorry et al., 2018; Chohan, 2018; Feder et al., 2017; Moore et al., 2018; Ji et al., 2020). For example, Kim et al. (Kim and Lee, 2018) analyzed vulnerabilities of cryptocurrency exchanges and individual user wallets and Ji et al. (Ji et al., 2020) demystified the fake deposit vulnerability related to exchanges and tokens. Others are take efforts to evaluate or improve the effectiveness and reliability of decentralized cryptocurrency exchanges (Lo and Medda, 2020; Capponi and Jia, 2021; Wang, 2020; Baum et al., 2021; Annessi and Fast, 2021). Lo et al. (Lo and Medda, 2020) verify the effectiveness of decentralized exchanges and Annessi et al. (Annessi and Fast, 2021) are exploring ways to improve security for DEX users through multiparty computation.
DeFi Security. The security of DeFi is also a hot research topic. There are many researchers studying on the price manipulation of DeFi (Sobol, 2020; Boonpeam et al., 2021; Wang et al., 2021b; Tatabitovska, 2021; Wu et al., 2021; Qin et al., 2020). For example, Boonpeam et al. (Boonpeam et al., 2021) investigate arbitrage strategies and factors for profit-maximizing on decentralized exchanges, while Wang et al. (Wang et al., 2021b) focus on cyclic arbitrage on Uniswap and evaluate its impact. Others have studied secure vulnerabilities and attacks on DeFi and Oracle (Hsu and Lin, 2021; Werner et al., 2021; Gudgeon et al., 2020; Oosthoek, 2021; Caldarelli and Ellul, 2021; Wang et al., 2021a). For example, Hsu et al. (Hsu and Lin, 2021) explored how design weaknesses in DeFi protocols could lead to a DeFi attack and Wang et al. (Wang et al., 2021a) proposed a real-time attack detection system for DeFi projects on the Ethereum blockchain through symbolic reasoning on smart contracts and monitoring transactions.
Blockchain Scams. Many kinds of blockchain scams have been studied, including the Ponzi Schemes (Chen et al., 2018; Bartoletti et al., 2020, 2018; Vasek and Moore, 2018; Chen et al., 2019; Toyoda et al., 2019; Bian et al., 2021), fraudulent Initial Coin Offering (ICO) (Liebau and Schueffel, 2019; Zetzsche et al., 2017), phishing scams (Wu et al., 2020; Phillips and Wilder, 2020; Chen et al., 2020), bitcoin generator scams (Badawi et al., 2020), fake cryptocurrency exchanges (Xia et al., 2020) and counterfeit tokens (Gao et al., 2020) , etc. Some of them also used machine learning methods to detect scams. For example, Badawi et al. (Badawi et al., 2020) utilized search engines to search for web pages and train a classier to detect Bitcoin generator scams. Wu et al. (Wu et al., 2020) proposed a network embedding algorithm to identify phishing addresses. Despite this, as a kind of emerging scams, scam tokens on DEX have not been systematically studied yet and existing techniques cannot be applied to identify scam tokens directly.
This paper presents the first in-depth analysis of scam tokens on Uniswap. We have proposed an effective and accurate method for detecting scam tokens, and identified over 10K scam tokens and scam liquidity pools on Uniswap V2. We have systematically analyzed the scam behaviors, their working mechanism, and the financial impacts. We reveal that scams are prevalent on Uniswap, and we speculate that similar scams could have been sneaked into other DEXs and Defi projects, because the inner cause lies on the loose/empty regulation of cryptocurrency on decentralized platforms. We advocate the cryptocurrency community to maintain a token reputation system using techniques like the ones proposed in this paper to eliminate the impact of scam tokens.
-  (2020) /BIZ/coin - general. Note: https://i.warosu.org/biz/thread/19213296 Cited by: §5.1.2.
-  (2021) Improving security for users of decentralized exchanges through multiparty computation. arXiv preprint arXiv:2106.10972. Cited by: §7.
-  (2021) Automated market maker (amm). Note: https://coinmarketcap.com/alexandria/glossary/automated-market-maker-amm Cited by: §1.
-  (2020) An automatic detection and analysis of the bitcoin generator scam. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS PW), External Links: Cited by: §7.
-  (2020) Balancer amm defi protocol. Note: https://balancer.fi Cited by: §2.4.1.
-  (2020) Bancor network - trade & earn. Note: https://bancor.network Cited by: §2.4.1.
-  (2020) Dissecting ponzi schemes on ethereum: identification, analysis, and impact. Future Generation Computer Systems 102, pp. 259–277. Cited by: §7.
-  (2018) Data mining for detecting bitcoin ponzi schemes. In 2018 Crypto Valley Conference on Blockchain Technology (CVCBT), pp. 75–84. Cited by: §7.
-  (2021) P2DEX: privacy-preserving decentralized cryptocurrency exchange. In International Conference on Applied Cryptography and Network Security, pp. 163–194. Cited by: §7.
-  (2021) Image-based scam detection method using an attention capsule network. IEEE Access. External Links: Cited by: §7.
-  (2021) Binance smart chain - binance.org. Note: https://www.binance.org/en/smartChain Cited by: §4.5.2.
-  (2020) Blockchain - wikipedia. Note: https://en.wikipedia.org/wiki/Blockchain Cited by: §2.1.
-  (2021) The arbitrage system on decentralized exchanges. In 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), External Links: Cited by: §7.
-  (2021) BORE token. Note: https://bnbvault.finance Cited by: §4.5.2.
-  (2001) Random forests. Machine learning 45 (1), pp. 5–32. Cited by: §4.5.1.
-  (2020) Browse and explore subgraphs - the graph. Note: https://thegraph.com/explorer/ Cited by: §3.1.1.
-  (2021) The blockchain oracle problem in decentralized finance-a multivocal approach. Cited by: §7.
-  (2021) The adoption of blockchain-based decentralized exchanges. External Links: Cited by: §7.
-  (2021) CertiK blockchain security leaderboard. Note: https://www.certik.org/ Cited by: §5.4.
LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2 (3), pp. 1–27. Cited by: §4.5.1.
-  (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. Cited by: §4.5.1.
-  (2020) Phishing scam detection on ethereum: towards financial security for blockchain ecosystem.. In IJCAI, pp. 4506–4512. Cited by: §7.
-  (2018) Detecting ponzi schemes on ethereum: towards healthier blockchain technology. In Proceedings of the 2018 World Wide Web Conference, pp. 1409–1418. Cited by: §7.
-  (2019) Exploiting blockchain data to detect smart ponzi schemes on ethereum. IEEE Access 7, pp. 37575–37586. Cited by: §7.
-  (2018) The problems of cryptocurrency thefts and exchange shutdowns. Available at SSRN 3131702. Cited by: §7.
-  (2021) CoinMarketCap: cryptocurrency prices, charts and market capitalizations. Note: https://coinmarketcap.com/ Cited by: §1, §2.1, §4.3.1.
-  (2020) Decentralized applications (dapps) | ethereum.org. Note: https://ethereum.org/en/dapps/ Cited by: §2.1.
-  (2020) Decentralized finance (defi) – uniswap is crawling with fake tokens! – cryptocurrencies. Note: https://personal-financial.com/2020/09/04/decentralized-finance-defi-uniswap-is-crawling-with-fake-tokens-cryptocurrencies/ Cited by: §1.
Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics 35 (5-6), pp. 352–359. Cited by: §4.5.1.
-  (2020) DYdX. Note: https://dydx.exchange Cited by: §2.4.1.
-  (2020) EtherDelta. Note: https://etherdelta.com Cited by: §2.4.1.
-  (2020) Ethereum (eth) blockchain explorer. Note: https://etherscan.io/ Cited by: §4.3.1.
-  (2020) Ethereum definition - investopedia. Note: https://www.investopedia.com/terms/e/ethereum.asp Cited by: §2.1.
-  (2020) Fake ethereum tokens net $53,000 in just 30 minutes. Note: https://decrypt.co/49208/fake-ethereum-tokens-net-53000-in-just-30-minutes Cited by: §4.1.
-  (2020) Fake tokens continue to plague uniswap. Note: https://cointelegraph.com/news/fake-tokens-continue-to-plague-uniswap Cited by: §1.
-  (2017) The impact of ddos and other security shocks on bitcoin currency exchanges: evidence from mt. gox. Journal of Cybersecurity 3 (2), pp. 137–144. Cited by: §7.
-  (2020) Tracking counterfeit cryptocurrency end-to-end. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4 (3), pp. 1–28. Cited by: §4.3.2, §7.
-  (2020) The decentralized financial crisis. In 2020 Crypto Valley Conference on Blockchain Technology (CVCBT), External Links: Cited by: §7.
-  (2021) Hack brief: hackers stole $40 million from binance cryptocurrency exchange. Note: https://www.wired.com/story/hack-binance-cryptocurrency-exchange/ Cited by: §1.
-  (2021) Hacked! malicious group leaks data of 161,400 crypto traders on buyucoin. Note: https://www.financemagnates.com/cryptocurrency/news/hacked-malicious-group-leaks-data-of-161400-crypto-traders-on-buyucoin/ Cited by: §1.
-  (2021) Analysis and solution of exploiting vulnerabilities of smart contracts in decentralized financial applications. Communications of the CCISA. Cited by: §7.
-  (2020) IDEX high-performance decentralized exchange. Note: https://idex.io Cited by: §2.4.1.
-  (2020) Deposafe: demystifying the fake deposit vulnerability in ethereum smart contracts. In 2020 25th International Conference on Engineering of Complex Computer Systems (ICECCS), pp. 125–134. Cited by: §7.
-  (2020) Keep3r. Note: https://keep3r.network/ Cited by: §5.3.1.
-  (2018) A domain is only as good as its buddies: detecting stealthy malicious domains via graph inference. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pp. 330–341. Cited by: §4.4.
-  (2018) Risk management to cryptocurrency exchange and investors guidelines to prevent potential threats. In 2018 International Conference on Platform Technology and Service (PlatCon), pp. 1–6. Cited by: §7.
-  (2019) Crypto-currencies and icos: are they scams? an empirical study. An Empirical Study (January 23, 2019). Cited by: §7.
-  (2020) Uniswap and the emergence of the decentralized exchange. Available at SSRN 3715398. Cited by: §7.
-  (2018) Why preventing a cryptocurrency exchange heist isn’t good enough. In Cambridge International Workshop on Security Protocols, pp. 225–233. Cited by: §7.
-  (2018) Revisiting the risks of bitcoin currency exchange closure. ACM Transactions on Internet Technology (TOIT) 18 (4), pp. 1–18. Cited by: §7.
-  (2021) North korean hackers accused of ‘biggest cryptocurrency theft of 2020’—their heists are now worth $1.75 billion. Note: https://www.forbes.com/sites/thomasbrewster/2021/02/09/north-korean-hackers-accused-of-biggest-cryptocurrency-theft-of-2020-their-heists-are-now-worth-175-billion/?sh=67dd69885b0b Cited by: §1.
-  (2021) Flash crash for cash: cyber threats in decentralized finance. arXiv preprint arXiv:2106.10740. Cited by: §7.
-  (2020) Tracing cryptocurrency scams: clustering replicated advance-fee and phishing websites. arXiv preprint arXiv:2005.14440. Cited by: §7.
-  (2020) Pump and dump. Note: https://www.investopedia.com/terms/p/pumpanddump.asp Cited by: §5.2.1.
-  (2020) Attacking the defi ecosystem with flash loans for fun and profit. arXiv preprint arXiv:2003.03810. Cited by: §7.
-  (2020) Rug pull | coinmarketcap. Note: https://coinmarketcap.com/alexandria/glossary/rug-pull Cited by: §5.2.1.
-  (2020) Towards attribution in mobile markets: identifying developer account polymorphism. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 771–785. Cited by: §4.4.
-  (2020) Frontrunning on automated decentralized exchange in proof of stake environment.. IACR Cryptol. ePrint Arch.. Cited by: §7.
-  (2021) Mitigation of transaction manipulation attacks in uniswap. Cited by: §7.
-  (2019) A novel methodology for hyip operators’ bitcoin addresses identification. IEEE Access 7, pp. 74835–74848. Cited by: §7.
-  (2020) Uniswap | home. Note: https://uniswap.org Cited by: §1, §2.4.1.
-  (2021) Uniswap analytics. Note: https://v2.info.uniswap.org/home Cited by: §1.
-  (2021) Uniswap is not always rainbows and unicorns — here’s how to recognize a uniswap scam. Note: https://blog.blockbank.ai/uniswap-is-not-always-rainbows-and-unicorns-heres-how-to-recognize-a-uniswap-scam-cb85f84a741e Cited by: §1.
-  (2020) Uniswap users rush back to sushiswap after uni rewards end. Note: https://cryptobriefing.com/uniswap-users-rush-back-sushiswap-after-uni-rewards-end/ Cited by: §3.2.
-  (2018) Analyzing the bitcoin ponzi scheme ecosystem. In International Conference on Financial Cryptography and Data Security, pp. 101–112. Cited by: §7.
-  (2021) BLOCKEYE: hunting for defi attacks on blockchain. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 17–20. Cited by: §7.
-  (2021) Cyclic arbitrage in decentralized exchange markets. Available at SSRN 3834535. Cited by: §7.
-  (2020) Automated market makers for decentralized finance (defi). External Links: Cited by: §7.
-  (2021) SoK: decentralized finance (defi). External Links: Cited by: §7.
-  (2020) Who are the phishers? phishing scam detection on ethereum via network embedding. IEEE Transactions on Systems, Man, and Cybernetics: Systems. Cited by: §7.
-  (2021) DeFiRanger: detecting price manipulation attacks on defi applications. arXiv preprint arXiv:2104.15068. Cited by: §7.
-  (2020) Characterizing cryptocurrency exchange scams. Computers & Security 98, pp. 101993. Cited by: §7.
-  (2020) Yearn. Note: https://yearn.finance/ Cited by: §4.5.2.
-  (2017) The ico gold rush: it’s a scam, it’s a bubble, it’s a super challenge for regulators. University of Luxembourg Law Working Paper (11), pp. 17–83. Cited by: §7.
Appendix Appendix 1 Appendix for machine-learning based detection and verification in Section 4.5.1
Appendix 1.1. Features used in our machine learning classifier.
Table 3 shows the 40 kinds of features we extract to train a scam token classifier, including 7 kinds of time-series features, 24 kinds of transaction features, 4 kinds of investor features and 5 kinds of Uniswap specific features.
|Time-series||The time interval from the first transaction to the last transaction on Uniswap|
|The time interval between the last transaction and the study time on Uniswap|
|The time point of mint events in the whole token lifecycle on Uniswap|
|The time point of swap events in the whole token lifecycle on Uniswap|
|The time point of swap-from events (swap from tartget token for other token) in the whole token lifecycle on Uniswap|
|The time point of swap-to events (swap from other token for target token) in the whole token lifecycle on Uniswap|
|The time point of burn events in the whole token lifecycle on Uniswap|
|Transaction||Total transaction numbers on Uniswap|
|Total transaction numbers on Ethereum|
|Total mint event numbers on Uniswap|
|Total swap event numbers on Uniswap|
|Total swap-to event numbers on Uniswap|
|Total swap-from event numbers on Uniswap|
|, set to -1 if the is 0|
|Total burn event numbers on Uniswap|
|Total number of addresses that have participated in mint events on Uniswap|
|Total number of addresses that have participated in swap events on Uniswap|
|Total number of addresses that have participated in swap-to events on Uniswap|
|Total number of addresses that have participated in swap-from events on Uniswap|
|Total number of addresses that have participated in burn events on Uniswap|
|Total number of addresses that have participated in events on Uniswap|
|Investor||The average liquidity pools the participants that have minted or burnt on Uniswap|
|The average liquidity pools the participants that have swapped on Uniswap|
|The average mint or burn event counts of participants on Uniswap|
|The average swap event counts of participants on Uniswap|
|Uniswap Specific||The number of liquidity pools|
|Amount of tokens traded all time across pairs|
|Amount of tokens in USD traded all time across pairs (only for tokens with a certain level of liquidity)|
|Amount of tokens in USD traded all time across pairs (all tokens)|
|Total amount of token provided as liquidity across all pairs|