Identifying Ethereum Address Owners Using Testnet Faucet Mechanisms
How I extracted 278,090 records from the Kovan faucet, mapping Ethereum addresses to real identities — exposing over $110M in holdings.
Faucets
About a year ago I used the Ethereum Kovan testnet to test an application I was developing at the time. It required getting some testnet ETH (KETH) using the Kovan faucet to create transactions and test the application.

The way Kovan faucet works is as follows:
- Join the Kovan testnet/faucet room on gitter using your GitHub account login.
- Post your Ethereum address to which you would like to receive the testnet funds.
- The funds are sent to your address on the Kovan network.

Extracting the Data
At the time this got me thinking about my privacy and how this public gitter chat could be used to map my real identity to my Ethereum address through my GitHub account by anyone with some basic coding skills and some spare time.
Time passed by and lately I started to take interest in Blockchain data analytics, and I remembered the Kovan faucet idea I had. I wondered how hard it could be to create a mapping of all the Kovan faucet chat.
Turns out it isn't complicated and requires only some basic coding skills. I will briefly describe how this can be done. A first version of my code is publicly available at my GitHub ‘identifyEth’ repository. Just to clarify, creating the mapping does not require any hacking or scraping, it is all public information which is easily accessible using the gitter API. Here's how it can be done:
- Sign in to gitter developer and register your application.
- Clone the gitter demo application, and follow the steps found in the README.md.
- Open your web browser and open
http://localhost:7000then sign in, allow and save your Bearer token that appears on the screen. - Identify “kovan-testnet/faucet” roomId using curl from your terminal:
curl -X POST -i \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
"https://api.gitter.im/v1/rooms" \
-d '{"uri":"kovan-testnet/faucet"}'Extract the room id from the response.
- Fetch all the chat messages using a loop and the following endpoint
GET /v1/rooms/:roomId/chatMessages?limit=50. Note that this only retrieves the last 50 messages. Use pagination as described in the gitter API documentation to get all the other previous messages.
My initial code is available in my ‘identifyEth’ repository, however I also developed an extended version which extracts more data and integrates with several APIs and provides additional information.
Sample Data
*identifying information was removed


Dataset Applications
In some of the cases, the user was aware about privacy and used a one-time address, specially created one time GitHub account using a one-time email address. In other cases, when the person shares personal information such as name, email address, LinkedIn account, etc, looking at the GitHub/Twitter account could expose the REAL identity of the address/wallet owner. Furthermore, if the same address was used in mainnet, this could expose the address owner's crypto financial activity, along with other owned addresses by using some advanced methods. We can probably assume that Microsoft (GitHub), and Twitter can access metadata related to their users, the IPs from which the account was created, and other interesting user related data. This information could be used by malicious actors such as scammers or even government backed hackers, companies, governments and law enforcement agencies.
Breakdown of the Data
Total of 278,090 records extracted. 126,823 unique addresses. 39,287 unique addresses with positive balance on mainnet.
| Asset | Amount Held |
|---|---|
| ETH | 19,318 |
| USDT | 10,560,961 |
| USDC | 34,938,658 |
| DAI | 2,881,050 |
| WBTC | 88.70 |
Excluding any other token, NFTs, and other chains such as BSC, Polygon, etc. Meaning — at least $110,328,886 based on ETH price of $3,008 and BTC price of $43,286 at the time of writing.
The holdings of some of the largest accounts:
0x4b85...b553— 1,025.65 ETH ($3,622,220) + $2,476,512 in tokens0x331B...462f— 630.14 ETH ($2,223,656)0xb1dB...3578— 612.12 ETH ($2,161,416) + $10,950,304 in tokens
A quick check shows that at least one of these address owners is identifiable using the dataset.
Example
Let's dive into one of the records from the data set:
2021-09-07T01:04:39.836Z,
0x6773ec31aa7719b30a02a3ab151a2b578ef17842,
XXXos, AAAAAA BBBBB,
https://avatars2.githubusercontent.com/CCCCCCCCCCC,
github, 0.008869738068446203, 0, 0, 0, 0I made sure to remove any identifying data, but this is an example of how this data could be used.
The record provides us the date in which the faucet request was submitted, the GitHub username, real full name (first and last) and the user avatar.

As you can see, we can extract the users' Gmail, LinkedIn account, which basically disclose their real identity (which we already have anyway from the dataset).
Now let's take a look at the user's Ethereum address on the Kovan testnet explorer:

We can see that the address owner initially received some funds from the faucet, then created some contracts and sent some funds to the Kovan Chainlink contract — indicating the user is probably learning or developing an application.
Now let's take a look at the same address on the Ethereum mainnet:

The address owner used the same address on testnet for their mainnet activity, which now discloses some of their real life financial activity. We can see that they own $34.44 worth of Ethereum and $152.01 worth of some other tokens at the current market price. The Ethereum owned in this address was most likely purchased on an exchange and withdrawn to this address. The address owner then tried to make an interaction (which failed due to lack of gas) with OpenSea, the largest NFT marketplace, and eventually swapped some of the ETH to SPANK token using Metamask.
This is a simple example, but on the dataset you could find accounts which expose large portfolios and activity and can be mapped to real identities.
De-anonymize Other Cryptocurrency Address Owners
A simple google search for “gitter faucet” shows that the same technique could be used for okexchain and evannetwork.

Of course, how it can be leveraged depends also on the properties of the blockchain. In Ethereum, testnet and mainnet addresses will be generated in the same manner, thus exposing a testnet address will expose the mainnet activity of the same address. In Bitcoin, knowing the testnet address, you would like to reverse the process of address generation to get the public key, from it you could generate the corresponding mainnet address. However reversing this process is not possible because SHA-256 is a one way function and cannot be reversed.
Airdrops as a Risk Factor
A known way for new DeFi protocols to test their protocols is deploying the dApp to testnet with an incentive campaign called an airdrop. Such campaigns attract users to “play around” in the testnet environment and perform some actions in the new protocol before it launches to mainnet.
The addresses that were active in the testnet during the incentivized testnet campaign later receive some governance or utility tokens of that protocol as a reward for their participation. This method forces users to use the same address on both testnet and mainnet. This increases the risk of exposing their real mainnet activities.
Conclusions and Further Discussion
Clearly, blockchain/crypto developers must be aware of their privacy as most of the ledgers are public and different techniques could be used to de-anonymize their activity. However, requiring users to link their address to their online identity in faucets is at best a poor design for such a system.
Some of the possible methods to avoid identification using faucets are: using a proxy, creating new accounts, avoiding the usage of testnet wallet on mainnet, and using privacy-enhancing crypto solutions.
Faucet analytics — Exploring the dataset shows that there was a massive spike in the faucet activity at the end of March, after which Ethereum price spiked to an ATH in mid April.

Other unique patterns such as same address being used by multiple different accounts allow mapping teams working on the same project.
Originally published on Medium on December 12, 2021.
Links: GitHub Repository
Any views or opinions expressed are my own.