Tor Onion Farming in 2021
Around four years ago, I wrote a blog post about creating vanity .onion
domains for Tor. To recap, .onion
domains are special domains understood by
Tor, and when visiting these sites, traffic never leaves the Tor network and
is end-to-end encrypted, just as if you were to use HTTPS. Furthermore, the
server’s identity is completely protected, if you are into that sort of
thing. Also, the domain name is linked to the server’s public key fingerprint,
and so knowing the .onion
domain is sufficient to authenticate the server
and preventing any MITM attacks or certificate authority compromises that
could happen with HTTPS.
Recently, I decided that my password generator correcthorse.pw
(GitHub) should also have a vanity onion domain, and naturally, I decided
to generate some onions.
The most important change since 2017 is probably proposal 224, which added v3 onions that are 56 characters long, instead of 16. When v3 onions first came out, I actually created one for my website, although, given the excellent write-up by Tudor, I didn’t bother writing my own blog post.
While the 16-character-long v2 onions are now deprecated, I still decided to
generate one for compatibility and also out of interest. For this, I used the
same tool as before: scallion. Having access to a GTX 1060 6GB compared
to the mobile GeForce 940MX that I had in 2017 allowed me to be far more
ambitious with my plans, however. Instead of spending a few hours generating
an onion matching the seven-character prefix followed by a number
(quantum2l7xnxwtb.onion
) as I had in 2017, I decided to generate a
nine-character prefix followed by a number (correctpw[234567]
). This took me
about 24 hours in total at 2.7 GH/s, and I finally obtained
correctpw3wmw7mw.onion
.
For the v3 onion, I used the tried-and-true tool mkp224o
, just as
Tudor had. This is a CPU-based tool, and could in fact be run simultaneously
with the scallion
.
However, things have changed quite a bit since 2018. In 2019, a batched mode
was introduced to mkp224o
, making it more than 10× faster than it had been.
On my 12-core Ryzen 9 3900X, I was able to hit 75 MH/s, compared to the paltry
4.8 MH/s Tudor managed with his grand fleet of cloud servers. However, despite
the much-increased capacity, I still only attempted a 7 character prefix
followed by a number (correct[234567]
), as 75 MH/s was two orders of
magnitude slower than the gigahashes per second I managed with my GPU.
Still, it took only around 10 hours for me to get a good collection of onions
matching the desired prefix, and from this list, I picked
correct2
.
In short, v2 onions hadn’t changed at all, while v3 vanity onions could be
generated an order of magnitude faster thanks to significant improvements to
mkp224o
.
Statistical Analysis
When generating onions, it is always helpful to be able to estimate how long the process should take before you commit to it. This estimate can not be very precise though, as the process is probabilistic and memoryless. This is a bit sad: if you have a 50% chance of getting the onion you want after 1 trillion hashes, and you have already done 1 trillion hashes, you still only have a 50% chance of getting the onion after another trillion hashes. In this situation, there is no progress.
In Tudor’s post, he modelled this as a Poisson process and used the exponential distribution to calculate the probability of finding a match after a certain time. This is, strictly speaking, not correct, as the hashing process is discrete: each hash either produces a match, or it doesn’t. Thus, each hash is best modelled as a Bernoulli trial, and the geometric distribution models this exact situation: the number of trials (hashes) it takes before we get one success (produces a match). The exponential distribution, on the other hand, is the continuous analogue of the geometric distribution and is best used to model continuous processes, although, in this situation, it is a good approximation.
In any case, let us move onto the derivation.
Probability of Single Hash
First, we need to look at the probability of a single hash matching our desired prefix.
For simple prefixes, such as quantum
, this is simple. Onion domain names are
base32-encoded, and so there are 32 different possibilities for each character.
Therefore, there is a chance that any given character will be what we
want. For a seven character prefix, seven characters need to match
simultaneously, and so the total probability is . In general, for
an character long prefix, the probability of a match is .
However, simple prefixes can be hard to read, and it’s generally better to end
the prefix with a number so that it is easy to tell where the prefix ends.
This is the form that I used for the onions described in this post. In base32,
there are six digits used: 234567
. Therefore, the probability that any given
character is a digit is , or . Therefore, for an
character long prefix followed by a number, the probability of a match is
.
For the prefix correctpw[234567]
, there are 9 characters followed by a
number, so the probability is
Hashes Required
Let be the probability that an individual hash will match our desired pattern, which we computed in the previous step. As described before, the number of hashes required is best modelled with the geometric distribution. Therefore, we define the random variable , where is the number of hashes required.
The expected value, i.e. the number of hashes after which there is a 50% chance that the desired onion will be generated, is .
To know the probability that after hashes, the desired onion is generated, i.e. , we can use the cumulative distribution function (CDF), which describes this exact quantity. For the geometric distribution, the CDF is .
Continuing with the example of correctpw[234567]
, the expected value would
be
This is around 188 trillion hashes (terahashes). Here is the plot for the CDF:
Conversion to Time
Now, we can just divide the number of hashes by the hash rate. Let be the time it takes for hashes and be the hash rate. Using my GTX 1060, which was able to do 2.7 GH/s, as an example:
From this we can see that it should have taken me around 19 hours.
Simplified Equations
For convenience, here are the expressions for the time required directly, using the convention that is the random variable for the time taken, and is the hash rate:
For simple prefixes of length , these reduce to:
For prefixes of length followed by a number, these reduce to: