Tor Onion Farming in 2021

Around four years ago, I wrote a blog post about creating vanity .onion domains for Tor. To recap, .onion domains are special domains understood by Tor, and when visiting these sites, traffic never leaves the Tor network and is end-to-end encrypted, just as if you were to use HTTPS. Furthermore, the server’s identity is completely protected, if you are into that sort of thing. Also, the domain name is linked to the server’s public key fingerprint, and so knowing the .onion domain is sufficient to authenticate the server and preventing any MITM attacks or certificate authority compromises that could happen with HTTPS.

Recently, I decided that my password generator correcthorse.pw (GitHub) should also have a vanity onion domain, and naturally, I decided to generate some onions.

The most important change since 2017 is probably proposal 224, which added v3 onions that are 56 characters long, instead of 16. When v3 onions first came out, I actually created one for my website, although, given the excellent write-up by Tudor, I didn’t bother writing my own blog post.

While the 16-character-long v2 onions are now deprecated, I still decided to generate one for compatibility and also out of interest. For this, I used the same tool as before: scallion. Having access to a GTX 1060 6GB compared to the mobile GeForce 940MX that I had in 2017 allowed me to be far more ambitious with my plans, however. Instead of spending a few hours generating an onion matching the seven-character prefix followed by a number (quantum2l7xnxwtb.onion) as I had in 2017, I decided to generate a nine-character prefix followed by a number (correctpw[234567]). This took me about 24 hours in total at 2.7 GH/s, and I finally obtained correctpw3wmw7mw.onion.

For the v3 onion, I used the tried-and-true tool mkp224o, just as Tudor had. This is a CPU-based tool, and could in fact be run simultaneously with the scallion.

However, things have changed quite a bit since 2018. In 2019, a batched mode was introduced to mkp224o, making it more than 10× faster than it had been. On my 12-core Ryzen 9 3900X, I was able to hit 75 MH/s, compared to the paltry 4.8 MH/s Tudor managed with his grand fleet of cloud servers. However, despite the much-increased capacity, I still only attempted a 7 character prefix followed by a number (correct[234567]), as 75 MH/s was two orders of magnitude slower than the gigahashes per second I managed with my GPU. Still, it took only around 10 hours for me to get a good collection of onions matching the desired prefix, and from this list, I picked correct2qlofpg4tjz5m7zh73lxtl7xrt2eqj27m6vzoyoqyw4d4pgyd.onion.

In short, v2 onions hadn’t changed at all, while v3 vanity onions could be generated an order of magnitude faster thanks to significant improvements to mkp224o.

Statistical Analysis

When generating onions, it is always helpful to be able to estimate how long the process should take before you commit to it. This estimate can not be very precise though, as the process is probabilistic and memoryless. This is a bit sad: if you have a 50% chance of getting the onion you want after 1 trillion hashes, and you have already done 1 trillion hashes, you still only have a 50% chance of getting the onion after another trillion hashes. In this situation, there is no progress.

In Tudor’s post, he modelled this as a Poisson process and used the exponential distribution to calculate the probability of finding a match after a certain time. This is, strictly speaking, not correct, as the hashing process is discrete: each hash either produces a match, or it doesn’t. Thus, each hash is best modelled as a Bernoulli trial, and the geometric distribution models this exact situation: the number of trials (hashes) it takes before we get one success (produces a match). The exponential distribution, on the other hand, is the continuous analogue of the geometric distribution and is best used to model continuous processes, although, in this situation, it is a good approximation.

In any case, let us move onto the derivation.

Probability of Single Hash

First, we need to look at the probability of a single hash matching our desired prefix.

For simple prefixes, such as quantum, this is simple. Onion domain names are base32-encoded, and so there are 32 different possibilities for each character. Therefore, there is a $1/32$ chance that any given character will be what we want. For a seven character prefix, seven characters need to match simultaneously, and so the total probability is $1/32^7$ . In general, for an $x$ character long prefix, the probability of a match is $1/32^x$ .

However, simple prefixes can be hard to read, and it’s generally better to end the prefix with a number so that it is easy to tell where the prefix ends. This is the form that I used for the onions described in this post. In base32, there are six digits used: 234567. Therefore, the probability that any given character is a digit is $6/32$ , or $3/16$ . Therefore, for an $x$ character long prefix followed by a number, the probability of a match is $6/32^{x+1}$ .

For the prefix correctpw[234567], there are 9 characters followed by a number, so the probability is

p = \frac{6}{32^{x+1}} = \frac{6}{32^{10}} = 5.329 \times 10^{-15}.

Hashes Required

Let $p$ be the probability that an individual hash will match our desired pattern, which we computed in the previous step. As described before, the number of hashes required is best modelled with the geometric distribution. Therefore, we define the random variable $X \sim \text{Geo}(p)$ , where $X$ is the number of hashes required.

The expected value, i.e. the number of hashes after which there is a 50% chance that the desired onion will be generated, is $1/p$ .

To know the probability that after $x$ hashes, the desired onion is generated, i.e. $\Pr(X \le x)$ , we can use the cumulative distribution function (CDF), which describes this exact quantity. For the geometric distribution, the CDF is $1-(1-p)^x$ .

Continuing with the example of correctpw[234567], the expected value would be

E[X] = \frac{1}{p} = \frac{32^{10}}{6} = 1.876 \times 10^{14}.

This is around 188 trillion hashes (terahashes). Here is the plot for the CDF:

Conversion to Time

Now, we can just divide the number of hashes by the hash rate. Let $T$ be the time it takes for $X$ hashes and $H$ be the hash rate. Using my GTX 1060, which was able to do 2.7 GH/s, as an example:

E[T] = \frac{E[X]}{H} = \frac{1.876 \times 10^{14}~\text{hashes}}{2.7 \times 10^9~\text{hashes/s}} \allowbreak = 69\,481~\text{s} = 19.3~\text{h}

From this we can see that it should have taken me around 19 hours.

Simplified Equations

For convenience, here are the expressions for the time required directly, using the convention that $T$ is the random variable for the time taken, and $H$ is the hash rate:

\begin{align*} E[T] &= \frac{E[X]}{H} = \frac{1}{pH} \\ E[T \le t] &= 1-(1-p)^{tH} \end{align*}

For simple prefixes of length $x$ , these reduce to:

\begin{align*} E[T] &= \frac{32^x}{H} \\ E[T \le t] &= 1-\left(1-\frac{1}{32^x}\right)^{tH} \end{align*}

For prefixes of length $x$ followed by a number, these reduce to:

\begin{align*} E[T] &= \frac{32^{x+1}}{6H} \\ E[T \le t] &= 1-\left(1-\frac{6}{32^{x+1}}\right)^{tH} \end{align*}