Quantum

Building highly available services: global anycast PowerDNS cluster

Aug 4, 2025

Sysadmin, High availability

19 minutes
Quantum
qt.ax/gad

As I’ve written about before, this blog has multiple geographically distributed backend servers serving the content, with my anycast PowerDNS cluster selecting the geographically closest backend server that’s up and returning it to the user.

Due to various outages I’ve experienced recently, I’ve been thinking a lot more about making my self-hosted services highly available (HA), staying up even if a few servers go down. This is mostly for the sake of my sanity, so that I could just shrug if a server goes down and wait for the provider to bring it back up, instead of panicking. Of course, the added availability also helps, but it’s probably a bigger concern in the enterprise space than it is for hobbyists. As a bonus, if you have nodes spread out across multiple locations, you can also route the user to the geographically closest one for lower latency and faster response times.

Either way, I thought it was time to start a series about building highly available services. We begin with the most important building block—DNS, which is basically required to make any other service highly available.

The stack I’ve chosen for this is MariaDB and PowerDNS, mostly because these are fairly easy to set up and I already have experience with them. Many other alternative tech stacks are probably equally viable, but that’s left as an exercise for the reader. The general idea should apply anyway. Note that anycast isn’t really required, since you can still follow along and deploy two unicast DNS servers for redundancy.

Without further ado, let’s dive in.

(Read more...)
Fast and cheap bulk storage: using LVM to cache HDDs on SSDs

May 11, 2025

Sysadmin

17 minutes
Quantum
qt.ax/lch

Since the inception of solid-state drives (SSDs), there has been a choice to make—either use SSDs for vastly superior speeds, especially with non-sequential read and writes (“random I/O”), or use legacy spinning rust hard disk drives (HDDs) for cheaper storage that’s a bit slow for sequential I/O¹ and painfully slow for random I/O.

The idea of caching frequently used data on SSDs and storing the rest on HDDs is nothing new—solid-state hybrid drives (SSHDs) embodied this idea in hardware form, while filesystems like ZFS support using SSDs as L2ARC. However, with the falling price of SSDs, this no longer makes sense outside of niche scenarios with very large amounts of storage. For example, I have not needed to use HDDs in my PC for many years at this point, since all my data easily fits on an SSD.

One of the scenarios in which this makes sense is for the mirrors I host at home. Oftentimes, a project will require hundreds of gigabytes of data to be mirrored just in case anyone needs it, but only a few files are frequently accessed and could be cached on SSDs for fast access². Similarly, I have many LLMs locally with Ollama, but there are only a few I use very frequently. The frequently used ones can be cached while the rest can be loaded slowly from HDD when needed.

While ZFS may seem like the obvious option here, due to Linux compatibility issues with ZFS mentioned previously, I decided to use Linux’s Logical Volume Manager (LVM) instead for this task to save myself some headache. To ensure reliable storage in the event of HDD failures, I am running the HDDs in RAID 1 with Linux’s mdadm software RAID.

This post documents how to build such a cached RAID array and explores some considerations when building reliable and fast storage.

(Read more...)
On reusing old cases for NAS applications

Apr 25, 2025

Sysadmin

15 minutes
Quantum
qt.ax/ocn

My home server—the one that acts as my router and NAS, while hosting a multitude of services at home, such as mirror.quantum5.ca—had a problem: it was using a nameless case that was at least 20 years old, and it wasn’t doing the job well. The ancient case was from an era when computers were much smaller and emitted a lot less heat. With a modern air cooler, I couldn’t even close the side panel.

However, buying a modern case has significant drawbacks. The design philosophy for cases in the 2020s is completely focused on displaying all the internals with as much glass as possible, offering as much cooling as possible for power-hungry components, or both. Given that spinning hard drives (HDDs) have gone completely out of fashion in the PC market, drive bays are sacrificed to improve cooling and aesthetics. Whereas my 20+-year-old case had six 3.5” HDD bays and four more 5.25” bays for optical drives that could be repurposed to house more HDDs, most modern cases, if they still had 3.5” HDD bays, could host at most three. This was perfectly fine for building PCs, but it was far from ideal for building a NAS.

What I really wanted was a full ATX case with good cooling and as many drive bays as possible. There’s effectively only one case on the market that fulfilled these requirements—Fractal Design’s Meshify 2 or the XL variant—and they came at a price of ~$200 CAD and ~$270 CAD, respectively, which always felt a bit too expensive for this hobby. So instead, I kept using the crappy old case. That was until I found an old Antec 1200, which ticked all my requirements, for free.

This post documents my experience of repurposing the 17-year-old Antec 1200 to fit a modern computer acting as a NAS, and my thoughts on the endeavour after doing it.

(Read more...)
Building a multi-network ADS-B feeder with a $20 dongle

Apr 6, 2025

Sysadmin, Electronics

11 minutes
Quantum
qt.ax/adsb

For a while now, I’ve wondered what to do with my old Raspberry Pi 3 Model B from 2017, which has basically been doing nothing ever since I replaced it with the Atomic Pi in 2019 and an old PC in 2022. I’ve considered building a stratum 1 NTP server, but ultimately did it with a serial port on my server instead.

Recently, I’ve discovered a new and interesting use for a Raspberry Pi—using it to receive Automatic Dependent Surveillance–Broadcast (ADS-B) signals. These signals are used by planes to broadcast positions and information about themselves, and are what websites like Flightradar24 and FlightAware use to track planes. In fact, these websites rely on volunteers around the world running ADS-B receivers and feeding the data to them to track planes worldwide.

Since I love running public services (e.g. mirror.quantum5.ca), I thought I might run one of these receivers myself and feed the data to anyone who wants it. I quickly looked at the requirements for Flightradar24 and found it wasn’t even that much—all you need was a Raspberry Pi, a view of the sky, and a cheap DVB-T TV tuner, such as the very cheap and popular RTL2832U/R820T dongle, which has a software-defined radio (SDR) that could be used to receive ADS-B signals.

I have enough open sky out of my window to run a stratum 1 NTP server with a GPS receiver, so I figured that was also sufficient for ADS-B¹. Since I found an RTL2832U/R820T combo unit with an antenna for around US$20 on AliExpress, I just decided on a whim to buy one. Today, it arrived, and I set out to build my own ADS-B receiver.

(Read more...)
A whirlwind tour of systemd-nspawn containers

Mar 22, 2025

Sysadmin

19 minutes
Quantum
qt.ax/snc
In the last yearly update, I talked about isolating my self-hosted LLMs running in Ollama as well as Open WebUI in systemd-nspawn containers and promised a blog post about it. However, while writing that blog post, a footnote on why I am using it instead of Docker accidentally turned into a full blog post on its own. Here’s the actual post on systemd-nspawn.

Fundamentally, systemd-nspawn is a lightweight Linux namespaces-based container technology, not dissimilar to Docker. The difference is mostly in image management—instead of describing how to build images with Dockerfiles and distributing prebuilt, read-only images containing ready-to-run software, systemd-nspawn is typically used with a writable root filesystem, functioning more similarly to a virtual machine. For those of you who remember using chroot to run software on a different Linux distro, it can also be described as chroot on steroids.

I find systemd-nspawn especially useful in the following scenarios:
1. When you want to run some software with some degree of isolation on a VPS, where you can’t create a full virtual machine due to nested virtualization not being available¹;
2. When you need to share access to hardware, such as a GPU (which is why I run LLMs in systemd-nspawn);
3. When you don’t want the overhead of virtualization;
4. When you want to directly access some files on the host system without resorting to virtiofs; and
5. When you would normally use Docker but can’t or don’t want to. For reasons, please see the footnote-turned-blog post.
In this post, I’ll describe the process of setting up systemd-nspawn containers and how to use them in some common scenarios.

Update (2025-05-19): added safer Alpine container creation instructions and some more guidance on setting up Alpine Linux containers.
(Read more...)
Docker considered harmful

Mar 18, 2025

Sysadmin

9 minutes
Quantum
qt.ax/dch
In the last yearly update, I talked about isolating my self-hosted LLMs running in Ollama, as well as Open WebUI, in systemd-nspawn containers. However, as I contemplated writing such a blog post, I realized the inevitable question would be: why not run it in Docker?

After all, Docker is super popular in self-hosting circles for its “convenience” and “security.” There’s a vast repository of images that exist for almost any software you might want. You could run almost anything you want with a simple docker run, and it’ll run securely in a container. What isn’t there to like?

This is probably going to be one of my most controversial blog posts, but the truth is that over the past decade, I’ve run into so many issues with Docker that I’ve simply had enough of it. I now avoid Docker like the plague. In fact, if some software is only available as a Docker container—or worse, requires Docker compose—I sigh and create a full VM to lock away the madness.

This may seem extreme, but fundamentally, this boils down to several things:
Let’s dive into this.
(Read more...)
On ECC RAM on AMD Ryzen

Feb 17, 2025

Sysadmin

9 minutes
Quantum
qt.ax/ecc

Last time, I talked about how a bad stick of RAM drove me into buying ECC RAM for my Ryzen 9 3900X home server build—mostly that ECC would have been able to detect that something was wrong with the RAM and also correct for single-bit errors, which would have saved me a ton of headache.

Now that I’ve received the RAM and ran it for a while, I’ll write about the entire experience of getting the RAM working and my attempts to cause errors to verify the ECC functionality.

Spoilers: Injecting faults was way harder than it appeared from online research.

(Read more...)
2024: Year in Review

Dec 31, 2024

Year Review

10 minutes
Quantum
qt.ax/24r
For the past two years, I’ve been writing year-end reviews to look back upon the year that had gone by and reflect on what had happened. I thought I might as well continue the tradition this year.

However, I’ll try a new format—instead of grouping by month, I’d group it by area. I’ll focus on the following areas:
Without further ado, let’s begin.
(Read more...)
On btrfs and memory corruption

Dec 22, 2024

Sysadmin

19 minutes
Quantum
qt.ax/bmc

As you may have heard, I have a home server, which hosts mirror.quantum5.ca and doubles as my home NAS. To ensure my data is protected, I am running btrfs, which has data checksums to ensure that bit rot can be detected, and I am using the raid1 mode to enable btrfs to recover from such events and restore the correct data. In this mode, btrfs ensures that there are two copies of every file, each on a distinct drive. In theory, if one of the copies is damaged due to bit rot, or even if an entire drive is lost, all of your data can still be recovered.

For years, this setup has worked perfectly. I originally created this btrfs array on my old Atomic Pi, with drives inside a USB HDD dock, and the same array is still running on my current Ryzen home server—five years later—even after a bunch of drive changes and capacity upgrades.

In the past week, however, my NAS has experienced some terrible data corruption issues. Most annoyingly, it damaged a backup that I needed to restore, forcing me to perform some horrific sorcery to recover most of the data. After a whole day of troubleshooting, I was eventually able to track the problem down to a bad stick of RAM. Removing it enabled my NAS to function again, albeit with less available RAM than before.

I will now explain my setup and detail the entire event for posterity, including my thoughts on how btrfs fared against such memory corruption, how I managed to mostly recover the broken backup, and what might be done to prevent this in the future.

(Read more...)
Implementing ASPA validation in the bird2 filter language

Oct 31, 2024

BGP

11 minutes
Quantum
qt.ax/aspa
When we looked at route authorization, we discussed how Resource Public Key Infrastructure (RPKI)—or more specifically, route origin authorizations—could prevent some types of BGP hijacking, but not all of it. We also mentioned that Autonomous System Provider Authorization (ASPA), a draft standard that extends RPKI to also authenticate the AS path, could prevent unauthorized networks from acting as upstreams. (For more information about upstreams, see my post on autonomous systems).

Essentially, an ASPA is a type of resource certificate in RPKI, just like Route Origin Authorizations (ROAs), which describes which ASNs are allowed to announce a certain IP prefix. However, ASPAs describe which networks are allowed to act as upstreams for any given AS.

There are two parts to deploying ASPA:
1. Creating an ASPA resource certificate for your network and publishing it, so that everyone knows who your upstreams are; and
2. Checking routes you receive from other networks, rejecting the ones that are invalid according to ASPA.
The first part is fairly straightforward, with RPKI software like Krill offering support out of the box. One simply has to set up delegated RPKI with the RIR that issued the ASN. I’ll give a quick overview of the process, but it’s not the main focus today.

Unfortunately, the second part is less than trivial, since ASPA is just a draft standard, not widely supported by router software. Only OpenBGPd, which I don’t use, has implemented experimental support. However, that doesn’t mean we can’t use ASPA today—we simply need to implement it ourselves. Thus, I embarked on this journey to implement ASPA filtering in the bird 2 filter language.
(Read more...)