Category: Sysadmin

  • On btrfs and memory corruption

    As you may have heard, I have a home server, which hosts mirror.quantum5.ca and doubles as my home NAS. To ensure my data is protected, I am running btrfs, which has data checksums to ensure that bit rot can be detected, and I am using the raid1 mode to enable btrfs to recover from such events and restore the correct data. In this mode, btrfs ensures that there are two copies of every file, each on a distinct drive. In theory, if one of the copies is damaged due to bit rot, or even if an entire drive is lost, all of your data can still be recovered.

    For years, this setup has worked perfectly. I originally created this btrfs array on my old Atomic Pi, with drives inside a USB HDD dock, and the same array is still running on my current Ryzen home server—five years later—even after a bunch of drive changes and capacity upgrades.

    In the past week, however, my NAS has experienced some terrible data corruption issues. Most annoyingly, it damaged a backup that I needed to restore, forcing me to perform some horrific sorcery to recover most of the data. After a whole day of troubleshooting, I was eventually able to track the problem down to a bad stick of RAM. Removing it enabled my NAS to function again, albeit with less available RAM than before.

    I will now explain my setup and detail the entire event for posterity, including my thoughts on how btrfs fared against such memory corruption, how I managed to mostly recover the broken backup, and what might be done to prevent this in the future.

    (Read more...)
  • Installing Debian (and Proxmox) by hand from a rescue environment

    Normally, installing Debian is a simple process: you boot from the installer CD image and follow the menu options in debian-installer. Simple, right? Or even easier, just use the Debian image provided by your server vendor, since Debian is quite popular and an image is bound to be available. Given the simplicity of this, you might have idly wondered: what’s actually going on behind debian-installer’s pretty menus? Well, you are about to find out.

    You see, recently, I got this cheap headless dedicated server without IPMI1—really, just an Intel N100 mini PC. To cut costs, there was no video feed, as that would require separate hardware to receive and stream the screen. Instead, there’s only the ability to power cycle and boot from PXE, which is used to perform a variety of tasks, such as booting rescue CDs or performing automated installation of operating systems. This shouldn’t be a problem for my use case, since there is a Proxmox 8 image right there, and I just set it to install automatically.

    Of course that didn’t work, because I wouldn’t be writing about it if it did! As it turns out, the Proxmox 8 image (and also the Debian 12 image) didn’t have the firmware for the Realtek NICs on the mini PC, which prevented them from working. I thought that I just needed to install the firmware package, but when I booted into the included Finnix rescue system, it appeared that Debian wasn’t installed at all! Clearly, the PXE installer failed to start due to the missing firmware.

    What now? Well, I’ve already done some pretty sketchy Debian installs in the past, so I thought I might as well just go all out and install a full Debian system through the rescue system. Unlike last time though, I’ll do a complete clean install, instead of keeping the partition scheme.

    (Read more...)
  • Cloning Proxmox with LVM Thin Pools

    During Black Friday last year, I got tempted by a super good offer of a dedicated server in Kansas City with the option of connecting it to the Kansas City Internet Exchange (KCIX). Here are the specs:

    • Intel Xeon E5-2620 v4 (8 cores, 16 threads)
    • 64 GB DDR4 RAM
    • 500 GB SSD
    • 1 Gbps unmetered bandwidth

    It was such the perfect thing for AS200351 (if a bit overkill), so I just had to take it. I set it up during the winter holidays, having decided to install Proxmox to run a bunch of virtual machines, and all was well. Except for one thing—the disk.

    You see, the server came with a fancy SAN, with exactly 500 GiB of storage mounted over iSCSI via 10 Gbps Ethernet, backed by a highly reliable ZFS volume (zvol). While this all sounds good on paper, in practice I am barely able to hit over 200 MB/s when doing I/O, even at large block sizes. Nothing I did seemed to help, so I asked the provider to switch it to physical drives.

    Having configured Proxmox just the way I wanted it, I opted against reinstalling it from scratch, instead choosing to clone the disk. The provider suggested using Clonezilla, which should be able to do this sort of disk cloning very quickly. So we found an agreeable time, took the server down, and booted Clonezilla over PXE. All should be good, right?

    As it turns out, this ended up being a super painful experience.

    Editorial note: This story is based on my memory and incomplete console output. While the broad story is correct, the commands provided may not be correct.

    (Read more...)
  • Introducing my own mirroring service: mirror.quantum5.ca

    In January, I upgraded my home Internet connection to 3 Gbps symmetric, because, strangely enough, it was cheaper than the package I already had at the time (1500 Mbps down, 940 Mbps up). This was connected to the second port on my ConnectX-3, allowing my home server to achieve the full speed where 2.5 Gbps Ethernet would have failed. Unfortunately, nothing I was doing could have harnessed the full speed of this Internet connection, or anywhere near it, so I started thinking…

    In February, I realized that I could run a mirroring service for open-source software to serve the community at basically no additional cost—I am already paying for this 3 Gbps Internet connection and I have some spare disk space on my SSD. So I decided to do exactly that.

    Today, I am happy to announce that this mirror, mirror.quantum5.ca, has been tested for a few months and is fully ready for production. If you find the service helpful, please feel free to support me via GitHub Sponsors, Ko-fi, Liberapay, or directly with credit card or bank through Stripe (CAD), though this is of course strictly optional.

    If you are interested in how it’s all set up, please read on:

    (Read more...)
  • Microsecond Accurate Time Synchronization on LAN with PTP

    Last time, I built a stratum 1 NTP server with a PPS signal from a GPS receiver, synchronizing my server’s clock to within 10 microseconds of UTC. However, NTP was designed to synchronize clocks within a few tens of milliseconds over the Internet, and I’d be lucky to achieve millisecond accuracy on a LAN. I mentioned that PTP was the alternative that could achieve accuracy in the sub-microsecond range. Well, this time I’ll be setting up PTP between my server and my PC with the hardware timestamping on the ConnectX-3s.

    If you are following along at home, don’t despair if your hardware can’t do timestamping or PTP. I will also attempt to set up PTP with software timestamping later for my other devices.

    Naturally, I first turned to the gpsd documentation, since that was a decent reference for setting up NTP with the PPS signal. Well, this is what it says for PTP with hardware timestamping:

    Sadly, theory and practice diverge here. I have never succeeded in making hardware timestamping work. I have successfully trashed my host system clock. Tread carefully. If you make progress please pass on some clue.

    That didn’t sound encouraging at all. “Oh well, I guess I am on my own here,” I thought to myself. “How bad could digging through a few man pages and random online documentation be? Worst case, there is the source code, right?”

    (Read more...)
  • DIY a Stratum 1 NTP Server with a Serial Port

    These days, it seems like everyone is posting about turning Raspberry Pis into a stratum 1 NTP server by hooking up a cheap GPS module, most often the GT-U7 u-blox 7 clone with a PPS (pulse-per-second) signal output, whose rising edge indicates exactly the start of a second.

    While this seems like a cool idea, it suffers from one flaw—while the Raspberry Pi itself almost certainly has very accurate time, getting accurate time to the rest of the network would be problematic. This is because the Ethernet adapter on Raspberry Pis before the Pi 4 was hooked up via USB, and the polling nature of USB introduces jitter, preventing the accurate signal from reaching the rest of the network. Unfortunately, I only have a Raspberry Pi 3 model B in my possession, which suffers from the problem.

    Now, I could have gotten a Raspberry Pi 4, but those aren’t priced sanely at the moment and it would be just an exercise in copying. Instead, I looked at the various alternatives. The traditional way of doing this kind of thing involves hooking up a GPS receiver into a serial port, which generates an interrupt. If the PPS signal is delivered to the DCD (data carrier detect) signal (as described in RFC 2783), then the in-tree Linux driver pps_ldisc is able to do the timestamping in kernel mode for the highest possible accuracy.

    I found out that my server’s X570 motherboard came with a serial port header (labelled COM). This meant that I could buy some fancy GPS receiver with a serial port and hook it up. Unfortunately, those aren’t priced sanely either, so I decided to build my own with the GT-U7 module and a driver module for RS-232 (the common serial port standard).

    This was late last year. I ordered the components on AliExpress and they all arrived in January, so I finally started this project.

    (Read more...)
  • How to make a better ARM virtual machine (armhf/aarch64) with UEFI

    Over a year ago, I wrote about making ARM virtual machines. Times have changed a lot since then — the release of Apple M1 has dramatically changed the perception of ARM. No longer is ARM a niche platform for low-power gadgets like phones or tablets, but a viable desktop computing platform. Similarly, in the server space, Amazon Graviton and Ampere Altra have gained traction. My old blog post presented only a quick hack to get the ARM virtual machine to boot — by copying the kernel image. This leaves a lot to be desired. Somehow, despite this, it quickly became one of the popular posts on my blog. Today, we shall rectify that flaw and present a way to boot the latest kernel installed in the virtual machine using the Unified Extensible Firmware Interface (UEFI).

    Again, like before, this tutorial will use Debian as an example, but the same methodology should work for other distributions. If you are looking for a simple chroot, you should instead follow the original post.

    (Read more...)
  • Website Changes: New Domains, New Infrastructure

    I haven’t posted in a while now, but nevertheless, I have been working on this site. Perhaps I have been too bogged down in the minor details, but it was a relatively interesting experience nonetheless.

    You may have already noticed some differences:

    1. The most obvious change is probably the domain. Instead of using my old quantum2.xyz domain, I switched to the shiny new quantum5.ca.
    2. The second most obvious change is the short URL on every post.
    3. The last change is invisible: the backend is now distributed in three locations around the world. This is so that even if you are in faraway Australia, you can still load this website instantly, even if it’s not in the Cloudflare cache.

    Why did I make these changes? Well, this was because I became thoroughly nerd-sniped by some ideas…

    (Read more...)
  • Serving Static Files from the Cloudflare Edge

    Three years ago, I wrote about a way to purge only changed static files when deploying a static site. It is very useful and I still use it for this website to this day. Its main advantage is that it only needs to be run on deploys. However, its main disadvantage is that it must be run on every deployment. Sometimes, this is not feasible.

    For example, I run a bunch of APT repositories on apt.quantum2.xyz. These repositories are constantly being updated by Jenkins and me personally, and using purge-static would require adding a purge-static command to every script that updates the repositories, which is clearly infeasible. Wouldn’t it be nice to just have a background daemon that purged the CDN cache automatically?

    As it turns out, I already wrote it back in 2015 before starting this blog. It was massively out-of-date (until very recently) and required you to use your all-powerful Cloudflare API key, providing a massive attack surface. However, I recently updated it, and hopefully, it will prove useful for you.

    Here’s a quick introduction to using it:

    (Read more...)
  • On Backporting Kernel Modules with DKMS

    Recently, I bought a USB 3.0 2.5 Gbps Ethernet dongle for my Atomic Pi router. This dongle requires a version of the r8152 kernel driver with support for the RTL8156 chipset, which is only added in Linux 5.13. Now, I am running the Debian stable kernel and have no wish to backport the latest 5.13 kernel simply for that one driver. So of course, I came up with an approach to backport a driver from a newer version.

    In this blog post, I will walk you through the process of backporting a single kernel module, using the r8152 kernel driver as an example.

    (Read more...)
  • On Building Custom Debian Kernels (and Backporting)

    It’s not often in 2021 that you find yourself building new kernels, but nevertheless, the occasion comes that you need to either enable a flag—or even worse—patch the kernel. This happened recently: on DMOJ, we recently run into a kernel issue that misreports the memory usage for processes as an “optimization.” For more information about this issue, see the excellent blog post by my friend Tudor. As a result of this, I was forced to build a patched kernel to work around this issue. Since the process was far from easy, I decided to write this blog post to help others in the future.

    Building a kernel is not too difficult, actually. The real challenge comes in the form of building the kernel in a maintainable way, which basically means that we should at least build the kernel into an easily installable package. For example, on DMOJ, we manage multiple judge virtual machines, and they all need to receive the same kernel. Furthermore, we want our custom build of the kernel to be distinct from the standard kernels that the operating system offers, as we don’t want a system upgrade to undo the patch that we applied.

    In this article, we will explore the process I used to build a custom kernel package on Debian for the scenario described above. This will involve both patching the kernel and subsequently changing a configuration option. Specifically, we will be applying this patch. These instructions should work with minor adaptations for other Debian-based distributions.

    (Read more...)
  • Sharing Unix sockets between multiple users

    I am sure that if you managed a Linux system for a while, you probably have dealt with Unix sockets—special files that act like sockets. You probably also run into permission issues when dealing with these socket files.

    In this post, I’ll describe some methods of dealing with these permission issues, and a situation in which each might apply.

    (Read more...)
  • How to make an ARM virtual machine (armhf/aarch64)

    Update (2022-03-19): I wrote about a new way to create an ARM virtual machine that’s simpler and handles kernel updates properly. I highly suggest you follow those instructions instead, unless you are building a chroot.

    I noticed that very few people seem to know how to create a full ARM virtual machine, so I decided to create a quick guide.

    This tutorial will use aarch64 and Debian as examples, but the same methodology should work for 32-bit ARM and other distributions. The instructions can also be adapted to create a simple chroot.

    (Read more...)
  • Run a Linux Program on a Different Network Interface

    Sometimes, you have multiple Internet connections, whether physical or virtual, and you want a few programs to access the Internet through one connection without making it the default gateway. For example, if you want a program to connect to the Internet through a VPN, but without forcing the entire system’s traffic through the VPN as well.

    The traditional way to do this is with packet marking with iptables and an ip rule to force marked packets through a different routing table to send the traffic to the correct destination. However, as the source IP was selected before routing, an SNAT rule in iptables is required to change the source IP. This is ugly and clearly a hack.

    However, since around 2013, Linux has introduced networking namespaces, which can be managed via ip netns as part of the iproute2 package. We can easily exploit this feature to achieve the desired goal with minimal fuss.

    (Read more...)
  • Install Debian on a VPS Provider without Debian Images

    Recently, I came across a VPS provider that does not provide Debian images. This is rather annoying since I much prefer a fresh minimal install of Debian over a “minimal” Ubuntu image that still has a lot of stuff that I don’t want.

    Naturally, I decided to install Debian anyways, and came up with an approach to do so.

    If you are feeling particularly bold, you can try running my pre-made scripts that would convert a fresh Ubuntu install to a fresh Debian install.

    To use the scripts, you should download either the UEFI version or the BIOS version, depending on whether your current OS is using BIOS or UEFI.

    At the top of the script, change the variables to match your system configuration. The most important one being BOOT_DRIVE so that grub would be installed on the correct device.

    The scripts will prompt you for a root password and SSH keys. Once the script finishes, the system will be rebooted and you should be able to SSH into the now-Debian machine as root via the SSH keys.

    If you don’t feel like using the script, I am also providing manual instructions. This also explains how the scripts work.

    (Read more...)
  • Simple NDP Proxy to Route Your IPv6 VPN Addresses

    If you tried setting up an IPv6-capable VPN on a VPS provider that gave you an IP range to play with, perhaps a /64 or larger, you would want to assign some of the IPv6 addresses you have to your clients. In this post, we suppose that you have the range 2001:db8::/64.

    This should be a simple process: enable the sysctl option net.ipv6.conf.all.forwarding to 1 (or whatever the equivalent is on your system), use DHCPv6 or SLAAC to assign the addresses to the clients, and then your client should have working IPv6.

    The Problem

    Unfortunately, this is not so simple. Most VPS providers are not actually routing the entire subnet 2001:db8::/64 to you. Rather, they just connect a number of VPSes onto the same virtual Ethernet network and rely on the Neighbour Discovery Protocol (NDP) to find the router.

    (Read more...)
  • On Invalidation of Aggressively Cached Static Sites

    I have always wanted to make this website load fast everywhere in the world, despite the server being in Montréal, Canada, without investing heavily. It shouldn’t be hard: after all, it is just a bunch of static files, generated with Jekyll.

    Cloudflare brings a free CDN. You can set a page rule to aggressively cache your website on their CDN edge nodes, allowing your site to load as if it is hosted locally, even if you are half a world away.

    There is just a little problem: how do you efficiently purge the cache when you update your site? It is quite easy to purge the entire cache on Cloudflare, but that is rather inefficient: most of your assets probably did not change, and now they will all have to be fetched again.

    Today I decided to tackle this problem by creating purge-static, a tool designed to purge your CDN cache. It can purge your Cloudflare cache for you. You can get started by running pip install purge-static.

    (Read more...)
  • The fast way to install nginx.org debs on Debian

    I personally prefer the nginx.org packages for nginx over the ones that comes with Debian. They are usually newer and have a more sane amount of dependencies. I also prefer the conf.d system over the sites-available and sites-enabled system.

    The main challenge in installing these packages on Debian is the trouble you have to go through to get the PGP keys and sources.list set up. nginx.org does not present a good setup script. This has become a repetitive and annoying experience, so I present a series of commands to set it up quickly.

    For stable:

    curl https://nginx.org/keys/nginx_signing.key | sudo apt-key add -
    (codename="$(dpkg --status tzdata | grep Provides | cut -f2 -d'-')"; echo; for deb in deb deb-src; do echo $deb http://nginx.org/packages/debian/ "$codename" nginx; done) | sudo tee -a /etc/apt/sources.list
    sudo apt update && sudo apt install nginx
    

    For mainline:

    curl https://nginx.org/keys/nginx_signing.key | sudo apt-key add -
    (codename="$(dpkg --status tzdata | grep Provides | cut -f2 -d'-')"; echo; for deb in deb deb-src; do echo $deb http://nginx.org/packages/mainline/debian/ "$codename" nginx; done) | sudo tee -a /etc/apt/sources.list
    sudo apt update && sudo apt install nginx
    
    (Read more...)
  • Installing Debian ARM64 on Raspberry Pi 3 with WiFi

    Most users are probably using Raspbian on their Raspberry Pi 3. However, Raspbian is designed for all Raspberry Pi devices, back to the original Raspberry Pi, which is ARMv6 with an FPU. This does not take advantage of the 64-bit support on the ARMv8 CPU on the Raspberry Pi 3.

    Debian has offered ARM64 support for a while, and being the base distribution for Raspbian, is quite similar. Conveniently, there is a pre-built Debian image for Raspberry Pi 3. You can download it and copy it to a SD card, and it should work out of the box.

    On Linux, the simple dd command showed on the Debian Wiki works. On other platforms, notably Windows, Etcher is reputed to work well and has an easy interface.

    The one flaw with this image is that the WiFi does not work.

    Update: The 20180108 image now works with WiFi out of the box. The following instructions are no longer necessary.

    (Read more...)