During Black Friday last year, I got tempted by a super good offer of a dedicated server in Kansas City with the option of connecting it to the Kansas City Internet Exchange (KCIX). Here are the specs:

  • Intel Xeon E5-2620 v4 (8 cores, 16 threads)
  • 64 GB DDR4 RAM
  • 500 GB SSD
  • 1 Gbps unmetered bandwidth

It was such the perfect thing for AS200351 (if a bit overkill), so I just had to take it. I set it up during the winter holidays, having decided to install Proxmox to run a bunch of virtual machines, and all was well. Except for one thing—the disk.

You see, the server came with a fancy SAN, with exactly 500 GiB of storage mounted over iSCSI via 10 Gbps Ethernet, backed by a highly reliable ZFS volume (zvol). While this all sounds good on paper, in practice I am barely able to hit over 200 MB/s when doing I/O, even at large block sizes. Nothing I did seemed to help, so I asked the provider to switch it to physical drives.

Having configured Proxmox just the way I wanted it, I opted against reinstalling it from scratch, instead choosing to clone the disk. The provider suggested using Clonezilla, which should be able to do this sort of disk cloning very quickly. So we found an agreeable time, took the server down, and booted Clonezilla over PXE. All should be good, right?

As it turns out, this ended up being a super painful experience.

Editorial note: This story is based on my memory and incomplete console output. While the broad story is correct, the commands provided may not be correct.

Firmware issues

The server came with a dual-port Broadcom BCM57810 Ethernet adapter. The first port is used for connecting to the Internet, and the second for SAN. As it turns out, the BCM57810 requires firmware, which doesn’t come by default in the default Clonezilla image’s initrd. Therefore, booting it over PXE just immediately caused it to crash when the kernel took over the networking. So that was a problem.

No matter. The fancy IPMI on the server supports booting from virtual CD images. So we downloaded a full Clonezilla image and booted it from the virtual CD. After an excruciatingly long wait, we got into Clonezilla. Then, we simply configured the IPs on the SAN interface and brought it up…

# ip link set eth1 up
RTNETLINK answers: No such file or directory

Wait what? A look at dmesg reveals the problem: the lack of firmware for the BCM57810. It’s obvious what we need: the Debian package firmware-bnx2x. Just a sudo apt install firmware-bnx2x away. Oh wait, we first need to bring up the Internet, which goes through the other port on the BCM57810. This is not good.

As it turns out, Clonezilla came with two versions. One is Debian-based, but apparently doesn’t come with any of Debian’s firmware. This is the one we were using. The alternate image is Ubuntu-based, which supposedly comes with all the firmware. Naturally, we decided to boot the Ubuntu-based image as a virtual CD.

Given how slow it was last time, we opted to load the entirety of the Ubuntu-based Clonezilla into RAM, which might be faster (it was not). And then we configured the IPs on the SAN interface and brought it up…

# ip link set eth1 up
RTNETLINK answers: No such file or directory

What the !@#$? As it turns out, even the Ubuntu version of Clonezilla doesn’t come with the required Broadcom firmware. Now what?

Building a firmware ISO

As we’ve seen though, it’s possible to load virtual CDs over IPMI. This time, we’ve loaded Clonezilla into RAM, so we should be able to eject the Clonezilla CD and replace it with a firmware CD. Now, how do we get one? I could scour the Internet for one… or simply make one myself.

So I downloaded the Debian source package firmware-nonfree, which contains literally every non-free firmware Debian has. I unpacked the .orig.tar.xz and built the ISO:

$ wget http://deb.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree/firmware-nonfree_20230625.orig.tar.xz
...
2024-02-17 18:27:45 (206 MB/s) - ‘firmware-nonfree_20230625.orig.tar.xz’ saved [238122692/238122692]
$ tar xf firmware-nonfree_20230625.orig.tar.xz
$ mkisofs -J -r -o firmware.iso firmware-nonfree-20230625/
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using IWLWI000.UCO;1 for  /iwlwifi-so-a0-gf-a0-77.ucode (iwlwifi-Qu-b0-jf-b0-73.ucode)
Using IWLWI001.UCO;1 for  /iwlwifi-Qu-b0-jf-b0-73.ucode (iwlwifi-so-a0-gf-a0-72.ucode)
Using IWLWI002.UCO;1 for  /iwlwifi-so-a0-gf-a0-72.ucode (iwlwifi-so-a0-gf-a0-74.ucode)
...
Total translation table size: 0
Total rockridge attributes bytes: 302768
Total directory bytes: 790528
Path table size(bytes): 3790
Max brk space used 2d9000
411937 extents written (804 MB)

Notice how I passed -J to enable the Joliet extension and -r to enable the Rock Ridge extension for ISO 9660 for good measure. Without them, all the filenames would be in all caps. Remember that Linux is case-sensitive, so the firmware would fail to load with the wrong case.

Then, we mounted this ISO file, copied the relevant firmware files to /lib/firmware, and ran ip link set eth1 up again. Success! We were able to mount the iSCSI.

Disk size issues

The astute among you may have noticed something. The iSCSI setup had 500 GiB of storage, while the physical SSDs were 500 GB, which is actually around 465.7 GiB. And there’s one thing Clonezilla can’t do—that’s cloning to a smaller disk. Well, that’s not strictly true, since it has some support.

For example, the -k1 option, which will shrink all partitions proportionally. This sounds like what we want? Except it’s not.

For reference, this is how the 500 GiB SAN storage was partitioned:

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            2047   1007.0 KiB  EF02  
   2            2048         2099199   1024.0 MiB  EF00  
   3         2099200      1048575966   499.0 GiB   8E00  

Note that EF02 is the BIOS boot partition, because for some reason this system was installed on GPT but configured to boot with legacy BIOS instead of UEFI; EF00 is the EFI system partition; and 8E00 is the Linux LVM, which contains the Proxmox rootfs, the swap, and all the virtual machines in a thin pool.

This is how the -k1 allocated the storage:

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            1906   936.5 KiB   EF02  
   2            1907         2099058   1024.0 MiB  EF00  
   3         2099059       975175646   464.0 GiB   8E00 

You can already see the problems: all the partitions are now unaligned and there is no way the BIOS boot partition will work when chopped off. For some reason, it didn’t bother to shrink the EFI system partition.

Clonezilla then proceeded to copy the partitions. It, unsurprisingly, failed to copy the BIOS boot partition, failing with NOSPC, since 1007 KiB will never fit into 936.5 KiB. Despite this, it then proceeded to copy the EFI system partition, which succeeded since it wasn’t shrunk. Then, it proceeded to copy the LVM partition. You would think that for a tool that purports to understand filesystems and copy intelligently, it would understand LVM, but you would also be wrong. No, it just happily spent the next half hour copying the LVM partition as a raw image. Then, 464 GiB in, it failed with ENOSPC (i.e. no space left on device). Shocking.

Full manual

Well, with Clonezilla failing, what else is left to do? Fortunately, the Clonezilla image is a Ubuntu live CD, and that was worth something. I had all the Linux tools to do this job by hand.

Note: /dev/sda is the new SSD and /dev/sdc is the iSCSI storage.

Partitioning

First, I had to clean up the mess Clonezilla created. So I ran gdisk /dev/sda and deleted all the broken partitions that it created. Then I recreated the partitions:

  • For the BIOS boot partition, I decided that it wasn’t necessary. I’d just boot as UEFI since the hardware most certainly supports it.
  • For the EFI system partition, I just created the same on /dev/sda, with the exact same offsets.
  • For the LVM partition, I gave it all the remaining space.

The result was:

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         2099199   1024.0 MiB  EF00  EFI system partition
   2         2099200       975175639   464.0 GiB   8E00  Linux LVM

Recreating LVM

With the partition created, I had to recreate the LVM setup. The original volume group for Proxmox is pve, so I uncreatively created pve2:

# vgcreate pve2 /dev/sda2

Then, I recreated the partitions. Note that since I was 35 GiB short, I decided to shrink the root partition, which was 96 GiB, since I didn’t need that much space:

# lvcreate -n root -L 50G pve2
# lvcreate -n swap -L 8G pve2

Copying rootfs

Then, I had to copy the root filesystem. However, I couldn’t simply dd 96 GiB into 50 GiB, so now what?

Well, this sounded like a job Clonezilla could do, so I started Clonezilla, this time in the partition copying mode. To my horror, I discovered that Clonezilla could only copy “real” partitions and not LVs. I guess I had to do it the hard way then. So I shrunk the iSCSI rootfs and then dded it:

# resize2fs /dev/pve/root 50G
# dd if=/dev/pve/root of=/dev/pve2/root bs=16M count=3200 status=progress
...
53687091200 bytes (54 GB, 50 GiB) copied, 229.59 s, 223 MB/s

Copying swap

I didn’t bother with this, I just recreated the swap:

# mkswap /dev/pve2/swap

Thin pool

As for the thin pool, I thought I could recreate it like a normal partition and just dd it at first, but that wasn’t the case. I guess I needed a new thin pool:

# lvdisplay --units B pve/data
  --- Logical volume ---
  LV Name                data
  VG Name                pve
  ...
  LV Size                398802812928 B
  ...
# lvcreate -T -n data -L 398802812928B --poolmetadatasize 4G pve2

Note: I had no idea how to view the metadata size for the pool, so I eyeballed it from lsblk and just decided to use 4G. This doesn’t really matter for what we are about to do.

Now, I needed to recreate all the thin LVs for the VMs:

# for i in {100..109}; do
>     lvcreate -V "$(lvdisplay --units B pve/vm-"$i"-disk-0 | grep 'LV Size' | cut -d' ' -f 20)"B -T pve2/data -n vm-"$i"-disk-0
> done

With that done, I needed to dd all of them over, using conv=sparse to keep the volumes as thin as possible:

# for i in {100..109}; do
>     echo Cloning "$i"...
>     dd if=/dev/pve/vm-"$i"-disk-0 of=/dev/pve2/vm-"$i"-disk-0 bs=256k conv=sparse status=progress
> done
Cloning 100...
...
40960+0 records in
40960+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 22.8633 s, 470 MB/s

Note: For conv=sparse to work properly, you need to use a small block size. If there is any non-zero byte in a block, dd will write the entire block. 256k seems to work decently well here.

Fixing boot

At this point, everything in /etc, such as /etc/fstab, still referred to the volume group pve, so we couldn’t have it be called pve2 on the new SSD. This was easy to fix:

# vgrename pve pve-old
  Volume group "pve" successfully renamed to "pve-old"
# vgrename pve2 pve
  Volume group "pve2" successfully renamed to "pve"

Also, since I’ve recreated all the LVM volume groups and volumes, the grub configuration was now invalid since it hardcoded the paths to the old LVs. This meant I needed to chroot into the cloned system:

# mkdir /mnt/new
# mount /dev/pve/root /mnt/new/
# for dir in /proc /sys /sys/firmware/efi/efivars /dev /dev/pts; do
>     mount --bind "$dir" "/mnt/new$dir"
> done
# chroot /mnt/new /bin/bash
# mount /dev/sda1 /boot/efi/

This mounted all the required filesystems for the chroot to function, as well as the EFI system partition so that grub-install would work. Now we can fix grub:

# grub-install /dev/sda
Installing for x86_64-efi platform.
Installation finished. No error reported.
# update-grub
Generating grub configuration file ...
Found linux image: ...
Found initrd image: ...
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
done

And we are done! Now, all I needed to do is to exit the chroot and reboot.

Conclusion

After all the ordeal with Clonezilla in which nothing worked right, I was pleasantly surprised that the system just booted up as if nothing had happened. So the migration was a success.

Given that I wasted 3 hours with Clonezilla and the manual migration only took an hour or so, I must recommend against trying to use Clonezilla for anything more complicated than copying normal plain file systems in raw MBR and GPT partitions.

I leave these instructions here on my blog so you won’t be tempted to use Clonezilla or some other tool, only to have it fail spectacularly and waste your time. Use them well.