Cloning Proxmox with LVM Thin Pools
During Black Friday last year, I got tempted by a super good offer of a dedicated server in Kansas City with the option of connecting it to the Kansas City Internet Exchange (KCIX). Here are the specs:
- Intel Xeon E5-2620 v4 (8 cores, 16 threads)
- 64 GB DDR4 RAM
- 500 GB SSD
- 1 Gbps unmetered bandwidth
It was such the perfect thing for AS200351 (if a bit overkill), so I just had to take it. I set it up during the winter holidays, having decided to install Proxmox to run a bunch of virtual machines, and all was well. Except for one thing—the disk.
You see, the server came with a fancy SAN, with exactly 500 GiB of storage mounted over iSCSI via 10 Gbps Ethernet, backed by a highly reliable ZFS volume (zvol). While this all sounds good on paper, in practice I am barely able to hit over 200 MB/s when doing I/O, even at large block sizes. Nothing I did seemed to help, so I asked the provider to switch it to physical drives.
Having configured Proxmox just the way I wanted it, I opted against reinstalling it from scratch, instead choosing to clone the disk. The provider suggested using Clonezilla, which should be able to do this sort of disk cloning very quickly. So we found an agreeable time, took the server down, and booted Clonezilla over PXE. All should be good, right?
As it turns out, this ended up being a super painful experience.
Editorial note: This story is based on my memory and incomplete console output. While the broad story is correct, the commands provided may not be correct.
Firmware issues
The server came with a dual-port Broadcom BCM57810 Ethernet adapter. The first port is used for connecting to the Internet, and the second for SAN. As it turns out, the BCM57810 requires firmware, which doesn’t come by default in the default Clonezilla image’s initrd. Therefore, booting it over PXE just immediately caused it to crash when the kernel took over the networking. So that was a problem.
No matter. The fancy IPMI on the server supports booting from virtual CD images. So we downloaded a full Clonezilla image and booted it from the virtual CD. After an excruciatingly long wait, we got into Clonezilla. Then, we simply configured the IPs on the SAN interface and brought it up…
# ip link set eth1 up
RTNETLINK answers: No such file or directory
Wait what? A look at dmesg
reveals the problem: the lack of firmware for the
BCM57810. It’s obvious what we need: the Debian package firmware-bnx2x
. Just a
sudo apt install firmware-bnx2x
away. Oh wait, we first need to bring up the
Internet, which goes through the other port on the BCM57810. This is not good.
As it turns out, Clonezilla came with two versions. One is Debian-based, but apparently doesn’t come with any of Debian’s firmware. This is the one we were using. The alternate image is Ubuntu-based, which supposedly comes with all the firmware. Naturally, we decided to boot the Ubuntu-based image as a virtual CD.
Given how slow it was last time, we opted to load the entirety of the Ubuntu-based Clonezilla into RAM, which might be faster (it was not). And then we configured the IPs on the SAN interface and brought it up…
# ip link set eth1 up
RTNETLINK answers: No such file or directory
What the !@#$? As it turns out, even the Ubuntu version of Clonezilla doesn’t come with the required Broadcom firmware. Now what?
Building a firmware ISO
As we’ve seen though, it’s possible to load virtual CDs over IPMI. This time, we’ve loaded Clonezilla into RAM, so we should be able to eject the Clonezilla CD and replace it with a firmware CD. Now, how do we get one? I could scour the Internet for one… or simply make one myself.
So I downloaded the Debian source package firmware-nonfree
,
which contains literally every non-free firmware Debian has. I unpacked the
.orig.tar.xz
and built the ISO:
$ wget http://deb.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree/firmware-nonfree_20230625.orig.tar.xz
...
2024-02-17 18:27:45 (206 MB/s) - ‘firmware-nonfree_20230625.orig.tar.xz’ saved [238122692/238122692]
$ tar xf firmware-nonfree_20230625.orig.tar.xz
$ mkisofs -J -r -o firmware.iso firmware-nonfree-20230625/
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using IWLWI000.UCO;1 for /iwlwifi-so-a0-gf-a0-77.ucode (iwlwifi-Qu-b0-jf-b0-73.ucode)
Using IWLWI001.UCO;1 for /iwlwifi-Qu-b0-jf-b0-73.ucode (iwlwifi-so-a0-gf-a0-72.ucode)
Using IWLWI002.UCO;1 for /iwlwifi-so-a0-gf-a0-72.ucode (iwlwifi-so-a0-gf-a0-74.ucode)
...
Total translation table size: 0
Total rockridge attributes bytes: 302768
Total directory bytes: 790528
Path table size(bytes): 3790
Max brk space used 2d9000
411937 extents written (804 MB)
Notice how I passed -J
to enable the Joliet extension and -r
to enable the
Rock Ridge extension for ISO 9660 for good measure. Without them, all the
filenames would be in all caps. Remember that Linux is case-sensitive, so the
firmware would fail to load with the wrong case.
Then, we mounted this ISO file, copied the relevant firmware files to
/lib/firmware
, and ran ip link set eth1 up
again. Success! We were able to
mount the iSCSI.
Disk size issues
The astute among you may have noticed something. The iSCSI setup had 500 GiB of storage, while the physical SSDs were 500 GB, which is actually around 465.7 GiB. And there’s one thing Clonezilla can’t do—that’s cloning to a smaller disk. Well, that’s not strictly true, since it has some support.
For example, the -k1
option, which will shrink all partitions proportionally.
This sounds like what we want? Except it’s not.
For reference, this is how the 500 GiB SAN storage was partitioned:
Number Start (sector) End (sector) Size Code Name
1 34 2047 1007.0 KiB EF02
2 2048 2099199 1024.0 MiB EF00
3 2099200 1048575966 499.0 GiB 8E00
Note that EF02
is the BIOS boot partition, because for some reason this system
was installed on GPT but configured to boot with legacy BIOS instead of UEFI;
EF00
is the EFI system partition; and 8E00
is the Linux LVM, which contains
the Proxmox rootfs, the swap, and all the virtual machines in a thin pool.
This is how the -k1
allocated the storage:
Number Start (sector) End (sector) Size Code Name
1 34 1906 936.5 KiB EF02
2 1907 2099058 1024.0 MiB EF00
3 2099059 975175646 464.0 GiB 8E00
You can already see the problems: all the partitions are now unaligned and there is no way the BIOS boot partition will work when chopped off. For some reason, it didn’t bother to shrink the EFI system partition.
Clonezilla then proceeded to copy the partitions. It, unsurprisingly, failed to
copy the BIOS boot partition, failing with NOSPC
, since 1007 KiB will never
fit into 936.5 KiB. Despite this, it then proceeded to copy the EFI system
partition, which succeeded since it wasn’t shrunk. Then, it proceeded to copy
the LVM partition. You would think that for a tool that purports to understand
filesystems and copy intelligently, it would understand LVM, but you would also
be wrong. No, it just happily spent the next half hour copying the LVM partition
as a raw image. Then, 464 GiB in, it failed with ENOSPC
(i.e. no space left on
device). Shocking.
Full manual
Well, with Clonezilla failing, what else is left to do? Fortunately, the Clonezilla image is a Ubuntu live CD, and that was worth something. I had all the Linux tools to do this job by hand.
Note: /dev/sda
is the new SSD and /dev/sdc
is the iSCSI storage.
Partitioning
First, I had to clean up the mess Clonezilla created. So I ran gdisk /dev/sda
and deleted all the broken partitions that it created. Then I recreated the
partitions:
- For the BIOS boot partition, I decided that it wasn’t necessary. I’d just boot as UEFI since the hardware most certainly supports it.
- For the EFI system partition, I just created the same on
/dev/sda
, with the exact same offsets. - For the LVM partition, I gave it all the remaining space.
The result was:
Number Start (sector) End (sector) Size Code Name
1 2048 2099199 1024.0 MiB EF00 EFI system partition
2 2099200 975175639 464.0 GiB 8E00 Linux LVM
Recreating LVM
With the partition created, I had to recreate the LVM setup. The original volume
group for Proxmox is pve
, so I uncreatively created pve2
:
# vgcreate pve2 /dev/sda2
Then, I recreated the partitions. Note that since I was 35 GiB short, I decided to shrink the root partition, which was 96 GiB, since I didn’t need that much space:
# lvcreate -n root -L 50G pve2
# lvcreate -n swap -L 8G pve2
Copying rootfs
Then, I had to copy the root filesystem. However, I couldn’t simply dd
96 GiB
into 50 GiB, so now what?
Well, this sounded like a job Clonezilla could do, so I started Clonezilla,
this time in the partition copying mode. To my horror, I discovered that
Clonezilla could only copy “real” partitions and not LVs. I guess I had to do it
the hard way then. So I shrunk the iSCSI rootfs and then dd
ed it:
# resize2fs /dev/pve/root 50G
# dd if=/dev/pve/root of=/dev/pve2/root bs=16M count=3200 status=progress
...
53687091200 bytes (54 GB, 50 GiB) copied, 229.59 s, 223 MB/s
Copying swap
I didn’t bother with this, I just recreated the swap:
# mkswap /dev/pve2/swap
Thin pool
As for the thin pool, I thought I could recreate it like a normal partition and
just dd
it at first, but that wasn’t the case. I guess I needed a new thin
pool:
# lvdisplay --units B pve/data
--- Logical volume ---
LV Name data
VG Name pve
...
LV Size 398802812928 B
...
# lvcreate -T -n data -L 398802812928B --poolmetadatasize 4G pve2
Note: I had no idea how to view the metadata size for the pool, so I
eyeballed it from lsblk
and just decided to use 4G
. This doesn’t really
matter for what we are about to do.
Now, I needed to recreate all the thin LVs for the VMs:
# for i in {100..109}; do
> lvcreate -V "$(lvdisplay --units B pve/vm-"$i"-disk-0 | grep 'LV Size' | cut -d' ' -f 20)"B -T pve2/data -n vm-"$i"-disk-0
> done
With that done, I needed to dd
all of them over, using conv=sparse
to keep
the volumes as thin as possible:
# for i in {100..109}; do
> echo Cloning "$i"...
> dd if=/dev/pve/vm-"$i"-disk-0 of=/dev/pve2/vm-"$i"-disk-0 bs=256k conv=sparse status=progress
> done
Cloning 100...
...
40960+0 records in
40960+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 22.8633 s, 470 MB/s
Note: For conv=sparse
to work properly, you need to use a small block
size. If there is any non-zero byte in a block, dd
will write the entire
block. 256k
seems to work decently well here.
Fixing boot
At this point, everything in /etc
, such as /etc/fstab
, still referred to the
volume group pve
, so we couldn’t have it be called pve2
on the new SSD. This
was easy to fix:
# vgrename pve pve-old
Volume group "pve" successfully renamed to "pve-old"
# vgrename pve2 pve
Volume group "pve2" successfully renamed to "pve"
Also, since I’ve recreated all the LVM volume groups and volumes, the grub
configuration was now invalid since it hardcoded the paths to the old LVs. This
meant I needed to chroot
into the cloned system:
# mkdir /mnt/new
# mount /dev/pve/root /mnt/new/
# for dir in /proc /sys /sys/firmware/efi/efivars /dev /dev/pts; do
> mount --bind "$dir" "/mnt/new$dir"
> done
# chroot /mnt/new /bin/bash
# mount /dev/sda1 /boot/efi/
This mounted all the required filesystems for the chroot to function, as well as
the EFI system partition so that grub-install
would work. Now we can fix grub:
# grub-install /dev/sda
Installing for x86_64-efi platform.
Installation finished. No error reported.
# update-grub
Generating grub configuration file ...
Found linux image: ...
Found initrd image: ...
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
done
And we are done! Now, all I needed to do is to exit the chroot
and reboot.
Conclusion
After all the ordeal with Clonezilla in which nothing worked right, I was pleasantly surprised that the system just booted up as if nothing had happened. So the migration was a success.
Given that I wasted 3 hours with Clonezilla and the manual migration only took an hour or so, I must recommend against trying to use Clonezilla for anything more complicated than copying normal plain file systems in raw MBR and GPT partitions.
I leave these instructions here on my blog so you won’t be tempted to use Clonezilla or some other tool, only to have it fail spectacularly and waste your time. Use them well.