Haack's Wiki

This is an old revision of the document!

vmserver
Jonathan Haack
Haack's Networking
netcmnd@jonathanhaack.com

This tutorial documents the steps I took to create an entry level enterprise VM server using a minimal Debian install on a SuperMicro host with 96GB of RAM and 2X 8-core dual-thread CPUs (32 total threads). I estimate that this system can handle up to 28 VMs w/ 1 CPU core and 3GB RAM and/or 4 larger VMs with 8 CPU cores and 16GB RAM. My intent in using this system is to offer Big Blue Button instances to a few small schools and/or educational organizations. Here is how I set the machine up:

1 SSD 120GB boot volume for Debian 11 Bullseye

For the boot volume, I wanted some physical protection over contents I might store their, so I made the / directory only 32GB and saved the rest for a LUKS crypt for my home directory. If you are unclear on how to use pam_mount and LUKS together to unlock your home directory crypt, check Jason's tutorial here. My server is at a Data Center, so this way I can reboot the remote system easily and still have some content protection if the physical device is compromised. I doubt this will happen, but why not - it just works. I also had 7 leftover bays that were usable, so I devoted another 1TB drive as another crypt just for kicks, leaving 6 drives for a zfs pool. It is my preference to have LUKS underneath zfs, which I did as follows:

cryptsetup luksFormat /dev/sda1
cryptsetup luksOpen /dev/sda1 sda8fce

In order to be straight about which devices were used and how, I appended the last four characters of the block ID to the crypt label, as you see above. To find the corresponding block ID, just ls -lah /dev/disk/by-uuid and/or run blkid. I repeated this for each of the 6 drives (sdb, sdc, etc,) I intended to mirror and pool with zfs. Once these crypts were all created and opened (not mounted, just opened), I then created the zfs pool as follows:

sudo apt install zfs-utils
zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066

In order to make sure the pool was created correctly, I ran df -h to check the mountpoint and size of the pool. I paired two 2TB drives, two 1TB drives, and another pairing of two 1TB drives for a total of around 4TB actual, since the remaining 4TB are lost to parity on the zfs mirror / RAID1 equivalent. I do not want these drives to unlock automatically with a key file for two reasons: 1) security and 2) if a drive breaks, the whole system could potentially not boot. I also don't want to use the zfs options of legacy and/or none because they change the functionality of zfs in ways I don't like. However, this means that the automatic pool creation zfs conducts on boot via zfs-import-cache.service will fail. After much searching of how to stop this failure while preserving the otherwise automatic zfs features, I determined that there was no reliable way to do this. The closest way was to set the leasefile=none once the pool was created, and this did successfully stop the pool and service from starting, but when I re-enabled the leasefile option to its native directory, the zfs pool never mounted again thereafter. For this reason, I decided that waiting 10 seconds and letting zfs-import-cache.service fail at boot was entirely acceptable to me. So, here is how I set up everything on the server upon reboot:

sudo -i
screen
su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault]
ctrl-a-d [detaches from screen]

After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this:

cryptsetup luksOpen /dev/disk/by-uuid/2702e690-...-0c4267a6fc11 sdafc11
cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-...-cdc5dedb9322 sdb9322
cryptsetup luksOpen /dev/disk/by-uuid/d353e727-...-e4d66a9b8a33 sdc8a33
cryptsetup luksOpen /dev/disk/by-uuid/352660ca-...-5a8beae15b44 sde5b44
cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-...-f46ce1cf8055 sdf8055
cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-...-13bc38656466 sdh6466

Obviously not all of my drives have … in the middle of the block ID - this is just me obfuscating because I am paranoid. Also, even though I used the short-names like sda, sdb, etc., for convenience when setting up the LUKS volumes, I deliberately did not do so here because upon reboot, those can sometimes change. This is why I did two things - once, I originally included the last four digits of the block ID on the LUKS device name, but I also made the script open the corresponding hard drive using the block ID instead of the short name. This ensures that, if for some odd reason, that Debian decides to rename sda to sdb or whatever, that the LUKS volumes will open properly. Also, do note that since the zpool was created with the LUKS names as well, that there is no way the pool could start by accident and/or incorrectly with the short-names (which some users report on SE but makes no sense to me). At any rate, I simply copy/paste the password 6 times, and then re-create the pool as follows once the volumes are opened:

zpool import vms

Altogether, rebooting this server takes about 4 minutes of wait time, and then about 2 minutes to mount my home directory, 1TB vault, and my recreate my zpool. I know this sounds a bit old school, but it ensures that I don't have to travel 63 miles to my data center to figure out what happened. If a drive fails, I will know and still be able to boot. And … becuase I have offsite backups of the VMs hardrive.img files, I can easily destroy the zpool and use a smaller pool with only 4 drives (to keep uptime and production going), while I order a new pair of drives and set up time to visit the center and replace them. Also, if/when the server does not start - remember, that could be a hard drive, or it could be something else - using this workflow, you always know what happened (unless the boot volume crashes). Of course, the center has remote KVM and so on, but why increase the chances of failure and rely on such kludgy access … this is cleaner and less chances for failure. I do realize, however, that this won't scale to 100X this size, or even 50X, but again - this is entry level enterprise of advanced self-hosters and/or robust residential use-cases that just exceed normal residential use parameters.

Phew … that was a mouthful … but if I don't write it down, I won't remember what I did lmao. Criticism welcome.

— oemb1905 2021/10/29 18:33

Haack's Wiki

User Tools

Site Tools

Page Tools