Haack's Wiki

This is an old revision of the document!

vmserver
Jonathan Haack
Haack's Networking
netcmnd@jonathanhaack.com

[Spoiler Alert: I changed the initial setup! Read below to skip the pain!]

Context: Why? I am a math/CS teacher for HS/college levels. I use free software to help my educational needs, and I would like to use self-hosted Big Blue Button to assist me when using the Inquirer-Presenter-Scribe thinking routine. I also figured it could replace my current business website and host 3/4 instances of NMBBB for small schools, etc. And thanks to Kilo Sierra for the kind hardware donation! [Spoiler Alert: I changed the initial setup! Read below to skip the pain!] – First Setup – I have a dual 8-core Xeon SuperMicro server (circa 08-12), with 8 HD bays in use, 96GBRAM, SAS to SATA SCU for the hard drives, and 1 HDD being a 120GBSSD boot volume running Debian Bullseye. I calculated that for a 500W PSU, that the RAM would be around 360W at capacity but rarely hit that or even close, that the HDs would often (especially on boot) hit up to 21.3W per drive, or around 150W excluding the boot SSD volume. The motherboard would be 100W, putting me at 610W. This is over, however, I don't expect all the RAM and all the HDs to reach peak, even on boot. After testing and confirming it worked (many times lol), I went on to install the physical host OS (Debian) and setup the basics of the system (hostname, DNS, etc., basic package installs). Again, used the 120GB SSD for Debian (with a home crypt) and kept the other 7 drives for a pool/LVM/spare crypt. My initial idea was to do LUKS first, then zfs, meaning 6 could be mirrors in zfs and I would keep 1 as a spare LUKS crypt for keys, other crap, etc. To create the LUKS crypts, I did the following 6 times, each time appending the last 4 digits of the block ID to the LUKS crypt name:

cryptsetup luksFormat /dev/sda
cryptsetup luksOpen /dev/sda sdafc11

You then make sure to use the LUKS label names when making the zpool, not the short names, which can change at times during reboots. I did this as follows:

sudo apt install zfs-utils
zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066

ZFS by default executes its mount commands at boot. This is a problem if you don't use auto-unlocking and key files with LUKS to also unlock on boot (and/or a custom script that unlocks). The problem, in this use cases, is ZFS will try to mount the volumes before they are unlocked. The two other options are none/legacy modes, both of which rely on you mounting the volume using traditional methods. But, the whole point of using zfs finally was to not use traditional methods lol, so for that reason I investigated if there was a fix. The closest to a fix is setting cachefile=none boot, but this a) hosed the pool once b) requires resetting, rebooting again and/or manually re-mounting the pool - either of which defeat the point. Using key files, cache file adjustments, etc., and/or none/legacy were all no-gos for me, so in the end, I decided to tolerate that zfs would fail at boot, and that I would zpool import it afterwards.

sudo -i
screen
su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault]
ctrl-a-d [detaches from screen]

After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this mount-luks.sh:

cryptsetup luksOpen /dev/disk/by-uuid/2702e690-…-0c4267a6fc11 sdafc11
cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-…-cdc5dedb9322 sdb9322
cryptsetup luksOpen /dev/disk/by-uuid/d353e727-…-e4d66a9b8a33 sdc8a33
cryptsetup luksOpen /dev/disk/by-uuid/352660ca-…-5a8beae15b44 sde5b44
cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-…-f46ce1cf8055 sdf8055
cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-…-13bc38656466 sdh6466

This script simply opens each LUKS crypt so long as you enter or copy/paste your HD password 6 times. After that, one has to re-mount the pool / rebuild the quasi RAID1 mirror/logical volumes with the import command as follows once the volumes are opened:

zpool import vms

Rebooting in this manner takes about 3-5 minutes for the host, and 2 minutes to screen into my user name, detach, and run the mount LUKS script (which also ends the script by importing the pool). The above was the original setup. I changed that below.

– Alternate Setup –

Eventually, I ended up agreeing with a friend that it made no sense to do LUKS first because that would preclude me from rebuilding degraded pools using zfs tools. This is because, if a drive failed, then the pool could never be imported, and thus never used without very complicated tinkering. So, I destroyed the LUKS pools above, made a zfs pool with the same command structure but used the regular short-names only. Then, after that, I created two datasets (zfs' name for encrypted folders inside pools/mountpoints for LVM mirrors). The datasets each unlock by pulling a dd-generated key from the encrypted home partition on the SSD boot volume. I set up the keys/datasets as follows:

dd if=/dev/random of=/secure/area/example.key bs=1 count=32
zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset

When you create this on the current running instance, it will also mount it for you as a courtesy, but upon reboot, you need to load the key, then mount the dataset using zfs commands. In my case, I created three datasets (one for raw isos, one for disk images, and a last one for backup sparse tarballs). Each one was created as follows:

zfs load-key pool/dataset
zfs mount pool/dataset

Once I created all the datasets, I made a script that would load the keys and unlock all of them, then rebooted and tested it for functionality. Upon verifying that the datasets worked, I could now feel comfortable creating VMs again, since the hard drive images for those VMs would be stored in encrypted datasets with zfs. My next task was to create both snapshots within zfs, which would handle routine rollbacks and smaller errors/mistakes. I did that by creating a small script that runs via cron 4 times a day, or every 6 hours:

DATE=date +"%Y%m%d-%H:%M:%S"
/usr/sbin/zfs snapshot -r pool/vm1dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/vm2dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/@backup_$DATE
/usr/sbin/zfs snapshot pool@backup_$DATE

The snapshots allow me to perform roll backs when end-users make mistakes, e.g., delete an instructional video after a class session, etc., or what have you. However, if the data center is compromised physically or their upstream goes down, I also need remote/failover options, so my next task was to find a way to easily take advantage of cp's understanding of sparse files and tar so that I could easily use rsync to bring over tarballs of the VM disks that only utilized actual data, instead of the entire 1TB container. To do this, I used the c and S flags in tar, together with bzip2 compression for speed, in order to provide myself remote/failover options. I did this as follows, and take care when adjusting this script, as most alterations will break the ability of tar to properly treat the .img file as sparse:

DATE=date +"%Y%m%d-%H:%M:%S"
cd /backups
cp -ar /vms/vol.img /backups/vol.img_QUICK_.bak
bsdtar --use-compress-program=pbzip2 -Scf vol.img_QUICK_.tar.bz2 vol.img_QUICK_.bak
mv /backups/vol.img_QUICK_.tar.bz2 /backups/tbs/vol.img_QUICK_$DATE.tar.bz2
rm /backups/vol.img_QUICK_.bak
find /egcy/backups/tarballs -type f -mtime +30 -delete

In addition to daily live images using the above, script, I also run a 1/3 days version called SANE , which runs virsh shutdown domain before copying/tarballing and then runs virsh start domain at the end of the tarballing. The host is set to keep 30 days worth of images, but you can easily adjust the flag in the last line above to your use case. After these run, pull the changes to offsite backup ``/`` computer using rsync on the offsite host as follows:

sudo rsync -av --log-file=/home/logs/backup-of-vm.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/

Since the workstation is on rsnapshot, I get redundant dailies on its backup that extend beyond the quantity on the physical host (because of space on my primary workstation). This new setup runs the following domains in production:

Physical Host (The SuperMicro described above ;> )
VM1 - Haack's Networking (my business lol, which includes this site/post)
VM2 - NM Big Blue Button project (the driving reason for this change)

– Network Bridge Setup / VMs –

Once the physical host was setup, I created two vanilla VMs using the virt-manager GUI with X-forwarding over ssh prior to bringing the server on site. Once those were setup, I headed to the DataCenter figuring I might have to tinker with bridging and network configurations a bit onsite before leaving the device there indefinitely and subject to 24 hour notice emergency KVM. Once there, I worked for about 3 hours configuring the interfaces for bridge mode, ultimately with two physical ethernet cables into the device, one on a non-bridged static IP / interface and the other on a static IP / interface dedicated to bridging. After about a week of thinking back on my Slackware phases, my freeBSD phases and the late 90s and early 2000s, … AND … a lot of Stack Exchange tutorials, I decided on the manual command line approach, utilizing no desktop tools to manage interfaces, just stripped down Debian with no network-manager etc., and just manual entries for needed functionality. Here's what I came up with:

sudo nano /etc/network/interfaces

That file should look like this (adjust to your use-case, ofc):

#eth0 (alt name ent8s0g) physical host base-connection
auto ent8s0g0
  iface ent8s0f0 inet static
  address 8.25.76.160
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameserver 8.8.8.8

#eth1 (alt name enp8s0g1) interface for bridge
auto enp8s0g1
iface enp8s0g1 inet manual

auto br0
iface br0 inet static
  address 8.25.76.159
  netmask 255.255.255.0
  gateway 8.25.76.1
  bridge_ports enp8s0g1
  nameserver 8.8.8.8

Once that's done, run ip a to make sure your primary interface connects upstream to the Data Center, and also make sure that the interface br0 appears at the bottom and that the secondary interface shows it as bound to the bridge in its output. Sometimes, I find that nameservers don't properly populate to resolv.conf, so I do the following so that my resolv.conf configurations stick and I don't lose upstream DNS. (Note: I do this because Debian - rightfully - still supports manual over-writing of /etc/resolv.conf.)

echo nameserver 8.8.8.8 > /etc/resolv.conf

Reboot the host and ping 8.8.8.8 and google.com to ensure you have link and upstream DNS. Next up, it is time to configure the guest / VM machine. I saw a lot of good tutorials online, but most of them got sloppy at this stage as far as interfaces and bridging was concerned, so I'll try to be clear where they were not. When you set up the new VM (not covered here), instead of relying on the NAT-based default network, change the option to “Bridge” (this is in the virt-manager GUI) and enter the name of the bridge, in my case br0. (You can also use virsh for this step, but why lol - I just use X forwarding and open the GUI.) This step connects the hypervisor NIC to the virtual switch of the bridge on the physical host. Once that's done, spin up the VM and open up the Terminal (the one inside the VM). In the VM's Terminal, configure the NIC interface as follows:

sudo nano /etc/network/interfaces

This file should look like this (adjust to your use-case - and again, this is inside the VM Terminal, and not on the Terminal of the physical host):

auto epr1
iface epr1 inet static
  address 8.25.76.158
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameservers 8.8.8.8

The VM interface is listed inside the guest/VM as epr1 - but remember, that's connected to the virtual switch and bridge through the previous steps, so don't worry. After this step, restart the networking service and check to see if your IP address is assigned. Also, in my use-case my VM is Ubuntu which does not allow manual over-writing of resolv.conf, so I also add upstream DNS as follows:

sudo service networking restart
ip a
sudo apt install resolvconf
sudo nano /etc/resolvconf/resolv.conf.d/tail

Enter the name server as follows:

nameserver 8.8.8.8

At this point, I would probably reboot and then from within the VM, ping 8.8.8.8, and then ping google.com to ensure you have link and upstream DNS. Everything should be rosy ;>. Some folks might be concerned about ARP and such, but virt-manager handles that with the gateway entry combined with the bridge, so no need to alter proc and pass traffic, etc. Of course, replace Google's DNS if you so choose, but I had reliability problems with Level 3 during testing myself (sad).

— oemb1905 2021/11/09 19:55

Haack's Wiki

User Tools

Site Tools

Page Tools