Haack's Wiki

This is an old revision of the document!

vmserver
Jonathan Haack
Haack's Networking
netcmnd@jonathanhaack.com

What? I put a large physical host at a data center in Northern New Mexico. I purchased a few blocks of ip4 addresses, and set up virt-manager to run guest OSs for my business and other projects. Put simply, I was creating virtualized infrastructure on a physical host I owned and configured to run VPSs for my own stuff, for clients, my kids, for gaming projects, other hacking projects, and so on. I used a data center rather than hosting this at home because a) it's noisy as f, and b) the data center has a symmetric gig.

Why? I am a math/CS teacher for HS/college levels. I use free software to help my educational needs, and I would like to use self-hosted services such as but not exclusive to Big Blue Button, Nextcloud, etc., to assist my instruction. Additionally, I run a small IT/education free software business, and since my client load is only around 10-20, I figured the server could also replace my current business website. Lastly, I serve on some non-profit boards and I figured I could move those websites over as well. It would save money, and it would also make the systems much more portable since restoring would as simple as adding a .img to a new virt-manager instance.

I was given a dual 8-core Xeon SuperMicro server (circa 08-12), with 8 HD bays in use, 96GBRAM, 8x 6TB Western Digital in Raid1 zfs mirror (24TB actual), with a 120GBSSD boot volume stuck behind the power front panel running non-GUI Debian. (Thanks to Kilo Sierra for the donation.) My first job was to calculate whether my PSU was up to the task I intended for it. I used a 500W PSU. From my calculations, I determined that the RAM would be around 360W at capacity but rarely hit that or even close, that the HDs would often (especially on boot) hit up to 21.3W per drive, or around 150W total, and that excluded the boot SSD volume. The motherboard would be 100W, putting me at 610W. Since I did not expect the RAM, HDs, and other physical components to concurrently hit peak consumption, I considered it safe to proceed, and figured no more than around 75% of that ceiling would be used at any one time. The next step was to install the physical host OS (Debian) and setup the basics of the system (hostname, DNS, etc., basic package installs). On the 120GB SSD boot volume, I used a luks / pam_mount encrypted home directory, where I could store keys for the zfs pool and/or other sensitive data. I used a nifty trick in order to first create the pools simply with short names, and then magically change them to block ids without having to make the pool creation syntax cumbersome.

zpool create -m /mnt/vms vms -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh
zpool export vms 
zpool import -d /dev/disk/by-id vms

Now that pool was created, I created two encrypted datasets, which is zfs name for encrypted file storage inside the pool. The datasets each unlock by pulling a dd-generated key from the encrypted (and separate) home partition on the SSD boot volume. I set up the keys/datasets as follows:

dd if=/dev/random of=/secure/area/example.key bs=1 count=32
zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset

When you create this on the current running instance, it will also mount it for you as a courtesy, but upon reboot, you need to load the key, then mount the dataset using zfs commands. In my case, I created three datasets (one for raw isos, one for disk images, and a last one for backup sparse tarballs). Each one was created as follows:

zfs load-key pool/dataset
zfs mount pool/dataset

Once I created all the datasets, I made a script that would load the keys and unlock all of them, then rebooted and tested it for functionality. Upon verifying that the datasets worked, I could now feel comfortable creating VMs again, since the hard drive images for those VMs would be stored in encrypted datasets with zfs. My next task was to create both snapshots within zfs, which would handle routine rollbacks and smaller errors/mistakes. I did that by creating a small script that runs via cron 4 times a day, or every 6 hours:

DATE=date +"%Y%m%d-%H:%M:%S"
/usr/sbin/zfs snapshot -r pool/vm1dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/vm2dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/@backup_$DATE
/usr/sbin/zfs snapshot pool@backup_$DATE

The snapshots allow me to perform roll backs when end-users make mistakes, e.g., delete an instructional video after a class session, etc., or what have you. To delete all snapshots and start over, run:

zfs list -H -o name -t snapshot | xargs -n1 zfs destroy

However, if the data center is compromised physically or their upstream goes down, I also need remote/failover options, so my next task was to find a way to easily take advantage of cp's understanding of sparse files and tar so that I could easily use rsync to bring over tarballs of the VM disks that only utilized actual data, instead of the entire 1TB container. To do this, I used the c and S flags in bsdtar, together with bzip2 compression for speed. Make sure to use bsdtar (sudo apt install libarchive-tools)! I did this by making the the script that follows, and take care when adjusting this script, as most alterations will break the ability of tar to properly treat the .img file as sparse:

DATE=date +"%Y%m%d-%H:%M:%S"
cd /backups
cp -ar /vms/vol.img /backups/vol.img_QUICK_.bak
bsdtar --use-compress-program=pbzip2 -Scf vol.img_QUICK_.tar.bz2 vol.img_QUICK_.bak
mv /backups/vol.img_QUICK_.tar.bz2 /backups/tbs/vol.img_QUICK_$DATE.tar.bz2
rm /backups/vol.img_QUICK_.bak
find /egcy/backups/tarballs -type f -mtime +30 -delete

In addition to daily live images using the above, script, I also run a 1/3 days version called SANE , which runs virsh shutdown domain before copying/tarballing and then runs virsh start domain at the end of the tarballing. The host is set to keep 30 days worth of images, but you can easily adjust the flag in the last line above to your use case. After these run, pull the changes to offsite backup ``/`` computer using rsync on the offsite host as follows:

sudo rsync -av --log-file=/home/logs/backup-of-vm.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/

Since the workstation is on rsnapshot, I get redundant dailies on its backup that extend beyond the quantity on the physical host (because of space on my primary workstation).

– Network Bridge Setup / VMs –

Once the physical host was setup, I created two vanilla VMs using the virt-manager GUI with X-forwarding over ssh prior to bringing the server on site. Once those were setup, I headed to the DataCenter figuring I might have to tinker with bridging and network configurations a bit onsite before leaving the device there indefinitely and subject to 24 hour notice emergency KVM. Once there, I worked for about 3 hours configuring the interfaces for bridge mode, ultimately with two physical ethernet cables into the device, one on a non-bridged static IP / interface and the other on a static IP / interface dedicated to bridging. After about a week of thinking back on my Slackware phases, my freeBSD phases and the late 90s and early 2000s, … AND … a lot of Stack Exchange tutorials, I decided on the manual command line approach, utilizing no desktop tools to manage interfaces, just stripped down Debian with no network-manager etc., and just manual entries for needed functionality. Here's what I came up with for interfaces on the physical host:

sudo nano /etc/network/interfaces

That file should look like this (adjust to your use-case, ofc):

#eth0 (alt name ent8s0g) physical host base-connection
auto ent8s0g0
  iface ent8s0f0 inet static
  address 8.25.76.160
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameserver 8.8.8.8

#eth1 (alt name enp8s0g1) interface for bridge
auto enp8s0g1
iface enp8s0g1 inet manual

auto br0
iface br0 inet static
  address 8.25.76.159
  netmask 255.255.255.0
  gateway 8.25.76.1
  bridge_ports enp8s0g1
  nameserver 8.8.8.8

Once that's done, run ip a to make sure your primary interface connects upstream to the Data Center, and also make sure that the interface br0 appears at the bottom and that the secondary interface shows it as bound to the bridge in its output. If you are doing the same overall project, but limited to a home network behind a dhcp router, then do the following assuming eth1 is primary interface:

auto eth1
iface eth1 inet manual

auto br0
iface br0 inet dhcp
      bridge_ports eth1

I included the dhcp / home-version in case self-hosting home users want to try something similar. Okay, not as for as nameservers are concerned, I find sometimes they don't properly populate to resolv.conf, so I do the following so that my resolv.conf configurations stick and I don't lose upstream DNS. (Note: I do this because Debian - rightfully - still supports manual over-writing of /etc/resolv.conf.)

echo nameserver 8.8.8.8 > /etc/resolv.conf

Reboot the host and ping 8.8.8.8 and google.com to ensure you have link and upstream DNS. Next up, it is time to configure the guest / VM machine. I saw a lot of good tutorials online, but most of them got sloppy at this stage as far as interfaces and bridging was concerned, so I'll try to be clear where they were not. When you set up the new VM (not covered here), instead of relying on the NAT-based default network, change the option to “Bridge” (this is in the virt-manager GUI) and enter the name of the bridge, in my case br0. (You can also use virsh for this step, but why lol - I just use X forwarding and open the GUI.) This step connects the hypervisor NIC to the virtual switch of the bridge on the physical host. Once that's done, spin up the VM and open up the Terminal (the one inside the VM). In the VM's Terminal, configure the NIC interface as follows:

sudo nano /etc/network/interfaces

This file should look like this (adjust to your use-case - and again, this is inside the VM Terminal, and not on the Terminal of the physical host):

auto epr1
iface epr1 inet static
  address 8.25.76.158
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameservers 8.8.8.8

The VM interface is listed inside the guest/VM as epr1 - but remember, that's connected to the virtual switch and bridge through the previous steps we did on the virt-manager GUI, so don't worry. After this step, restart the networking service and check to see if your IP address is assigned. If you are doing this in a home set-up, and you want the VM to receive an ip address via the host switch you set up above, then adjust its interface file as follows:

auto epr1
iface epr1 inet dhcp

Now, to ensure DNS works on both the physical host and the VM flawlessly, I manually adjust resolv.conf. On Ubuntu VMs, which do not by default allow manual over-writing of resolv.conf, I add the resolv.conf package and manage my upstream DNS as follows:

sudo apt install resolvconf
sudo nano /etc/resolvconf/resolv.conf.d/tail
<nameserver 8.8.8.8>
sudo service networking restart
ip a

At this point, I would probably reboot and/or then from within the VM, ping 8.8.8.8, and then ping google.com to ensure you have link and upstream DNS. Everything should be rosy ;>. Some folks might be concerned about ARP and such, but virt-manager handles that with the gateway entry combined with the bridge, so no need to alter proc and pass traffic, etc. Of course, replace Google's DNS if you so choose, but I had reliability problems with Level 3 during testing myself (sad). As I alluded to above, the point of this project was to create my own virtualized VPS infrastructure, to run my own stuff and for clients. At present, I have ported my business site over, created a teaching nextcloud for Talk with students and for resource sharing, a big blue button instance (that proves to be a major problem and source of pain), a minecraft server, some gamer sites, and some testing VPS for my kids. Here's a few to check out:

The last one is my 10 year old daughter's project. It's coming along nicely, and serves as a great way to teach her basic html, CSS, and JS. The next part of this write-up includes how to do the same overall virtualization of infrastructure and VPS leveraging as above, but does so with LUKS first. Read on to see what I originally did … the reason I ultimately rejected this, moreover, was because you can't use zfs tools when hard drives fail if you do it that way. ;<

– LUKS FIRST, ZFS SECOND - (LEGACY SETUP, NOT CURRENT) –

My initial idea was to do LUKS first, then zfs, meaning 6 could be mirrors in zfs and I would keep 1 as a spare LUKS crypt for keys, other crap, etc. To create the LUKS crypts, I did the following 6 times, each time appending the last 4 digits of the block ID to the LUKS crypt name:

cryptsetup luksFormat /dev/sda
cryptsetup luksOpen /dev/sda sdafc11

You then make sure to use the LUKS label names when making the zpool, not the short names, which can change at times during reboots. I did this as follows:

sudo apt install zfs-utils bridge-utils
zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066

ZFS by default executes its mount commands at boot. This is a problem if you don't use auto-unlocking and key files with LUKS to also unlock on boot (and/or a custom script that unlocks). The problem, in this use cases, is ZFS will try to mount the volumes before they are unlocked. The two other options are none/legacy modes, both of which rely on you mounting the volume using traditional methods. But, the whole point of using zfs finally was to not use traditional methods lol, so for that reason I investigated if there was a fix. The closest to a fix is setting cachefile=none boot, but this a) hosed the pool once b) requires resetting, rebooting again and/or manually re-mounting the pool - either of which defeat the point. Using key files, cache file adjustments, etc., and/or none/legacy were all no-gos for me, so in the end, I decided to tolerate that zfs would fail at boot, and that I would zpool import it afterwards.

sudo -i
screen
su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault]
ctrl-a-d [detaches from screen]

After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this mount-luks.sh:

cryptsetup luksOpen /dev/disk/by-uuid/2702e690-…-0c4267a6fc11 sdafc11
cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-…-cdc5dedb9322 sdb9322
cryptsetup luksOpen /dev/disk/by-uuid/d353e727-…-e4d66a9b8a33 sdc8a33
cryptsetup luksOpen /dev/disk/by-uuid/352660ca-…-5a8beae15b44 sde5b44
cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-…-f46ce1cf8055 sdf8055
cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-…-13bc38656466 sdh6466

This script simply opens each LUKS crypt so long as you enter or copy/paste your HD password 6 times. After that, one has to re-mount the pool / rebuild the quasi RAID1 mirror/logical volumes with the import command as follows once the volumes are opened:

zpool import vms

Rebooting in this manner takes about 3-5 minutes for the host, and 2 minutes to screen into my user name, detach, and run the mount LUKS script to mount the pools/datasets, etc. Again, I ultimately rejected this because you cannot use zfs tools when hard drives fail with this setup.

— oemb1905 2022/07/26 13:07

Haack's Wiki

User Tools

Site Tools

Page Tools