User Tools

Site Tools


computing:vmserver

  • vmserver
  • Jonathan Haack
  • Haack's Networking
  • netcmnd@jonathanhaack.com

I was given a dual 8-core Xeon SuperMicro server (32 threads), with 8 HD bays in use, 96GBRAM, 8x 6TB Western Digital in Raid1 zfs mirror (24TB actual), with a 120GBSSD boot volume stuck behind the power front panel running non-GUI Debian. (Thanks to Kilo Sierra for the donation.) My first job was to calculate whether my PSU was up to the task I intended for it. I used a 500W PSU. From my calculations, I determined that the RAM would be around 360W at capacity but rarely hit that or even close, that the HDs would often (especially on boot) hit up to 21.3W per drive, or around 150W total, and that excluded the boot SSD volume. The motherboard would be 100W, putting me at 610W. Since I did not expect the RAM, HDs, and other physical components to concurrently hit peak consumption, I considered it safe to proceed, and figured no more than around 75% of that ceiling would be used at any one time. The next step was to install the physical host OS (Debian) and setup the basics of the system (hostname, DNS, etc., basic package installs). On the 120GB SSD boot volume, I used a luks / pam_mount encrypted home directory, where I could store keys for the zfs pool and/or other sensitive data. I used a nifty trick in order to first create the pools simply with short names, and then magically change them to block ids without having to make the pool creation syntax cumbersome.

zpool create -m /mnt/pool pool -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh
zpool export pool
zpool import -d /dev/disk/by-id pool

Now that pool was created, I created two encrypted datasets, which is zfs name for encrypted file storage inside the pool. The datasets each unlock by pulling a dd-generated key from the encrypted (and separate) home partition on the SSD boot volume. I set up the keys/datasets as follows:

dd if=/dev/random of=/secure/area/example.key bs=1 count=32
zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset

When you create this on the current running instance, it will also mount it for you as a courtesy, but upon reboot, you need to load the key, then mount the dataset using zfs commands. In my case, I created three datasets (one for raw isos, one for disk images, and a last one for backup sparse tarballs). Each one was created as follows:

zfs load-key pool/dataset
zfs mount pool/dataset

Once I created all the datasets, I made a script that would load the keys and unlock all of them, then rebooted and tested it for functionality. Upon verifying that the datasets worked, I could now feel comfortable creating VMs again, since the hard drive images for those VMs would be stored in encrypted datasets with zfs. My next task was to create both snapshots within zfs, which would handle routine rollbacks and smaller errors/mistakes. I did that by creating a small script that runs via cron 4 times a day, or every 6 hours:

DATE=date +"%Y%m%d-%H:%M:%S"
/usr/sbin/zfs snapshot -r pool/vm1dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/vm2dataset@backup_$DATE
/usr/sbin/zfs snapshot -r pool/@backup_$DATE
/usr/sbin/zfs snapshot pool@backup_$DATE

The snapshots allow me to perform roll backs when end-users make mistakes, e.g., delete an instructional video after a class session, etc., or what have you. To delete all snapshots and start over, run:

zfs list -H -o name -t snapshot | xargs -n1 zfs destroy

However, if the data center is compromised physically or their upstream goes down, I also need remote/failover options, so my next task was to find a way to easily take advantage of cp's understanding of sparse files and tar so that I could easily use rsync to bring over tarballs of the VM disks that only utilized actual data, instead of the entire 1TB container. To do this, I used the c and S flags in bsdtar, together with bzip2 compression for speed. Make sure to use bsdtar (sudo apt install libarchive-tools)! I did this by making the the script that follows, and take care when adjusting this script, as most alterations will break the ability of tar to properly treat the .img file as sparse:

DATE=date +"%Y%m%d-%H:%M:%S"
cd /backups
cp -ar /vms/vol.img /backups/vol.img_QUICK_.bak
bsdtar --use-compress-program=pbzip2 -Scf vol.img_QUICK_.tar.bz2 vol.img_QUICK_.bak
mv /backups/vol.img_QUICK_.tar.bz2 /backups/tbs/vol.img_QUICK_$DATE.tar.bz2
rm /backups/vol.img_QUICK_.bak
find /egcy/backups/tarballs -type f -mtime +30 -delete

In addition to daily live images using the above, script, I also run a 1/3 days version called SANE , which runs virsh shutdown domain before copying/tarballing and then runs virsh start domain at the end of the tarballing. The host is set to keep 30 days worth of images, but you can easily adjust the flag in the last line above to your use case. After these run, pull the changes to offsite backup ``/`` computer using rsync on the offsite host as follows:

sudo rsync -av --log-file=/home/logs/backup-of-vm.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/

Since the workstation is on rsnapshot, I get redundant dailies on its backup that extend beyond the quantity on the physical host (because of space on my primary workstation).

– Network Bridge Setup / VMs –

Once the physical host was configured, I needed to edit its network settings and create a virtual switch that VMs could be allocated ips through. To do this, I kept it simple and used bridge-utils package and some manual editing in /etc/network/interfaces.

sudo apt install bridge-utils
sudo brctl addbr br0
sudo nano /etc/network/interfaces

Now that you have added the routing software package and created the virtual switch, you need to reconfigure your interfaces file so that your host OS knows how to negotiate a connection again. In my case, I used 2/10 ips I purchased at the data center for the physical host.

#eth0 (debian required the alt name for some reason) <- 1st physical port backup
auto ent8s0g0
  iface ent8s0f0 inet static
  address 8.25.76.160
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameserver 8.8.8.8
#eth1 (debian required the alt name for some reason) <- 1nd physical port for bridge
auto enp8s0g1
iface enp8s0g1 inet manual
auto br0
iface br0 inet static
  address 8.25.76.159
  netmask 255.255.255.0
  gateway 8.25.76.1
  bridge_ports enp8s0g1
  nameserver 8.8.8.8
  

Once that's done, you can restart networking.service (or optionally network-manager if you prefer). After that, see if your changes stuck by with ip a. The output of ip a will now show br0 state UP in the output of interface enp8s0g1 and down below, you will see the bridge interface, br0, and this interface, or virtual switch, is what you connect your virtualization software to. In my case, I just specify br0 in virt-manager in the network section. For smaller environments, for example, being at home and/or behind a dhcp router, then the following configuration should be sufficient:

auto eth1
iface eth1 inet manual
auto br0
iface br0 inet dhcp
      bridge_ports eth1

The above home-version allows, for example, users to have a virtual machine that gets an ip address on your LAN and makes ssh access far easier, for example. Okay, back to the server setup. Well, the next thing to do is to test whether or not you can send/receive packets on those interfaces. To do that, run a few ping tests:

ping 8.8.8.8
ping google.com

At this stage, these tests failed and I was not able to route and had no functional DNS servers. Running cat /etc/resolv.conf confirmed that DNS was only localhost, so it made sense I could not route. Since I use Debian, this was an easy fix, and I simply provided my host with nameservers as follows:

echo nameserver 8.8.8.8 > /etc/resolv.conf

After this step, you can either restart networking.service or if you prefer network-manager.service and/or reboot. Since I had just done a lot, I decided to just reboot. Upon rebooting, I ran the same ping tests above, and both successfully received bytes back. Now that the physical host has two ips and can route, it was time to setup the VMs and make sure they could connect to the virtual switch, or br0. To do this, I first configured a vanilla install of debian within virt-manager. Then, using the console of virt-manager for that VM, I edited the guest OS network configuration files as follows:

sudo nano /etc/network/interfaces
auto epr1
iface epr1 inet static
  address 8.25.76.158
  netmask 255.255.255.0
  gateway 8.25.76.1
  nameservers 8.8.8.8

Remember, the configuration above is within the guest OS of the VM in virt-manager and not the physical host. In my example, I used epr1 because that's the name of the network interface when you run ip a within the guest OS. For smaller/home set-ups using dhcp, you would change the configuration files as follows:

sudo nano /etc/network/interfaces
<auto epr1>
<iface epr1 inet dhcp>

Notes for Ubuntu VMs: On some of my VMs, I am required to use Ubuntu. Ubuntu has now deprecated ifupdown in favor of netplan and disabled manual editing of /etc/resolv.conf so unless you want to make the above interfaces in YAML in netplan, then you have to temporarily enable NAT in virt-manager and reboot the VM. Once NAT is enabled and you can route, then add ifupdown, remove netplan, and add the resolvconf package as follows:

sudo apt install ifupdown
sudo apt remove --purge netplan.io
<enter the bridge network config above in /etc/network/interfaces>
sudo apt install resolvconf
sudo nano /etc/resolvconf/resolv.conf.d/tail
<nameserver 8.8.8.8>
sudo reboot

Make sure to restart networking.service or network-manager.service at this point and conduct some ping tests on both 8.8.8.8 and google.com. Sometimes, I find a reboot is required. Some online tutorials report that you need additional configuring for traffic to pass properly and/or for NAT to function. However, in my experience, this is all handled by virt-manager. In summary, the point of this project was to create my own virtualized VPS infrastructure, to run my own stuff and for clients. At present, I have ported my business site over, created a teaching nextcloud for Talk with students and for resource sharing, a big blue button instance (that proves to be a major problem and source of pain), a minecraft server, some gamer sites, and some testing VPS for my kids. Here's a few to check out:

The last one is my 10 year old daughter's project. It's coming along nicely, and serves as a great way to teach her basic html, CSS, and JS. The next part of this write-up includes how to do the same overall virtualization of infrastructure and VPS leveraging as above, but does so with LUKS first. Read on to see what I originally did … the reason I ultimately rejected this, moreover, was because you can't use zfs tools when hard drives fail if you do it that way. ;<

– LUKS FIRST, ZFS SECOND - (LEGACY SETUP, NOT CURRENT) –

My initial idea was to do LUKS first, then zfs, meaning 6 could be mirrors in zfs and I would keep 1 as a spare LUKS crypt for keys, other crap, etc. To create the LUKS crypts, I did the following 6 times, each time appending the last 4 digits of the block ID to the LUKS crypt name:

cryptsetup luksFormat /dev/sda
cryptsetup luksOpen /dev/sda sdafc11

You then make sure to use the LUKS label names when making the zpool, not the short names, which can change at times during reboots. I did this as follows:

sudo apt install zfs-utils bridge-utils
zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066

ZFS by default executes its mount commands at boot. This is a problem if you don't use auto-unlocking and key files with LUKS to also unlock on boot (and/or a custom script that unlocks). The problem, in this use cases, is ZFS will try to mount the volumes before they are unlocked. The two other options are none/legacy modes, both of which rely on you mounting the volume using traditional methods. But, the whole point of using zfs finally was to not use traditional methods lol, so for that reason I investigated if there was a fix. The closest to a fix is setting cachefile=none boot, but this a) hosed the pool once b) requires resetting, rebooting again and/or manually re-mounting the pool - either of which defeat the point. Using key files, cache file adjustments, etc., and/or none/legacy were all no-gos for me, so in the end, I decided to tolerate that zfs would fail at boot, and that I would zpool import it afterwards.

sudo -i
screen
su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault]
ctrl-a-d [detaches from screen]

After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this mount-luks.sh:

cryptsetup luksOpen /dev/disk/by-uuid/2702e690-…-0c4267a6fc11 sdafc11
cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-…-cdc5dedb9322 sdb9322
cryptsetup luksOpen /dev/disk/by-uuid/d353e727-…-e4d66a9b8a33 sdc8a33
cryptsetup luksOpen /dev/disk/by-uuid/352660ca-…-5a8beae15b44 sde5b44
cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-…-f46ce1cf8055 sdf8055
cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-…-13bc38656466 sdh6466

This script simply opens each LUKS crypt so long as you enter or copy/paste your HD password 6 times. After that, one has to re-mount the pool / rebuild the quasi RAID1 mirror/logical volumes with the import command as follows once the volumes are opened:

zpool import pool

Rebooting in this manner takes about 3-5 minutes for the host, and 2 minutes to screen into my user name, detach, and run the mount LUKS script to mount the pools/datasets, etc. Again, I ultimately rejected this because you cannot use zfs tools when hard drives fail with this setup.

oemb1905 2022/07/26 19:31

computing/vmserver.txt · Last modified: 2022/08/09 18:07 by oemb1905