Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
computing:vmserver [2022/11/12 18:52] – oemb1905 | computing:vmserver [2024/02/17 20:54] – oemb1905 |
---|
------------------------------------------- | ------------------------------------------- |
| |
I was given a dual 8-core Xeon SuperMicro server (32 threads), with 8 HD bays in use, 96GBRAM, 8x 6TB Western Digital in Raid1 zfs mirror (24TB actual), with a 120GBSSD boot volume stuck behind the power front panel running non-GUI Debian. (Thanks to Kilo Sierra for the donation.) My first job was to calculate whether my PSU was up to the task I intended for it. I used a 500W PSU. From my calculations, I determined that the RAM would be around 360W at capacity but rarely hit that or even close, that the HDs would often (especially on boot) hit up to 21.3W per drive, or around 150W total, and that excluded the boot SSD volume. The motherboard would be 100W, putting me at 610W. Since I did not expect the RAM, HDs, and other physical components to concurrently hit peak consumption, I considered it safe to proceed, and figured no more than around 75% of that ceiling would be used at any one time. The next step was to install the physical host OS (Debian) and setup the basics of the system (hostname, DNS, etc., basic package installs). On the 120GB SSD boot volume, I used a luks / pam_mount encrypted home directory, where I could store keys for the zfs pool and/or other sensitive data. I used a nifty trick in order to first create the pools simply with short names, and then magically change them to block ids without having to make the pool creation syntax cumbersome. | //vmserver// |
| |
| ------------------------------------------- |
| |
| I am currently running a Supermicro 6028U-TRTP+ w/ Dual 12-core Xeon E5-2650 at 2.2Ghz, 384GB RAM, with four two-way mirrors of Samsung enterprise SSDs for the primary vdev, and two two-way mirrors of 16TB platters for the backup vdev. All drives using SAS. I am using a 500W PSU. I determine the RAM would be about 5-10W a stick, the mobo about 100W, and the drives would consume most of the rest at roughly 18-22W per drive. The next step was to install Debian on the bare metal to control and manage the virtualization environment. The virtualization stack is virsh and kvm/qemu. As for the file system and drive formatting, I used luks and pam_mount to open an encrypted home partition and mapped home directory. I use this encrypted home directory to store keys for the zfs pool and/or other sensitive data, thus protecting them behind FDE. Additionally, I create file-level encrypted zfs data sets within each of the vdevs that are unlocked by the keys on the LUKS home partition. Instead of tracking each UUID down on your initial build, do the following: |
| |
zpool create -m /mnt/pool pool -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh | zpool create -m /mnt/pool pool -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh |
zpool import -d /dev/disk/by-id pool | zpool import -d /dev/disk/by-id pool |
| |
Now that pool was created, I created two encrypted datasets, which is zfs name for encrypted file storage inside the pool. The datasets each unlock by pulling a dd-generated key from the encrypted (and separate) home partition on the SSD boot volume. I set up the keys/datasets as follows: | Once the pool is created, you can create your encrypted datasets. To do so, I made some unlock keys with the dd command and placed the keys in a hidden directory inside that LUKS encrypted home partition I mentioned above: |
| |
dd if=/dev/random of=/secure/area/example.key bs=1 count=32 | dd if=/dev/random of=/secure/area/example.key bs=1 count=32 |
zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset | zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset |
| |
When you create this on the current running instance, it will also mount it for you as a courtesy, but upon reboot, you need to load the key, then mount the dataset using zfs commands. In my case, I created three datasets (one for raw isos, one for disk images, and a last one for backup sparse tarballs). Each one was created as follows: | When the system reboots, the vdevs will automatically mount but the data sets won't because the LUKS keys won't be available until you mount the home partition by logging in to the user that holds the keys. For security reasons, this must be done manually or it defeats the entire purpose. So, once the administrator has logged in to the user in a screen session (remember, it is using pam_mount), they simple detach from that session and then load the keys and datasets as follows: |
| |
zfs load-key pool/dataset | zfs load-key pool/dataset |
zfs mount pool/dataset | zfs mount pool/dataset |
| |
Once I created all the datasets, I made a script that would load the keys and unlock all of them, then rebooted and tested it for functionality. Upon verifying that the datasets worked, I could now feel comfortable creating VMs again, since the hard drive images for those VMs would be stored in encrypted datasets with zfs. My next task was to create both snapshots within zfs, which would handle routine rollbacks and smaller errors/mistakes. I did that by creating a small script that runs via cron 4 times a day, or every 6 hours: | If you have a lot of data sets, you can make a simple script to load them all at once, etc. Since we have zfs, it's a good idea to run some snapshots. To do that, I created a small shell script with the following commands and then set it to run 4 times a day, or every 6 hours: |
| |
DATE=date +"%Y%m%d-%H:%M:%S" | DATE=date +"%Y%m%d-%H:%M:%S" |
/usr/sbin/zfs snapshot pool@backup_$DATE | /usr/sbin/zfs snapshot pool@backup_$DATE |
| |
The snapshots allow me to perform roll backs when end-users make mistakes, e.g., delete an instructional video after a class session, etc., or what have you. To delete all snapshots and start over, run: | Make sure to manage your snapshots and only retain as many as you can etc., as they will impact performance. If you need to zap all of them and start over, you can use this command: |
| |
zfs list -H -o name -t snapshot | xargs -n1 zfs destroy | zfs list -H -o name -t snapshot | xargs -n1 zfs destroy |
| |
Of course, off-site backups are essential. To do this, I use a small script that powers down the VM, uses ''cp'' with the ''--sparse=always'' flag to preserve space, and then uses tar with pbzip2 compression to save even more space. From my research, bsdtar seems to honor sparsity better than gnutar so install that with ''sudo apt install libarchive-tools''. The ''cp'' command is not optional, moreover, for remember tar will not work directly on an ''.img'' file. Here's a small shell script with a loop for multiple VMs within the same directory. I also added a command at the end that will delete any tarballs older than 180 days. | Off-site //full// backups are essential but they take a long time to download. For that reason, it's best to have the images as small as possible. When using ''cp'' in your workflow, make sure to specify ''--sparse=always''. Before powering the virtual hard disk back up, you should run ''virt-sparsify'' on the image to free up the unused blocks on the host and that are not actually used in the VM. In order for the VM to designate those blocks as empty, ensure that you are running fstrim within the VM. If you want the ls command to show the size of the virtual disk that remains after the zeroing, you will need to run ''qemu-img create'' on it, which will create a new copy of the image without listing the ballooned size. the new purged virtual hard disk image can then be copied to a backup directory where one can compress and tarball it to further reduce its size. I use BSD tar and the pbzip2 compression which makes ridiculously small images. GNU tar glitches with the script for some reason. BSD tar can be downloaded with ''sudo apt install libarchive-tools''. I made a script to automate all of those steps for a qcow2 image. I also adapted that to work for raw images. |
| |
DATE=`date +"%Y%m%d-%H:%M:%S"` | [[https://repo.haacksnetworking.org/haacknet/haackingclub/-/blob/main/scripts/virtualmachines/vm-bu-production-QCOW-loop.sh|vm-bu-production-QCOW-loop.sh]] \\ |
IMG="vm1.img vm2.img" | [[https://repo.haacksnetworking.org/haacknet/haackingclub/-/blob/main/scripts/virtualmachines/vm-bu-production-RAW-loop.sh|vm-bu-production-RAW-loop.sh]] |
for i in $IMG; | |
do | |
virsh shutdown $i | |
wait | |
cd /mnt/vms/backups | |
cp -ar --sparse=always /mnt/vms/students/$i /mnt/vms/backups/SANE_$i.bak | |
wait | |
virsh start $i | |
bsdtar --use-compress-program=pbzip2 -Scf SANE_$i.tar.bz2 SANE_$i.bak | |
mv /mnt/vms/backups/SANE_$i.tar.bz2 /mnt/vms/backups/tarballs/$i:_SANE_$DATE:_.tar.bz2 | |
rm /mnt/vms/backups/SANE_$i.bak | |
find /mnt/vms/backups/tarballs -type f -mtime +180 -delete | |
| |
In addition to daily live images using the above, script, I also run a 1/3 days version called SANE , which runs virsh shutdown domain before copying/tarballing and then runs virsh start domain at the end of the tarballing. The host is set to keep 30 days worth of images, but you can easily adjust the flag in the last line above to your use case. After these run, pull the changes to offsite backup ``/`` computer using rsync on the offsite host as follows: | On the off-site backup machine, I originally would pull the tarballs down using a one line rsync script. I would adjust the cron timing of the rsync script to work well with when the tarballs are created. |
| |
sudo rsync -av --log-file=/home/logs/backup-of-vm.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/ | sudo rsync -av --log-file=/home/logs/backup-of-vm-tarballs.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/ |
| |
| Since then, I've switched to using rsnapshot to pull down the tarballs in some cases. The rsnapshot configurations can be found here: |
| |
| [[https://repo.haacksnetworking.org/haacknet/haackingclub/-/tree/main/scripts/rsnapshot|Rsnapshot Scripts]] |
| |
Since the workstation is on rsnapshot, I get redundant dailies on its backup that extend beyond the quantity on the physical host (because of space on my primary workstation). | |
| |
**** | **** |
-- Network Bridge Setup / VMs -- | -- Network Bridge Setup / VMs -- |
| |
Once the physical host was configured, I needed to edit its network settings and create a virtual switch that VMs could be allocated ips through. To do this, I kept it simple and used bridge-utils package and some manual editing in ''/etc/network/interfaces''. | Up until now, I've covered how to provision the machines with virt-manager, how to backup the machines on the physical host, and how to pull those backups to an off-site workstation. Now I will discuss how to assign each VM an external IP. The first step is to provision the physical host with a virtual switch (wrongly called a bridge) to which VMs can connect. To do this, I kept it simple and used ''ifup'' and ''bridge-utils'' package and some manual editing in ''/etc/network/interfaces''. |
| |
sudo apt install bridge-utils | sudo apt install bridge-utils |
sudo nano /etc/network/interfaces | sudo nano /etc/network/interfaces |
| |
Now that you have added the routing software package and created the virtual switch, you need to reconfigure your interfaces file so that your host OS knows how to negotiate a connection again. In my case, I used 2/10 ips I purchased at the data center for the physical host. | Now that you have added created the virtual switch, you need to reconfigure your physical host's ''/etc/network/interfaces'' file to use the switch. In my case, I used 1 IP for the host itself, and another for the switch, meaning that two ethernet cables are plugged into my physical host. I did this so that if I hose my virtual switch settings, I still have a separate connection to the box. Here's the configuration in ''interfaces'': |
| |
#eth0 [1st physical port] | #eth0 [1st physical port] |
nameserver 8.8.8.8 | nameserver 8.8.8.8 |
| |
Once that's done, you can restart networking.service (or optionally network-manager if you prefer). After that, see if your changes stuck by with ''ip a''. The output of ''ip a'' will now show ''br0 state UP'' in the output of interface ''enp8s0g1'' and down below, you will see the bridge interface, ''br0'', and this interface, or virtual switch, is what you connect your virtualization software to. In my case, I just specify ''br0'' in virt-manager in the network section. For smaller environments, for example, being at home and/or behind a dhcp router, then the following configuration should be sufficient: | After that, either reboot or ''systemctl restart networking.service'' to make the changes current. Execute ''ip a'' and you should see both external IPs on two separate interfaces, and you should see ''br0 state UP'' in the output of the second interface ''enp8s0g1''. You should also run some ''ping 8.8.8.8'' and ''ping google.com'' tests to confirm you can route. If anyone wants to do this in a home, small business, or other non-public facing environment, you can easily use dhcp and provision the home/small business server's ''interface'' file as follows: |
| |
auto eth1 | auto eth1 |
bridge_ports eth1 | bridge_ports eth1 |
| |
The above home-version allows, for example, users to have a virtual machine that gets an ip address on your LAN and makes ssh access far easier, for example. Okay, back to the server setup. Well, the next thing to do is to test whether or not you can send/receive packets on those interfaces. To do that, run a few ping tests: | The above home-version allows, for example, users to have a virtual machine that gets an ip address on your LAN and makes ssh/xrdp access far easier. If you have any trouble routing on the physical host, it could be that you do not have nameservers setup. If that's the case, do the following: |
| |
ping 8.8.8.8 | echo nameserver 8.8.8.8 > /etc/resolv.conf |
ping google.com | systemctl restart networking.service |
| |
At this stage, these tests failed and I was not able to route and had no functional DNS servers. Running ''cat /etc/resolv.conf'' confirmed that DNS was only localhost, so it made sense I could not route. Since I use Debian, this was an easy fix, and I simply provided my host with nameservers as follows: | Now that the virtual switch is setup, I can now provision VMs and connect them to the virtual switch ''br0'' in virt-manager. You can provision the VMs within the GUI using X passthrough, or use the command line. First, create a virtual disk to your desired size by excuting ''sudo qemu-img create -f raw new 1000G'' and then run something like this: |
| |
echo nameserver 8.8.8.8 > /etc/resolv.conf | sudo virt-install --name=new.img \ |
| --os-type=Linux \ |
| --os-variant=debian10 \ |
| --vcpu=1 \ |
| --ram=2048 \ |
| --disk path=/mnt/vms/students/new.img \ |
| --graphics spice \ |
| --location=/mnt/vms/isos/debian-11.4.0-amd64-netinst.iso \ |
| --network bridge:br0 |
| |
After this step, you can either restart ''networking.service'' or if you prefer ''network-manager.service'' and/or reboot. Since I had just done a lot, I decided to just reboot. Upon rebooting, I ran the same ping tests above, and both successfully received bytes back. Now that the physical host has two ips and can route, it was time to setup the VMs and make sure they could connect to the virtual switch, or br0. To do this, I first configured a vanilla install of debian within virt-manager. Then, using the console of virt-manager for that VM, I edited the guest OS network configuration files as follows: | The machine will open in virt-viewer, but if you lose the connection you can reconnect easily with: |
| |
sudo nano /etc/network/interfaces | virt-viewer --connect qemu:///system --wait new.img |
| |
| Once you finish installation, configure the guestOS interfaces file ''sudo nano /etc/network/interfaces'' with the IP you intend to assign it. You should have something like this: |
| |
auto epr1 | auto epr1 |
nameservers 8.8.8.8 | nameservers 8.8.8.8 |
| |
Remember, the configuration above is within the guest OS of the VM in virt-manager and not the physical host. In my example, I used ''epr1'' because that's the name of the network interface when you run ''ip a'' within the guest OS. For smaller/home set-ups using dhcp, you would change the configuration files as follows: | If you are creating VMs attached to a virtual switch on the smaller home/business environment, then adjust the guest OS by executing ''sudo nano /etc/network/interfaces'' and then something like this recipe: |
| |
sudo nano /etc/network/interfaces | auto epr1 |
<auto epr1> | iface epr1 inet dhcp |
<iface epr1 inet dhcp> | |
| |
Notes for Ubuntu VMs: On some of my VMs, I am required to use Ubuntu. Ubuntu has now deprecated ''ifupdown'' in favor of ''netplan'' and disabled manual editing of ''/etc/resolv.conf'' so unless you want to make the above interfaces in YAML in netplan, then you have to temporarily enable NAT in ''virt-manager'' and reboot the VM. Once NAT is enabled and you can route, then add ifupdown, remove netplan, and add the ''resolvconf'' package as follows: | If your guest OS uses Ubuntu, you will need to do extra steps to ensure that the guestOS can route. This is because Ubuntu-based distros have deprecated ''ifupdown'' in favor of ''netplan'' and disabled manual editing of ''/etc/resolv.conf''. So, either you want to learn netplan syntax and make interface changes using its YAML derivative, or you can install the optional ''resolvconf'' package to restore ''ifupdown'' functionality. To do this, adjust the VM provision script above (or use the virt-manager GUI with X passthrough) to temporarily use NAT then override Ubuntu defaults and restore ''ifupdown'' functionality as follows: |
| |
sudo apt install ifupdown | sudo apt install ifupdown |
sudo apt remove --purge netplan.io | sudo apt remove --purge netplan.io |
<enter the bridge network config above in /etc/network/interfaces> | |
sudo apt install resolvconf | sudo apt install resolvconf |
sudo nano /etc/resolvconf/resolv.conf.d/tail | sudo nano /etc/resolvconf/resolv.conf.d/tail |
<nameserver 8.8.8.8> | <nameserver 8.8.8.8> |
sudo reboot | systemctl restart networking.service |
| |
Make sure to restart ''networking.service'' or ''network-manager.service'' at this point and conduct some ping tests on both ''8.8.8.8'' and ''google.com''. Sometimes, I find a reboot is required. Some online tutorials report that you need additional configuring for traffic to pass properly and/or for NAT to function. However, in my experience, this is all handled by virt-manager. In summary, the point of this project was to create my own virtualized VPS infrastructure, to run my own stuff and for clients. At present, I have ported my business site over, created a teaching nextcloud for Talk with students and for resource sharing, a big blue button instance (that proves to be a major problem and source of pain), a minecraft server, some gamer sites, and some testing VPS for my kids. Here's a few to check out: | |
| |
* [[https://haacksnetworking.org|Haack's Networking]] | |
* [[https://mrhaack.org|Mr. Haack]] | |
* [[http://space.hackingclub.org|Space]] | |
| |
The last one is my 10 year old daughter's project. It's coming along nicely, and serves as a great way to teach her basic html, CSS, and JS. The next part of this write-up includes how to do the same overall virtualization of infrastructure and VPS leveraging as above, but does so with LUKS first. Read on to see what I originally did ... the reason I ultimately rejected this, moreover, was because you can't use zfs tools when hard drives fail if you do it that way. ;< | |
| |
-- LUKS FIRST, ZFS SECOND - (LEGACY SETUP, NOT CURRENT) -- | |
| |
My initial idea was to do LUKS first, then zfs, meaning 6 could be mirrors in zfs and I would keep 1 as a spare LUKS crypt for keys, other crap, etc. To create the LUKS crypts, I did the following 6 times, each time appending the last 4 digits of the block ID to the LUKS crypt name: | |
| |
cryptsetup luksFormat /dev/sda | |
cryptsetup luksOpen /dev/sda sdafc11 | |
| |
You then make sure to use the LUKS label names when making the zpool, not the short names, which can change at times during reboots. I did this as follows: | |
| |
sudo apt install zfs-utils bridge-utils | |
zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066 | |
| |
ZFS by default executes its mount commands at boot. This is a problem if you don't use auto-unlocking and key files with LUKS to also unlock on boot (and/or a custom script that unlocks). The problem, in this use cases, is ZFS will try to mount the volumes before they are unlocked. The two other options are none/legacy modes, both of which rely on you mounting the volume using traditional methods. But, the whole point of using zfs finally was to not use traditional methods lol, so for that reason I investigated if there was a fix. The closest to a fix is setting cachefile=none boot, but this a) hosed the pool once b) requires resetting, rebooting again and/or manually re-mounting the pool - either of which defeat the point. Using key files, cache file adjustments, etc., and/or none/legacy were all no-gos for me, so in the end, I decided to tolerate that zfs would fail at boot, and that I would ''zpool import'' it afterwards. | |
| |
sudo -i | |
screen | |
su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault] | |
ctrl-a-d [detaches from screen] | |
| |
After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this mount-luks.sh: | |
| |
cryptsetup luksOpen /dev/disk/by-uuid/2702e690-…-0c4267a6fc11 sdafc11 | |
cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-…-cdc5dedb9322 sdb9322 | |
cryptsetup luksOpen /dev/disk/by-uuid/d353e727-…-e4d66a9b8a33 sdc8a33 | |
cryptsetup luksOpen /dev/disk/by-uuid/352660ca-…-5a8beae15b44 sde5b44 | |
cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-…-f46ce1cf8055 sdf8055 | |
cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-…-13bc38656466 sdh6466 | |
| |
This script simply opens each LUKS crypt so long as you enter or copy/paste your HD password 6 times. After that, one has to re-mount the pool / rebuild the quasi RAID1 mirror/logical volumes with the import command as follows once the volumes are opened: | |
| |
zpool import pool | You should once again execute ''ping 8.8.8.8'' and ''ping google.com'' to confirm you can route within the guest OS. If it fails, reboot and try again. Its a good idea at this point to check ''netstat -tulpn'' on both the host and in any VMs to ensure only approved services are listening. When I first began spinning up machines, I would make template machines and then use ''virt-clone'' to make new machines which I would then tweak for the new use case. You always get ssh hash errors this way and it is just kind of cumbersome and not clean. Over time, I found out about how to pass preseed.cfg files to Debian through virt-install, and so now I simply spin up new images with desired parameters and the preseed.cfg files passes nameservers, network configuration details, and ssh keys into the newly created machine. Although related, that topic stands on its own, so I wrote up the steps I took over at [[computing:preseed]]. This completes the tutorial on setting up a virtualization stack with virsh and qemu/kvm. |
| |
Rebooting in this manner takes about 3-5 minutes for the host, and 2 minutes to screen into my user name, detach, and run the mount LUKS script to mount the pools/datasets, etc. Again, I ultimately rejected this because you cannot use zfs tools when hard drives fail with this setup. | |
| |
--- //[[jonathan@haacksnetworking.org|oemb1905]] 2022/07/26 19:31// | --- //[[webmaster@haacksnetworking.org|oemb1905]] 2024/02/17 20:46// |