User Tools

Site Tools


computing:vmserver

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
computing:vmserver [2023/05/21 21:47] oemb1905computing:vmserver [2024/02/17 21:11] (current) oemb1905
Line 3: Line 3:
   * **Jonathan Haack**   * **Jonathan Haack**
   * **Haack's Networking**   * **Haack's Networking**
-  * **netcmnd@jonathanhaack.com** +  * **webmaster@haacksnetworking.org** 
  
 ------------------------------------------- -------------------------------------------
  
-I was given a dual 8-core Xeon SuperMicro server (32 threads), with 8 HD bays in use, 96GBRAM, 8x 6TB Western Digital in Raid1 zfs mirror (24TB actual), with a 120GBSSD boot volume stuck behind the power front panel running non-GUI Debian. (Thanks to Kilo Sierra for the donation.) My first job was to calculate whether my PSU was up to the task I intended for it. I used a 500W PSU. From my calculations, I determined that the RAM would be around 360W at capacity but rarely hit that or even close, that the HDs would often (especially on boot) hit up to 21.3W per drive, or around 150W total, and that excluded the boot SSD volume. The motherboard would be 100W, putting me at 610W. Since I did not expect the RAM, HDs, and other physical components to concurrently hit peak consumption, I considered it safe to proceed, and figured no more than around 75% of that ceiling would be used at any one time. The next step was to install the physical host OS (Debian) and setup the basics of the system (hostname, DNS, etc., basic package installs). On the 120GB SSD boot volume, I used a luks pam_mount encrypted home directory, where I could store keys for the zfs pool and/or other sensitive data. I used a nifty trick in order to first create the pools simply with short names, and then magically change them to block ids without having to make the pool creation syntax cumbersome.+//vmserver//      
  
-**Update**: I am now running a newer server with 48 threads, 12 hard drive bays, 384GB RAM, two-way mirrors of Samsung enterprise SSDs for the primary vm zpool, and two-way mirrors of 16TB platters for the backup zpool and for some mailserversThese are also SAS hard drives nownot SATA. The server can handle 1.5TB of RAM.+------------------------------------------- 
 + 
 +This tutorial covers how to set up a production server that's intended to be used as a virtualization stack for a small business or educator. I am currently running a Supermicro 6028U-TRTP+ w/ Dual 12-core Xeon E5-2650 at 2.2Ghz, 384GB RAM, with four two-way mirrors of Samsung enterprise SSDs for the primary vdev, and two two-way mirrors of 16TB platters for the backup vdevAll drives using SAS. I am using a 500W PSU. I determine the RAM would be about 5-10W a stickthe mobo about 100W, and the drives would consume most of the rest at roughly 18-22W per drive. The next step was to install Debian on the bare metal to control and manage the virtualization environmentThe virtualization stack is virsh and kvm/qemu. As for the file system and drive formatting, I used luks and pam_mount to open an encrypted home partition and mapped home directory. I use this encrypted home directory to store keys for the zfs pool and/or other sensitive data, thus protecting them behind FDE. Additionally, I create file-level encrypted zfs data sets within each of the vdevs that are unlocked by the keys on the LUKS home partitionInstead of tracking each UUID down on your initial build, do the following:
  
   zpool create -m /mnt/pool pool -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh   zpool create -m /mnt/pool pool -f mirror sda sdb mirror sdc sdh mirror sde sdf mirror sdg sdh
Line 15: Line 17:
   zpool import -d /dev/disk/by-id pool   zpool import -d /dev/disk/by-id pool
  
-Now that pool was created, I created two encrypted datasets, which is zfs name for encrypted file storage inside the pool. The datasets each unlock by pulling a dd-generated key from the encrypted (and separate) home partition on the SSD boot volume. set up the keys/datasets as follows:+Once the pool is created, you can create your encrypted datasets. To do soI made some unlock keys with the dd command and placed the keys in a hidden directory inside that LUKS encrypted home partition I mentioned above:
  
   dd if=/dev/random of=/secure/area/example.key bs=1 count=32   dd if=/dev/random of=/secure/area/example.key bs=1 count=32
   zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset   zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///mnt/vault/example.key pool/dataset
  
-When you create this on the current running instanceit will also mount it for you as a courtesy, but upon reboot, you need to load the keythen mount the dataset using zfs commandsIn my caseI created three datasets (one for raw isosone for disk images, and a last one for backup sparse tarballs). Each one was created as follows:+When the system rebootsthe vdevs will automatically mount but the data sets won't because the LUKS keys won't be available until you mount the home partition by logging in to the user that holds the keys. For security reasonsthis must be done manually or it defeats the entire purposeSoonce the administrator has logged in to the user in a screen session (rememberit is using pam_mount)they simple detach from that session and then load the keys and datasets as follows:
  
   zfs load-key pool/dataset   zfs load-key pool/dataset
   zfs mount pool/dataset   zfs mount pool/dataset
      
-Once I created all the datasetsI made a script that would load the keys and unlock all of themthen rebooted and tested it for functionalityUpon verifying that the datasets workedI could now feel comfortable creating VMs again, since the hard drive images for those VMs would be stored in encrypted datasets with zfs. My next task was to create both snapshots within zfswhich would handle routine rollbacks and smaller errors/mistakes. did that by creating a small script that runs via cron 4 times a day, or every 6 hours:+If you have a lot of data setsyou can make simple script to load them all at onceetcSince we have zfsit's a good idea to run some snapshots. To do that, I created a small shell script with the following commands and then set it to run 4 times a day, or every 6 hours:
  
   DATE=date +"%Y%m%d-%H:%M:%S"   DATE=date +"%Y%m%d-%H:%M:%S"
Line 33: Line 35:
   /usr/sbin/zfs snapshot pool@backup_$DATE   /usr/sbin/zfs snapshot pool@backup_$DATE
  
-The snapshots allow me to perform roll backs when end-users make mistakes, e.g., delete an instructional video after a class session, etc., or what have you.  To delete all snapshots and start over, run:+Make sure to manage your snapshots and only retain as many as you can etc., as they will impact performanceIf you need to zap all of them and start over, you can use this command:
  
   zfs list -H -o name -t snapshot | xargs -n1 zfs destroy   zfs list -H -o name -t snapshot | xargs -n1 zfs destroy
  
-Of course, off-site backups are essential. To do this, I use a small script that powers down the VMuses ''cp'' with the ''--sparse=always'' flag to preserve spaceand then uses tar with pbzip2 ''sudo apt install pbzip2'' compression to save even more spaceFrom my researchbsdtar seems to honor sparsity better than gnutar so install that with ''sudo apt install libarchive-tools''. The ''cp'' command is not optional, moreover, for remember tar will not work directly on an ''.img'' fileHere'small shell script with loop for multiple VMs within the same directory. I also added a command at the end that will delete any tarballs older than 180 days.+Off-site //full// backups are essential but they take a long time to downloadFor that reasonit's best to have the images as small as possible. When using ''cp'' in your workflow, make sure to specify ''--sparse=always''. Before powering the virtual hard disk back upyou should run ''virt-sparsify'' on the image to free up the unused blocks on the host and that are not actually used in the VMIn order for the VM to designate those blocks as emptyensure that you are running fstrim within the VM. If you want the ls command to show the size of the virtual disk that remains after the zeroing, you will need to run ''qemu-img create'' on itwhich will create a new copy of the image without listing the ballooned size. the new purged virtual hard disk image can then be copied to a backup directory where one can compress and tarball it to further reduce its size. I use BSD tar and the pbzip2 compression which makes ridiculously small images. GNU tar glitches with the script for some reason. BSD tar can be downloaded with ''sudo apt install libarchive-tools''I made a script to automate all of those steps for qcow2 image. I also adapted that to work for raw images.
  
-  DATE=`date +"%Y%m%d-%H:%M:%S"+[[https://repo.haacksnetworking.org/haacknet/haackingclub/-/blob/main/scripts/virtualmachines/vm-bu-production-QCOW-loop.sh|vm-bu-production-QCOW-loop.sh]] \\ 
-  IMG="vm1.img  vm2.img" +[[https://repo.haacksnetworking.org/haacknet/haackingclub/-/blob/main/scripts/virtualmachines/vm-bu-production-RAW-loop.sh|vm-bu-production-RAW-loop.sh]]
-  for i in $IMG; +
-  do +
-  virsh shutdown $i +
-  wait +
-  cd /mnt/vms/backups +
-  cp -ar --sparse=always /mnt/vms/students/$i /mnt/vms/backups/SANE_$i.bak +
-  wait +
-  virsh start $i +
-  bsdtar --use-compress-program=pbzip2 -Scf SANE_$i.tar.bz2 SANE_$i.bak +
-  mv /mnt/vms/backups/SANE_$i.tar.bz2 /mnt/vms/backups/tarballs/$i:_SANE_$DATE:_.tar.bz2 +
-  rm /mnt/vms/backups/SANE_$i.bak +
-  find /mnt/vms/backups/tarballs -type f -mtime +180 -delete+
  
-The script above can be downloaded here [[https://repo.haacksnetworking.org/oemb1905/haackingclub/-/blob/master/scripts/sane-vm-backup.sh|sane-vm-backup.sh]]. I use multiple copies of the loop script for related groups of VMs on the same physical host, and then stagger when they run with cron to limit simultaneous read/write time as follows:+On the off-site backup machine, I originally would pull the tarballs down using a one line rsync script. I would adjust the cron timing of the rsync script to work well with when the tarballs are created. 
  
-  #backup student machines, client machines  +  sudo rsync -av --log-file=/home/logs/backup-of-vm-tarballs.log --ignore-existing -e 'ssh -/home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/media/user/Backups/ 
-  00 03 1,15 * * /usr/local/bin/sane-vm-backup-students.sh >> /root/sane-vm-backup-students.log +   
-  00 03 2,16 * * /usr/local/bin/sane-vm-backup-clients.sh >> /root/sane-vm-backup-clients.log+Since then, I've switched to using rsnapshot to pull down the tarballs in some casesThe rsnapshot configurations can be found here:
  
-On the off-site backup machine, I pull the tarballs down using a one line rsync scriptI adjust the cron timing of the rsync script to work well with when the tarballs are created.+[[https://repo.haacksnetworking.org/haacknet/haackingclub/-/tree/main/scripts/rsnapshot|Rsnapshot Scripts]]
  
-  sudo rsync -av --log-file=/home/logs/backup-of-vm-tarballs.log --ignore-existing -e 'ssh -i /home/user/.ssh/id_rsa' root@domain.com:/backups/tarballs/ /media/user/Backups/ 
- 
-The off-site backup workstation uses rsnapshot, which provides me with months of restore points and thus provides version control for if/when errors are not caught immediately.  
  
 **** ****
Line 151: Line 138:
   systemctl restart networking.service   systemctl restart networking.service
  
-You should once again execute ''ping 8.8.8.8'' and ''ping google.com'' to confirm you can route within the guest OS. SometimesI find a reboot is requiredAt this stage, you now have physical host configured with a virtual switch, and one VM provisioned to use the switch with its own external IP. Both the physical host and guest OS in this scenario are public facing so take precautions to properly secure each by checking services ''netstat -tulpn'' and/or utilizing a firewall. The main things to configure at this point are ssh access so you no longer need to rely on the virt-viewer console which is slowTo do thatyou will need to add packages (if you use the netinst.iso). To make that easy, I keep the sources.list on my primary business server:  +You should once again execute ''ping 8.8.8.8'' and ''ping google.com'' to confirm you can route within the guest OS. If it fails, reboot and try againIts good idea at this point to check ''netstat -tulpn'' on both the host and in any VMs to ensure only approved services are listeningWhen I first began spinning up machinesI would make template machines and then use ''virt-clone'' to make new machines which I would then tweak for the new use caseYou always get ssh hash errors this way and it is just kind of cumbersome and not clean. Over timeI found out about how to pass preseed.cfg files to Debian through virt-installand so now simply spin up new images with desired parameters and the preseed.cfg files passes nameserversnetwork configuration detailsand ssh keys into the newly created machineAlthough related, that topic stands on its own, so I wrote up the steps took over at [[computing:preseed]]One other thing that people might want do is enable some type of GUI-based monitoring tool for the physical host like munincactismokepingetc., in order to monitor snmp or other characteristics of the VMsIf so, make sure you only run those web administration panels locally and/or block 443/80 in a firewall. You will want to put the physical host behind a vpnlike I've documented in [[computing:vpnserver-debian]] and then just access it by its internal IP. This completes the tutorial on setting up a virtualization stack with virsh and qemu/kvm
- +
-  wget https://haacksnetworking.org/sources.list +
- +
-Once you grab the ''sources.list'' file, install ''openssh-server'' and exchange keys, you can now use a shell to ssh into the guestOS henceforwardThis means that at this point you are now in a position to create VMs and various production environments at will or start working on the one you just created. Another thing to consider is to create base VMs that have ''interfaces'' and ''ssh'' access all ready to go, and then leverage those to make new instances using ''cp''. Alternately, you can power down a base VM and then clone it as follows: +
- +
-  virt-clone \ +
-  --original=clean +
-  --name=sequoia \ +
-  --file=/mnt/vms/students/sequoia.img +
- +
-The purpose of this project was to create my own virtualized VPS infrastructure (using KVM and VMs), to run my own production environments and for clients, students, and familyHere's a few to check out: +
- +
-  * [[https://nextcloud.haacksnetworking.org|Haack's Networking Nextcloud Talk Instance]] +
-  * [[https://mrhaack.org|GNU/Linux Social - Mastodon Instance]] +
-  * [[http://space.hackingclub.org|My Daughter's Space Website]] +
-  * [[http://bianca.hackingclub.org|A Student's Pentesting Website]] +
- +
-That's all folks! Well ... except for one more thing. When I first did all of this, I was convinced that zfs should be within LUKS as it was difficult for me to let go of LUKS / full disk encryptionI've now decided that's insane because of one primary reason. Namelyby putting zfs (or any file system) within LUKSyou lose the hot swapability that you have when zfs (or regular RAID) run directly on the hardwareThat would mean that replacing a hard drive would require an entire server rebuildwhich is insane. However, it is arguably more secure that way, so if budget and time permits, I've retained how put zfs inside LUKS in the passage that follows. Proceed at your own risk lol. +
- +
--- LUKS FIRST, ZFS SECOND - (LEGACY SETUP, NOT CURRENT) -- +
- +
-My initial idea was to do LUKS first, then zfs, meaning 6 could be mirrors in zfs and I would keep 1 as a spare LUKS crypt for keys, other crap, etc. To create the LUKS cryptsI did the following 6 timeseach time appending the last 4 digits of the block ID to the LUKS crypt name: +
- +
-  cryptsetup luksFormat /dev/sda +
-  cryptsetup luksOpen /dev/sda sdafc11 +
- +
-You then make sure to use the LUKS label names when making the zpoolnot the short names, which can change at times during reboots. I did this as follows: +
- +
-  sudo apt install zfs-utils bridge-utils +
-  zpool create -m /mnt/vms vms -f mirror sdafc11 sdb9322 mirror sdc8a33 sdh6444 mirror sde5b55 sdf8066 +
-   +
-ZFS by default executes its mount commands at boot. This is a problem if you don't use auto-unlocking and key files with LUKS to also unlock on boot (and/or a custom script that unlocks)The problem, in this use cases, is ZFS will try to mount the volumes before they are unlocked. The two other options are none/legacy modes, both of which rely on you mounting the volume using traditional methodsBut, the whole point of using zfs finally was to not use traditional methods lol, so for that reason I investigated if there was a fix. The closest to a fix is setting cachefile=none bootbut this a) hosed the pool once b) requires resetting, rebooting again and/or manually re-mounting the pool - either of which defeat the point. Using key files, cache file adjustments, etc., and/or none/legacy were all no-gos for me, so in the end, I decided to tolerate that zfs would fail at boot, and that I would ''zpool import'' it afterwards. +
-   +
-  sudo -i +
-  screen +
-  su - user [pam_mount unlocks /home for physical host primary user and the spare 1TB vault] +
-  ctrl-a-d [detaches from screen] +
- +
-After unlocking my home directory and the spare 1TB vault, the next step is to unlock each LUKS volume, which I decided a simple shell script would suffice which looks like this mount-luks.sh: +
- +
-  cryptsetup luksOpen /dev/disk/by-uuid/2702e690-…-0c4267a6fc11 sdafc11 +
-  cryptsetup luksOpen /dev/disk/by-uuid/e3b568ad-…-cdc5dedb9322 sdb9322 +
-  cryptsetup luksOpen /dev/disk/by-uuid/d353e727-…-e4d66a9b8a33 sdc8a33 +
-  cryptsetup luksOpen /dev/disk/by-uuid/352660ca-…-5a8beae15b44 sde5b44 +
-  cryptsetup luksOpen /dev/disk/by-uuid/fa1a6109-…-f46ce1cf8055 sdf8055 +
-  cryptsetup luksOpen /dev/disk/by-uuid/86da0b9f-…-13bc38656466 sdh6466 +
- +
-This script simply opens each LUKS crypt so long as you enter or copy/paste your HD password 6 times. After that, one has to re-mount the pool / rebuild the quasi RAID1 mirror/logical volumes with the import command as follows once the volumes are opened: +
- +
-  zpool import pool +
-  +
-Rebooting in this manner takes about 3-5 minutes for the host, and 2 minutes to screen into my user name, detach, and run the mount LUKS script to mount the pools/datasets, etc. Again, I ultimately rejected this because you cannot use zfs tools when hard drives fail with this setup.+
  
- --- //[[jonathan@haacksnetworking.org|oemb1905]] 2022/11/12 12:39//+ --- //[[webmaster@haacksnetworking.org|oemb1905]] 2024/02/17 20:46//
computing/vmserver.1684705629.txt.gz · Last modified: 2023/05/21 21:47 by oemb1905