This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| computing:btrfsreminders [2026/02/07 20:19] – oemb1905 | computing:btrfsreminders [2026/02/08 16:06] (current) – oemb1905 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | Introduction | + | ------------------------------------------- |
| + | * **btrfsreminders** | ||
| + | * **Jonathan Haack** | ||
| + | * **Haack' | ||
| + | * **support@haacksnetworking.org** | ||
| + | |||
| + | ------------------------------------------- | ||
| + | |||
| + | // | ||
| + | |||
| + | ------------------------------------------- | ||
| + | |||
| + | === Introduction | ||
| This tutorial is for Debian users that want to create a JBOD pool using BTRFS subvolumes and its RAID10 equivalent. These types of setups are common and helpful for virtualization environments and hosting multiple services, either for serious home hobbyist use and/or small business level production. These approaches are not designed for enterprise or large-scale production. | This tutorial is for Debian users that want to create a JBOD pool using BTRFS subvolumes and its RAID10 equivalent. These types of setups are common and helpful for virtualization environments and hosting multiple services, either for serious home hobbyist use and/or small business level production. These approaches are not designed for enterprise or large-scale production. | ||
| - | Overview of setups | + | === Overview of Design Model === |
| - | Encrypting the home partition is essential because it ensures that the pool key is never directly exposed; its behind LUKS on the boot volume and the sysadmin keeps this credential stored in KeePassXC offsite. Thus, the physical layer is protected by LUKS with integrity. As for Pam's mounting utilities, I use [[https:// | + | Encrypting the home partition is essential because it ensures that the pool key is never directly exposed; its behind LUKS on the boot volume and the sysadmin keeps this credential stored in KeePassXC offsite. Thus, the physical layer is protected by LUKS with integrity. As for Pam's mounting utilities, I use [[https:// |
| + | |||
| + | === Installation Instructions === | ||
| + | Let's install btrfs, LUKS, and identify your hard drives: | ||
| sudo apt-get install cryptsetup libpam-mount btrfs* | sudo apt-get install cryptsetup libpam-mount btrfs* | ||
| Line 40: | Line 55: | ||
| cryptsetup luksOpen / | cryptsetup luksOpen / | ||
| + | Now that we have mounted the crypts at ''/ | ||
| + | |||
| + | mkdir -p /mnt/vm | ||
| + | mkdir -p /mnt/wh | ||
| + | mkfs.btrfs -f -d raid10 -m raid1 --checksum=xxhash --nodesize=32k / | ||
| + | mkfs.btrfs -f -d raid10 -m raid1 --checksum=xxhash --nodesize=32k / | ||
| + | mount -o compress-force=zstd: | ||
| + | mount -o compress=zstd: | ||
| + | btrfs filesystem show /mnt/vm | ||
| + | btrfs filesystem show /mnt/wh | ||
| + | df -h #verify all looks right! | ||
| + | | ||
| + | After the first reboot I set persistent compression. I did this because I was getting errors trying to do it on initial pool build. Here's what I do for compression: | ||
| + | |||
| + | btrfs property set /mnt/vm compression zstd:3 | ||
| + | btrfs property set /mnt/wh compression zstd:3 | ||
| + | | ||
| + | === Maintenance and Monitoring === | ||
| + | Once that's done and you've rebooted a few times and tested things a few times, you can safely make a mount script for remote rebooting. This way, you reboot and then log in to your user and detach, run a simple script to unlock and mount the BTRFS subvolumes ... and you are done! Create '' | ||
| + | |||
| + | #!/bin/bash | ||
| + | #open SSD crypts | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | #open PLATTER crypts | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | cryptsetup luksOpen / | ||
| + | #mount the btrfs r10 pool for vm | ||
| + | mount -o compress-force=zstd: | ||
| + | #mount the btrfs r10 pool for wh | ||
| + | mount -o compress=zstd: | ||
| + | | ||
| + | This script is designed to be run manually post reboot. In order, you reboot, log in to the admin user via ssh, unlock the crypt key directory with '' | ||
| + | |||
| + | / | ||
| + | / | ||
| + | | ||
| + | To check the status, you use: | ||
| + | |||
| + | / | ||
| + | / | ||
| + | | ||
| + | In addition to scrubbing, I compiled a slew of commands to assess pool health more granularly. I put this script on a cronjob which runs and sends me a statistics report every hour: | ||
| + | |||
| + | <code bash> | ||
| + | #!/bin/bash | ||
| + | DATE=`date +" | ||
| + | LOG="/ | ||
| + | |||
| + | echo "Here are the RAM usage stats ..." >> $LOG | ||
| + | free -h | ||
| + | |||
| + | echo "Here are the btrfs stats for the vm pool ..." >> $LOG | ||
| + | btrfs filesystem show /mnt/vm | ||
| + | btrfs filesystem df /mnt/vm | ||
| + | btrfs filesystem usage /mnt/vm | ||
| + | btrfs device usage /mnt/vm | ||
| + | btrfs scrub status /mnt/vm | ||
| + | btrfs device stats /mnt/vm | ||
| + | btrfs device stats /mnt/vm -c | ||
| + | mount | grep /mnt/vm | ||
| + | dmesg | grep -i btrfs | tail -n 40 | ||
| + | dmesg | grep -E ' | ||
| + | btrfs fi show /mnt/vm | grep -i missing | ||
| + | btrfs fi df -h /mnt/vm | ||
| + | btrfs fi usage -T /mnt/vm | ||
| + | btrfs qgroup show /mnt/vm 2>/ | ||
| + | btrfs subvolume list -a /mnt/vm | ||
| + | btrfs balance status /mnt/vm | ||
| + | |||
| + | echo "Here are the btrfs stats for the wh pool ..." >> $LOG | ||
| + | btrfs filesystem show /mnt/wh | ||
| + | btrfs filesystem df /mnt/wh | ||
| + | btrfs filesystem usage /mnt/wh | ||
| + | btrfs device usage /mnt/wh | ||
| + | btrfs scrub status /mnt/wh | ||
| + | btrfs device stats /mnt/wh | ||
| + | btrfs device stats /mnt/wh -c | ||
| + | mount | grep /mnt/wh | ||
| + | dmesg | grep -i btrfs | tail -n 40 | ||
| + | dmesg | grep -E ' | ||
| + | btrfs fi show /mnt/wh | grep -i missing | ||
| + | btrfs fi df -h /mnt/wh | ||
| + | btrfs fi usage -T /mnt/wh | ||
| + | btrfs qgroup show /mnt/wh 2>/ | ||
| + | btrfs subvolume list -a /mnt/wh | ||
| + | btrfs balance status /mnt/wh | ||
| + | |||
| + | for disk in \ | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | temp=$(sudo smartctl -a " | ||
| + | echo " | ||
| + | done | ||
| + | |||
| + | for disk in \ | ||
| + | / | ||
| + | / | ||
| + | temp=$(sudo smartctl -a " | ||
| + | echo " | ||
| + | done | ||
| + | </ | ||
| + | |||
| + | If you use the script above, you will also need to '' | ||
| + | |||
| + | <code bash> | ||
| + | root@net:~# free -h | ||
| + | | ||
| + | Mem: | ||
| + | Swap: | ||
| + | root@net:~# / | ||
| + | UUID: | ||
| + | Scrub started: | ||
| + | Status: | ||
| + | Duration: | ||
| + | Total to scrub: | ||
| + | Rate: | ||
| + | Error summary: | ||
| + | </ | ||
| + | |||
| + | To test or compare your new pool's speed to your prior setup and/or just to obtain some benchmarks, I recommend using '' | ||
| + | |||
| + | sudo apt install fio | ||
| + | sudo fio --name=seqread --rw=read --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --size=4g --numjobs=8 --runtime=60 --group_reporting --filename=/ | ||
| + | sudo fio --name=seqwrite --rw=write --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --size=4g --numjobs=8 --runtime=60 --group_reporting --filename=/ | ||
| + | sudo fio --name=seqread --rw=read --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --size=4g --numjobs=8 --runtime=60 --group_reporting --filename=/ | ||
| + | sudo fio --name=seqwrite --rw=write --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --size=4g --numjobs=8 --runtime=60 --group_reporting --filename=/ | ||
| + | |||
| + | With zfs on my production server, I found I was still getting the read speed of one hard drive, despite the presumed parallelization benefits from having 8 enterprise SAS SSDs in a R10 pool?! Since I migrated to BTRFS, the speeds are near hardware level caps. Here's the read test: | ||
| + | |||
| + | <code bash> | ||
| + | seqread: (g=0): rw=read, bs=(R) 128KiB-128KiB, | ||
| + | ... | ||
| + | fio-3.39 | ||
| + | Starting 8 processes | ||
| + | seqread: Laying out IO file (1 file / 4096MiB) | ||
| + | Jobs: 8 (f=8): [R(8)][100.0%][r=5797MiB/ | ||
| + | seqread: (groupid=0, jobs=8): err= 0: pid=2279596: | ||
| + | read: IOPS=42.1k, BW=5264MiB/ | ||
| + | slat (usec): min=11, max=28981, avg=106.92, stdev=402.06 | ||
| + | clat (usec): min=43, max=53886, avg=5831.38, | ||
| + | lat (usec): min=183, max=53910, avg=5938.30, | ||
| + | clat percentiles (usec): | ||
| + | | ||
| + | | 30.00th=[ 3064], 40.00th=[ 4113], 50.00th=[ 5080], 60.00th=[ 6063], | ||
| + | | 70.00th=[ 7242], 80.00th=[ 8717], 90.00th=[11469], | ||
| + | | 99.00th=[21365], | ||
| + | | 99.99th=[46924] | ||
| + | bw ( MiB/s): min= 4508, max= 6109, per=100.00%, | ||
| + | | ||
| + | lat (usec) | ||
| + | lat (msec) | ||
| + | lat (msec) | ||
| + | cpu : usr=2.22%, sys=28.68%, ctx=252056, majf=0, minf=8262 | ||
| + | IO depths | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | Run status group 0 (all jobs): | ||
| + | READ: bw=5264MiB/ | ||
| + | </ | ||
| + | |||
| + | Here's the write test: | ||
| + | <code bash> | ||
| + | seqwrite: (g=0): rw=write, bs=(R) 128KiB-128KiB, | ||
| + | ... | ||
| + | fio-3.39 | ||
| + | Starting 8 processes | ||
| + | seqwrite: Laying out IO file (1 file / 4096MiB) | ||
| + | Jobs: 6 (f=6): [W(6), | ||
| + | seqwrite: (groupid=0, jobs=8): err= 0: pid=2279720: | ||
| + | write: IOPS=12.2k, BW=1529MiB/ | ||
| + | slat (usec): min=38, max=33255, avg=595.61, stdev=1120.04 | ||
| + | clat (usec): min=176, max=96135, avg=18562.40, | ||
| + | lat (usec): min=264, max=96296, avg=19158.01, | ||
| + | clat percentiles (usec): | ||
| + | | ||
| + | | 30.00th=[14222], | ||
| + | | 70.00th=[19792], | ||
| + | | 99.00th=[53216], | ||
| + | | 99.99th=[79168] | ||
| + | bw ( MiB/s): min= 1074, max= 2563, per=100.00%, | ||
| + | | ||
| + | lat (usec) | ||
| + | lat (msec) | ||
| + | lat (msec) | ||
| + | cpu : usr=2.07%, sys=64.47%, ctx=142306, majf=0, minf=20562 | ||
| + | IO depths | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | Run status group 0 (all jobs): | ||
| + | WRITE: bw=1529MiB/ | ||
| + | </ | ||
| + | |||
| + | In lay terms, these reports confirm that read speed is 5,520 MB/s, or 5.5 GB/s, and write speed is 1,603 MB/s, or 1.6 GB/s. This is a 4x improvement for reads and 2x improvement for writes compared to zfs. For whatever reason, zfs was not benefitting from the parallelization. It's possible that I could get zfs to perform better with tinkering, but why? Every major upgrade I have to re-compile it with dkms against the new kernel headers, which takes forever. Additionally, | ||
| + | --- // | ||