This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
computing:managingbots [2025/04/06 19:31] – oemb1905 | computing:managingbots [2025/04/12 01:25] (current) – oemb1905 | ||
---|---|---|---|
Line 11: | Line 11: | ||
------------------------------------------- | ------------------------------------------- | ||
- | This tutorial is designed for Debian OS and LAMP stack users that want to track and or prohibit bot scraping (or other url requests) that might harm server performance and/or cause it to fail. In my case, I have a multi-site WordPress that includes my tech blog, poetry, and teaching blog. Additionally, | + | This tutorial is designed for Debian OS and LAMP stack users that want to track and or prohibit bot scraping (or other url requests) that might harm server performance and/or cause it to fail. In my case, I have a multi-site WordPress that includes my tech blog, poetry, and teaching blog. Additionally, |
* [[https:// | * [[https:// | ||
Line 79: | Line 79: | ||
ignoreregex = | ignoreregex = | ||
| | ||
- | It is important to note that this definition stops any url requests that exceed 20K/min, not just AI-bots, hence the title AI-bots and a “+” sign. Next, I restarted apache, fpm, and fail2ban, and because I’m paranoid, I also restarted and then checked each service’s status to ensure that nothing was mis-configured. All was in order, so it was now time to DDOS my machine from another server of mine. To do this, I used '' | + | It is important to note that this definition stops any url requests that exceed 20K/min, not just AI-bots, hence the title AI-bots and a “+” sign. Also, if you prefer a jail that only bans the bots, and not all heinously large url requests, then adjust your jail to something like this instead: |
+ | |||
+ | [Definition] | ||
+ | failregex = ^ - - [.] "GET [^" | ||
+ | ignoreregex = | ||
+ | ignorecase = true | ||
+ | |||
+ | Next, I restarted apache, fpm, and fail2ban, and because I’m paranoid, I also restarted and then checked each service’s status to ensure that nothing was mis-configured. All was in order, so it was now time to DDOS my machine from another server of mine. To do this, I used '' | ||
ab -n 30000 -c 1000 http:// | ab -n 30000 -c 1000 http:// | ||
| | ||
- | It was overkill to do 30K, but I needed to make sure I did enough that it would trigger the rule. It worked properly and banned the IP accordingly for 10 minutes. In short, if the bots in weeks ahead get too frisky, I can time them out for any rate (url requests / minute) I so choose and then time them out (in minutes) for whatever I feel is appropriate. Since my server and virtual appliance can handle it, I’ve set my rule to 20K requests in one minute as the ceiling I tolerate. Your use case, hardware, personal tolerance of bot behavior and associated biased towards bot behavior might differ. Maybe your hardware is less robust or more robust, maybe you think the bots deserve a longer or shorter time out, or maybe you want to ban any IP that does this indefinitely. For me, I want bots to scrape my wiki, tech blog, poetry, or whatever else they want and I don’t think it’s really fair or honest of me to request that they change their behavior for my public site. If I held that sentiment, I would not have made these resources public. But, I also don’t want my server to go down because they scrape or request too much, amounting to a de facto DDOS attack. So, this is what I cooked up to ensure I am prepared in weeks ahead and I also hope it helps others who use Debian + LAMP stacks protect their appliances using common tools available to Debian users | + | It was overkill to do 30K, but I needed to make sure I did enough that it would trigger the rule. It worked properly and banned the IP accordingly for 10 minutes. In short, if the bots in weeks ahead get too frisky, I can time them out for any rate (url requests / minute) I so choose and then time them out (in minutes) for whatever I feel is appropriate. Since my server and virtual appliance can handle it, I’ve set my rule to 20K requests in one minute as the ceiling I tolerate. Your use case, hardware, personal tolerance of bot behavior and associated biased towards bot behavior might differ. Maybe your hardware is less robust or more robust, maybe you think the bots deserve a longer or shorter time out, or maybe you want to ban any IP that does this indefinitely. For me, I want bots to scrape my wiki, tech blog, poetry, or whatever else they want and I don’t think it’s really fair or honest of me to request that they change their behavior for my public site. If I held that sentiment, I would not have made these resources public. But, I also don’t want my server to go down because they scrape or request too much, amounting to a de facto DDOS attack. So, this is what I cooked up to ensure I am prepared in weeks ahead and I also hope it helps others who use Debian + LAMP stacks protect their appliances using common tools available to Debian users. |
- | + | ||
- | --- // | + | --- // |