this post was submitted on 26 Nov 2025
128 points (97.1% liked)

Selfhosted

53155 readers
487 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Got a warning for my blog going over 100GB in bandwidth this month... which sounded incredibly unusual. My blog is text and a couple images and I haven't posted anything to it in ages... like how would that even be possible?

Turns out it's possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for 'Unknown robot'? This is actually bonkers.

Edit: As Thunraz points out below, there's a footnote that reads "Numbers after + are successful hits on 'robots.txt' files" and not scientific notation.

Edit 2: After doing more digging, the culprit is a post where I shared a few wallpapers for download. The bots have been downloading these wallpapers over and over, using 100GB of bandwidth usage in the first 12 days of November. That's when my account was suspended for exceeding bandwidth (it's an artificial limit I put on there awhile back and forgot about...) that's also why the 'last visit' for all the bots is November 12th.

you are viewing a single comment's thread
view the rest of the comments
[–] Lee@retrolemmy.com 15 points 2 hours ago

A friend (works in IT, but asks me about server related things) of a friend (not in tech at all) has an incredibility low traffic niche forum. It was running really slow (on shared hosting) due to bots. The forum software counts unique visitors per 15 mins and it was about 15k/15 mins for over a week. I told him to add Cloudflare. It dropped to about 6k/15 mins. We excitemented turning Cloudflare off/on and it was pretty consistent. So then I put Anubis on a server I have and they pointed the domain to my server. Traffic drops to less than 10/15 mins. I've been experimenting with toggling on/off Anubis/Cloudflare for a couple months now with this forum. I have no idea how the bots haven't scrapped all of the content by now.

TLDR: in my single isolated test, Cloudflare blocks 60% of crawlers. Anubis blocks presumably all of them.

Also if anyone active on Lemmy runs a low traffic personal site and doesn't know how or can't run Anubis (eg shared hosting), I have plenty of excess resources I can run Anubis for you off one of my servers (in a data center) at no charge (probably should have some language about it not being perpetual, I have the right to terminate without cause for any reason and without notice, no SLA, etc). Be aware that it does mean HTTPS is terminated at my Anubis instance, so I could log/monitor your traffic if I wanted as well, so that's a risk you should be aware of.