this post was submitted on 26 Nov 2025
128 points (97.1% liked)

Selfhosted

53155 readers
487 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Got a warning for my blog going over 100GB in bandwidth this month... which sounded incredibly unusual. My blog is text and a couple images and I haven't posted anything to it in ages... like how would that even be possible?

Turns out it's possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for 'Unknown robot'? This is actually bonkers.

Edit: As Thunraz points out below, there's a footnote that reads "Numbers after + are successful hits on 'robots.txt' files" and not scientific notation.

Edit 2: After doing more digging, the culprit is a post where I shared a few wallpapers for download. The bots have been downloading these wallpapers over and over, using 100GB of bandwidth usage in the first 12 days of November. That's when my account was suspended for exceeding bandwidth (it's an artificial limit I put on there awhile back and forgot about...) that's also why the 'last visit' for all the bots is November 12th.

you are viewing a single comment's thread
view the rest of the comments
[–] Thunraz@feddit.org 26 points 3 hours ago (1 children)

It's 12181 hits and the number behind the plus sign are robots.txt hits. See the footnote at the bottom of your screenshot.

[–] benagain@lemmy.ml 11 points 3 hours ago (2 children)

Phew, so I'm a dumbass and not reading it right. I wonder how they've managed to use 3MB per visit?

[–] arandomthought@sh.itjust.works 9 points 3 hours ago (1 children)

The robots are a problem, but luckily we're not into the hepamegaquintogilarillions... Yet.

[–] benagain@lemmy.ml 5 points 3 hours ago* (last edited 3 hours ago)

12,000 visits, with 181 of those to the robots.txt file makes way, way more sense. The 'Not viewed traffic' adds up to 136,957 too - so I should have figured it out sooner.

I couldn't wrap my head around how large the number was and how many visits that would actually entail to reach that number in 25 days. Turns out that would be roughly 5.64 quinquinquagintillion visits per nanosecond. Call it a hunch, but I suspect my server might not handle that.

[–] EarMaster@lemmy.world 4 points 3 hours ago (1 children)

The traffic is really suspicious. Have you by any chance a health or heartbeat endpoint which provides continuous output? That would explain why so many hits cause so much traffic.

[–] benagain@lemmy.ml 2 points 2 hours ago

It's super weird for sure. I'm not sure how the bots have managed to use so much more bandwidth with only 30k more hits than regular traffic, I guess they probably don't rely on any caching and fetch each page from scratch?

Still going through my stats, but it doesn't look like I've gotten much traffic via any API endpoint (running WordPress). I had a few wallpapers available for download and it looks like for whatever reason the bots have latched onto those.