this post was submitted on 07 Jul 2025
589 points (98.2% liked)

Open Source

38814 readers
105 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] unexposedhazard@discuss.tchncs.de 127 points 4 days ago* (last edited 4 days ago) (3 children)

Non paywalled link https://archive.is/VcoE1

It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

[–] Mubelotix@jlai.lu 77 points 4 days ago (5 children)

Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

[–] JackbyDev@programming.dev 8 points 3 days ago (2 children)

It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)

[–] Mubelotix@jlai.lu 6 points 3 days ago* (last edited 3 days ago)

Interesting stance. I have bought many tens of thousand of captcha soves for legitimate reasons, and I have now completely lost faith in them

load more comments (1 replies)
load more comments (4 replies)
[–] exu@feditown.com 12 points 3 days ago

It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

load more comments (1 replies)
[–] Panda@lemmy.today 33 points 3 days ago (2 children)

I've seen this pop up on websites a lot lately. Usually it takes a few seconds to load the website but there have been occasions where it seemed to hang as it was stuck on that screen for minutes and I ended up closing my browser tab because the website just wouldn't load.

Is this a (known) issue or is it intended to be like this?

[–] lime@feddit.nu 27 points 3 days ago (1 children)

anubis is basically a bitcoin miner, with the difficulty turned way down (and obviously not resulting in any coins), so it's inherently random. if it takes minutes it does seem like something is wrong though. maybe a network error?

[–] isolatedscotch@discuss.tchncs.de 21 points 3 days ago* (last edited 3 days ago) (1 children)

adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

[–] RepleteLocum@lemmy.blahaj.zone 7 points 3 days ago (4 children)

So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

i mean, kinda? you are absolutely right that someone with an old pc might need to wait a few extra seconds, but the speed is ultimately throttled by the browser

[–] MadPsyentist@lemmy.nz 5 points 3 days ago

Just wait till they hit my homepage with a 200mb react frontend, 9 seperate tracking / analytics scripts and generic shopify scripts on it :P

[–] CrabAndBroom@lemmy.ml 5 points 3 days ago

Isn't that just the way things work in general though? If you have a worse computer, everything is going to be slower, broadly speaking.

[–] JackbyDev@programming.dev 6 points 3 days ago

Well, it's the scrapers that are causing the problem.

load more comments (1 replies)
[–] fuzzy_tinker@lemmy.world 93 points 4 days ago (7 children)

This is fantastic and I appreciate that it scales well on the server side.

Ai scraping is a scourge and I would love to know the collective amount of power wasted due to the necessity of countermeasures like this and add this to the total wasted by ai.

load more comments (7 replies)
[–] grysbok@lemmy.sdf.org 52 points 4 days ago

My archive's server uses Anubis and after initial configuration it's been pain-free. Also, I'm no longer getting multiple automated emails a day about how the server's timing out. It's great.

We went from about 3000 unique "pinky swear I'm not a bot" visitors per (iirc) half a day to 20 such visitors. Twenty is much more in-line with expectations.

[–] Jankatarch@lemmy.world 43 points 4 days ago

Everytime I see anubis I get happy because I know the website has some quality information.

[–] refalo@programming.dev 21 points 4 days ago* (last edited 4 days ago) (2 children)

I don't understand how/why this got so popular out of nowhere... the same solution has already existed for years in the form of haproxy-protection and a couple others... but nobody seems to care about those.

[–] Flipper@feddit.org 47 points 3 days ago (1 children)

Probably because the creator had a blog post that got shared around at a point in time where this exact problem was resonating with users.

It's not always about being first but about marketing.

[–] JohnEdwa@sopuli.xyz 27 points 3 days ago* (last edited 3 days ago) (1 children)

It’s not always about being first but about marketing.

And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

[–] JackbyDev@programming.dev 19 points 3 days ago

Compare and contrast.

High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise — open source at heart.

Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

Weigh the soul of incoming HTTP requests to protect your website!

Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

load more comments (1 replies)
[–] bdonvr@thelemmy.club 30 points 4 days ago (10 children)

Ooh can this work with Lemmy without affecting federation?

[–] beyond@linkage.ds8.zone 32 points 4 days ago (1 children)

Yes.

Source: I use it on my instance and federation works fine

[–] bdonvr@thelemmy.club 16 points 4 days ago (1 children)

Thanks. Anything special configuring it?

[–] beyond@linkage.ds8.zone 20 points 4 days ago* (last edited 4 days ago)

I keep my server config in a public git repo, but I don't think you have to do anything really special to make it work with lemmy. Since I use Traefik I followed the guide for setting up Anubis with Traefik.

I don't expect to run into issues as Anubis specifically looks for user-agent strings that appear like human users (i.e. they contain the word "Mozilla" as most graphical web browsers do) any request clearly coming from a bot that identifies itself is left alone, and lemmy identifies itself as "Lemmy/{version} +{hostname}" in requests.

load more comments (9 replies)
[–] thedeadwalking4242@lemmy.world 8 points 3 days ago (1 children)

I know people love anime myself included, but this popping up on my work PC can be frustrating

[–] ILikeBoobies@lemmy.ca 10 points 3 days ago

Contact the administrator to ask them to change the landing page

[–] medem@lemmy.wtf 24 points 4 days ago (7 children)

What advantage does this software provide over simply banning bots via robots.txt?

[–] kcweller@feddit.nl 89 points 4 days ago

Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.

AI scrapers don't respect this trust, and thus robots.txt is meaningless.

[–] medem@lemmy.wtf 47 points 4 days ago

Well, now that y'all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file...

[–] irotsoma@lemmy.blahaj.zone 29 points 4 days ago

TL;DR: You should have both due to the explicit breaking of the robots.txt contract by AI companies.

AI generally doesn't obey robots.txt. That file is just notifying scrapers what they shouldn't scrape, but relies on good faith of the scrapers. Many AI companies have explicitly chosen not no to comply with robots.txt, thus breaking the contract, so this is a system that causes those scrapers that are not willing to comply to get stuck in a black hole of junk and waste their time. This is a countermeasure, but not a solution. It's just way less complex than other options that just block these connections, but then make you get pounded with retries. This way the scraper bot gets stuck for a while and doesn't waste as many of your resources blocking them over and over again.

the scrapers ignore robots.txt. It doesn't really ban them - it just asks them not to access things, but they are programmed by assholes.

load more comments (3 replies)
[–] interdimensionalmeme@lemmy.ml 9 points 3 days ago

Open source is also the AI scraper bots AND the internet itself, it is every character in the story.

load more comments
view more: next ›