hihi24522

joined 2 years ago
[–] hihi24522@lemm.ee 9 points 1 week ago

If it isn’t depressing, is it really The Great Gatsby?

[–] hihi24522@lemm.ee 4 points 1 week ago

People will get more from the idea he represented than the Jelly Bean he really was.

[–] hihi24522@lemm.ee 2 points 1 week ago

Garage motor special $100 off? Hooray!

Now if only I could afford a garage…

[–] hihi24522@lemm.ee 1 points 1 week ago

Sorry, the point I was trying to make is that we will be able to know if any statement that is testable is correct.

I just wanted to clarify that your initial comment is only true when you are counting things that don’t actually matter in science. Anything that actually matters can be tested/proven which means that science can be 100% correct for anything that’s actually relevant.

[–] hihi24522@lemm.ee 1 points 1 week ago

Gödel’s theorem is a logical proof about any axiomatic system within which multiplication and division are defined.

By nature, every scientific model that uses basic arithmetic relies on those kinds axioms and is therefore incomplete.

Furthermore, the statement “we live in a simulation” is a logical statement with a truth value. Thus it is within the realm of first order logic, part of mathematics.

The reason you cannot prove the statement is because it itself is standalone. The statement tells you nothing about the universe, so you cannot construct any implication that can be proven directly, or by contradiction, or by proving the converse etc.

As for the latter half of your comment, I don’t think I’m the one who hasn’t thought about this enough.

You are the one repeating the line that “science doesn’t prove things” without realizing that is a generalization not an absolute statement. It also largely depends on what you call science.

Many people say that science doesn’t prove things, it disproves things. Technically both are mathematic proof. In fact, the scientific method is simply proving an implication wrong.

You form a hypothesis to test which is actually an implication “if (assumptions hold true), then (hypothesis holds true).” If your hypothesis is not true then it means your assumptions (your model) are not correct.

However, you can prove things directly in science very easily: Say you have a cat in a box and you think it might be dead. You open the box and it isn’t dead. You now have proven that the cat was not dead. You collected evidence and reached a true conclusion and your limited model of the world with regards to the cat is proven correct. QED.

Say you have two clear crystals in front of you and you know one is quartz and one is calcite but you don’t remember which. But you have vinegar with you and you remember that it should cause a reaction with only the calcite. You place a drop of vinegar on the rocks and one starts fizzing slightly. Viola, you have just directly proven that rock is the calcite.

Now you can only do this kind of proof when your axioms (that one rock is calcite, one rock is quartz, and only the calcite will react with the vinegar) hold true.

The quest of science, of philosophy, is to find axioms that hold true enough we can do these proofs to predict and manipulate the world around us.

Just like in mathematics, there are often multiple different sets of axioms that can explain the same things. It doesn’t matter if you have “the right ones” You only need ones that are not wrong in your use case, and that are useful for whatever you want to prove things with.

The laws of thermodynamics have not been proven. They have been proven statistically but I get the feeling that you wouldn’t count statistics as a valid form of proof.

Fortunately, engineers don’t care what you think, and with those laws as axioms, engineers have proven that there cannot be any perpetual motion machines. Furthermore, Carnot was able to prove that there is a maximum efficiency heat engine and he was able to derive the processes needed to create one.

All inventions typically start as proof based on axioms found by science. And often times, science proves a model wrong by trying to do something, assuming the model was right, and then failing.

The point is that if our scientific axioms weren’t true, we would not be able to build things with them. We would not predict the world accurately. (Notice that statement is an implication) When this happens, (when that implication is proven false) science finds the assumption/axiom in our model that was proven wrong and replaces it with one or more assumptions that are more correct.

Science is a single massive logical proof by process of elimination.

The only arguments I’ve ever seen that it isn’t real proof are in the same vein as the “you can’t prove the world isn’t a simulation.” Yep, it’s impossible to be 100% certain that all of science is correct. However, that doesn’t matter.

It is absolutely possible to know/prove if science dealing with a limited scope is a valid model because if it isn’t, you’ll be able to prove it wrong. “Oh but there could be multiple explanations” yep, the same thing happens in mathematics.

You can usually find multiple sets of axioms that prove the same things. Some of them might allow you to prove more than the others. Maybe they even disagree on certain kinds of statements. But if you are dealing with statements in that zone of disagreement, you can prove which set of axioms is wrong, and if you don’t deal with those statements at all, then both are equally valid models.

Science can never prove that only a single model is correct… because it is certain that you can construct multiple models that will be equally correct. The perfect model doesn’t matter because it doesn’t exist. What matters is what models/axioms are true enough that they can be useful, and science is proving what that is and isn’t.

[–] hihi24522@lemm.ee 13 points 1 week ago (10 children)

This is false. Godels incompleteness theorems only prove that there will be things that are unprovable in that body of models.

Good news, Newtons flaming laser sword says that if something can’t be proven, it isn’t worth thinking about.

Imagine I said, “we live in a simulation but it is so perfect that we’ll never be able to find evidence of it”

Can you prove my statement? No.

In fact no matter what proof you try to use I can just claim it is part of the simulation. All models will be incomplete because I can always say you can’t prove me wrong. But, because there is never any evidence, the fact we live in a simulation must never be relevant/required for the explanation of things going on inside our models.

Are models are “incomplete” already, but it doesn’t matter and it won’t because anything that has an effect can be measured/catalogued and addded to a model, and anything that doesn’t have an effect doesn’t matter.

TL;DR: Science as a body of models will never be able to prove/disprove every possible statement/hypothesis, but that does not mean it can’t prove/disprove every hypothesis/statement that actually matters.

[–] hihi24522@lemm.ee 1 points 2 weeks ago

I work in a lab, so yes, I understand how data science works. However, I think you have too much faith in the people running these scrapers.

I think it’s unlikely that ChatGPT would have had those early scandals of leaking people’s SSNs or other private information if the data was actually “cleared by a human team” The entire point of these big companies is laziness; I doubt they have someone looking over the thousands of sites worth of data they feed to their models.

Maybe they do quality checks on the data but even in that event, forcing them to toss out a large data set because some of it was poisoned is a loss for the company. And if enough people poison their work or are able to feed poison to the scrapers, it becomes much less profitable to scrape images automatically.

I previously mentioned methods for possibly slipping through automatic filters in the scraper (though maybe I mentioned that in a different comment chain).

As for a scraper acting like a human by use of an LLM, that sounds hella computationally expensive on the side of the scrapers. There would be few willing to put in that much effort, fewer scrapers makes DDOS like effect of scraping less likely. It would also take more time which means the scraper is spending less time harassing others.

But these are good suggestions. I suppose a drastic option for fighting a true AI mimicking a human would be to make all links have a random chance of sending any user to the tarpit. People would know to click back and try again, but the AI would at best have to render the site, process what it sees, decide it is in the tarpit, and then return. That would further slow down the scraper (or very likely stop/trap it) but that would make it slightly annoying for regular users.

In any case, at a certain point, trying to tailor an AI scraper to avoid a single specific website and navigate the traps for it would probably take more time and effort than sending a human to aggregate the content instead of an automated scraper

[–] hihi24522@lemm.ee 1 points 2 weeks ago (2 children)

Oh when you said arms race I thought you were referring to all anti-AI countermeasures including Anubis and tarpits.

Were you only saying you think AI poisoning methods like Glaze and Nightshade are futile? Or do you also think AI mazes/tarpits are futile?

Both kind of seem like a more selfless version of protection like Anubis.

Instead of protecting your own site from scrapers, a tarpit will trap the scraper, stopping it from causing harm to other people’s services whether they have their own protections in place or not.

In the case of poisoning, you also protect others by making it more risky for AI to train on automatically scraped data which would deincentivize automated scrapers from being used on the net in the first place

[–] hihi24522@lemm.ee 1 points 2 weeks ago (4 children)

With aggressive scrapers, the “change” is having sites slowed or taken offline, being basically DDOSed by scrapers ignoring robots.txt.

What is your proposed adaptation that’s better than countermeasures? Be rich enough to afford more powerful hardware? Simply stop hosting anything on the internet?

[–] hihi24522@lemm.ee 2 points 2 weeks ago

Okay, so I’m definitely not the most knowledgeable hacker, but the issue with an active AI hunter, to hunt and kill instead of setting tarots, is that you’d have to actually create an AI capable of of hacking the scraper.

This would mean tracing it back to the actual source and then hacking that source to destroy the scraper, and I’d bet that’s not an easy task even for a human.

But yeah honestly, creating an AI capable of hacking and fucking up certain systems and then setting it loose on the net really could cause a Datakrash like event if it can replicate itself like a virus on the hardware it infects.

Even better if you could find some way to have it mutate as it goes along but that’s pretty far fetched even for this already far fetched hypothetical.

[–] hihi24522@lemm.ee 1 points 2 weeks ago (6 children)

Isn’t that what the arms race is? Adapting to new situations?

[–] hihi24522@lemm.ee 2 points 2 weeks ago

I guess diversity of tactics probably is a good way to stop scrapers from avoiding the traps we set. Good on you for helping out. Also I like the name lol

On a slightly unrelated note, is rust a web dev language? I’ve been meaning to learn it myself since I’ve heard it’s basically a better, modern alternative to C++

 

I came across Nepenthes today in the comments under a post about AI mazes. It has an option to purposefully generate not just an endless pit of links and pages, but also to deterministically generate random, human-like text for those pages to poison the LLM scrapers as they sink into the tarpit.

After reading that, I thought, could you do something similar to poison image scrapers too?

Like if you have an art hosting site, as long as you can get an AI to fall into the tarpit, you could replace all the art it thinks should be there with distorted images from a dataset.

Or just send it to a kind of “parallel” version of the site that replaces (or heavily distorts) all the images but leaves the text descriptions and tags the same.

I realize there’s probably some sort of filter for any automated image scraper that attempts to sort out low quality images, but if one used similar images to the expected content, that might be enough to get through the filter.

I guess if someone really wanted to poison a model, generating AI replacement images would probably be the most effective way to speed up model decay, but that has much higher energy and processing power overhead.

Anyway, I’m definitely not skilled/knowledgeable enough to make this a thing myself even just as an experiment. But I thought you all might know if someone’s already done it, or you might find the idea fascinating.

What do you think? Any better ideas / suggestions for poisoning art scraping AI?

 

I don’t know if there already is a real Web 3.0 definition out there (the first search results I got were using web3.0 to promote crypto so fuck that definition) but like Web 1.0 was the internet being a way for specific scientists/hobbyists/organizations to send esoteric data right?

Web 2.0 is the shift over to creating and sharing content on a broad scale, people reaching out through the web to interact and express themselves. Creators and companies trying to reach out to be accessible by lots of people.

We went from “you have to put in work to send/receive data on the net” to “it is easy for you to send stuff to the net and recieve stuff from the net” to “the net knows where you live and begs you to give it data it can sell then takes that data even if you refuse”

We also went from “you want this info, you need to find someone with it, set up a connection, get it” to “now we have efficient search engines help you easily find what you want” to “the internet is now the library of babel but worse because all the nonsense is ‘AI’ which can sometimes convincingly look like it isn’t nonsense.”

Both those paths seem like direct continuation so I propose we use web3.0 as a term for the enshittified internet.

Thoughts? We can call the decentralization of the net 4.0 because it’s being spurred on in response to 3.0 yeah?

view more: next ›