If your blockchain isn’t distributed, it doesn’t need to be a blockchain, because then you already have trust established.
enumerator4829
Git gud
/s
It mostly affects people working with ”fun” enterprise hardware or special purpose things.
But to take one example, proprietary drivers for high performance network cards, most likely from Nvidia.
You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.
Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.
I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.
Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.
Stability and standardisation within the kernel for kernel modules. There are plenty of commercial products that use proprietary kernel modules that basically only work on a very specific kernel version, preventing upgrades.
Or they could just open source and inline their garbage kernel modules…
Document databases are the future /s
Or you know, trusted timestamps and cryptographic signatures via normal PKI. A Merkle tree isn’t worth shit legally if you can’t verify it against a trust outside of the tree.
All of the blockchain bullshit miss that part - you can create a cryptographic representation of money or contracts, but you can’t actually enforce, verify or trust anything in the real world without intermediaries. On the other hand, I can trust a certificate from a CA because there are verifiable actual real-world consequences for someone if that CA breaks legal agreements.
I’ll use a folder of actual papers, signed using a pen. Have some witnesses, make sure they have a legal stake and consequences, and you are golden.
Distributed blockchains are useful when all of the below are fulfilled:
- Need for distributed ledger
- Peers are adversarial w.r.t. contents of transactions in the ledger
- Enough peers exist so that no group can become a majority and thus assume control
- No trusted central authority exists
Here, we have a single peer creating entries in a ledger. We can get away with a copy of the ledger and one or more trusted timestamping authorities.
There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.
By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.
Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.
What? Just base64 encrypt it before you store it in the git hub
You mean a transparency log? Just sign and publish. Or if it’s confidential, have a timestamp authority sign it, but what’s the point of a confidential blockchain? Sure, we han have a string of hashes chained together á la git, but that’s just an implementation detail. Where does the trust come from, who does the audit? That’s the interesting part.