LLMs are not designed to give you objective factual answers. They're designed to guess what you want to hear, like a middle school student writing a book report for a book they never read.
Fuck AI
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
Case1 isn't a good use case of AI, Case 2 you're going to want a higher quality model than o4. 4.1 is better at math and analysis, claude 4 is probably more accurate at this use case
LLM image processing doesn’t work the same way reverse image lookup does.
Tldr explanation: Multimodal LLMs turn pictures into a ~~thousand~~ 200-500 or so ~~words~~ tokens, but reverse image lookups create perceptual hashes of images and look the hash of your uploaded image up in a database.
Much longer explanation:
Multimodal LLMs (technically, LMMs - large multimodal models) use vision transformers to turn images into tokens. They use tokens for words, too, but these tokens don’t also correspond to words. There are multiple ways this could be implemented, but a common approach is to break the image down into a grid, then transform each “patch” of a specific size, e.g., 16x16, into a single token. The patches aren’t transformed individually - the whole image is processed together, in context - but it still comes out of it with basically 200 or so tokens that allow it to respond to the image, the same way it would respond to text.
Current vision transformers also struggle with spatial awareness. They embed basic positional data into the tokens but it’s fragile and unsophisticated when it comes to spatial awareness. Fortunately there’s a lot to explore in that area so I’m sure there will continue to be improvements.
One example improvement, beyond improved spatial embeddings, would be to use a dynamic vision transformers that’s dependent on the context, or that can re-evaluate an image based off new information. Outside the use of vision transformers, simply training LMMs to use other tools on images when appropriate can potentially help with many of LMM image processing’s current shortcomings.
Given all that, asking an LLM to find the album for you is like - assuming you’ve given it the ability and permission to search the web - like showing the image to someone with no context, then them to help you find what music video - that they’ve never seen, by an artist whose appearance they describe with 10-20 generic words, none of which are their name - it’s in, and to hope there were, and that they remembered, the specific details that would make it would come up in the top ten results if searched for on Google. That’s a convoluted way to say that it’s a hard task.
By contrast, reverse image lookup basically uses a perceptual hash generated for each image. It’s the tool that should be used for your particular problem, because it’s well suited for it. LLMs were the hammer and this problem was a torx screw.
Suggesting you use - or better, using a reverse image lookup tool itself - is what the LLM should do in this instance. But it would need to have been trained to think to suggest this, capable of using a tool that could do the lookup, and have both access and permission to do the lookup.
Here’s a paper that might help understand the gaps between LMMs and tasks built for that specific purpose: https://arxiv.org/html/2305.07895v7
So if I am understanding it, LLMs is not using the easier option of reverse image search because it is not aware of them?
It may be aware of them, but not in that context. If you asked it how to solve the problem rather than to solve the problem for you, there’s a chance it would suggest you use a reverse image search.
LLMs are curve fitting the function of “input text” to “expected output text”.
So when you give it an input text, it generates an output text interpolated from the expected outputs for similar inputs.
That means it’s often right for very common prompts and often wrong for prompts that are subtly different from common prompts.
This is my observation as well. Generic questions are answered well but specific situations are not.
I was thinking about the question here and how to reframe it so that it answers itself. I think I have the right way to ask the question:
Why is a hyper-advanced game of mad libs so wrong all the time?
That would get across the point, I think.
Why are LLMs so wrong most of the time? Aren’t they processing high quality data from multiple sources?
Well that's the thing. LLMs don't generally "process" data as humans would. They don't understand the text they're generating. So they can't check their answers against reality.
(Except for Grok 4, but it's apparently checking its answers to make sure they agree with Elon Musk's Tweets, which is kind of the opposite of accuracy.)
I just don’t understand the point of even making these softwares if all they can do is sound smart while being wrong.
As someone who lived through the dotcom boom of the 2000s, and the crypto booms of 2017 and 2021, the AI boom is pretty obviously yet another fad. The point is to make money - from both consumers and investors - and AI is the new buzzword to bring those dollars in.
Don’t forget IoT, where the S stands for security! Or “The Cloud”! Make sure to rebuy the junk we will deprecate in 2 years time because we love electronic waste and planned obsolescence ;)
AI is definitely a bubble and it is going to crash the stock market one day, along with bitcoin
The thing about LLMs is that they "store" information about the shape of their training models, not about the information contained therein. That information is lost.
A LLM will produce text that looks like the texts it was trained with, but it only can only reproduce any information contained in them if it's common enough in its training data to statistically affect their shape, and even then it has a chance to get it wrong, since it has no way to check its output for fact accuracy.
Add to that that most models are pre-prompted to sound confident, helpful, and subservient (the companies' main goal not being to provide information, but to get their customers hooked on their product and coming back for more), and you get the perfect scammers and yes-men. Auto-complete mentalists that will give you as much confident sounding information shaped nonsense as you want, doing their best to agree with you and confirm any biases you might have, with complete disregard for accuracy, truth, or the effects your trust in their output might have (which makes them extremely dangerous and addictive for suggestible or intellectually or emotionally vulnerable users).
Aren’t they processing high quality data from multiple sources?
Here's where the misunderstanding comes in, I think. And it's not the high quality data or the multiple sources. It's the "processing" part.
It's a natural human assumption to imagine that a thinking machine with access to a huge repository of data would have little trouble providing useful and correct answers. But the mistake here is in treating these things as thinking machines.
That's understandable. A multi-billion dollar propaganda machine has been set up to sell you that lie.
In reality, LLMs are word prediction machines. They try to predict the words that would likely follow other words. They're really quite good at it. The underlying technology is extremely impressive, allowing them to approximate human conversation in a way that is quite uncanny.
But what you have to grasp is that you're not interacting with something that thinks. There isn't even an attempt to approximate a mind. Rather, what you have is a confabulation engine; a machine for producing plausible fictions. It does this by creating unbelievably huge matrices of words - literally operating in billions of dimensions at once, graphs with many times more axes than we have letters - and probabilistically associating them with each other. It's all very clever, but what it produces is 100% fake, made up, totally invented.
Now, because of the training data they've been fed, those made up answers will, depending on the question, sometimes ends up being right. For certain types of question they can actually be right quite a lot of the time. For other types of question, almost never. But the point is, they're only ever right by accident. The "AI" is always, always constructing a fiction. That fiction just sometimes aligns with reality.
Confabulation is what it is, you are right.
Why on Earth are investors backing this? Usually money filters out useless endeavours.
Really?
Did money filter out the subprime mortgages before disaster?
Did money filter out cryptocurrency?
Did money filter out NFTs?
Hey, why stick to the recent past. Did money filter out tulip bulbs?
Money filters out nothing. Money is held by humans. Humans do stupid things. Humans run in packs. Human do stupid things in packs. And that means money does stupid things in packs.
Yes, but all these things were actually filtered out by money. It took a while but it happened.
I'm not sure I understand, then, what you mean by "filtered out by money". If you mean "they collapsed eventually because they were idiotic ideas" then, well, yes. But they lasted for a long time before doing so and caused incalculable damage in the process. The tulip bulb craze (one of the earliest speculative crazes) lasted about 4 years. The subprime mortgage disaster took 8 years. The NFT fiasco lasted about 2 years. The dot-com bubble took 7 years to play out. The Japan real estate bubble was about 5 years.
We're only 3 years or so into the LLMbecile bubble. If you want to think of bubble collapses as "filtered out by money" we've got anywhere from next week to 2029 for the collapse.
Oh you sweet summer child.
If you remember anything from this thread, remember this: capitalist markets do not care whether something is useful or useless. Capitalist markets care whether something will make money for its investors. If something totally useless will make money for its investors, the market will throw money at it.
See: tulips, pet rocks, ethanol, cryptocurrency. And now AI.
Because people are stupid. And people will spend money on stupid shit. And the empty hand of capitalism will support whatever people will spend money on, whether it's stupid shit or not.
(And because, unfortunately, AI tools are amazing at gathering information from their users. And I think the big tech companies are really aggressively pushing AI because they want very much to have users talking to their AI tools about what they need and what they want and what their interests are, because that's the kind of big data they can make a lot of money from.)
money filters out useless endeavours.
That might have been true once, if ever, but it's certainly not true anymore. Actually fabulation is where most of the money is. Most 'investors' have gotten rich by accident and by an incredible amount of luck. They will tell you it was hard work, swear and blood but that is never true, it's being born in the right family and being in the right place at the time. These people aren't any smarter or better then you and me. And are just as susceptible to bullshit as you and me. Maybe even more so, because they think there exceptional skill has gotten them where they are. This means they will put there money quite easily in any endeavour that sounds plausible and/or profitable on their mind, but what usually is complete nonsense. What is more, once a few of these have put money on the table, fomo kicks in and all the bro's from the gym want in too, kicking of a cycle of complete and utter waste of money. All the while, telling everyone that this, THIS, this what the have put money on, is the next big thing.
See Quantum computing.
Once governments started to set aside funding for it, the scams began. Google, Microsoft, they're all in on it
DWave is history, an AI example Builder was revealed to be 700 underpaid Indians.
There's like two useful algorithms right now. That we also can't use because we cannot make matrices of qubits that are stable.
Once the money and hype train starts rolling, it becomes about money men exploiting that hype to multiply their money.. and the technology is completey secondary.
Eh, I'll agree that quantum computing hasn't delivered much yet, but it shouldn't be mentioned in the same sentence as LLMs. There's a difference between tech that hasn't become practical yet, and tech that is a gigantic grift pretending to be something it will categorically never achieve.
And why do you think quantum computing isn't the latter one?
Saving this! Great write-up
Everyone here should already know, but read the wheresyouredat blog to fully understand the Business Idiot.
Even the "thinking engine" ones are wild to watch in motion, if you ever turn on debugging. It's like watching someone substitute the autosuggest of your keyboard for what words appear in your head when trying to think through something. It just generates something and then generates again using THAT output (multiple times maybe involved for each step).
I watched one I installed locally for Home Assistant, as a test for various operations, just start repeating itself over and over to nearly everything before it just spat out something completely wrong.
Garbage engines.
I assume by "thinking engine" you mean "Reasoning AI".
Reasoning AI is just more bullshit. What happens is that they produce the output the way they always do - by guessing at a sequence of words that is statistically adjacent to the input they're given - but then what they do is produce a randomly generated "Chain of thought" which is invented in the same way as the result; just pure statistical word association. Essentially they create the output the same way that a non-reasoning LLM does, then they give r themselves the prompt "Write a chain of thought for this output." There's a little extra stuff going on where they sort of check their own output, but in essence that's just done by running the model multiple times and picking the output they converge on. So, just weighting the randomness, basically.
I'm simplifying a lot here obviously, but that's pretty much what's going on.
Basically reworded what I was saying almost exactly, but yes.
I like to think of them as artificial con men. They sound great. They have confidence and are complimentary and are very agreeable, but they will tell you what they think you want to hear. Whether or not what they are telling you is truthful isn't even part of the equation.
Yeah, confidence is the problem. And they don’t accept that they don’t know something.
I almost always get perfect responses, but I'm very limited in what I'll input. Often I'm just using ChatGPT to remember a word or event I've forgotten. Pretty much 100% accurate on that bit.
Couldn't explain how I know what will and won't work, but I have a sense of it. Also, the farther you drill into a thing, the more off-topic it gets. I'm almost always one and done with a prompt.
You are getting more surface level information from it which is probably going to be correct unless there is a major problem in training data.
LLMs only output fan-fiction of our reality.
Yeah, I have realised that now. It seems to be useful because it is well spoken but it is not.
The first time I ever used it I got a bugged response. I asked it to give me a short summary of the 2022 Super Bowl, and it told me Patrick Mahomes won the Super Bowl with a field goal kick.
Now, those two things separately are true. Mahomes won. The game was won on a field goal.
The LLM just looks at the probability that the sentences it's generating are correct based on its training data, and it smashed two correct statements together thinking that was the most probable reasonable response.
It does that a lot. Don't use GenAI without checking its output.
I have noticed that it is terrible when you know at least a little about the topic.
Ooh! Now do the press!
Or a more accurate way to say it is AI is terrible all the time but it is easier to notice when you know at least a little about the topic.
Much of the Input Comedy from the Web now, not the web 2003.
I've used it a few times to quickly check some reference values, calculations and symptoms (research/physiology) and most of the time its fine but occasionally it spits out some of the craziest shit i've ever seen, like dangerously wrong but its just as confident.
The confidence is the problem. If a human does not know the answer, they say that they don’t know. LLMs seem to not know that it is an option.
Confidence isn't a good description either, since there is no thought proces or motivation. It is a bullshit machine, spewing out something that looks like a coherent response.
It does sound more professional than most humans