this post was submitted on 28 Mar 2025

0 points (NaN% liked)

TechTakes

2515 readers

138 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

Gemini 2.5 "reasoning", no real improvement on river crossings. (awful.systems)

submitted 1 year ago by diz@awful.systems to c/techtakes@awful.systems

14 comments fedilink hide all child comments

So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

top 14 comments

sorted by: hot top controversial new old

[–] kata1yst@sh.itjust.works 0 points 1 year ago* (last edited 1 year ago) (1 children)

I just asked the same question to my preferred onprem model, Mistral-Small 3.1.

13s on a 7900xtx

justinledwards/mistral-small-3.1-Q6_K:latest

To successfully transport the duck, carrot, and potato across the river without losing any of them, you need to plan your trips carefully. Here’s a step-by-step strategy:

First Trip: Take the duck across the river first.

Items on this side: Carrot, Potato
Items on the other side: Duck

Return Trip: Go back alone to the original side.

Items on this side: Carrot, Potato (you)
Items on the other side: Duck

Second Trip: Take either the carrot or the potato across the river.

Let's say you take the carrot first for illustration.

Items on this side: Potato
Items on the other side: Duck, Carrot (you)

Return Trip: Go back with the duck to the original side.

Items on this side: Potato, Duck
Items on the other side: Carrot

Third Trip: Leave the duck behind and take the potato across the river.

Items on this side: Duck
Items on the other side: Carrot, Potato (you)

Return Trip: Go back alone to get the duck.

Items on this side: Duck (you)
Items on the other side: Carrot, Potato

Final Trip: Take the duck across the river. All items are now on the other side.

By following these steps, you ensure that the duck is never left unsupervised and all items make it safely across the river.

[–] MBM@lemmings.world 1 points 1 year ago

What's your point, that your favourite LLM doesn't reason either?

[–] BlueMonday1984@awful.systems 0 points 1 year ago (1 children)

I'm kinda tired, but this puzzle's shoved itself into my brain. The obvious solution I can see is, roughly speaking:

Take the duck and carrot across
Take the duck back
Take the duck and potato across

[–] diz@awful.systems 1 points 1 year ago* (last edited 1 year ago) (1 children)

Yeah, exactly. There's no trick to it at all, unlike the original puzzle.

I also tested OpenAI's offerings a few months back with similarly nonsensical results: https://awful.systems/post/1769506

All-vegetables no duck variant is solved correctly now, but I doubt it is due to improved reasoning as such, I think they may have augmented the training data with some variants of the river crossing. The river crossing is one of the top most known puzzles, and various people have been posting hilarious bot failures with variants of it. So it wouldn't be unexpected that their training data augmentation has river crossing variants.

Of course, there's very many ways in which the puzzle can be modified, and their augmentation would only cover obvious stuff like variation on what items can be left with what items or spots on the boat.

[–] SGforce@lemmy.ca 0 points 1 year ago (3 children)

It's just overtrained on the puzzle such that it mostly ignores your prompt. Changing a few words out doesn't change that it recognises the puzzle. Try writing it out in ASCII or uploading an image with it written or some other weird way that it hasn't been specifically trained on and I bet it actually performs better.

[–] froztbyte@awful.systems 1 points 1 year ago (1 children)

oh look it's a loadbearing "just" in the wild. better hope you can shore that fucker up with some facts

Try writing it out in ASCII

my poster in christ, what in the fuck are you on about. stop prompting LLMs and go learn some things instead

some other weird way that it hasn’t been specifically trained on and I bet it actually performs better

"no no see, you just need to prompt it different. just prompt it different bro it'll work bro I swear bro"

god, every fucking time

[–] sailor_sega_saturn@awful.systems 1 points 1 year ago (1 children)

All along my mistake was that I was prompting it in unicode instead of latin1, alphameric BCD, or "modified UTF-8".

[–] froztbyte@awful.systems 1 points 1 year ago (1 children)

I thought everyone knew that you had to structure prompts in ALGOL 420 to get the best performance by going close to the metal

[–] bitofhope@awful.systems 1 points 1 year ago (1 children)

I use UTF-9 to efficiently handle Unicode on my PDP-10.

[–] cstross@wandering.shop 1 points 1 year ago

@bitofhope @techtakes Surely you need a PDP-9 for that?

[–] dgerard@awful.systems 1 points 1 year ago

"it can't be that stupid, you must be prompting it wrong"

[–] self@awful.systems 1 points 1 year ago (1 children)

I bet it generates stochastic nonsense you’ll read like tea leaves

[–] SGforce@lemmy.ca 0 points 1 year ago (1 children)

Bet

[–] o7___o7@awful.systems 1 points 1 year ago

The accumulated filth of all their slop and murder will foam up about their waists and all the removed and prompt enjoyers will look up and shout: 'Bet' - and I'll whisper 'no.'