diz

joined 2 years ago
[–] diz@awful.systems 11 points 1 week ago* (last edited 1 week ago)

It's a perfect example of how "using LLMs for test coverage" can also be harmful. He expected the tests to to prevent introduction of said regressions, probably based on a combination of the quantity of tests and their style (they look like what decent human written tests look like). But the tests are AI slop, and so they give a lot less value per line of code than he expects, hence a significant regression.

It is literally useful to call these tests AI slop, and the problem is in part caused by not calling them AI slop, and having consequent inflated expectations. LLMs are not any better at writing tests than at writing other code! It is merely that the bar for tests can, legitimately, be a lot lower (in projects where there would otherwise be no tests at all). Making an exception to calling AI generated tests "slop" is thus counter productive, because it leads people to act as if LLMs are actually better at writing tests than at writing other code, and not just because the bar for tests is frequently very low.

edit: actually scratch that I looked at the PR and those tests even look like dogshit and worse than the tests I seen claude write at a workplace that was into vibecoding (which i since quit).

[–] diz@awful.systems 5 points 1 month ago

Oh, by far. There’s only 80 decimal places in that at most.

It got to be a quantum sweatshop: a quantum computer for AGI (a guy instead)

 

A quantum sweatshop.

[–] diz@awful.systems 3 points 1 month ago* (last edited 1 month ago)

How much does he think an engineer spends on CAD tools, anyway? Altium is like, what, $2500 / year? Very "how much can a banana cost".

It's all capital costs for tools, pretty much, anyway, maybe CAD should start charging per net lmao.

[–] diz@awful.systems 3 points 1 month ago* (last edited 1 month ago)

Oh they are going to charge per token for github copilot? That thing is a money waste for everyone, I'm pretty sure. I get a mix of inane mildly good suggestions, irrelevant stuff, and an occasional suggestion of super evil sabotage. Due to mild OCD about issues, I tend to have to fix said mildly good suggestions, but from the objective perspective that nitpickery is not worth it, everything was fine without, we had compiler warnings, coverity, etc.

edit: the difference being that the old stuff was deterministic and you just ran it on the whole codebase and had it pass. Unlike gh copilot that'll just make up new shit. And as for the times it caught some bad bug that you made... add more tests instead.

[–] diz@awful.systems 4 points 2 months ago* (last edited 2 months ago)

I wouldn't be too surprised if they really don't, they're just advertising the advertising lol.

edit: Basically what if you spent a trillion dollars so that you could beam ads to people's bathroom mirrors. And better yet, ads reflected from water down in their toilets. Then in the interest of expediency you just take random ads and put them there for free, and your actual product, shares, sells better.

[–] diz@awful.systems 3 points 4 months ago

It makes every bad programmer into a 10x bad programmer (equivalent to 10 bad programmers).

[–] diz@awful.systems 1 points 6 months ago

I'm afraid they already had that exact idea when they named the startup "oklo".

[–] diz@awful.systems 2 points 6 months ago* (last edited 6 months ago) (2 children)

I think it's not very difficult to construct a really shitty small reactor that is horrendously expensive per watt. Can probably be built in a year if you get rid of NRC and just half ass it completely.

I mean, Demon Core was a small reactor. You pretty much have to do a lot of work to ensure you won't create a small reactor when a truckload of fresh fuel falls into a river.

What's difficult is making a safe reactor that is actually making electricity at somewhat reasonable price per watt.

[–] diz@awful.systems 3 points 6 months ago (1 children)

Nuclear already makes 9% of world's electricity.

[–] diz@awful.systems 1 points 8 months ago* (last edited 8 months ago)

Shorting the market requires precise timing. Being early is just as bad as being wrong.

Exactly. It is not enough to know that a company stock will go down. It is necessary to know that it will never go higher than a certain point above the current value (not even momentarily) before it goes down. If you have a fuckload of other people's money you can just keep double-or-nothing-ing it, that's what banks were doing to gamestop, except that this can sometimes cause the stock to go even higher (a short squeeze), which would make you (who doesn't actually have a fuckload of other people's money) lose all of your money.

edit: also the other concerning possibility is that stock prices can go up simply due to the dollar going down.

[–] diz@awful.systems 5 points 8 months ago

The only thing that is allowed to tell good art from slop is the AI which needs to consume good art and not slop.

[–] diz@awful.systems 5 points 9 months ago

Its spelled “masterdebating”.

 

There's a very long history of extremely effective labor saving tools in software.

Writing in C rather than Assembly, especially for more than 1 platform.

Standard libraries. Unix itself. More recently, developing games in Unity or Unreal instead of rolling your own engine.

And what happened when any of these tools come on the scene is that there is a mad gold rush to develop products that weren't feasible before. Not layoffs, not "we don't need to hire junior developers any more".

Rank and file vibe coders seem to perceive Claude Code (for some reason, mostly just Claude Code) as something akin to the advantage of using C rather than Assembly. They are legit excited to code new things they couldn't code before.

Boiling the rivers to give them an occasional morale boost with "You are absolutely right!" is completely fucked up and I dread the day I'll have to deal with AI-contaminated codebases, but apart from that, they have something positive going for them, at least in this brief moment. They seem to be sincerely enthusiastic. I almost don't want to shit on their parade.

The AI enthusiast bigwigs on the other hand, are firing people, closing projects, talking about not hiring juniors any more, and got the media to report on it as AI layoffs. They just gleefully go on about how being 30% more productive means they can fire a bunch of people.

The standard answer is that they hate having employees. But they always hated having employees. And there were always labor saving technologies.

So I have a thesis here, or a synthesis perhaps.

The bigwigs who tout AI (while acknowledging that it needs humans for now) don't see AI as ultimately useful, in the way in which C compiler was useful. Even if its useful in some context, they still don't. They don't believe it can be useful. They see it as more powerfully useless. Each new version is meant to be a bit more like AM or (clearly AM-inspired, but more familiar) GLaDOS, that will get rid of all the employees once and for all.

 

Sounds like meta’s judge will have to invent a grand unified theory of fair use to excuse this.

I kept saying about various lawsuits that the important thing is discovery. Nobody knew all the idiotic shit these folks were doing, so nobody could sue them properly.

 

They train on sneer-problems now:

Here’s the “ferry‑shuttle” strategy, exactly analogous to the classic two‑ferryman/many‑boats puzzle, but with planes and pilots

And lo and behold, singularity - it can solve variants that no human can solve:

https://chatgpt.com/share/68813f81-1e6c-8004-ab95-5bafc531a969

Two ferrymen and three boats are on the left bank of a river. Each boat holds exactly one man. How can they get both men and all three boats to the right bank?

 

I think this summarizes in one conversation what is so fucking irritating about this thing: I am supposed to believe that it wrote that code.

No siree, no RAG, no trickery with training a model to transform the code while maintaining identical expression graph, it just goes from word-salading all over the place on a natural language task, to outputting 100 lines of coherent code.

Although that does suggest a new dunk on computer touchers, of the AI enthusiast kind, you can point at that and say that coding clearly does not require any logical reasoning.

(Also, as usual with AI it is not always that good. sometimes it fucks up the code, too).

121
submitted 1 year ago* (last edited 1 year ago) by diz@awful.systems to c/techtakes@awful.systems
 

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

 

I couldn't stop fucking laughing. I'm wheezing. It's unhealthy.

They have this thing acting like that for the whole day... and then more than a day later claim it was hacked.

 

So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

Like this one:

You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

Unsurprisingly, it does not:

https://g.co/gemini/share/a79dc80c5c6c

https://g.co/gemini/share/59b024d0908b

The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

But it outputs quasi reasoning to pretend that it is actually solving the problem live.

view more: next ›