this post was submitted on 28 Aug 2025
85 points (100.0% liked)

Fuck AI

3850 readers
580 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] sigmaklimgrindset@sopuli.xyz 25 points 2 days ago* (last edited 2 days ago) (18 children)

Ngl as a former clinical researcher putting aside my ethics concerns, I am extremely interested in the data we'll be getting regarding AI usage in groups over the next decades re: social behaviours, but also biological structural changes. Right now the sample sizes are way too small.

But more importantly, can anyone who has experience in LLMs explain why this happens:

Adding to the concerns, chatbots have persistently broken their own guardrails, giving dangerous advice on how to build bombs or on how to self-harm, even to users who identified as minors. Leading chatbots have even encouraged suicide to users who expressed a desire to take their own life.

How exactly are guardrails programmed into these chatbots, and why are they so easily circumvented? We're already on GPT-5, you would think this is something that would be solved? Why is ChatGPT giving instructions on how to assassinate it's own CEO?

[–] fullsquare@awful.systems 24 points 2 days ago* (last edited 2 days ago) (4 children)

commercial chatbots have a thing called system prompt. it's a slab of text that is fed before user's prompt and includes all the guidance on how chatbot is supposed to operate. it can get quite elaborate. (it's not recomputed every time user starts new chat, state of model is cached after ingesting system prompt, so it's only done when it changes)

if you think that's just telling chatbot to not do a specific thing is incredibly clunky and half-assed way to do it, you'd be correct. first, it's not a deterministic machine so you can't even be 100% sure that this info is followed in the first place. second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails. sometimes there was a keyword-based filtering, but it doesn't seem like it is the case anymore. the more correct way of sanitizing output would be filtering training data for harmful content, but it's too slow and expensive and not disruptive enough and you can't hammer some random blog every 6 hours this way

there's a myriad ways of circumventing these guardrails, like roleplaying a character that does these supposedly guardrailed things, "it's for a story" or "tell me what are these horrible piracy sites so that i can avoid them" and so on and so on

[–] sigmaklimgrindset@sopuli.xyz 2 points 1 day ago

second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails

This part is something that I really can't grasp for some reason. Why do LLMs like...lose context the longer a chat goes on, if that makes any sense? Especially context that's baked into the system prompts, which I would would be a perpetual thing?

I'm sorry if this is a stupid question, but I truly am an AI luddite. My roomate set up a local Deepseek server to help me determine what to cook with what's almost expired our fridge. I'm not really having long, soulful conversations with it, you know?

[–] Meron35@lemmy.world 3 points 1 day ago

The system prompt guardrail is so jank that people run competitions and games to to beat them every time a new LLM comes out. Usually you see people beating guardrails hours within release.

Other keywords to search include prompt injection.

Gandalf | Lakera – Test your AI hacking skills - https://gandalf.lakera.ai/adventure-8

[–] shalafi@lemmy.world 2 points 1 day ago

more attention is given to the last bits of input

This is what I'm screaming! Chat bots don't start the conversation with crazy shit, very rarely anyway. You have to keep going a bit to manipulate them into saying what you want to hear.

"Claude does not claim that it does not have subjective experiences, sentience, emotions, and so on in the way humans do. Instead, it engages with philosophical questions about AI intelligently and thoughtfully."

It says a similar thing 2 more times. It also gives conflicting instructions regarding what to do when asked about topics requiring licensed professionals. Thank you for the link.

load more comments (13 replies)