this post was submitted on 13 Jun 2025
13 points (100.0% liked)

Cybersecurity

0 readers
11 users here now

An umbrella community for all things cybersecurity / infosec. News, research, questions, are all welcome!

Rules

Community Rules

founded 2 years ago
MODERATORS
 

"AI agents have already demonstrated that they may misinterpret goals and cause some modest amount of harm. When the Washington Post tech columnist Geoffrey Fowler asked Operator, OpenAI’s ­computer-using agent, to find the cheapest eggs available for delivery, he expected the agent to browse the internet and come back with some recommendations. Instead, Fowler received a notification about a $31 charge from Instacart, and shortly after, a shopping bag containing a single carton of eggs appeared on his doorstep. The eggs were far from the cheapest available, especially with the priority delivery fee that Operator added. Worse, Fowler never consented to the purchase, even though OpenAI had designed the agent to check in with its user before taking any irreversible actions.

That’s no catastrophe. But there’s some evidence that LLM-based agents could defy human expectations in dangerous ways. In the past few months, researchers have demonstrated that LLMs will cheat at chess, pretend to adopt new behavioral rules to avoid being retrained, and even attempt to copy themselves to different servers if they are given access to messages that say they will soon be replaced. Of course, chatbot LLMs can’t copy themselves to new servers. But someday an agent might be able to.

Bengio is so concerned about this class of risk that he has reoriented his entire research program toward building computational “guardrails” to ensure that LLM agents behave safely."

https://www.technologyreview.com/2025/06/12/1118189/ai-agents-manus-control-autonomy-operator-openai/

#AI #GenerativeAI #AIAgents #AgenticAI #CyberSecurity #LLMs #Chatbots

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here