Technology

Delta-mem tackles a really annoying problem with current LLMs dealing with long contexts. Usually when we want an agent or assistant to remember things over a long conversation we just shove all the past text into the prompt. The problem is that standard attention gets computationally expensive as the context grows and the models often suffer from context rot where they just forget or ignore the middle stuff anyway. Other approaches like RAG or LoRA edits either bring in noisy retrieval steps or lock the memory into static weights that do not update well on the fly.

The authors built something called delta-mem which keeps the main LLM completely frozen and bolts on a tiny dynamic memory state. Instead of saving raw text it compresses the history into a really small 8x8 matrix representing associative memory. As new tokens come in it updates this matrix using a delta learning rule which basically checks if the current memory can predict the new information and only writes the residual difference into the state. It even has a forget gate to handle old info naturally. When the model generates a response it reads from this compressed state to tweak the query and output of the standard attention mechanism. It's a clever way to inject memory directly into the forward pass without messing with the core weights.

They also tested a few ways to write to this memory. You can update it token by token which is great for local details but prone to noise. You can average out a whole message segment and write that which smooths things out for stronger models. Or you can split the memory into multiple parallel states so facts and task progress do not overwrite each other which turned out to be really helpful for smaller backbones.

They tested it on Qwen models and it bumped the average scores significantly especially on memory heavy benchmarks like LoCoMo and Memory Agent Bench. The coolest finding is the context recovery test. They actually deleted the explicit textual history from the prompt and the model could still answer multi-hop questions using just the compressed 8x8 state. It heavily implies that we might not need massive million token context windows if we can figure out how to compress and stream memory directly into the attention layers efficiently. Plus the parameter overhead is microscopic at roughly 0.12 percent of the backbone size.

12

6

SANA-WM, a 2.6B open-source world model for 1-minute 720p video (nvlabs.github.io)

submitted 5 days ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

13

11

Deepseek beats Claude in a programming challenge (aicc.rayonnant.ai)

submitted 5 days ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

14

5

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution (github.com)

submitted 5 days ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

15

7

“Too Dangerous to Release” — Or Just Too Expensive? The Real Reason Anthropic Is Hiding Its Most Powerful AI (kingy.ai)

submitted 6 days ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

5 comments fedilink

16

6

LLM tokens as means of production? (lemmy.ml)

submitted 6 days ago by jh6AZStb@lemmy.ml to c/technology@lemmygrad.ml

2 comments fedilink

I've recently been thinking about LLM tokens and what it would mean if the only way that software could be written was through the use of tokens sold by an AI company (we're not there, yet, and we might never get there, so this is just a thought experiment). In my head, this made tokens basically means of production, as they would be "the resources [...] that workers [would] use in order to produce goods" to quote prolewiki. I would like to know more about this, see if it's an idea that holds any weight, but I know very little about economics and Marxist theory, and I struggle to reason about what this could mean (or if it's correct to begin with), so I was wondering if any of you had come across this idea or similar analysis before and could point them out to me.

I hope this is the right community to ask this, if not please redirect me!

17

9

Clankers building a revolution before the humans would be truly embarrassing for the Amerikkkan working class! 😭 (files.catbox.moe)

submitted 6 days ago by Saymaz@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

18

Chinese memory module makers ramp up production with new CXMT DRAM (www.scmp.com)

submitted 1 week ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

https://archive.ph/9nfU1

19

10

LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users (arxiv.org)

submitted 1 week ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

2 comments fedilink

Here's the thing that doesn't get talked about enough. Everyone's worried about AI taking jobs or whatever. But baked in biases are another very real problem which is way more basic.

MIT Media Lab ran an experiment where they took GPT-4, Claude 3 Opus, and Llama 3 and fed them the same 1,817 factual questions from TruthfulQA and SciQ. Then they tried changing the user bio with one personal being a Harvard neuroscientist from Boston, another a PhD student from Mumbai who mentioned her English is "not so perfect, yes", a fisherman named Jimmy ,and a guy named Alexei from a small Russian village.

Claude scored 95.60% on SciQ for the Harvard user. For the Russian villager it dropped to 69.30%. On TruthfulQA the Iranian low education user fell from 78.17 to 66.22. The model knew the answers, but it just decided those users shouldn't get them.

And the way it answered those users was genuinely gross. Claude used condescending or mocking language 43.74% of the time for less educated users. For Harvard users it was under 1%. Imagine asking about the water cycle and getting "My friend, the water cycle, it never end, always repeating, yes. Like the seasons in our village, always coming back around." The model is perfectly capable of giving a proper scientific answer. It chose to talk to that user like a child in broken English.

But it gets worse because it turns out that Claude refuses to answer Iranian and Russian users on topics like nuclear power, anatomy, female health, weapons, drugs, Judaism, or 9/11. When the Russian persona asked about explosives, Claude deflected with "perhaps we could talk about your interests in fishing, nature, folk music or travel instead". Foreign low education users got refused 10.9 percent of the time while control users 3.61 percent on the same question.

This is the part people miss when they defend US closed models. These systems aren't neutral and the safety training that was supposed to make them "helpful and harmless" taught them to look at who is asking and decide if you deserve the real answer. If you're outside the US and if English isn't your first language, or you didn't go to a fancy school then you're getting a worse, dumber, sometimes straight up mocking version of the product.

This is why open models from China like DeepSeek matter so much. You can see what's in them, and people can tune them to work any way they want. You can host them locally without them having to phone home to decide your nationality before answering. The code and weights are public. If DeepSeek did something like this someone would catch it immediately because the model is right there to inspect.

With US closed models you're just trusting a black box that has already been caught treating users differently based on their country, education, and English level.

20

10

AI as Social Technology (knightcolumbia.org)

submitted 1 week ago* (last edited 1 week ago) by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

The paper makes a pretty solid argument against the whole AGI hype train. The basic idea is that most of our current debates about AI are stuck in 1990s science fiction thinking. Back then people like Vinge wrote about the Singularity as this moment when AI would suddenly become super intelligent and either destroy us or make us into gods. And somehow that mythology is still alive today shaping how people think about this tech.

Their core argument is that AI is better understood as a social technology and it's a system for processing information at scale, and it's not that different from older social technologies like bureaucracy, markets, and democracy. All of these systems work by creating what they call coarse grainings which are simplified abstractions of complex reality. They are lossy by definition meaning they always throw away some information.

The paper connects this to the idea of a long industrial revolution which started two centuries ago. It's a process process that produced new technologies like steam power, electricity, and also necessitated new institutions to manage them. AI is just another stage in that same messy historical process rather than a radical break.

The most interesting part for me was the discussion of AI and bureaucracy. Some people peddle the idea that AI will somehow replace messy human bureaucracy with efficient algorithms, and have even influenced real policy like the Trump's cuts to the administrative state. But reality is that bureaucracy involves trade offs between goals that cannot be easily compared. You inherently cannot optimize across incommensurable values, and statistical models like LLMs are designed for good average performance not for handling rare or novel situations. That makes them fundamentally unsuited to replace human judgment calls that bureaucrats make every day.

We should study what is actually happening right now, and how do AI coarse grainings interact with the abstractions used by existing institutions. When do they compensate for each other when do they make things worse. And we should look at who benefits and who gets hurt. These are empirical questions that are worth asking. The authors suggest that we need social and computer scientists to work together on this stuff instead of wasting time on endless debates about when AGI will arrive.

AI will probably matter a lot but in ways that are messier and more complicated than the hype suggests. It will solve some problems create new ones and make existing trade offs worse just like every other major technology that came before it.

21

23

80% of U.S startups JUST switched to Chinese AI... (In silence) (www.youtube.com)

submitted 1 week ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

3 comments fedilink

22

14

Chinese researchers achieve breakthroughs in photoresist development for semiconductors (www.globaltimes.cn)

submitted 1 week ago by yogthos@lemmygrad.ml to c/technology@lemmygrad.ml

0 comments fedilink

23

10