technology

23878 readers

11 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 5 years ago

MODERATORS

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net

Hierarchical Reasoning Model (arxiv.org)

submitted 2 days ago by yogthos@lemmygrad.ml to c/technology@hexbear.net

0 comments fedilink hide all child comments

https://github.com/sapientinc/HRM

Hierarchical Reasoning Model is a new architecture that's inspired by neural computation principles observed in the brain, such as hierarchical processing, temporal separation of neural rhythms, and recurrent connectivity.

The bio-inspired design demonstrates significantly improved efficiency and accuracy on complex reasoning tasks compared with current LLMs.

The HRM architecture is designed to achieve significant computational depth while maintaining stability and efficiency during training. It consists of two interdependent recurrent modules operating at different speeds.

The High-Level module operates slowly and is responsible for abstract planning and deliberate reasoning. The Low-Level module functions rapidly, handling detailed computations.

A dual-module system allows the HRM to perform sequential reasoning tasks in a single forward pass without needing explicit supervision of intermediate steps. The model is also designed to be Turing-complete, meaning it can theoretically simulate any Turing machine, overcoming the computational limits of standard Transformer models.

Another interesting feature is the use of one-step gradient approximation, which improves efficiency by avoiding the computationally intensive backpropagation through time method typically used for recurrent networks. Avoiding backpropagation offers a constant memory footprint, making the model more scalable.

The model also incorporates an Adaptive Computation Time mechanism, inspired by the brain's ability to switch between fast, automatic thinking and slow, deliberate reasoning. The HRM is thus able to dynamically allocate computational resources based on the complexity of the task.

Despite having only 27 million parameters, the HRM achieves nearly perfect performance on difficult tasks like complex Sudoku puzzles and finding optimal paths in large mazes, areas where even advanced models using Chain-of-Thought (CoT) methods fail completely.

The HRM also outperforms much larger models on the Abstraction and Reasoning Corpus benchmark for artificial general intelligence. It achieved a 40.3% accuracy, surpassing models like 03-mini-high (34.5%) and Claude 3.7 8K (21.2%).

The model's design means that its training phase is much cheaper as well. It can be trained effectively with a small number of examples (around 1,000) and does not require pre-training or CoT data.

HRM conducts computations within its internal hidden state space which is more efficient than CoT where reasoning is externalized into token-level language. The externalization process can be brittle and requires extensive data to work.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here