Technology

1134 readers
26 users here now

A tech news sub for communists

founded 2 years ago
MODERATORS
1
 
 
  • Thousands of Chinese researchers and scientists are leaving top jobs in leading US universities and companies, to take positions in China.
  • The Cambridge area of Massachusetts is home to Harvard, MIT, and scores of leading companies, and was the number one source of returning Chinese research and engineering talent.
  • In second place is the Palo Alto-Berkeley cluster, which includes Stanford, University of California, and Silicon Valley.
  • The migration of top scientific and engineering talent back to China is accelerating, but began nearly a decade ago. And while the political situation between China and the United States certainly is a major motivation for many scientists to return, more important is the quality of the education systems.
  • Chinese universities are now claiming the top spots across all the hard science disciplines, while American colleges are tumbling.

YouTube video.

2
3
4
5
6
 
 

Apple published a paper criticizing the capabilities of Large Language Models (LLMs) in reasoning and formal logic. The paper builds on previous arguments made by Gary Marcus and Subbarao Kambhampati about LLMs' limitations in generalizing beyond their training distribution.

The authors of the paper demonstrated that even the latest "reasoning models" fail to reason reliably on classic problems like the Tower of Hanoi. LLMs cannot solve the Tower of Hanoi reliably, even with the solution algorithm given to them.

The paper argues that LLMs are not a substitute for well-specified conventional algorithms and have limitations that are becoming clearer. LLMs are not a direct route to AGI and while the field of neural networks is not dead, current approach has clear limitations.

The paper highlights the importance of combining human adaptiveness with computational brute force and reliability in AI development.

7
8
 
 

AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could and it just takes more work. To prove it, AI researchers trained a model on a dataset that uses only public domain and openly licensed material.

What makes it difficult is curating the data, but once the data has been curated once, in principle everyone can use it without having to go through the painful part. So the whole "we have to violate copyright and steal intellectual property" is (as everybody already knew) total BS.

9
10
11
12
13
14
 
 

The project implements sparse multiplication and fuses up/down projections in the MLP layers through low rank weight activations. Work is based on Deja Vu and Apple's LLM in a Flash.

This approach avoids loading and computing activations with feed forward layer weights whose outputs will eventually be zeroed out.

It's a lossless approach as these weights anyway do not contribute in the current token prediction. It does however, need the predictors to be accurate in clustering the weights.

The result? 5X faster MLP layer performance in transformers with 50% lesser memory consumption avoiding the sleeping nodes in every token prediction. For Llama 3.2, Feed forward layers accounted for 30% of total weights and forward pass computation resulting in 1.6-1.8x increase in throughput:

Sparse LLaMA 3.2 3B vs LLaMA 3.2 3B (on HuggingFace Implementation):

- Time to First Token (TTFT):  1.51× faster (1.209s → 0.803s)
- Output Generation Speed:     1.79× faster (0.7 → 1.2 tokens/sec)  
- Total Throughput:            1.78× faster (0.7 → 1.3 tokens/sec)
- Memory Usage:                26.4% reduction (6.125GB → 4.15GB)
15
16
17
18
19
20
21
22
23
24
25
view more: next ›