1

5

The Attention Mechanism Born for Cost Optimization (oilbeater.com)

submitted 2 weeks ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

2

5

dcdaML - devanagari character detection dataset training framework (github.com)

submitted 2 weeks ago by thickertoofan@lemm.ee to c/machinelearning@lemmy.ml

5 comments fedilink

cross-posted from: https://lemm.ee/post/61282397

Open sourcing this project I made in just a weekend, planning to continue this in my free time, with synthetic data gen and some more modifications, anyone is welcome to chip in, I'm not an expert in ML. The inference is live here using tensorflow.js. The model is just 1.92 Megabytes!

3

5

Neural Graffiti is an experiment in adding a "Spray Layer" to a transformer model, which injects a memory trace into the final stages of inference without finetuning or retraining (github.com)

submitted 3 weeks ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

4

-6

Breaking GPT-5 News! (lemmy.world)

submitted 4 weeks ago by fubarx@lemmy.world to c/machinelearning@lemmy.ml

2 comments fedilink

cross-posted from: https://lemmy.world/post/27657674

5

4

I want to open source a dataset but I'm not sure what license to use (lemmy.world)

submitted 4 weeks ago* (last edited 4 weeks ago) by 4Robato@lemmy.world to c/machinelearning@lemmy.ml

6 comments fedilink

Hello!

I did a map generator(it's pixel art and the largest are 300x200 pixels) some time ago and decided to generate 3 types of map sizes and 1500 maps for each size to train a model to practice and I thought to do that dataset open source.

Is that really something that people want/appreciate or not really? I'm a bit lost on how to proceed and what license to use. Does it make sense to use an MIT License? Or which one do you recommend?

thanks!

6

Why do LLMs make stuff up? New research peers under the hood. (arstechnica.com)

submitted 1 month ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

7

8

MLOps tips I gathered recently (www.readyforagents.com)

submitted 1 month ago by oba@lemmy.world to c/machinelearning@lemmy.ml

0 comments fedilink

Hi all,

I've been experimenting with building and deploying ML and LLM projects for a while now, and honestly, it’s been a journey.

Training the models always felt more straightforward, but deploying them smoothly into production turned out to be a whole new beast.

I had a really good conversation with Dean Pleban (CEO @ DAGsHub), who shared some great practical insights based on his own experience helping teams go from experiments to real-world production.

Sharing here what he shared with me, and what I experienced myself -

Data matters way more than I thought. Initially, I focused a lot on model architectures and less on the quality of my data pipelines. Production performance heavily depends on robust data handling—things like proper data versioning, monitoring, and governance can save you a lot of headaches. This becomes way more important when your toy-project becomes a collaborative project with others.

LLMs need their own rules. Working with large language models introduced challenges I wasn't fully prepared for—like hallucinations, biases, and the resource demands. Dean suggested frameworks like RAES (Robustness, Alignment, Efficiency, Safety) to help tackle these issues, and it’s something I’m actively trying out now. He also mentioned "LLM as a judge" which seems to be a concept that is getting a lot of attention recently.

Some practical tips Dean shared with me:

Save chain of thought output (the output text in reasoning models) - you never know when you might need it. This sometimes require using the verbos parameter.

Log experiments thoroughly (parameters, hyper-parameters, models used, data-versioning...).

Start with a Jupyter notebook, but move to production-grade tooling (all tools mentioned in the guide bellow 👇🏻)

To help myself (and hopefully others) visualize and internalize these lessons, I created an interactive guide that breaks down how successful ML/LLM projects are structured. If you're curious, you can explore it here:

https://www.readyforagents.com/resources/llm-projects-structure

I'd genuinely appreciate hearing about your experiences too—what’s your favorite MLOps tools? I think that up until today dataset versioning and especially versioning LLM experiments (data, model, prompt, parameters..) is still not really fully solved.

8

2

DeepSeek open source DeepEP – library for MoE training and Inference (github.com)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

9

2

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (transformer-circuits.pub)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

10

1

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

11

1

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (arxiv.org)

submitted 3 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

12

1

Neurosymbolic AI -- Why, What, and How (arxiv.org)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

Neurosymbolic AI is a hybrid approach aiming to bridge the gap between neural networks' ability to learn patterns and symbolic AI's capacity for logical reasoning and explainability.

This approach may offer the best of both worlds combining robust learning from data and clear with understandable reasoning based on knowledge. It has the potential to outperform systems relying solely on either neural networks or symbolic logic and to provide clear explanations for its decisions.

The approach involves encoding structured symbolic knowledge into a format that can be integrated with neural networks and then mapping information from neural patterns back to structured symbolic representations.

13

1

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence (arxiv.org)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

14

1

Genie 2: A large-scale foundation world model (deepmind.google)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

15

1

A good primer on what to expect running local LLMs (nullprogram.com)

submitted 5 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

16

1

A community statement supporting the Open Source Definition (OSD) (osd.fyi)

submitted 6 months ago* (last edited 6 months ago) by Shamar@feddit.it to c/machinelearning@lemmy.ml

0 comments fedilink

Declaration

We, the undersigned members of the Open Source community, assert that Open Source is defined solely by the Open Source Definition (OSD) version 1.9.

Any amendments or new definitions shall only be recognized if declared by clear community consensus through a transparent process to be determined.

17

1

How ‘Embeddings’ Encode What Words Mean (www.quantamagazine.org)

submitted 7 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

18

1

New AI model “learns” how to simulate Super Mario Bros. from video footage (arstechnica.com)

submitted 7 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

19

1

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o) (huggingface.co)

submitted 7 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

"Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

The technique that drives Reflection 70B is simple, but very powerful.

Current LLMs have a tendency to hallucinate, and can’t recognize when they do so.

Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users.

Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg’s LLM Decontaminator.

The weights of our 70B model are available today on @huggingface here: https://huggingface.co/mattshumer/Reflection-70B

@hyperbolic_labs API available later today.

Next week, we will release the weights of Reflection-405B, along with a short report going into more detail on our process and findings.

Most importantly, a huge shoutout to @csahil28 and @GlaiveAI.

I’ve been noodling on this idea for months, and finally decided to pull the trigger a few weeks ago. I reached out to Sahil and the data was generated within hours.

If you’re training models, check Glaive out.

This model is quite fun to use and insanely powerful.

Please check it out — with the right prompting, it’s an absolute beast for many use-cases.

Demo here: https://reflection-playground-production.up.railway.app/

405B is coming next week, and we expect it to outperform Sonnet and GPT-4o by a wide margin.

But this is just the start. I have a few more tricks up my sleeve.

I’ll continue to work with @csahil28 to release even better LLMs that make this one look like a toy.

Stay tuned."

https://x.com/mattshumer_/status/1831767014341538166

20

1

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI (www.lifeiscomputation.com)