a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by mk
mk  ·  2 days ago  ·  link  ·    ·  parent  ·  post: The race for "AI Supremacy" is over — at least for now.

    What they did was to continue developing open source LLMs, which were trained at great expense by others, right?

It doesn't seem like it. These chain-of-thought models kinda broke the mold.

https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda





mk  ·  2 days ago  ·  link  ·  

    The team at DeepSeek wanted to prove whether it’s possible to train a powerful reasoning model using pure-reinforcement learning (RL). This form of "pure" reinforcement learning works without labeled data.

    Skipping labeled data? Seems like a bold move for RL in the world of LLMs.

    I've learned that pure-RL is slower upfront (trial and error takes time) — but it eliminates the costly, time-intensive labeling bottleneck. In the long run, it’ll be faster, scalable, and way more efficient for building reasoning models. Mostly, because they learn on their own.

https://www.vellum.ai/blog/the-training-of-deepseek-r1-and-ways-to-use-it