DeepSeek R1 viral success and more!

Posted on

This week we present a nice visual on how DeepSeek’s R1 was trained, discuss recent legal battles against AI in Europe, and show a new way of doing data science using a new reactive notebook called Marimo!

A visual guide to DeepSeek’s R1 reasoning model

The new big player in the AI playing field is the Chinese company DeepSeek. They came as a surprise when out of nowhere they released a new large language model that was much cheaper and faster to train than competitors, performing on par or even slightly better than OpenAI’s latest models, while also being open source.

To understand how R1 became the new #1, we created a few visuals on how the model was trained:

Click to enlarge
Click to enlarge

The Training Journey

  1. Start with DeepSeek V3 as the base

  2. First round of SFT of V3 with synthetic Chain-of-thought dataset

  3. Use reasoning-oriented RL with rule-based reward model ⇒ V3 RL checkpoint

  4. Second round of SFT of V3 with synthetic reasoning and non-reasoning dataset

  5. Finally, second round of comprehensive RL optimization for reasoning, helpfulness, and harmlessness with rule-based and automated preference-based reward

The synthetic datasets that were used in training process**

  • Initial SFT Round: Generated high-quality reasoning datasets using V3 with Chain-of-Thought prompting. The dataset is then refined through post-processing by human annotators.

  • Second SFT Round: Created 800k examples (600k reasoning + 200k non-reasoning) using the RL checkpoint

  • Quality Control: LLM-as-a-judge and rejection sampling (generate multiple responses and filter high-quality responses) to ensure only the best examples make it through

Compared to other reasoning models, this training scheme by DeepSeek seems to requires less human input, making it cheaper and faster to train.

AI lawsuits reaching Europe

With the vast amount of data that AI is trained on, it is impossible to filter out copyrighted material. And sure enough lawsuits started being filed against the larger AI players. Originality ai recently listed 18 lawsuits going on against OpenAI in the US. Most of them related to copyright infringement:

  • intellectual property rights are violated
  • copyrighted material is used without permissions and without compensation

In the last few weeks we saw that this trend has flown over to Europe as well. In the Netherlands, two large Dutch large language models were taken offline under pressure of the Dutch joint content protection program, BREIN. While the German performance rights organization GEMA opened a lawsuit against Suno inc., a provider of AI-generated music.

Other concerns that are heavily discussed are respecting user data privacy, where Italy recently banned DeepSeek from offering their services in the country.

Now that legal and ethical challenges are catching up fast and having Europe enter the the legal battleground, the coming months might show surprising developments in how we can use AI in our daily lives.

Marimo notebook

Tired of running old notebooks where you can’t reproduce your results, or even worse, the code is not runnable because some definitions were either redefined or deleted? What about when you want to share some quick results with your stakeholders and you have to endlessly scroll through your notebook? Well, there’s good news! Marimo is a new type of notebook that solves these (and more) problems.

  • it is a reactive notebook. A change in one cell will be immediately reflected everywhere to ensure consistency (no more problems of having a notebook that doesn’t run after restarting the kernel).
  • easy to version control (internally it’s all saved as a raw python file).
  • the environment is saved in the notebook, you can share with anyone and they can run it locally without bothering about missing requirements (powered by uv).
  • great for simple apps and demos! It comes with many cool widgets and you can easily deploy it online.

Plenty of example can be found here (all of them runnable in your web browser!)

AI Hype?

Last newsletter we mentioned that we noticed more and more scientific articles use clickbait-like titles. This time we have some cool visuals to illustrate this!

Click to enlarge

What we were reading this week

  • A new startup is trying out some of Roger Penrose’ ideas on consciousness by combining quantum computing and AI. Hype or not? (video)
  • An AI decided to rewrite itself to overcome the restrictions imposed by its creators. (article)
  • Is AI too unpredictable to align with human goals and intentions? (article)

Join Our Newsletter

Subscribe

* indicates required

Intuit Mailchimp