Many new models were released this week, with Claude 3.7 Sonnet and grok 3.0 the highlights. We discuss an agentic way of extracting useful information from PDFs, and how AI has had a measurable effect on online written text.
This week's newsletter dives into the challenge of evaluating LLM output---how can we trust AI to judge AI. I will be speaking about this topic at Devworld conference in Amsterdam this week!
Today we discuss how good language models really perform when given large amounts of context, results from a Microsoft survey that show that over-reliance on genAI impacts our critical thinking, and some recent safety concerns regarding Deepseek's latest reasoning model R1.
This week we present a nice visual on how DeepSeek's R1 was trained, discuss recent legal battles against AI in Europe, and show a new way of doing data science using a new reactive notebook called Marimo!
This week we have some exciting news from OpenAI that published a new agent, we see trends of expertly crafted clickbait research, and we review one augmentation technique that received a lot of attention recently.
Whether the AI survives the hype, I feel so fortunate to live in this interesting time in human history. We are already able to do things that would have seemed like unimaginable to our grandma. It makes me really appreciate how fast the industry has progressed.