In This Issue:
👉 Introduction to AI Reasoners - Big Thinkers
👉 AI Reasoners and Creativity
👉 Student reproduces DeepSeek's R1 "Aha" Moment for $30
Reasoners - The AI Revolution No One Saw Coming
For most of AI’s recent history, chatbots have operated in a fairly predictable way: you ask a question, and they generate a response, token by token, in real-time.
That means the AI was effectively thinking while talking, which, as any human who’s ever blurted out something dumb can tell you, is not the best approach to complex reasoning.
So researchers found a workaround. Instead of having the AI respond instantly, they could tell it to "think step by step" before answering.
This method, called chain-of-thought prompting, significantly improved performance. But it was still a hack, something imposed externally rather than a core capability of the model.
Then came Reasoners.
The Birth of Thinking AI
Reasoners changed the game by automating the reasoning process.
Instead of diving straight into an answer, a Reasoner spends time producing “thinking tokens”—intermediate steps of reasoning—before committing to a final response.
This was a breakthrough for two key reasons:
- Better Thought Process: Instead of forcing a chatbot to simulate deep reasoning on the fly, Reasoners learn reasoning patterns from expert problem-solvers. This means they can generate higher-quality chains of thought than we ever could just by manually prompting them to “think step by step.”
- Longer Thinking = Smarter AI: The more a Reasoner thinks, the better its answer becomes. This is a radical shift from traditional AI development, where improving model performance meant making bigger and bigger models—a process that was both expensive and data-hungry. With Reasoners, you can improve AI performance not by scaling the model, but by letting it think longer before answering.
This shift has massive implications for the future of AI.
Big Thinker
Before Reasoners, AI advancement relied on one brute-force method: build bigger models with more parameters. If an AI wasn’t performing well enough, the answer was simple, throw more GPUs at it and train it on more data.
But that approach is hitting limits. Training massive models requires enormous amounts of energy, compute, and specialized hardware, making it an arms race between only the wealthiest AI labs. Reasoners introduce a different path: instead of making AI bigger, make it think better.
This change introduces inference-time compute as the new bottleneck. Instead of spending months training a behemoth model upfront, you let the AI spend more thinking time at the moment it needs to generate an answer. That means AI’s intelligence is no longer limited by the raw size of the model, but by how much computation we allow it to use per question.
And because inference-time compute is dynamic, it gives users control over AI quality. Want a quick answer? Get a fast but shallow response. Need deep reasoning? Let the model think longer. This is how humans operate—we don’t always use the same level of effort for every problem we solve.
This means AI is no longer a static intelligence. Instead, we can dial up or down its reasoning in real time.
Proof That It Works, Beating the Google-Proof Test
One of the biggest tests of AI reasoning ability is the Graduate-Level Google-Proof Q&A (GPQA) test. Unlike most knowledge tests, this one is specifically designed so that even PhDs with access to the internet struggle with it.
Here’s how humans score on it:
- 34% accuracy outside their area of expertise
- 81% accuracy inside their area of expertise
For a long time, AI models performed abysmally on this test. But with Reasoners, models have been rapidly improving. Just within months, OpenAI’s o1 models have been outpaced by o3, and China’s DeepSeek R1 has introduced a more cost-efficient way of achieving strong reasoning performance.
This isn’t just a theoretical breakthrough—it’s an applied one.
The Next AI Arms Race, Who Wins?
Because Reasoners are so new, the landscape is shifting rapidly. Every major AI lab is racing to build better models:
- OpenAI has already transitioned its latest models to be more reasoning-focused, with early versions of Reasoners appearing in their o1 series and significantly improving in o3.
- DeepSeek has been particularly aggressive, pioneering new ways to reduce compute costs while maintaining reasoning ability.
- Google has entered the game with its first Reasoner models, suggesting they recognize this as the future.
And this is only the beginning. If you thought AI was advancing quickly before, Reasoners are going to accelerate progress even faster.
What This Means for the Future
- AI Will Be More Accessible
With Reasoners, smaller models can achieve high performance simply by reasoning more. This means you won’t need a trillion-dollar budget to build powerful AI. A well-trained, efficiently run Reasoner could outperform larger models simply by thinking longer. - AI Performance Will Be Adjustable
Just like humans decide how much effort to put into a problem, future AI systems will let users choose the level of reasoning. Quick summary or deep analysis? The choice will be yours. - AI Will Move From Prediction to Understanding
Traditional AI models are great at predicting the next word in a sentence but bad at understanding complex, multi-step reasoning. Reasoners shift AI from guessing well to thinking well. - The Real AI Race is Just Beginning
If Reasoners can outperform massive models with just more thinking time, the industry’s entire incentive structure will change. Instead of centralizing all AI power in a handful of big companies, we could see more diverse, specialized AI systems. Startups could compete without needing billions of dollars in compute.
A More Human AI?
There’s something poetic about AI evolving in this direction. Instead of merely predicting words, AI is now learning to pause, reflect, and reason—things that make human intelligence so powerful.
For the first time, AI isn't just faster than us, it's starting to think like us. And that changes everything.
The Limits of Reasoners - Why Creativity is Still a Challenge
While Reasoners are great at tasks with clear right or wrong answers, their performance drops when the rules aren’t so rigid. In my own testing, I found that Reasoners—because they are trained using reinforcement learning—often struggle with creative tasks like storytelling, poetry, and even certain types of coding projects.
The problem lies in how they learn. Reinforcement learning is all about optimizing for well-defined success—rewarding the model when it gets the right answer and penalizing it when it doesn’t. This works brilliantly for structured tasks like math, logic, and factual reasoning, where there is an objective measure of correctness. But creativity doesn’t work that way.
Why Creativity Confuses Reasoners
When writing an essay, composing music, or coding something open-ended (like designing a new game mechanic rather than fixing a syntax error), there isn’t a single "correct" solution—just many possible good ones. And this is where Reasoners often fail.
Instead of generating imaginative or original content, they tend to become overly rigid or even boring, favouring responses that follow known patterns rather than taking risks. It’s as if, when faced with an open-ended creative challenge, they ask: “What’s the most likely correct answer?” rather than “What’s the most interesting or novel answer?”
I’ve seen this firsthand when testing Reasoners for creative writing tasks. Where traditional large language models like GPT-4 can come up with wild, unexpected plot twists or poetic turns of phrase, Reasoners often default to bland, predictable responses. Even when prompted with "be more creative," they seem hesitant to stray too far from safe, conventional ideas.
Coding - Not Always a Win for Reasoners
At first glance, coding seems like the perfect use case for Reasoners—after all, code has strict right and wrong answers, right? Not always.
For simple debugging or well-structured programming problems, Reasoners shine. They break problems down step by step, self-verify their work, and iterate toward correct solutions. But in projects that require architectural decisions, novel problem-solving, or unconventional approaches, Reasoners struggle. Instead of thinking like a creative engineer, they behave like an overly cautious assistant—repeating standard best practices rather than exploring outside-the-box solutions.
Can Reasoners Ever Be Creative?
The challenge of making AI both structured and creative is a fundamental one. Some researchers argue that a hybrid approach—combining reinforcement learning with techniques like self-supervised learning on creative datasets—might help Reasoners become more versatile.
But for now, if you need AI to solve math proofs, optimize code, or analyze data, Reasoners are a game-changer. If you need a compelling sci-fi short story or an original musical composition, you might still be better off with traditional LLMs that lean into randomness and pattern disruption rather than rigid correctness.
In the future, perhaps AI will be able to balance both structure and creativity, but for now, Reasoners remind us of an important truth: thinking is not the same as imagining.
Small Models, Big Brains - Why the Future of AI Might Cost Just $30
Imagine if you could recreate a major AI breakthrough for the price of a dinner. That’s exactly what happened when Jiayi Pan, a PhD student at UC Berkeley, took a $30 budget and a little bit of reinforcement learning magic to clone the core capability of DeepSeek R1—a model that, just months ago, was considered a marvel of open-source AI.
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works
— Jiayi Pan (@jiayi_pirate) January 24, 2025
Through RL, the 3B base LM develops self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Code: https://t.co/B2IsN1PrXV
Here's what we learned 🧵 pic.twitter.com/43BVYMmS8X
This wasn't just a fun experiment, it’s a signal that something bigger is happening. We may be looking at the dawn of an AI era where smaller, cheaper, and more specialized models outperform their bloated, billion-dollar counterparts. And that should make us rethink everything we assume about how AI evolves.
The Myth of Bigger is Better
The AI industry, much like Silicon Valley itself, has always been obsessed with scale. More data, bigger models, more compute. The assumption has been that intelligence comes from sheer size—stacking trillions of parameters together like a child stacking Lego bricks, hoping that at some point the whole thing becomes self-aware.
And to be fair, this has worked—for now.
Models like GPT-4 and Claude operate at a scale so large that training them requires energy consumption equivalent to a small country. The race for bigger, more powerful models has become a corporate arms race, with OpenAI, Google, and Anthropic throwing hundreds of millions of dollars at GPUs in the hopes of reaching the next threshold of intelligence.
But Jiayi Pan’s experiment shows a crack in this philosophy. What if, instead of making models bigger, we made them think better?
This is where the "aha moment" comes in.
The "Aha Moment" and Why It Matters
DeepSeek R1 has a particularly fascinating feature: it can spend more time thinking when a problem is hard. Instead of blindly rushing to an answer, it pauses, reevaluates, and iterates—just like a human would when solving a tough math problem.
In the AI world, this is called an "emergent capability," which is a fancy way of saying the model surprised us by doing something we didn’t explicitly teach it to do. That’s a big deal because it suggests that reasoning isn’t something we need to hard-code into models. Instead, it might just emerge naturally when the right incentives are in place.
Jiayi Pan's key insight was that you don’t need a gigantic model to make this happen. All you need is reinforcement learning applied in the right way.
With a carefully designed training loop and a well-defined reward function, even a 3-billion-parameter model (a fraction of the size of modern LLMs) was able to develop this emergent reasoning ability. And it did so on a budget that wouldn’t even cover a single hour of compute for GPT-4.
Reinforcement Learning - The Secret Weapon of AI
Reinforcement learning (RL) is not new. It’s the same approach that allowed DeepMind’s AlphaGo to dominate human Go players without ever studying human gameplay. Instead of mimicking humans, it played against itself, tweaking its strategy based on whether it won or lost.
Jiayi Pan’s experiment used RL in a similar way. His model was trained on The Countdown Game, a puzzle where players combine numbers using arithmetic to reach a target value.
Why was this game the perfect test?
- It has clear right and wrong answers. AI thrives in structured environments where success is easily measurable.
- It forces iterative thinking. The best solutions aren’t always obvious on the first try.
- It mimics real-world problem-solving. The model has to check and refine its work instead of just blurting out the first answer it comes up with.
The result? The model learned to self-verify its own outputs—checking whether an answer made sense before presenting it.
That’s huge. It means AI can be trained to think more carefully, rather than just thinking faster. And it means we might not need trillion-dollar models to get better results.
The Future - Millions of Tiny, Specialized Models?
If a $30 experiment can create an AI model that reasons more effectively, what does that tell us about the future?
One possibility is that we stop worshiping at the altar of scale and start focusing on specialization. Instead of one monolithic model that tries to do everything, the future might look like millions of tiny AI models, each trained to handle a specific type of task with extreme efficiency.
Think about it:
- Why train a massive model to do everything when you can have a tiny model that’s perfect at one thing?
- What if your AI assistant wasn’t just one giant model but a network of smaller models, each tailored to your needs?
- What if AI became so cheap that everyone could have a personal model, running locally on their phone, adapting to their unique way of thinking?
The shift would be profound. AI would become more personal, more affordable, and more energy-efficient. Instead of relying on centralized cloud services (which are expensive and slow), AI could run on local hardware, dynamically fine-tuning itself to the user’s preferences and needs.
The Open-Source Revolution - AI for Everyone
One of the biggest reasons this experiment was possible in the first place? Open-source AI.
DeepSeek R1 is available for anyone to study, tweak, and build upon. And as soon as its capabilities were published, independent researchers like Jiayi Pan started experimenting with it.
Contrast that with companies like OpenAI and Google, which treat their models like trade secrets, tightly controlling access and limiting research. While corporate AI labs are racing toward monopoly-like control over AI, open-source developers are quietly making breakthroughs that may be just as significant—if not more.
This moment feels a lot like the early days of personal computing. Back then, IBM and DEC dominated with massive, expensive mainframes. Then Apple and Microsoft came along and showed that powerful computing could be personal, decentralized, and accessible to everyone.
AI is approaching a similar inflexion point. The closed-source, mega-scale approach of corporate AI labs may not be the only path forward. Open-source AI, driven by independent researchers and small teams, is proving that small, cheap, and creative can still win.
So What Comes Next?
Jiayi Pan’s experiment is just the beginning. We are likely to see:
- More breakthroughs in small-model efficiency. What happens when we apply RL techniques to other tasks? Could we make small models even smarter?
- A surge in AI accessibility. If a $30 AI can reason like this, what happens when we push it further? AI may soon be in the hands of every developer, every researcher, every tinkerer.
- The decline of the "bigger is better" mentality. If smaller, cheaper models can think better, the industry will have to rethink its addiction to scale.
This could be the beginning of a new AI era—one where brains beat brawn, and where intelligence isn’t just about size, but about how well a model can think.
And if that’s true, the future of AI might be a lot smaller, and a lot smarter, than we ever imagined.