Reasoning Models: When to Pay a Model to Think
A new class of models thinks before it answers — burning extra time and tokens to reason through hard problems. Sometimes that’s transformative. Often it’s an expensive way to answer a question a fast model already nails.
Somewhere around 2025, models split into two temperaments. The familiar kind answers immediately, streaming tokens the moment you hit enter. The newer kind — reasoning models, the o1 and DeepSeek-R1 lineage — pauses first, working through a hidden chain of thought before it says anything. That pause is the product: they spend extra computation at answer time to reason, and on hard problems it shows.
The catch is that the pause isn’t free. Reasoning models are slower, and they bill you for all that invisible thinking. Used on the right problem, the trade is a bargain — a correct answer to something a fast model fumbles. Used on the wrong one, it’s paying premium rates and waiting several seconds for a question that needed neither. Knowing which is which is the whole skill.
What a reasoning model does differently
Under the hood the difference is less mysterious than it sounds. A reasoning model is trained to generate a long internal reasoning trace before its final answer, and that changes both its economics and its behavior.
- It thinks in hidden tokens. Before answering, it produces a chain of thought you usually don’t see — but you pay for those tokens, often at a higher rate.
- It spends compute at answer time. This is “test-time compute”: instead of being smarter only from training, it gets better answers by thinking longer on each request.
- It’s slower by design. Time-to-first-token can stretch from milliseconds to many seconds, because the model is reasoning before it speaks.
- It shines on hard, verifiable problems. Math, logic, multi-step code, and planning — tasks where working through steps beats pattern-matching an answer.
When the thinking pays off
Reasoning models earn their cost on a specific class of work. If your task looks like these, the extra time and tokens buy real accuracy.
- Hard reasoning, math, and code. Problems with a right answer that takes several correct steps to reach — exactly what a chain of thought is for.
- Multi-step planning. Breaking a goal into ordered actions, like the planning brain of an agent, benefits from deliberate thinking over a snap response.
- High-stakes correctness. When a wrong answer is expensive — a financial calculation, a medical summary — paying for more careful reasoning is cheap insurance.
- The hard parts of an agent. Use a reasoning model for the agent’s planning and a fast model for its many simple steps; don’t make one model do both.
When it’s overkill
For most requests, a reasoning model is the wrong default — slower and pricier with no benefit. Fast models already nail the bulk of real traffic, and that’s where they belong. The pattern most production systems settle on is to route by difficulty:
# Route to a reasoning model only when the task is actually hard.
def answer(task):
if is_hard(task): # multi-step logic, math, planning
return reasoning_model(task) # slower, pricier - thinks first
return fast_model(task) # most requests land here, instantly
def is_hard(task):
# cheap heuristics beat paying the reasoning tax on everything
return task.needs_planning or task.is_math or task.step_count > 3That router is the pattern most production systems converge on: a cheap, fast model handles the common case, and only the genuinely hard requests get escalated to the reasoning model. You pay for thinking exactly when thinking is worth it, and nowhere else.
A reasoning model is the specialist you call in for the hard cases, not the receptionist who answers every call. Pay for thinking when the problem rewards it — and let a fast model handle the rest.
Where teams get it wrong
- Using it for everything. Routing simple lookups and chit-chat through a reasoning model is the most common waste — slow, costly, and no better.
- Ignoring latency. The thinking pause is fine for a batch job and terrible for a live chat. Match the model to the interaction, not just the task.
- Not capping the thinking. Left unbounded, a reasoning model can burn a large token budget on one hard prompt. Set limits on reasoning effort where the API allows.
- Exposing the chain of thought. The hidden reasoning can be verbose, weird, or revealing. Show the user the answer, not the model’s scratchpad.
- No fast fallback. When the reasoning model is slow or unavailable, a system with no cheaper path just stalls. Keep a fast model in the loop.
Match the model to the problem
The arrival of reasoning models didn’t make fast models obsolete; it gave you a second gear. The teams using them well aren’t the ones who switched everything over — they’re the ones who learned to tell a hard problem from an easy one and route accordingly, paying the reasoning premium only where it changes the answer.
It’s the same lesson that keeps recurring in AI engineering: there’s no single best model, only the right model for the job in front of you. A reasoning model is a powerful, expensive tool. Reach for it when the problem genuinely needs to be thought through — and trust a fast model with everything else, which is most things.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter