- simple.ai by @dharmesh
- Posts
- How to Make Cheaper AI Models Work Smarter
How to Make Cheaper AI Models Work Smarter
Designing token conscious AI workflows
When I talk with people about how they’re using AI, a question they often ask me is: which model should I be using?
The truth is -- for most people -- it comes down to personal taste. While I often rely on Anthropic’s Claude Code (with Opus), it wouldn’t be a stretch for me to switch over to OpenAI’s Codex for a weekend, for the sake of experimentation.
In fact, this is exactly what I did a couple days ago when I plugged the newest and most powerful model from Claude (Fable 5) into my dad joke generator. You can try that here.
Whether you prefer ChatGPT or Claude, there’s still a strategic decision left to make when it comes to selecting the specific model you’ll entrust to answer your questions, help you come up with ideas, and carry out your work.
Today, most people use the same cutting-edge model that helps them refactor their code to answer questions about pasta recipes. That’s a wonderful luxury to indulge in!
But as we enter a new tier of token-hungry AI models, such as Claude’s Fable 5, we may need to be more conscious about selecting the right model for the job.
Learning how to design token conscious AI workflows today could be something your future self thanks you for, if this is how things play out.
Today, I want to break down:
How to get maximum leverage out of top models (like Fable 5)
How the strongest models can make cheaper models work smarter
A short practical test to find your new daily driver model

Picking The Right Model
Which AI model is right for your specific workflows?
Today it’s easy to default to the best of the best models. If you use AI sparingly and don’t mind waiting a little longer for responses, this is the path to take. There’s little reason to choose a weaker model if you’re at no risk of hitting usage limits.
But this could change.
Your AI usage could increase as you learn how to leverage it for more and more tasks. That’s one reason. Another reason could be the best models from here on out use more and more tokens, forcing most AI users to become more token efficient when designing AI workflows.
If your social feeds are anything like mine, you’re seeing a some AI users complain about how quickly their usage limits are being hit after experimenting with Anthropic’s Claude Fable 5. The stronger the model, the fewer messages you’re able to exchange with it before you hit your usage limit.
Back in 2023, when the models were much weaker and context windows were much smaller, I wrote about the idea of cognitive composability. A modified version of that idea that still applies today sounds something like this: the strongest model plans the work up front and reviews it once it’s done -- while a weaker, “everyday” model handles the execution.
Let’s talk about what that looks like in practice.

Make The Strongest Models Do The Thinking
Writing a good process is real thinking work. Following a process somebody else wrote down for you is considerably easier. I think you see where I’m going with this.
Instead of asking the smartest, most token intensive version of (public) artificial intelligence in the world to do everything for you, consider having it act as the orchestrator for cheaper-yet-capable AI models.
Restaurants figured this out a long time ago. The head chef doesn't cook every plate that leaves the kitchen. The head chef makes the menu and tastes the food before it goes out. The cooks -- also talented, but clearly junior to the head chef -- handle everything in between.
Let’s say you get a pile of customer feedback emails every week. Here’s a prompt you can use to test the simplest version of this concept:
I need to turn a week of customer feedback emails into a summary of the top themes, with one representative quote per theme. Write instructions that a less capable AI assistant could follow to do this well every time. Include the steps in order, what a great final output looks like, the most common mistakes to avoid, and a short checklist for verifying the result.
Then, paste the instructions you get back into a lesser model. If you’re not sure how to change the model you’re using, you can always just ask AI for help.
Both ChatGPT and Claude interfaces show the model dropdown selector to the left of the microphone icon. ChatGPT users could try GPT-5.4, and Claude users could try Sonnet 5 -- but really, it’s totally up to your discretion.
For bonus points, open a new conversation with the strong model and have it review the output, preceded by the following prompt:
Here is the plan that was followed, and here is the result that was returned. Grade the result against the plan and flag anything missing or wrong.
If you read last week’s newsletter on subagents, you’ll recognize why this review step works especially well: fresh eyes, clean context, and no attachment to the work being graded.

Try It On Your Own Work
Unless you’re constantly hitting your usage limits, you don’t need to redesign how you work with AI this week. Though I’d still recommend trying the experiment above.
There’s a chance you don’t even notice a meaningful difference using a “lesser” model. Pull up three tasks you completed with AI this week and re-run them on a model one tier down, then compare the results side by side.
No benchmark can test this for you, because nobody else does your work. Solving your actual day-to-day problems is still the fastest way to level up your ability to leverage AI.
—Dharmesh (@dharmesh)
By the way, if you tried the dad joke generator I mentioned in the intro, reply to this email and let me know what you think and share your favorite one. 🙂


What'd you think of today's email?Click below to let me know. |
