- simple.ai - The Agent AI newsletter
- Posts
- I stayed up until 3AM testing Claude 4...
I stayed up until 3AM testing Claude 4...
And honestly, I have no regrets

I had every intention of enjoying my long weekend and getting some rest.
Then Anthropic dropped Claude 4 Opus and Sonnet, and my plans went straight out the window. What started as "I'll just test this quickly" turned into a 3 AM deep-dive session that left me excited about where AI coding is headed.
Oh, and they also announced voice mode in beta on mobile yesterday. Anthropic has been absolutely everywhere in the news lately, and for good reason.
In today's newsletter, I’m breaking down:
What's actually new in Claude 4 (and why it matters)
My honest reactions after extensive late-night testing
A simple idea that could make AI memory much more useful

Claude 4: The "World's Best Coding Model"
In case you missed it over the weekend, Anthropic launched Claude Opus 4 and Sonnet 4, which are their next-generation models that can think through problems step-by-step while using external tools.
Here's what caught my attention:
Hybrid thinking modes: Similar to Claude 3.7, both models can switch between instant responses and extended reasoning, with visible summaries showing their thought processes.
Benchmarks: Opus 4 achieved 72.5% on SWE-bench (a benchmark used to measure software engineering capabilites) and can code autonomously for hours. Sonnet 4 replaces the previous 3.7 version with significant upgrades across the board.
New capabilities: Parallel tool use, memory functions for maintaining context across tasks, and integration with IDEs via Claude Code extensions.
Enhanced security: Anthropic has implemented ASL-3 security measures with safeguards against potential misuse.

Image credits to Anthropic
This release caps off what's been an incredible past couple weeks in AI, with Anthropic positioning Claude 4 as the "world's best coding model."
Now, whether that claim holds up remains to be seen… but my early testing suggests they might not be exaggerating.

My 3AM Testing Marathon (No Regrets)
So what kept me up until the early hours?
I got straight into Claude Code (which is now generally available, by the way), using it to build test applications that query MCP servers, discover available tools, and process prompts through those tools.

Image credits to Anthropic
Now, I wasn't building the next HubSpot or revolutionizing software development, but the experience left me genuinely impressed.
Here's the best way to describe it: Claude 4 Opus + Claude Code feels like having an actual AI software engineer on your team. Not an assistant that needs constant hand-holding, but a collaborator who understands what you're trying to accomplish.
What makes it different from other coding agents is that interacting with it feels like how you would a human coworker. You don't need to learn special prompting techniques or adapt your workflow to the tool - the agent adapts to you.
I also spent a surprising amount of time just exploring one of my MCP servers and browsing through available tools. It was oddly satisfying, like discovering new features in software you thought you knew inside and out.
(Speaking of which, if you haven't set up an MCP server yet, I highly recommend it - check out my previous post to get started).
My one criticism is that I'm craving a larger context window. When models have truly usable context at scale, it unlocks entirely new ways of working - especially for complex, multi-step projects.
But even without the larger context, this kept me coding happily until 3 AM. That says something.

A Simple Idea: Personalized AI Memory

All this testing got me thinking about a broader opportunity I see in AI.
AI should not just have bigger and better memory, but more personalizable memory.
This is one of the big opportunities in AI right now. By giving models a "memory," they can learn more about us and deliver better results over time. This is great, and I'm a fan.
But along with building bigger and better memory, there's an opportunity to make the memory more customizable.
Here's a simple implementation: Imagine that in ChatGPT or Claude, there was a way to enter "Memory Instructions."
Example: "Remember things I tell you to improve future responses, especially related to business/work. DON'T remember things I reveal about health or anything related to my family. Don't keep memories around for longer than a year unless they are factual and do not change. The world changes quickly and my thinking or position on a topic may change too."
It's a bit like having a real personal assistant. You may have them open any mail that is a bill or junk mail — but not personal mail. You may want them to remember that you prefer an aisle seat when traveling, but not that you binge-watched a guilty-pleasure TV show last week.
The idea is to have some degree of control over what the system remembers. This would help people get more comfortable with the idea of AI having memory and context.
It's such a straightforward concept that I'm honestly surprised no one has implemented it yet.

The Future Feels Closer Than Ever
After my unplanned late-night session with Claude 4, the thing that’s most clear is that we're moving rapidly toward AI that doesn't just assist with tasks but truly collaborates on them.
The combination of improved reasoning, seamless tool integration, and natural interaction patterns creates something that feels less like using software and more like working with a very capable partner.
The coding improvements alone were worth losing sleep over. But the broader implications—how these capabilities will democratize software development, enable new forms of creativity, and change how we approach complex problems—are what really keep me excited about this space.
We're getting to the point where working with AI feels less like managing a tool and more like collaborating with a colleague.
Now if you'll excuse me, I’m going to go back to tinkering. And maybe take a nap.
—Dharmesh (@dharmesh)


What'd you think of today's email?Click below to let me know. |