The AI Memory Problem

(why context is your secret weapon)

There are tons of techniques for getting more out of LLMs like ChatGPT.

Meta prompting, tweaking Custom Instructions, prompt engineering frameworks, crafting longer conversations -- the list goes on.

And these techniques can help. I've written about many of them myself. But in the end, they're mostly working around a fundamental limitations of LLMs.

The next real breakthrough probably isn't going to come from cleverer prompts or better instructions. It's going to come from solving AI's memory problem.

So today, I want to break down:

  • Why AI systems fundamentally have no memory

  • The current workaround: using context as a memory substitute

  • How RAG (Retrieval-Augmented Generation) scales this approach

  • Why real AI memory is the next frontier

Why AI Has No Memory (The Fundamental Problem)

Large language models (LLMs) are just like that friend who can tell you everything about history, science, and literature, but can't remember what you talked about in your last conversation.

They learn an enormous amount during pre-training -- essentially cramming large parts of the internet into their "brain."

But here's the fundamental issue: they have no memory after that training point. Every time you start a new conversation with an LLM, its brain completely resets and goes back to just its pre-training data.

ChatGPT and Claude have started experimenting with some cross-conversation memory features, which has already made the chatbots far more useful. But once full memory capabilities arrive, it'll be transformational.

Right now:

  • Chatbots don’t remember all of your previous conversations

  • They can't fully learn your preferences over time

  • By default, they have no context about your business, your role, or your specific challenges

This is why you find yourself constantly re-explaining the same context: who you are, what your company does, what you're trying to accomplish in every new chat. The AI isn't being stubborn -- it literally doesn't remember.

And this memory limitation isn't just an inconvenience. It's one of the primary barriers preventing AI from being truly useful for complex, ongoing work where context and history matter.

Context: The Current Memory ‘Hack’

All this said, there is a current workaround that most people don't use strategically enough.

The way we give information to the LLM beyond the training data is called giving it context. And when it comes to making the most of AI, ✨context is queen✨. The better the context, the better the results.

When we interact with an AI chatbot like ChatGPT, it puts everything in what's called the "context window."

You can visualize the context window as a single large sheet of paper. Anything we want to send to the LLM goes in the context window, and anything the LLM responds with also has to fit in that same space.

This is why longer conversations with ChatGPT eventually start to "forget" things from earlier in the chat -- the context window has a limit, and older information gets pushed out to make room for new information.

But what if you had thousands or hundreds of thousands of documents that you wanted the LLM to know about? You can't fit all that in a context window.

That's where something called Retrieval-Augmented Generation, or RAG, comes into play.

Here's how it works: imagine you had a large set of documents like every knowledge-base article in HubSpot, every website page, every customer conversation, every internal memo. We can't fit all that in the context window, but we can store all of them in what's called a vector database.

A vector database is a special kind of database that allows searching for documents based on the meaning of the content, not just keyword matching. It's like having a librarian who understands what you're actually looking for, not just the words you use.

When a question comes in, the system:

  1. Searches the vector database for the most relevant documents

  2. Pulls those specific documents

  3. Puts them into the context window along with your question

  4. The LLM then generates a response based on that specific, relevant information

This whole process is called Retrieval-Augmented Generation (RAG) because we're retrieving relevant documents, augmenting the LLM with them, so it can generate a better, more grounded response.

The beauty of RAG is that it feels like the AI has access to all your company's knowledge, when really it's just getting really good at finding the right information at the right time.

The Next Frontier: Real AI Memory

All of these approaches -- context windows, RAG, even the Custom Instructions I wrote about recently -- are fundamentally ‘hacks’. They're workarounds for the fact that AI systems don't have true memory.

But that's about to change soon.

The next frontier in AI is going to be about memory. Real memory. AI systems that can learn from every interaction, remember your preferences, build understanding over time, and carry context across all your conversations and tasks.

I'm incredibly excited about this direction. I’m an investor in a startup called Mem0 which is working on global memory that works across AI applications/agents. The HubSpot team is also working on infusing memory in our own AI products. We're exploring what AI looks like when it can truly remember and learn, not just retrieve and generate.

We're still in the early days of this transition, but I believe memory will be the defining characteristic that separates truly useful AI from the chatbots we have today.

The future of AI is about models that remember, learn, and grow alongside us and our businesses.

—Dharmesh (@dharmesh)

What'd you think of today's email?

Click below to let me know.

Login or Subscribe to participate in polls.