Back to all videos

AI Memory: Stop Building Stateless Agents

By Jack Herrington

AI Agent Development LLM Memory Systems Software Engineering

Share:

Key Concepts

Agentic Memory: The ability for an AI agent to retain, recall, and utilize information across different conversation sessions.
Episodic Memory: Information limited to the current chat session (context window).
Long-term User/Application Memory: Durable storage of facts, preferences, and summaries that persist across multiple sessions.
Memory Middleware: A software layer that intercepts chat turns to perform "recall" (before the response) and "retain" (after the response) operations.
Vector-based Search: A method of retrieving information based on conceptual similarity rather than exact keyword matching.
Memory Engine: The backend service (e.g., Hindsight, Mem0, Honcho, or custom SQLite) responsible for storing and retrieving facts.

1. Main Topics and System Architecture

The video introduces a prototype for TanStack AI memory, designed to move beyond simple "episodic" memory. The system is structured as a monorepo containing:

apps/web (Memory Bench): A testing application to compare different memory providers.
packages/ai-memory: The core library providing types, connectors, and a DIY framework for implementing memory.

The Memory Lifecycle:

Recall: Before a user prompt is processed, the system queries the memory engine for relevant facts and injects them into the LLM context.
Retain: After the LLM generates a response, the system analyzes the transcript to extract new facts and stores them in the database.
Tool-based Memory: Some providers (like Hindsight) allow the LLM to use "tools" to proactively store or retrieve information during the conversation.

2. Types of LLM Memory

The author categorizes memory into four distinct types:

Context Window Memory: The immediate chat history (episodic).
Model Memory: Information inherent to the model's training data.
Working Memory: Information processed within a single, specific response.
Long-term Memory: The focus of this project; durable storage of user preferences and observations.

3. Implementation and Frameworks

The system supports three primary third-party vendors and a custom DIY solution:

Vendors: Hindsight (noted for tool support), Mem0, and Honcho.
DIY Implementation: Uses SQLite for storage, OpenAI for vector embeddings, and Anthropic for the LLM to extract facts.

Step-by-Step Integration:

Configuration: Define the scope (user identity and session context).
Middleware Setup: Use createMemoryMiddleware to wrap the chat engine.
Execution: Add the middleware to the application's middleware array. The system automatically handles the recall/retain phases.
Tool Injection: For engines supporting tools, retrieve scope-based tools and add them to the agent's tool array.

4. Key Arguments and Perspectives

Concept vs. Keyword Search: The author argues that vector-based search is superior for AI memory because it allows the agent to understand the intent behind a query rather than relying on exact string matches.
Flexibility: By creating a generic memory provider interface, the author allows developers to swap between cloud-based vendors and local, self-hosted solutions without changing the core application logic.
Incognito Mode: The system supports "retain-only" roles, allowing users to contribute to their memory profile without necessarily recalling previous facts in a specific session.

5. Notable Quotes

"What I really want to do is actually retain information between chats."
"Recall... before the turn of the conversation, I'm going to take your input as the user and then I'm going to inject those into the system context."

6. Synthesis and Conclusion

The TanStack AI memory prototype provides a robust, modular framework for adding long-term intelligence to AI agents. By abstracting the memory engine, the system allows developers to choose between managed services (Hindsight, Mem0, Honcho) or a custom SQLite-based local implementation. The core takeaway is that effective agentic memory requires a dual-phase approach—Recall to prime the LLM with relevant context and Retain to continuously update the user's knowledge base—ultimately enabling more personalized and context-aware AI interactions.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video