Back to all videos

Hierarchical Memory: Context Management in Agents — Sally-Ann Delucia

By AI Engineer

Constraint: No broad terms (e.g.AI Technology").LLM Memory Management:* Long-term memory

Share:

Key Concepts

Context Engineering: The strategic process of selecting and managing the information provided to an LLM, moving beyond simple prompt engineering.
Context Window: The maximum amount of data (tokens) an LLM can process in a single interaction.
Smart Truncation: A strategy involving keeping the "head" (start) and "tail" (end) of a conversation while compressing or removing the middle to stay within token limits.
Sub-agents: A modular architecture where a main agent delegates heavy, data-intensive tasks to specialized secondary agents to keep the primary context light.
Long Session Evals: A testing methodology where an agent is evaluated on the 11th turn of a conversation to ensure context management remains effective over time.
Observability: The practice of monitoring and tracing AI agent performance to identify where and why failures occur.

1. The Shift from Prompt Engineering to Context Engineering

The speaker, Salian (Head of Product at Arise), argues that while early AI development focused heavily on prompt engineering, the industry has shifted toward context engineering. The core argument is that an agent’s success is determined by its ability to "remember what it needs to and forget what it doesn't."

The Problem: Simply stuffing as much data as possible into a context window leads to failure.
The Reality: Context management is a product and UX problem, not just an engineering one. If an agent lacks the correct context, it provides poor answers, leading to user churn.

2. The "Vicious Loop" of Agent Failure

The team encountered a recursive failure loop while building "Alex," an AI agent designed to analyze trace and span data from the Arise platform:

The agent analyzes data (spans).
The data volume grows, exceeding the context window.
The agent fails.
Because the failure data is part of the trace, the agent attempts to re-process the failure, adding more data and further bloating the context.

3. Methodologies for Context Management

The team tested three primary strategies to escape the failure loop:

Naive Truncation: Keeping only the first 100 characters.
- Result: Failed. The agent lost all continuity, treating follow-up questions as entirely new conversations.
Summarization: Using an LLM to summarize the context into fewer tokens.
- Result: Failed. It was too inconsistent and lacked control over which specific data points were preserved.
Smart Truncation & Memory (Current Strategy):
- Process: Retain the first 100 characters (head) and the last 100 characters (tail). Truncate the middle but store it in a searchable memory database.
- Mechanism: The agent keeps the latest results and does not reset the system prompt. If the agent needs specific historical data or tool calls, it can retrieve them from the memory store.

4. Architectural Framework: Sub-agents

To handle data-intensive operations (like searching through hundreds of spans), the team implemented a sub-agent architecture:

Main Agent: Maintains a "light" context, handling user interaction and high-level reasoning.
Sub-agents: Delegated specific, heavy tasks. They process the large data sets independently and return only the necessary results to the main agent.
Benefit: This keeps the main conversation context small and manageable, preventing the "bloat" that causes performance degradation.

5. Future Challenges and Research

Despite current successes, the team is actively working on:

Long-term Memory: Moving beyond session-based memory so the agent can reference past issues across different chat sessions.
Sophisticated Heuristics: Moving away from the "first 100/last 100" rule toward a more principled "context budget" and quality metrics.
Caching: Investigating how to prevent cache invalidation during context management, a challenge noted in recent industry research (e.g., the Claude code leak).

6. Notable Quotes

"The best context strategy is one that lets your agents remember what it needs to, and forget what it doesn't."
"Agents don't fail because of prompts; they fail because of context."

Synthesis/Conclusion

Context management is an iterative, non-negotiable component of building reliable AI agents. The transition from monolithic agents to a sub-agent architecture, combined with smart truncation and a dedicated memory store, is essential for handling long-running, data-heavy workflows. The team emphasizes that evaluation (specifically testing long-session performance) is the only way to ensure that context management strategies are actually working in production.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video