3 New Context Engineering Skills for Agents

Key Concepts

Context Management: The process of efficiently handling and optimizing the information available to an AI agent within its limited context window.
Context Window: The maximum amount of text (tokens) an LLM can process at one time.
System Prompt: Initial instructions given to an LLM or agent to define its role, behavior, and constraints.
System Tools: Pre-defined functionalities or APIs available to an agent.
MCP Tools (Multi-Agent Communication Protocol): Tools that facilitate communication and interaction between agents or with external systems.
Context Rot: The degradation of an agent's performance or efficiency due to an overloaded or irrelevant context window.
Prompt Engineering: The traditional practice of crafting effective prompts to guide LLM behavior, primarily for single-turn queries.
Context Engineering: An evolution of prompt engineering, focusing on dynamically managing the entire context (system instructions, tools, memory, documents, user input) for multi-turn agentic systems.
Attention Mechanism: A core component of transformer architecture in LLMs that determines the importance of different parts of the input sequence.
Compacting: A context management strategy involving summarizing past interactions or information to save context window space.
Structured Note-Taking (Agentic Memory): A strategy where agents write and retrieve notes from an external memory system, outside the immediate context window.
Sub-Agent Architecture: A design pattern where a main agent delegates specific tasks to specialized sub-agents, each with its own context, to prevent context pollution.
Goldilocks Zone: The optimal balance in system prompt design, avoiding both overly rigid and overly vague instructions.
Claude Sonnet 4.5's Context Awareness: A unique feature where the model understands its own context window limits and adapts its behavior accordingly.
Skywork Agent Orchestra: An innovative deep research agent that includes an MCP Manager Agent capable of creating and discarding tools on demand.

The Challenge of Context Management in Modern Agents

Modern AI agents face a significant challenge in context management due to the limited context window of Large Language Models (LLMs). The transcript illustrates this using a Cloud Code example:

A new session typically starts with a system prompt consuming approximately 2,000 tokens.
System tools available to Cloud Code take up about 6% of the context.
Initially, about 70% of the context is usable for the agent.
However, enabling MCP tools like Contact 7, Playwright, and Chrome DevTools MCP server drastically changes this. The description of these tools alone occupies about 16% of the context, reducing the usable context to only about 50%.
This reduction occurs even before any tools are used, highlighting that the problem isn't just about active usage but the static description of available functionalities.
The core reason for this limitation lies in how LLMs work, specifically the attention mechanism within the transformer architecture, which makes efficient context management critical for building effective agentic systems.

From Prompt Engineering to Context Engineering

The concept of managing agent behavior has evolved from prompt engineering to context engineering:

Prompt Engineering (Pre-Agentic Era): Traditionally, a system prompt controlled the behavior of an LLM for single-turn queries. Tuning this prompt was crucial.
Context Engineering (Agentic Systems): As systems become more complex, involving multi-turn interactions with users, tools, and environments, the scope expands. Context engineering involves managing a broader range of inputs: system instructions, domain knowledge, memory, access to multiple tools, and internal knowledge bases (e.g., documents).
The danger of "stuffing all this into the context" is context rot, where irrelevant information clutters the context window, hindering performance. The goal is to prune information, preserving only what is needed at a specific point in time (e.g., after a tool call, results are fed in, but irrelevant details are removed).

Calibrating the System Prompt

Even with context engineering, the system prompt remains critical as it controls the agent's overall behavior. The article emphasizes the need to engineer this prompt carefully:

Clarity and Simplicity: System prompts should be "extremely clear and use simple direct language that presents ideas at the right altitude for the agent."
The Goldilocks Zone: This "right altitude" avoids two common failure modes:
1. Hardcoding Complex, Brittle Logic: Engineers sometimes embed overly specific, rigid logic, leading to fragility and increased maintenance complexity. This makes the model "laser focus on certain edge cases."
2. Vague, High-Level Guidance: Providing insufficient detail fails to give the LLM concrete signals for desired outputs or falsely assumes shared context.
Recommendations for System Prompt Design:
- Distinct Sections: Divide the system prompt into clear sections using delimiters like XML tags or markdown.
- Human Understandability: A general rule of thumb is that if a human engineer cannot understand the instructions, the LLM will most likely fail.
- Avoid Laundry Lists of Edge Cases: Instead of trying to articulate every possible rule, rely on the model's intelligence by providing enough information and diverse, canonical few-shot examples that effectively portray expected behavior without forcing focus on specific edge cases.

Strategies for Effective Context Management

The transcript outlines three primary techniques for proper context management:

1. Compacting (Summarization)

Problem: LLMs tend to lose focus as they approach the limit of their context window.
Solution: Compacting involves summarizing the most important contents, rules, or failure points the agent has encountered.
Cloud Code Example: The compact command clears conversation history but keeps a summary in context. Cloud Code may automatically run this command when context limits are approached.
Caution: Compacting too frequently can lead to the loss of useful information, reducing the summary to a generic overview.
Claude Sonnet 4.5's Context Awareness:
- A notable development is Sonnet 4.5's unique ability to be "aware of its own context window."
- This awareness shapes its behavior: it proactively summarizes its progress and becomes more decisive in implementing fixes as it approaches context limits.
- This native behavior required Cognition to rebuild their agent, Devon, from scratch specifically for Sonnet 4.5, resulting in 2x faster performance and 12% better results on their evaluations.
- This highlights that traditional prompt or context engineering rules may not directly apply to models with such advanced capabilities.

2. Structured Note-Taking (Agentic Memory)

Concept: The agent regularly writes notes to an external memory system, outside the immediate context window. These notes are persisted and can be retrieved as needed.
Application: This allows the agent to refer to past information without cluttering the active context.
Cloud Code Example: Cloud Code uses a similar mechanism, creating something like a not MDS file for this purpose.
Goal: To use a memory system as a complementary component to provide useful, retrievable information alongside the active context window.

3. Sub-Agent Architecture

Concept: This involves assigning specific tasks or purposes to specialized sub-agents.
Mechanism: Each sub-agent operates with its own separate context window, conducts its own research, and takes actions using available tools.
Benefit: The key advantage is that when sub-agents take actions, they do not "rot" or "pollute" the main agent's context window.
Information Flow: Only a summarized version of the sub-agent's input, actions taken, and final output is returned to the main context, preserving its integrity.
Distinction (Anthropic/Cloud Code vs. Manace):
- Anthropic/Cloud Code: Discards tool actions after a certain time if deemed irrelevant.
- Manace: Recommends "masking" actions instead of discarding them, maintaining a contextual history of actions (without all related details) for the agent.

Sponsor Spotlight: Skywork and Agent Orchestra

The video highlights Skywork, a sponsor, for their contributions to open-source specialized agents, including their "world model" (similar to Gen3) and Agent Orchestra.

Agent Orchestra: A deep research model featuring innovative context management techniques.
- It comprises specialized agents like a deep researcher, browser use agent, and deep analyzer agent.
- MCP Manager Agent: A unique component that can create tools on demand using its code execution capability. If a needed tool isn't available, the agent can generate it and then discard it when no longer required, representing a highly innovative approach to context management.
Skywork Super Agent Platform: Offers a platform for users to interact with agents, create documents, slides, sheets, or even an AI developer.
- The AI developer uses a multi-agent system to collaborate and accomplish tasks, leveraging their MCB server to run and modify code, similar to other coding agents. An example of a website created by this AI developer is showcased.

Synthesis and Conclusion

Context engineering is rapidly becoming a critical field as AI systems evolve towards more complex, multi-turn agentic architectures. Effective context management is essential to overcome the limitations of LLM context windows and prevent "context rot." Strategies like compacting, structured note-taking (agentic memory), and sub-agent architectures offer robust solutions. Furthermore, advancements like Claude Sonnet 4.5's context awareness and innovative approaches from platforms like Skywork's Agent Orchestra (with its on-demand tool creation) demonstrate the dynamic and evolving nature of this domain, emphasizing the need for continuous adaptation in agent design and development.