"Don’t Outsource Your Thinking" to your Agent

Coding Agents: Maximizing Benefits & Avoiding Pitfalls

Key Concepts:

Context Rot: Degradation of model performance due to irrelevant or excessive information in the context window.
Cloud.md/Agent.md: Onboarding documents for AI agents, containing project style guides, gotchas, and desired thinking approaches.
Incremental Development: Building features in small, testable chunks rather than large, monolithic prompts.
Sub-agents: Specialized AI agents designed for specific tasks within a larger workflow.
Test-Driven Development (TDD): A development approach where tests are written before the code itself.
Outsourcing Thinking: Relying on the AI to make high-level decisions or fill in significant gaps in understanding.

I. The Double-Edged Sword of Coding Agents

Coding agents, like Claude, offer significant productivity gains – Enthropic claims 90% of Claude code is AI-generated, potentially reducing weeks of work to days or hours to minutes. However, unsupervised use can lead to problematic codebases characterized by duplicate logic, inconsistent naming, and a lack of overall architecture. The core argument presented is that you cannot outsource your thinking; all effective use of these tools stems from this principle. Effective prompting isn’t about simply getting answers, but about learning to learn while interacting with the system.

II. Mitigating Context Rot & Establishing Workflow

Context rot is a critical issue where excessive or irrelevant information in the context window degrades model performance. The model may forget information, ignore questions, or struggle to prioritize relevant data. Solutions include regularly restarting sessions or utilizing commands like “catchup” in Claude to refocus the agent’s attention.

A recommended workflow emphasizes planning before coding:

Brainstorming: Initiate a new session by allowing the agent to ask clarifying questions.
Specification Document: Compile the brainstorming results into a detailed specifications document.
Step-by-Step Plan: Generate a granular plan breaking down the task into smaller, manageable steps.
Coding: Only after a plan is in place should coding begin.

This approach, while initially slower, reduces ambiguity and ensures both the developer and the agent are aligned. Without a plan, the result is often “slop” – poorly structured and difficult-to-maintain code. The “clear” command can be used to restart without losing the plan.

III. Incremental Development & The Role of the Developer

Incremental development, favoring small chunks over large prompts, is crucial. Following a test-driven development (TDD) approach – code a step, test it, then move to the next – leverages the LLM’s strength in contained tasks.

A key perspective is that developers must adopt a role similar to a team lead or technical manager, even without extensive programming expertise. Simon Wilson is quoted: “Think of an LLM pair programmer as an overconfident and prone to mistakes. It writes bugs with complete conviction.” This necessitates treating every code snippet as if written by a junior developer, requiring thorough review, testing, and understanding. Accountability for every line of AI-generated code is paramount.

IV. The Importance of Testing & Grounding the Model

Test-driven development is essential, as coding agents operate “blind” without tests, assuming everything functions correctly. The quote, “Nobody gets promoted for writing unit tests, but unit tests never don't save your life,” highlights their critical importance.

Grounding the model involves utilizing cloud.md or agent.md files as onboarding documents. These files should contain project style guides, known issues (“gotchas”), and guidance on how the agent should approach tasks. The /init command automatically generates a starting point. These documents should be treated as living documents, updated with new issues and resolutions encountered during development.

V. Limitations & Strategic Use of Agents

Sub-agents are not always the optimal solution. They are unsuitable for exploratory work where clear completion criteria are lacking. Architectural decisions require human judgment, not simply iterative refinement. Security-critical code always requires human review, as passing tests does not guarantee security. Currently, AI excels at implementation but struggles with design.

Multi-agent systems can introduce complexity, creating a “combinatorial explosion” of potential failure points.

VI. Git & Model Diversity as Safety Nets

Git is a vital safety net. Frequent commits (monthly is suggested) allow for easy rollback to working versions and simplify debugging. However, never commit code you cannot explain.

Utilizing multiple models (e.g., Claude for coding, Gemini for review, Codeex for PR review) is recommended. Different models possess different strengths and weaknesses. Reviewing code with a different model provider than the one used for development (e.g., using OpenAI to review code written by Claude) helps catch blind spots. This mirrors the practice of not reviewing your own pull requests.

VII. Synthesis & Key Takeaways

The effectiveness of coding agents hinges on a virtuous cycle: solid fundamentals lead to better AI results, which drive further learning and improved fundamentals. AI amplifies existing skills; it doesn’t replace them. The secret isn’t in the prompts, but in the process – planning, thinking, collaborating, providing context, and continuously learning.

Key takeaways:

Don't outsource thinking; learn to learn.
Plan first – design before coding.
Embrace the role of a senior engineer/technical manager.
Prioritize testing and utilize Git frequently.
Ground your model with comprehensive cloud.md/agent.md documentation.
The process, not just the prompts, is key to success.