LLM codegen fails and how to stop 'em — Danilo Campos, PostHog

By AI Engineer

Share:

Key Concepts

  • Autonomous Coding Agents: AI systems capable of performing software development tasks (e.g., integrations) with minimal human intervention.
  • Model Rot: The phenomenon where an AI model’s training data becomes outdated, leading to inaccurate code generation for modern software environments.
  • RAG (Retrieval-Augmented Generation): A technique to provide LLMs with fresh, external data (like documentation) to improve accuracy.
  • Model Airplanes: A framework of "simulacrum" applications used to demonstrate the ideal architecture and pattern for an integration to the AI.
  • Breadcrumbing: A methodology of breaking down complex tasks into sequential, manageable steps to prevent the agent from improvising poorly.
  • Inference-Time Interrogation: The practice of asking the AI agent to reflect on its own performance after a task to identify errors or bottlenecks.
  • LLM Gateway: A service layer that manages token usage and provides controlled access to AI models.

1. Main Topics and Key Points

The speaker, Denilo from PostHog, discusses the development of the "PostHog Wizard," an autonomous agent that automates software integrations. With 15,000 monthly users, the primary challenge is ensuring the agent writes high-quality, accurate code rather than "hallucinating" APIs or creating messy, unsupportable architectures.

  • The Problem of Model Rot: Models are snapshots of past data. To combat this, the team uses RAG, injecting fresh markdown documentation into the context window so the agent can reference current APIs.
  • The "Model Airplane" Strategy: To prevent the agent from choosing weird architectural paths, the team maintains "model airplanes"—thin, functional versions of real applications. These serve as templates, showing the agent exactly where and how to place event tracking code.
  • Breadcrumbing for Control: Instead of giving the agent a massive, complex prompt, the team uses a sequential approach:
    1. Identify files with business value (e.g., login, Stripe).
    2. List interesting events to track.
    3. Implement the integration using the established context.

2. Important Examples and Real-World Applications

  • PostHog Wizard: A CLI tool that automates the integration of PostHog into user projects.
  • Blue Sky/Twitter Feedback: The speaker uses unprompted positive social media feedback as a metric for the agent's success.
  • The "O" Example: In their model airplanes, the authentication (OAuth) is simplified to accept any input, allowing the agent to focus on the placement of tracking events rather than the complexity of the auth logic.

3. Methodologies and Frameworks

  • Inference-Time Interrogation: At the end of every run, the system asks the agent: "What could we have done better to set you up for success?" This revealed critical issues like missing tool permissions or language mismatches (e.g., providing JavaScript instructions for a Python project).
  • Tool Lockdown: To prevent security "shenanigans," the agent is restricted from reading entire .env files. Instead, it is provided with specific, limited tools to only check for the existence of a key or write a new value.

4. Key Arguments

  • Code as a Depreciating Asset: The speaker argues that writing "clever" code is less valuable than writing high-quality plain text prose. As models improve, they will be better at interpreting clear documentation than navigating complex, hand-coded logic.
  • Avoid Over-Scaffolding: Don't try to control every move of the agent. Instead, provide the right information in the right sequence and let the agent’s "octopus-like" ability to maneuver solve the problem.
  • Human Error is the Biggest Threat: The speaker emphasizes that developers are the primary source of agent failure due to "fragmentary" memory and contradictory instructions.

5. Notable Quotes

  • "The PostHog wizard skips two hours of misery that you will never get back in your life and it hands it back to you as 8 minutes of pseudo entertainment."
  • "An agent is an octopus; it can wriggle, it can squeeze into tight corners, it can maneuver itself around problems. You do not want to overconstrain the agent."
  • "Code has always been a depreciating asset."

6. Synthesis and Conclusion

The success of the PostHog Wizard relies on shifting from a "code-heavy" mindset to a "context-heavy" one. By providing the agent with fresh documentation (RAG), architectural templates (Model Airplanes), and a sequential task structure (Breadcrumbing), the team has successfully automated complex integrations. The most critical takeaway is the importance of inference-time interrogation—treating the AI as a user and asking it how to improve the process—which allows developers to identify and fix the "human errors" that typically cause agent failure.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "LLM codegen fails and how to stop 'em — Danilo Campos, PostHog". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video