Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor

By AI Engineer

Share:

Key Concepts

  • Git Worktrees: Separate checkouts of a repository that allow parallel development without interference.
  • Agentic Skills: Modular, markdown-based instructions that define agent behavior.
  • Sub-agents: Secondary AI agents spawned by a parent agent to perform specific, isolated tasks.
  • Best Event: A feature allowing multiple AI models to compete on the same task simultaneously for comparison.
  • Evals (Evaluations): Automated testing frameworks used to measure model performance and adherence to constraints.
  • RL (Reinforcement Learning): A training methodology used to improve model behavior based on feedback loops.

1. The Shift from Hard-Coded Features to Markdown Skills

The speaker details a significant architectural refactor in the Cursor application, where a complex, 15,000-line feature set related to Git worktrees was replaced by lightweight, markdown-based "skills."

  • The Old Approach: Previously, Cursor managed worktrees through extensive, hard-coded logic, including manual lifecycle management, strict scoping, and complex cleanup routines.
  • The New Approach: By leveraging existing primitives—Agent Skills and Sub-agents—the team reduced the codebase by approximately 4,000 lines for specific features, moving the logic into simple markdown instructions.
  • Implementation: Users trigger these via slash commands (e.g., /worktree, /best-event). These commands act as dynamic prompts that can be updated server-side without requiring a client-side application update.

2. Core Functionalities and Frameworks

  • Worktree Isolation: The /worktree command instructs an agent to create a git worktree, execute setup scripts, and—crucially—remain scoped to that directory.
  • Best Event (Model Competition): This framework allows a parent agent to spawn multiple sub-agents, each in its own worktree, using different models (e.g., Claude Opus, GPT, Grok). The parent agent then aggregates the results, provides a critique, and helps the user merge the best parts of each implementation.
  • Cross-Platform Compatibility: The markdown instructions include logic to handle OS-specific setup scripts for Windows, Linux, and macOS.

3. Pros and Cons of the Refactor

Advantages:

  • Maintenance: Drastically reduced code surface area.
  • Flexibility: Users can initiate worktrees mid-chat rather than being forced to set them up at the start.
  • Multi-Repo Support: The new implementation handles multi-repository setups (e.g., separate front-end and back-end repos) seamlessly, which was previously disabled.
  • Enhanced Judging: The parent agent now has deeper context to synthesize code from multiple sub-agents.

Challenges:

  • "Vibes-based" Reliability: Because the agent is now instructed via prompts rather than hard-coded constraints, it occasionally "escapes" its worktree or deviates from instructions, especially during long sessions.
  • Discoverability: Removing the UI dropdowns in favor of slash commands makes the feature less visible to non-power users.
  • Perceived Latency: The user sees the agent creating the worktree in real-time, which can feel slower than a pre-warmed environment.

4. Methodology for Improvement: Evals and RL

To address the reliability issues, the team is implementing:

  • Automated Evals: Using the Brain Trust platform, the team runs headless Cursor CLI tests. They use two primary scorers: one to verify the agent performed work in the target worktree, and a "reverse" scorer to ensure no work was performed in the primary checkout.
  • Reinforcement Learning (RL): The team is incorporating worktree-specific tasks into their RL pipeline for the "Composer" model to ensure future iterations are natively better at maintaining directory isolation.

5. Future Roadmap

  • Native Integration: Cursor 3.0 will introduce a more "agentic" UI, which the team believes is the natural home for a more robust, native worktree implementation.
  • Beyond Git: The team is researching parallelization primitives that do not rely on Git worktrees, aiming to support non-Git repositories and improve performance (as Git worktrees can be disk-heavy and slow to initialize).

Synthesis

The transition from hard-coded features to markdown-driven skills represents a shift toward "agentic" software architecture. While this approach sacrifices some of the rigid safety of hard-coded constraints, it gains significant agility, maintainability, and cross-repo capability. The future of this feature lies in balancing this flexibility with rigorous automated evaluation and model training to ensure that "prompt-based" constraints become as reliable as traditional code-based ones.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video