Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor
By AI Engineer
Key Concepts
- Git Worktrees: Separate checkouts of a repository that allow parallel development without interference.
- Agentic Skills: Modular, markdown-based instructions that define agent behavior.
- Sub-agents: Secondary AI agents spawned by a parent agent to perform specific, isolated tasks.
- Best Event: A feature allowing multiple AI models to compete on the same task simultaneously for comparison.
- Evals (Evaluations): Automated testing frameworks used to measure model performance and adherence to constraints.
- RL (Reinforcement Learning): A training methodology used to improve model behavior based on feedback loops.
1. The Shift from Hard-Coded Features to Markdown Skills
The speaker details a significant architectural refactor in the Cursor application, where a complex, 15,000-line feature set related to Git worktrees was replaced by lightweight, markdown-based "skills."
- The Old Approach: Previously, Cursor managed worktrees through extensive, hard-coded logic, including manual lifecycle management, strict scoping, and complex cleanup routines.
- The New Approach: By leveraging existing primitives—Agent Skills and Sub-agents—the team reduced the codebase by approximately 4,000 lines for specific features, moving the logic into simple markdown instructions.
- Implementation: Users trigger these via slash commands (e.g.,
/worktree,/best-event). These commands act as dynamic prompts that can be updated server-side without requiring a client-side application update.
2. Core Functionalities and Frameworks
- Worktree Isolation: The
/worktreecommand instructs an agent to create a git worktree, execute setup scripts, and—crucially—remain scoped to that directory. - Best Event (Model Competition): This framework allows a parent agent to spawn multiple sub-agents, each in its own worktree, using different models (e.g., Claude Opus, GPT, Grok). The parent agent then aggregates the results, provides a critique, and helps the user merge the best parts of each implementation.
- Cross-Platform Compatibility: The markdown instructions include logic to handle OS-specific setup scripts for Windows, Linux, and macOS.
3. Pros and Cons of the Refactor
Advantages:
- Maintenance: Drastically reduced code surface area.
- Flexibility: Users can initiate worktrees mid-chat rather than being forced to set them up at the start.
- Multi-Repo Support: The new implementation handles multi-repository setups (e.g., separate front-end and back-end repos) seamlessly, which was previously disabled.
- Enhanced Judging: The parent agent now has deeper context to synthesize code from multiple sub-agents.
Challenges:
- "Vibes-based" Reliability: Because the agent is now instructed via prompts rather than hard-coded constraints, it occasionally "escapes" its worktree or deviates from instructions, especially during long sessions.
- Discoverability: Removing the UI dropdowns in favor of slash commands makes the feature less visible to non-power users.
- Perceived Latency: The user sees the agent creating the worktree in real-time, which can feel slower than a pre-warmed environment.
4. Methodology for Improvement: Evals and RL
To address the reliability issues, the team is implementing:
- Automated Evals: Using the Brain Trust platform, the team runs headless Cursor CLI tests. They use two primary scorers: one to verify the agent performed work in the target worktree, and a "reverse" scorer to ensure no work was performed in the primary checkout.
- Reinforcement Learning (RL): The team is incorporating worktree-specific tasks into their RL pipeline for the "Composer" model to ensure future iterations are natively better at maintaining directory isolation.
5. Future Roadmap
- Native Integration: Cursor 3.0 will introduce a more "agentic" UI, which the team believes is the natural home for a more robust, native worktree implementation.
- Beyond Git: The team is researching parallelization primitives that do not rely on Git worktrees, aiming to support non-Git repositories and improve performance (as Git worktrees can be disk-heavy and slow to initialize).
Synthesis
The transition from hard-coded features to markdown-driven skills represents a shift toward "agentic" software architecture. While this approach sacrifices some of the rigid safety of hard-coded constraints, it gains significant agility, maintainability, and cross-repo capability. The future of this feature lies in balancing this flexibility with rigorous automated evaluation and model training to ensure that "prompt-based" constraints become as reliable as traditional code-based ones.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.