AI Fixes My Code Better than Me Now?! (Here's How)
By Cole Medin
Key Concepts
- PIV Loop: A system for using AI coding assistance, encompassing Planning, Implementation, and Validation.
- AI Coding Assistants: Tools that help developers write code, such as GitHub Copilot, CodeWhisperer, or Cursor.
- End-to-End Validation: A comprehensive testing process that simulates real user interactions and covers all aspects of an application.
- Meta Command: A command that analyzes a codebase and generates another command (e.g.,
validate.md) to perform a specific task. - Living and Breathing System: An AI-driven validation process that continuously checks and adapts to the codebase.
- GitHub CLI: A command-line interface for interacting with GitHub, used here for creating issues and pull requests.
- Docker: A platform for developing, shipping, and running applications in containers, used for testing and isolation.
- Remote Coding Agent: An AI system that can perform coding tasks remotely, integrated with platforms like Telegram and GitHub.
- Unit Testing: Testing individual components or functions of the code.
- Linting: Analyzing code for stylistic errors and potential bugs.
- Type Checking: Verifying that data types are used correctly in the code.
Comprehensive Summary of AI-Powered End-to-End Validation
Introduction to AI Validation and the PIV Loop
The speaker initially harbored skepticism about trusting Large Language Models (LLMs) for code implementation and validation, viewing them primarily as planning tools. However, their perspective has evolved, and they now utilize AI coding assistants for all their coding needs. A crucial element of their workflow is the PIV Loop (Planning, Implementation, Validation), a system for leveraging AI coding assistance. Validation, in this context, refers to empowering the coding assistant to verify its own work after implementation. While AI agents commonly check their own work through methods like unit tests and linting, the speaker argues that end-to-end validation is the most undervalued aspect of AI coding workflows. Current practices often fall short of comprehensive, real-world testing.
The Ultimate Meta Command for End-to-End Validation
Driven by the need for more robust validation, the speaker embarked on a challenge to push the boundaries of AI validation. This led to the creation of a process that enables a coding agent to build a "living and breathing system" for validating an entire application end-to-end, effectively replacing extensive manual testing of user flows. This process is encapsulated in a meta command that can be run on any codebase with any AI coding assistant.
Key Functionality of the Meta Command:
- Deep Research: Analyzes the project to understand its structure, user flows, and edge cases.
- End-to-End Validation Strategy: Determines how to comprehensively validate the application.
- Generation of
validate.md: Creates a new command file (validate.md) that, when executed, initiates the full end-to-end validation cycle.
This meta command is designed to be easily integrated into a codebase. For cloud-based coding assistants supporting slash commands, it can be invoked with ultimate validate command. For assistants without slash commands (e.g., Cursor), the command can be copied and executed as instructions. Once validate.md is generated, it can be run (e.g., via /v validate) to commence the validation process. The speaker emphasizes that while this might seem over-engineered, the results have been highly impressive, saving significant time in development cycles.
Genesis of the Validation Process: The Remote Coding Agent Project
The development of this advanced validation system was spurred by the speaker's work on their remote coding agent. This project, envisioned as a custom, extendable alternative to platforms like Cloud Code, CodeX, or Factory, integrates with various platforms (Telegram, GitHub, Slack) and coding assistants (Cloud Code, CodeX, Open Code). The speaker plans to share this project in a live stream on November 29th.
During the development of this remote agent, the speaker faced a significant validation burden. Testing new features involved rigorous checks across different integrations (Telegram, GitHub) and coding assistants. This overwhelming validation requirement prompted the search for a way to automate and delegate this process to the coding assistant itself, while still maintaining confidence in the application's functionality.
Evolution of the Validation Prompt
The journey to the ultimate meta command began with a simpler prompt, aiming to guide the AI towards comprehensive validation. An example of an early prompt is provided:
Initial Prompt Structure:
- Problem Statement: "I need help validating my project. There are just so many different edge cases and user flows to test."
- Codebase Analysis Request: "Analyze the codebase deeply to understand all the different flows, edge cases, how the application is structured, how we can really validate things end to end."
- Validation Strategy and Tooling: "Think about how you can validate this app end to end and what tools you would use specifically. For example, this is a Docker application. We're using GitHub. I want you to think about how you could use the GitHub CLI. Think about all the tools you can use to validate things as a user would."
- Mimicking User Behavior: "If I were to do things end to end myself, how can you mimic that? When I say end to end, I mean end to end do not hold back."
- Command Generation: "Generate a massive validate document so that we have this command that goes for a really long time going through all these edge cases. Finally, I want you to help me create a command that does all the validation end to end. I want it to be extremely comprehensive and also include any of the other testing we're already doing here, like you know, the unit testing, the linting, the type checking."
The key insight that unlocked this next level of validation was instructing the AI to "act as a user" and leverage tools like the GitHub CLI for more than just basic unit testing. The goal was to create a comprehensive validation command that mimicked rigorous manual testing.
Detailed Breakdown of the validate.md Command Execution
The speaker showcases a highly accurate and consistent validate.md command that has been run multiple times on their project. The execution process is structured and comprehensive:
-
High-Level Testing:
- Type checking
- Linting
- Unit testing
-
Test Repository Setup:
- Creates a new GitHub repository for testing remote agents.
- Cleans up previous test environments.
- Sets up the GitHub repository and webhook for receiving mentions.
-
Telegram Simulation:
- Mimics testing for Telegram functionality.
- Injects API endpoints to simulate Telegram behavior.
-
Docker Testing (Test Adapter):
- Performs validation within Docker containers.
-
Database Validation:
- Checks if conversations are being stored correctly in the database.
-
GitHub Integration Testing:
- Creates an issue and mentions the agent.
- Ensures a pull request is created and validates its content.
- Tests the ability to kick off different coding agents in parallel (optional but observed to work well).
- Tests custom commands for priming, planning, and executing (related to the PIV loop).
The validation process generates a detailed report, often with numerous green checkmarks, indicating successful execution. While the validation is non-deterministic due to the nature of AI, it has successfully identified real bugs that the speaker might have missed during manual testing. The speaker emphasizes that this AI-driven validation, combined with their own testing, is significantly more effective than manual testing alone.
Real-World Application and Impact
The speaker demonstrates the effectiveness of this validation process by showing examples from their private repositories. They highlight how the validate process automatically creates issues and pull requests. For instance, a simple issue like "add a footer section to the readme" triggers a complex automated workflow:
- Loading Commands: The system loads custom commands.
- PIV Loop Execution: The agent goes through the priming, planning, and executing stages of the PIV loop.
- Codebase Analysis: The agent reads the README and analyzes the project.
- Feature Planning: The agent plans the feature.
- Comment Verification: The system checks comments to ensure alignment with the request.
- Feature Branch Creation: A feature branch is created.
- Pull Request Creation: A pull request is generated.
- Final Validation: The pull request itself is then validated.
This entire process, which would typically take 10-20 minutes of manual effort, can now be initiated by the speaker while they attend to other tasks.
Conclusion and Call to Action
The speaker strongly encourages viewers to try the provided meta command, even if they already have a validation system in place. They believe this approach offers an unprecedented level of comprehensiveness and consistency, having personally been "blown away" by its capabilities. The command has been generalized to work on any codebase. Viewers are invited to share their experiences in the comments. The speaker also requests likes and subscriptions for future content on AI coding.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "AI Fixes My Code Better than Me Now?! (Here's How)". What would you like to know?