Beyond Code Coverage: Functionality Testing with Playwright — Marlene Mhangami, Microsoft
By AI Engineer
Key Concepts
- AI-Augmented Development: The integration of AI agents (like GitHub Copilot) into the software development lifecycle.
- Red-Green-Refactor TDD: A methodology where developers write a failing test (Red), write code to pass the test (Green), and then improve the code quality (Refactor).
- Playwright: An open-source framework for end-to-end (E2E) browser testing.
- Agentic Coding: Using AI agents to automate tasks like planning, generating, and healing code/tests.
- Codebase Entropy: The degradation of code quality caused by unchecked AI-generated code.
- MCP (Model Context Protocol): A standard for connecting AI assistants to systems, data, and tools.
1. The State of AI-Driven Development
Marlene, a Senior Developer Advocate at Microsoft and GitHub, highlights that 2025 was GitHub’s most active year, with over 1 billion commits. Projections for 2026 suggest an acceleration to approximately 14 billion commits annually, driven significantly by AI-assisted coding.
The Productivity Paradox:
- Research Finding: A Stanford University study of 120,000 developers indicates that AI productivity gains are not automatic.
- Key Argument: AI amplifies existing conditions. In clean codebases, AI boosts productivity; in messy codebases, it amplifies "entropy."
- Case Study: A team using AI in an unchecked manner saw an increase in Pull Requests (PRs) but a decrease in code quality, leading to excessive refactoring time and a net productivity gain of only 1%.
2. Methodologies for Maintaining Clean Code
To mitigate the risks of AI-generated "entropy," Marlene advocates for standardized practices, specifically Test-Driven Development (TDD).
- The Red-Green-Refactor Framework:
- Red: Write a failing test based on a feature request.
- Green: Generate code (via AI) to pass the test as quickly as possible.
- Refactor: Manually improve the code quality and best practices.
- Critique of Traditional TDD: Citing DHH (creator of Rails), Marlene notes that over-indexing on unit tests can lead to testing "implementation details" rather than system behavior. If a test breaks simply because a method was renamed, it is a poor test.
- Behavior-Driven Testing: The focus should be on the end-user experience and stable contracts (APIs) rather than internal code structure.
3. Implementing Playwright for AI-Assisted Testing
Playwright is presented as the solution for validating functionality rather than just code coverage.
- Playwright Agents: A specialized toolset that installs three specific files to manage the testing lifecycle:
- Planner: Determines which tests to run.
- Generator: Creates the functional tests.
- Healer: Automatically fixes broken tests.
- Workflow Integration:
- Work IQ: A Microsoft tool used to pull requirements directly from M365 (Outlook/Teams) into the terminal.
- Execution: Developers can run tests in "headed" (visible) or "headless" (background) modes.
- Validation: Playwright simulates real user interactions (e.g., searching for a "Furby" or filtering by price) to ensure the application behaves as expected.
4. Best Practices for AI-Assisted Development
- Visual Documentation: Attach screenshots generated by Playwright to PRs to provide visual proof of functionality.
- Commit Discipline: Always commit code before asking an AI agent to refactor or "heal" it, ensuring a rollback point exists.
- Granularity: Generate one test per feature to maintain clarity and modularity.
- State Management: For complex state-heavy applications, use the
agent.mdinstructions provided by Playwright Agents to handle specialized logic.
5. Notable Quotes
- "Clean codebases amplify AI gains and AI productivity, while unchecked AI in a codebase is going to amplify entropy."
- "In this new world, what we want to focus on is the behavior. So we want to focus on a feature."
Synthesis and Conclusion
The rapid growth of AI-generated code necessitates a shift in how developers manage their workflows. The primary takeaway is that AI is a force multiplier, not a replacement for engineering discipline. By adopting a "Behavior-Driven TDD" approach—using Playwright to validate user-facing functionality and reserving the "Refactor" phase for human oversight—developers can harness the speed of AI agents without sacrificing the long-term health and maintainability of their codebases.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.