I'm Building an AI Dark Factory That Ships Its Own Code (Public Experiment)

By Cole Medin

Share:

Key Concepts

  • Dark Factory: A software development system where AI agents autonomously manage the entire lifecycle of a codebase—planning, implementing, testing, and deploying to production—without human intervention.
  • Archon: An open-source workflow engine used to create deterministic and repeatable AI coding workflows.
  • Hold-out Pattern: A validation strategy where the testing agent is kept isolated from the development context to prevent sycophancy and bias.
  • Agentic RAG (Retrieval-Augmented Generation): The underlying application being built, which allows users to query an AI tutor trained on the creator's YouTube content.
  • Governance Layer: The set of configuration files (mission.md, factory_rules.md, claude.md) that provide the AI with high-level constraints, scope, and operational standards.

1. The Evolution of AI Coding (Levels of Autonomy)

The creator categorizes the use of generative AI in software engineering into five levels, drawing from self-driving car analogies:

  • Level 0 (Spicy Autocomplete): AI as a reference tool (smarter Stack Overflow). The developer writes all code.
  • Level 1 (Coding Intern): AI handles boilerplate and minor tasks; developer maintains control.
  • Level 2 (Junior Developer): Interactive pair-programming; AI generates significant portions of code.
  • Level 3 (Human-in-the-loop): AI generates the majority of the codebase, but a human reviews every plan and pull request (PR).
  • Level 4 (Unattended Harnesses): AI works on long-term tasks autonomously, with humans reviewing results periodically.
  • Level 5 (Dark Factory): Full autonomy. No human review before production deployment. The engineer manages the system's goals and rules, not the code itself.

2. System Architecture

The Dark Factory is built on three primary pillars:

A. The Governance Layer

To ensure the AI stays within boundaries, three core files are loaded into every agentic context:

  • mission.md: Defines the application's purpose, core capabilities, and strict "out-of-scope" features (e.g., no payments, no support for other channels).
  • factory_rules.md: Operational guidelines, including PR size limits, labeling systems, and quality gates for auto-merging.
  • claude.md: Technical context, including the tech stack, repository layout, and testing standards.

B. The Harness (Archon)

Archon acts as the orchestration engine. It allows the creator to:

  • Define deterministic workflows (e.g., triage, implement, validate).
  • Run multiple agentic sessions in parallel using isolated work trees.
  • Route tasks to specific models (the creator uses Miniax M2.7 for cost-efficiency and high throughput).

C. The Factory Loop

The system operates in a continuous, automated loop:

  1. Triage: A scheduled workflow fetches GitHub issues, uses an agent to reason about them against the mission.md, and assigns priorities or rejects them.
  2. Implementation: Approved issues trigger parallel Archon workflows to plan, code, and create PRs.
  3. Validation (The Hold-out Pattern): A separate agent, with no knowledge of the implementation process, tests the user journey using browser automation. This prevents the AI from "lazily" rewriting tests to match flawed code.
  4. Deployment: Once validated, the code is merged directly into the production branch.

3. Real-World Applications and Research

  • StrongDM: Cited as a primary inspiration. They successfully implemented a "Dark Factory" model, shipping thousands of lines of code without human review by utilizing strict validation harnesses.
  • Spotify: Mentioned as another entity utilizing background coding agents, though their internal systems remain proprietary.

4. Notable Quotes

  • "The Dark Factory is the ultimate use of generative AI—completely giving the reins of our codebase over to it."
  • "A test that is stored in a codebase can be lazily rewritten to match the code... that is why it is so important to have a key separation [between implementation and validation]."

5. Synthesis and Conclusion

The Dark Factory experiment represents a shift from "AI as a tool" to "AI as a system operator." By combining a strict governance layer with a decoupled validation harness (the hold-out pattern), the creator aims to prove that autonomous software engineering is viable for production-grade applications. The project is being built in public, with the ultimate goal of creating a self-maintaining, RAG-powered AI tutor that accepts and implements user-submitted GitHub issues without human oversight.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "I'm Building an AI Dark Factory That Ships Its Own Code (Public Experiment)". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video