Missions: Multi-Agent Systems That Ship for Days — Luke Alvoeiro, Factory

By AI Engineer

Share:

Key Concepts

  • Agentic AI: Systems where autonomous agents perform tasks with minimal human intervention.
  • Missions: A multi-agent framework that orchestrates long-running software development tasks.
  • Validation Contract: A pre-implementation specification defining "done" through assertions.
  • Droid Whispering: The skill of selecting and managing different LLMs for specific roles within an agent ecosystem.
  • Structured Handoffs: A mechanism for passing state and context between agents to prevent information loss.
  • Adversarial Validation: A design principle where validators are independent of the implementation agents to ensure objective quality control.

1. Main Topics and Frameworks

The speaker, Luke from Factory, argues that the primary bottleneck in modern software engineering is human attention, not model intelligence. To solve this, he proposes a taxonomy of five multi-agent frameworks:

  • Delegation: A parent agent spawns sub-agents for specific tasks (e.g., database schema design).
  • Creator-Verifier: Separation of concerns where one agent builds and another (with fresh context) reviews.
  • Direct Communication: Agents interacting without a central coordinator (noted as difficult due to state fragmentation).
  • Negotiation: Agents interacting over shared resources (e.g., APIs) to achieve win-win outcomes.
  • Broadcast: One-to-many communication for status updates and shared constraints.

2. The "Missions" Architecture

Missions combine the above frameworks into a three-role system designed to run for days:

  • Orchestrator: Handles strategic planning, requirement scoping, and defining the "Validation Contract."
  • Workers: Handle implementation. They operate with "clean context" to avoid accumulated baggage and commit code via Git.
  • Validators: Handle verification. They are "adversarial by design," meaning they have no prior knowledge of the implementation, ensuring unbiased testing.

3. Step-by-Step Methodology

  1. Planning: The user defines a goal; the Orchestrator scopes it and creates a plan with milestones and a validation contract.
  2. Execution: Features are executed serially to prevent conflicts and inconsistent architectural decisions.
  3. Validation: After each milestone, two types of validation occur:
    • Scrutiny Validator: Runs tests, type checks, lints, and spawns code-review agents.
    • User Testing Validator: Interacts with the live application (e.g., clicking buttons, filling forms) to ensure end-to-end functional correctness.
  4. Handoff: The worker provides a structured summary of completed tasks, exit codes, and discovered issues, allowing the system to self-heal.

4. Key Arguments and Evidence

  • Serial vs. Parallel Execution: While parallelization seems faster, it causes agent conflicts and duplication. Serial execution with internal read-only parallelization (e.g., research) results in higher correctness and fewer errors.
  • Validation Contracts: Tests written after implementation often just confirm existing decisions. By defining assertions before coding, the system prevents "drift" over long-running tasks.
  • Model-Agnosticism: No single model is best at planning, coding, and validation. Using different models for different roles (e.g., a slow, reasoning model for planning; a fast, creative model for coding) creates a structural advantage.

5. Notable Quotes

  • "The bottleneck in software engineering nowadays is not intelligence. It's now limited by human attention."
  • "Tests written after implementation don't catch bugs. They confirm decisions."
  • "Validation is adversarial by design."

6. Real-World Application & Data

  • Production Performance: A Slack clone project showed that 50% of the final codebase consisted of tests, with 90% code coverage.
  • Efficiency: Missions can run for up to 16–30 days. Most wall-clock time is spent on "User Testing" (interacting with the live app) rather than token generation.
  • Use Cases: Prototyping, internal tool building, large-scale refactors, and modernizing legacy codebases.

7. Synthesis and Conclusion

The "Missions" framework shifts the role of the human from an active coder to a project manager. By utilizing structured handoffs, adversarial validation, and a model-agnostic architecture, developers can scale their output significantly. The system is designed to improve as models improve, as the core logic is defined in prompts and skills rather than rigid, hard-coded state machines. The ultimate takeaway is that by offloading execution to an ecosystem of agents, human engineers can focus on high-level architecture and product strategy.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video