Skills at Scale — Nick Nisi and Zack Proser, WorkOS

By AI Engineer

Share:

Key Concepts

  • Skills: Discrete, portable units of work (markdown files + optional scripts) that encode specific instructions, constraints, and workflows for AI agents.
  • Context Window Management: The practice of using "progressive disclosure" to load only necessary information (e.g., specific rubrics or migration guides) rather than bloating the AI's memory with global files like claude.md.
  • Deterministic Interleaving: Using scripts (e.g., bash commands) within skills to inject real-time, accurate data into a non-deterministic LLM conversation.
  • Confidence Scoring: A methodology where an agent evaluates its own understanding of a task and requests further clarification from the user until a specific confidence threshold is met.
  • Skill Routing: Using the description field in a skill's front matter to allow the LLM to automatically determine when a specific skill is relevant to the current task.
  • Evaluation Frameworks: Using automated rubrics or "skill builder" tools to test whether a skill improves or degrades AI performance compared to a baseline.

1. Main Topics and Key Points

  • The Problem with Generic Agents: Agents often start from zero, requiring repetitive context loading. While claude.md or agents.md files provide global memory, they often lead to context bloat and lack portability across different projects.
  • Skills as a Solution: Skills act as modular, composable units. They can be as simple as a 30-line markdown file or as complex as a folder containing scripts, images, and references.
  • Portability: Skills can be shared across teams via git-based plugin architectures or by zipping folders for use in desktop applications.
  • The Dev Loop: The workflow involves: Edit skill → Save → Invoke → Evaluate output → Refine.

2. Step-by-Step Methodology: Building a Skill

  1. Define Front Matter: Create a skill.md file with a name and a highly descriptive description. The description is the "routing rule" that tells the AI when to trigger the skill.
  2. Codify Constraints: Instead of being overly prescriptive, provide negative constraints (e.g., "Never be vague," "Always cite git commit references").
  3. Inject Determinism: Use the ! (bang) syntax followed by backticks to execute scripts (e.g., !git log -n 10). This forces the LLM to work with real data rather than speculating.
  4. Progressive Disclosure: Use references to external markdown files (e.g., testing.md) that are only loaded when the AI is performing a specific task, keeping the context window clean.
  5. Iterate with Evals: Use the built-in "Skill Builder" or custom evaluation scripts to compare performance with and without the skill.

3. Real-World Applications

  • Repo Roast: A workshop project that analyzes repository health, identifies stale TODOs, and detects "bus factor" risks using git logs.
  • Recruiting/HR: Automating candidate report generation by pulling data from Slack, Notion, and internal software.
  • Creative Media: Using the "Remotion" skill to generate video demos from git history or the "Nano Banana" skill to generate and animate images from text prompts.
  • Workflow Automation: Monitoring Slack for new requests and automatically creating tickets in Linear to prevent context switching.

4. Key Arguments and Perspectives

  • "Don't Repeat Yourself" (DRY) for Agents: Skills allow developers to encode their specific coding conventions and project constraints once, ensuring consistency across a team.
  • The Value of "Thinking": The speakers argue that the value of confidence scoring isn't in the "math" (which is fuzzy), but in the iterative loop it forces between the human and the AI, which clarifies the user's own requirements.
  • Context is Gold: The speakers emphasize that even failed conversations are valuable. They suggest mining past logs to identify where the AI struggled, then building a skill to solve that specific friction point.

5. Notable Quotes

  • "It's a misnomer that skills are only a single markdown file... they're more like a folder... they can have references to other things, scripts that it should run, and images." — Nick Nissi
  • "Descriptions are routing rules... they're less for us and they're more for the AI to determine when to use it." — Zach Proer
  • "The answer is one turn request with skill builder is the fastest way [to improve a skill]." — Zach Proer

6. Synthesis/Conclusion

Skills represent a shift from generic AI interaction to a structured, engineering-led approach. By treating AI instructions as modular, versionable, and testable code, developers can move beyond simple prompting into building robust, automated workflows. The most effective skills are those that combine clear, constraint-based instructions with deterministic script execution, allowing the AI to act as a specialized team member rather than a generic chatbot.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video