Back to all videos

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

By AI Engineer

"MCP "Evals "Role-Level Security and "PostgreSQL

Share:

Key Concepts

Skills: Folders containing instructions (skill.md) and reference files (scripts, markdown) that provide agents with custom workflows, context, or tools.
Progressive Disclosure: An architectural pattern where only the skill's metadata (front matter) is loaded into the agent's context initially, with the full content loaded only when the agent determines it is necessary.
MCP (Model Context Protocol): A standard for connecting AI agents to external data and tools. Unlike skills, MCP tools are often remote and do not require a local environment to execute.
Evals (Evaluations): A non-deterministic testing methodology used to assess agent behavior by comparing inputs against expected outputs or tool-calling patterns.
Role-Level Security (RLS): A PostgreSQL feature that restricts data access at the row level; critical for ensuring that database views do not inadvertently expose unauthorized information.
Security Invoker: A PostgreSQL flag that ensures a view executes with the permissions of the user calling it, rather than the user who created it, thereby respecting RLS policies.

1. Understanding Skills vs. MCP

Pedro clarifies the common misconception that Skills and MCP are competing technologies.

MCP: Best for integrations and connecting to services where the agent does not have direct access to the underlying environment. Tools run server-side.
Skills: Best for providing "contextual intelligence" and defining repeatable workflows. Scripts within skills run in the local environment, making them dependent on the host OS (Linux/macOS/Windows).
Synthesis: Use both. Use MCP for service connectivity and Skills to provide the agent with the "how-to" instructions and domain-specific context that don't fit into standard tool descriptions.

2. The Anatomy of a Skill

A skill is structured like a "book" for the agent:

skill.md: The "index on steroids." It contains front matter (Name, Description) and references to other files.
Front Matter: The only required component. It acts as an envelope for progressive disclosure.
Reference Files: Markdown or scripts (Python/Bash) stored in a reference/ folder. These can be nested to create a graph of information.

3. Evaluation Framework (Evals)

Testing LLM-based agents requires a shift from deterministic unit testing to "Eval-driven development."

The Cycle: Define metrics $\rightarrow$ Create/Update Skill $\rightarrow$ Run Evals $\rightarrow$ Grade $\rightarrow$ Iterate.
LLM-as-a-Judge: For non-deterministic outputs, use a secondary LLM to grade the performance of the agent based on success criteria.
Testing Strategy: Pedro recommends testing two conditions: one with the skill loaded and one without. This isolates the impact of the skill on the agent's decision-making process.

4. Real-World Application: Database Security

Pedro demonstrates a common vulnerability: creating a PostgreSQL view that bypasses RLS.

The Problem: By default, views in Postgres run with the creator's permissions, ignoring RLS policies.
The Solution: The agent must be instructed to use the SECURITY INVOKER flag.
Actionable Insight: Pedro found that using the verb "use" in the skill description significantly increases the likelihood of the agent loading the skill during a task.

5. Best Practices for Production

Documentation: Treat skills as living documentation. If a workflow changes, the skill must be updated alongside the code.
Maintenance: Periodically audit skills. If a skill is not being loaded or used by the agent, it should be removed to keep the context window clean.
CI/CD: In production environments, keep the skill set minimal and specific to the task, rather than maintaining a bloated global library of skills.

Conclusion

Skills represent a powerful way to guide agent behavior through progressive disclosure. While the ecosystem is still maturing, the combination of MCP for connectivity and Skills for contextual guidance provides a robust framework for building agentic applications. The most effective way to ensure reliability is to implement an automated evaluation pipeline that compares agent performance with and without specific skills, ensuring that the agent consistently adheres to security and operational standards.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video