Build Database Agents That Get Smarter With Every Query (n8n)

Key Concepts

Natural Language Query (NLQ): A retrieval method for AI agents to query structured data using natural language.
Vector Stores: Generally not suitable for structured data like spreadsheets or databases, as they retrieve isolated chunks of information without context, calculations, or grouping, leading to hallucinations.
Self-Improving Database Agents: AI agents that learn successful query patterns over time and adapt them for future similar questions.
Tool Calling: The ability of an AI agent to use external tools or services.
SQL Agents: AI agents specifically designed to interact with SQL databases.
Cypher: A query language for knowledge graphs, applicable to improving knowledge graph retrieval.
MCP (Meta-Cloud Platform): A service that abstracts database interaction complexities, allowing schema retrieval and SQL execution.
Superbase: A backend-as-a-service platform used in the demonstration, with an MCP server.
Schema: The structure of a database, including tables, columns, and relationships.
Normalized Tables: Database tables designed to reduce data redundancy and enforce data integrity.
Vector Store Node: A component in N8N used for semantic retrieval of past queries.
RAG (Retrieval Augmented Generation): A pattern that combines retrieval of information with generative AI.
System Prompt: Instructions given to an AI agent to guide its behavior and responses.
Database View: A virtual table based on the result-set of a stored SQL query, used to simplify complex data structures.
Parameterized Queries (Prepared Queries): Predefined queries with placeholders for dynamic data, enhancing security and control.
Role-Level Security (RLS): A security feature in databases that restricts data access based on user roles.
Principle of Least Privilege: Granting only the necessary permissions to users or agents.
CRUD Operations: Create, Read, Update, Delete operations on data.
SQL Injection Attacks: A security vulnerability where malicious SQL code is inserted into database queries.
Multi-User/Multi-Tenant: Systems designed to serve multiple users or organizations securely and independently.

Approaches to Interacting with Databases via AI Agents

This summary details five distinct methods for enabling AI agents to interact with databases, focusing on improving query accuracy, agent performance, and security. The core challenge addressed is moving beyond the limitations of vector stores for structured data and leveraging Natural Language Query (NLQ) for more effective database interaction.

1. Using MCP for Database Interaction with Self-Improvement

This approach utilizes MCP to abstract database complexities and incorporates a self-improving mechanism for the AI agent.

Core Functionality:
- Schema Retrieval: MCP is used to dynamically fetch the database schema (tables, columns, relationships) in JSON format, providing the agent with necessary context. This is achieved through tools like list tables and execute SQL via the MCP server.
- SQL Query Construction: The AI agent, armed with the schema and past successful queries, constructs SQL queries to answer user questions.
- Query Execution: The constructed SQL query is sent to the MCP server for execution against the database (e.g., Superbase).
- Response Handling: The database results are returned to the agent and then to the user.
Self-Improving Mechanism:
- Long-Term Memory: Successful queries (question and corresponding SQL) are stored in a vector database.
- Semantic Retrieval: When a new question is asked, the system searches the vector store for semantically similar past questions.
- Query Adaptation: If a similar question is found, the agent adapts the stored SQL query from its metadata to the current question.
- Feedback Loop: A save successful queries to memory tool triggers a sub-workflow that persists the question and its successful SQL query to the vector store, acting as a feedback loop for continuous improvement.
- RAG Implementation: This is described as a simple RAG implementation where the question is the content and the SQL query is stored in the metadata of the vector chunk.
MCP Implementation Details:
- MCP Client vs. Community Node: The presenter encountered issues with the official N8N MCP client tool when interacting with Superbase MCP servers. They opted for an MCP client community node, which proved more successful.
- Connection to Superbase: Requires the MCP URL from the Superbase project and an access token. Read-only access is recommended for security.
- Tool Specification: Within the AI agent, the execute SQL tool is explicitly specified, and the MCP schema for this tool call is added for increased reliability.
System Prompt Components:
- Schema Analysis: The agent analyzes the database schema.
- Prior Successful Patterns: Utilizes retrieved successful queries from memory.
- Validate Filter Values: The agent can perform an initial query to inspect database values and then use them as filter values for the final query. This is crucial when the agent doesn't have direct access to data but needs to filter based on specific values.
- Handling Unanswerable Questions: If a question cannot be answered or a query returns no rows, the agent is instructed to reply with "Sorry I don't know" to prevent hallucinations.
- Saving Successful Queries: If a query is successful, the save successful queries to memory tool is called.
- Context Aggregation: The schema and gathered successful queries are added to the agent's context.
Benefits of MCP:
- Abstracts away database interaction complexities.
- SQL is not required for the agent to get the schema.
Challenges with MCP:
- The official client had issues with Superbase MCP servers.
- The community node setup requires specific configuration.

2. Direct API Connection to PostgreSQL (without MCP)

This method bypasses MCP and uses a direct API connection to PostgreSQL for schema retrieval and query execution.

Core Functionality:
- Schema Retrieval: An SQL query is executed directly against the PostgreSQL database to fetch table names, columns, and their types. The query uses pg_catalog for efficiency, which is more portable than information_schema for other database types.
- Query Execution: A direct PostgreSQL connection is used to execute the constructed SQL queries.
Implementation Details:
- Credentials: Requires PostgreSQL connection credentials.
- Query for Schema: A specific SQL query is used to retrieve schema information. The presenter mentions a previous video where ChatGPT was used to create a view for this, but here it's executed directly.
- execute query Tool: A tool is used to execute queries directly within the PostgreSQL database.
Benefits:
- More reliable execution of SQL queries compared to some MCP configurations.
- Offers a middle ground by using MCP for table listing but direct connection for execution.
Considerations:
- Schema retrieval can be more complex.
- The generated schema data can be rich and potentially bloat the agent's context window.
- The schema can be hardcoded into the system prompt to reduce API calls.

3. Hardcoding the Schema within the Agent

This approach involves manually embedding the database schema directly into the AI agent's system prompt.

Core Functionality:
- Static Schema: The schema is obtained once (e.g., via MCP or direct query) and then pasted into the AI agent's system prompt.
- No Dynamic Schema Retrieval: The agent does not need to query the database for schema information during runtime.
Implementation Details:
- Schema Snapshot: The schema details are copied from a previous workflow's output.
- Prompt Injection: The copied schema is pasted into the system prompt of the AI agent.
- Exclusion of Sensitive Data: It's recommended to exclude sensitive information like chat history from the hardcoded schema.
Benefits:
- Reduces API calls to the database.
- Allows for selective inclusion of schema elements, focusing the agent on relevant tables and fields.
- Can exclude tables or fields the agent should ignore.
Drawbacks:
- Requires manual updates if the database schema changes.
- Less flexible than dynamic schema retrieval.

4. Using a Database View

This method involves creating a denormalized view of the database that flattens relationships, simplifying querying for the AI agent.

Core Functionality:
- Denormalized View: A database view is created as a stored query that combines data from multiple tables into a single, flattened structure.
- Simplified Agent Querying: The AI agent queries this single view as if it were a regular table, eliminating the need for complex joins.
Implementation Details:
- View Creation: A CREATE VIEW statement is used in the SQL editor to define the view. ChatGPT or Claude can assist in generating this query.
- Schema Representation: The schema of this view is then included in the AI agent's system prompt.
Benefits:
- Greatly reduces query complexity for the AI agent.
- Ideal for complex databases or when restricting agent access to specific fields.
- The agent only needs to query one "table" (the view).
Example: A view named full_order_details is created, flattening order and customer information.

5. Parameterized Queries

This approach provides the AI agent with access to specific, predefined queries with placeholders for dynamic data, offering maximum control and security.

Core Functionality:
- Predefined Queries: The agent is given access to a set of specific SQL queries (e.g., get order list, get product list).
- Dynamic Data Insertion: Placeholders within these queries allow for the insertion of dynamic data (e.g., order status filter).
- No Arbitrary SQL Execution: The agent cannot execute arbitrary SQL commands, significantly reducing security risks.
Implementation Details:
- Tool Definition: Each parameterized query is defined as a tool for the AI agent.
- Variables: The queries include variables that can be populated with user-provided or agent-determined values.
Benefits:
- Increased Reliability: Predictable query execution.
- Faster Responses: Optimized for specific tasks.
- Reduced Risk of Misuse: Prevents SQL injection attacks and unauthorized data access.
- Enhanced Security: Allows for more confident granting of CRUD access.
- Deterministic Workflows: Suitable for predictable and controlled operations.
- User/Tenant Specific Data: Can be used to hardcode user IDs or tenant IDs for data isolation.
Use Cases:
- Deterministic workflows.
- Customer-facing agents where specific data needs to be returned.
- Multi-user or multi-tenant applications requiring strict data separation.

Security Considerations for AI Agents Accessing Databases

Role-Level Security (RLS):
- Importance: Crucial for preventing unauthorized data access. If RLS is not enabled (e.g., in Superbase, indicated by warning signs), anyone with the anonymous public key can access all database data.
- Application: Essential when using public keys in applications.
Principle of Least Privilege:
- Read-Only User: Create a dedicated read-only user for the AI agent if it only needs to read data.
- Table Access: Grant access only to the specific tables required by the agent.
Parameterized Queries for Control:
- For highly controlled data surfacing, parameterized queries are the most secure option, providing access to specific tools and parameterized queries. This sacrifices flexibility for enhanced control.
Exploratory vs. Deterministic Tools:
- Exploratory/Analytics/Co-pilot: Giving agents access to schema and arbitrary queries is suitable for these use cases.
- Deterministic/Controlled: Parameterized queries are the preferred method for controlled environments.
Data Separation in Multi-User/Multi-Tenant Scenarios:
- Requires careful attention to data separation, with resources recommended for further learning on "zero trust RAG."

Conclusion and Key Takeaways

The video emphasizes that vector stores are generally unsuitable for structured data like databases. Instead, Natural Language Query (NLQ) is presented as a powerful and underrated retrieval method. The core innovation discussed is the creation of self-improving database agents that learn from past successful queries, enhancing their accuracy and efficiency over time.

Five distinct approaches for AI agent-database interaction are detailed, each with its own trade-offs in terms of complexity, flexibility, and security:

MCP with Self-Improvement: Abstracts complexity and learns from past queries, but can have implementation challenges.
Direct PostgreSQL API: Offers more reliable query execution but requires direct credential management.
Hardcoded Schema: Simplifies runtime operations by embedding schema directly, but requires manual updates.
Database Views: Denormalizes data to simplify agent querying, ideal for complex schemas.
Parameterized Queries: Provides maximum security and control by limiting the agent to predefined, specific queries, best for deterministic workflows.

Crucially, robust security measures are highlighted, including enabling Role-Level Security (RLS), adhering to the Principle of Least Privilege, and leveraging parameterized queries for sensitive operations. The choice of approach depends on the specific database complexity, desired level of control, and the agent's intended use case, ranging from exploratory tools to highly deterministic customer-facing applications. The self-improvement mechanism, implemented via a RAG pattern with a vector store for long-term memory, is a key enabler for building more intelligent and reliable database agents.