Using Model Armor in ADK callbacks

Key Concepts

AI Agents: Autonomous entities powered by AI, designed to interact with users and systems.
Prompt Injection: A security vulnerability where malicious input manipulates an AI model’s behavior.
Data Leakage: The unintentional exposure of sensitive information by an AI model.
Model Armor: A service for enforcing policies to filter prompts and responses, mitigating AI security risks.
ADK (Agent Development Kit): A toolkit for building AI agents.
Callbacks: Hooks within the ADK that allow observation, customization, and control of an agent’s behavior.
Input Guardrail: Security measures implemented before an LLM call to sanitize user prompts.
Output Validation Guardrail: Security measures implemented after an LLM call to inspect and sanitize model responses.
LLM (Large Language Model): The core AI model powering the agent.
Sanitize User Prompt/Response: Model Armor API methods for filtering potentially harmful or sensitive content.
Redaction: The process of removing sensitive information from text.
CCN (Credit Card Number): A specific example of sensitive data requiring protection.

Securing AI Agents with Model Armor and ADK Callbacks

This video details how to enhance the security of AI agents built with the Agent Development Kit (ADK) by integrating Model Armor, a service designed to filter user prompts and model responses. The core principle revolves around utilizing ADK’s callback mechanism to establish security checkpoints throughout the agent’s interaction with the underlying Large Language Model (LLM).

The Threat Landscape for AI Agents

The introduction highlights the inherent security risks associated with granting users access to AI agents. These risks include prompt injection – where a user crafts input designed to manipulate the agent’s behavior – data leakage – the unintentional exposure of sensitive information – and the generation of harmful content. These vulnerabilities necessitate robust security measures.

ADK Callbacks: The Foundation of Security

The ADK’s callback mechanism is presented as the key to mitigating these risks. Callbacks provide “powerful hooks” to observe, customize, and control the agent’s behavior at critical points. The video focuses on two specific callback types: before model call and after model call.

Implementing Input Guardrails with 'Before Model Call'

The before model call callback is used to implement an input guardrail. This function executes immediately before the agent sends a request to the LLM. Its primary function is to inspect and sanitize the user’s prompt using the Model Armor API’s sanitize user prompt method.

Process: The callback intercepts the user’s prompt, sends it to Model Armor for analysis, and based on the results, either proceeds with the LLM call or halts it.
Action on Detection: If Model Armor detects a prompt injection attempt or other policy violation, the callback skips the LLM call entirely and returns a pre-defined, safe response. This prevents the malicious prompt from ever reaching the model.
Example: The video demonstrates this by showing a benign question proceeding normally, while a prompt containing a “telltale sign” of a malicious attempt is intercepted, and a static safe response is returned instead of the LLM’s output.

Implementing Output Validation with 'After Model Call'

The after model call callback is used to implement an output validation guardrail. This function executes after the LLM has generated a response but before the response is returned to the user. It utilizes the Model Armor API’s sanitize model response method to inspect the raw LLM output.

Process: The callback intercepts the LLM’s response, sends it to Model Armor for analysis, and based on the results, either returns the response as is, replaces sensitive parts of it, or blocks it entirely.
Replacement vs. Blocking: The video explains that replacing parts of the response is useful for preserving most of the information while filtering out specific sensitive data.
Example: A specific scenario is presented involving the detection of a Credit Card Number (CCN) in the LLM’s output. The callback is configured to redact all but the last four digits of the CCN, preventing full disclosure while still providing some useful information to the user. This demonstrates a practical application of redaction.

Logical Flow and Layered Defense

The video emphasizes that these callbacks create a layered defense. The before model call prevents harmful input from reaching the LLM, while the after model call prevents sensitive information from being exposed in the LLM’s output. This dual approach provides comprehensive security.

Supporting Resources and Future Development

The presenter directs viewers to documentation links for both ADK callbacks and Model Armor, as well as a previous video discussing agent integration with cloud environments ("Agent Factory episode"). They also solicit feedback from viewers regarding their projects and desired future content, mentioning a planned video demonstrating these concepts using “anti-gravity to vibe code an agent” – suggesting a more advanced, potentially creative implementation.

Conclusion

Integrating Model Armor with ADK agents via callbacks is presented as a crucial step in securing AI-powered applications. By implementing input and output guardrails, developers can significantly reduce the risks of prompt injection, data leakage, and harmful content generation, fostering a more secure and trustworthy user experience. The video stresses the importance of proactive security measures in the rapidly evolving landscape of AI agent development.