How to Use Real-Time Web Search with LLMs (Single Page, Search, Allowed Domains)

Key Concepts

AI Agent Development Pattern: Combining internal knowledge (e.g., handbooks) with web search capabilities.
Internal Knowledge: Data sources like company handbooks or policy documents.
Web Search:
- Specific URL: Retrieving data from a known web page.
- Broader Search: Allowing an agent to loop and collect information from the web.
RAG Pipeline (Retrieval-Augmented Generation): A framework for AI agents to access and utilize external knowledge.
Tools: Functions or APIs that an AI agent can call to perform specific tasks (e.g., web search, document retrieval).
Dockling Library: A Python library for converting web page HTML into usable formats like Markdown.
Pydantic Models: Used for defining data structures and enabling type-safe validation and structured output from AI models.
OpenAI API: Used for interacting with large language models (LLMs) and their tools.
Reasoning Models (e.g., GPT-5): LLMs capable of more complex decision-making and iterative search.
Structured Output: The ability of an LLM to return information in a predefined format (e.g., JSON, Pydantic models).
Citations: Providing the sources of information used by the AI agent.
Domain Filtering: Restricting web searches to specific websites or domains.
Tool Orchestration: The agent's ability to decide which tools to use and in what order.
Interactive Agent: An agent that can maintain conversation history and context.

Combining Internal Knowledge with Web Search for AI Agents

This tutorial outlines a common pattern in AI development: integrating internal knowledge bases with web search capabilities to create more robust and informed AI agents. The process involves building an agent that can handle user queries by deciding whether to consult internal documents, search a specific URL, or perform a broader web search, potentially filtered by domain. This approach is increasingly requested by clients for building internal agents with RAG pipelines.

1. Getting a Single Web Page

The first step demonstrates how to extract content from a specific web page.

Goal: To grab HTML content from a given URL and convert it into a format suitable for LLMs.
Tool: The dockling library is used for this purpose. It's a simple pip install and provides a convert function.
Process:
1. Define Pydantic models for data validation (e.g., a URL model).
2. Use dockling.Document.convert to fetch and convert the HTML content of a URL.
3. Export the converted content to Markdown using document.export_to_markdown().
4. Pass the Markdown content to an LLM (e.g., GPT-4.1 nano) with instructions to summarize it.
Example: The EU AI Act page is used as an example. The output is a Markdown representation of the page's content.
Technical Term: HTML (HyperText Markup Language): The standard markup language for documents designed to be displayed in a web browser.
Technical Term: Markdown: A lightweight markup language with plain-text formatting syntax.

2. Broader Web Search with Domain Filtering

This section details how to perform more dynamic and agentic web searches.

Goal: To allow an agent to search the internet, potentially within specified domains, and return structured results with citations.
Model: GPT-5 (nano version for speed) is used as a reasoning model.
Data Models: Pydantic models are defined for SearchResult (containing answer and citations) and Citation (containing text and URL).
Process:
1. Define allowed domains to restrict the search scope.
2. Use the OpenAI API's built-in web_search tool.
3. Specify the tool_choice as "auto" and include all sources.
4. Provide the user query as input.
5. Instruct the model to return results in the defined SearchResult Pydantic model.
Example: A query about current Dutch government policies is used, with the search restricted to specific government domains (e.g., regering.nl, overheid.nl).
Key Point: Reasoning models are crucial for complex web crawling and decision-making during search. Latency can be managed by scaling down to faster, non-reasoning models if needed.
Technical Term: Agentic: Refers to the autonomous and decision-making capabilities of an AI agent.

3. Searching Internal Knowledge (Handbook)

This part focuses on integrating internal documents into the agent's knowledge base.

Goal: To enable the agent to retrieve information from an internal handbook.
Methodology: For simplicity in this example, the entire handbook (a Markdown file) is loaded into memory. In production, a RAG pipeline would be more appropriate for larger documents.
Process:
1. Define a function to read the handbook file and return its content as text.
2. Create a tool definition for this function, making it available to the LLM.
3. When a query requires information from the handbook, the agent calls this tool.
4. The retrieved handbook content is then used by the LLM to formulate an answer, with citations pointing to the handbook.
Model: GPT-4.1 nano is used for speed.
Example: The agent is asked about requirements for registering an AI system. It first attempts a direct response based on its system prompt. If the information is not readily available, it calls the handbook search tool.
Key Argument: The agent intelligently decides when to use a tool versus responding directly from its internal prompt, demonstrating a core aspect of tool orchestration.
Structured Output: Pydantic models are used to separate the answer from citations.

4. Bringing All Components Together: The Search Agent

This section explains how to combine the previously developed functionalities into a single, dynamic agent.

Methodology: Code is structured into a tools folder to create simple abstractions and avoid monolithic files.
Process:
1. Each functionality (get single page, search handbook, web search) is encapsulated as a separate tool within the tools folder.
2. These tools are imported into the main agent file.
3. A unified tool definition is created, listing all available tools.
4. An ask_agent function is implemented to handle tool calls and process LLM outputs.
5. Prompting strategies are used to steer the agent's behavior.
Example: The agent is tested with scenarios requiring:
- Direct response (e.g., "What can you do?").
- Handbook search only.
- Specific web page retrieval.
- Broader web search.
- Multiple tool calls: The agent can sequentially use different tools (e.g., search handbook, then retrieve a specific web page) to answer a complex query.
Key Point: The system is flexible enough to handle multiple tool calls, and advanced models can manage a significant number of tools (15-20).
Technical Term: Tool Orchestration: The process by which an AI agent selects and sequences the use of available tools to fulfill a user's request.

5. Interactive Agent (Terminal Chat)

The final part demonstrates how to wrap the agent in an interactive chat interface.

Goal: To create a conversational agent that maintains chat history and memory.
Implementation: A Python script is provided that can be run in the terminal.
Process:
1. The agent's capabilities are exposed through a chat interface.
2. The agent keeps track of conversation turns and updates its memory.
3. This allows for multi-turn conversations, mimicking a real AI assistant.
Key Takeaway: This showcases how the combined functionalities can be integrated into a user-facing application.

Conclusion

The tutorial successfully demonstrates a practical pattern for building AI agents that leverage both internal knowledge and external web search. By breaking down the process into modular tools and using structured output with Pydantic models, developers can create sophisticated agents capable of complex information retrieval and synthesis. The emphasis on code organization and clear prompting strategies provides actionable insights for implementing similar functionalities in real-world applications. The provided examples serve as isolated snippets, enabling developers to quickly integrate these patterns into their own projects.