This Browser Agent Automates ANYTHING (N8N + Skyvern)
By Ben AI
No-Code AI Agent for Web Automation: A Detailed Breakdown
Key Concepts:
- No-code AI agent
- Autonomous web browsing
- n8n (workflow automation platform)
- Skyvern (web browsing AI agent)
- Google Gemini (LLM)
- Google SERP API (search engine results page API)
- Workflow automation
- API integration
- Prompt engineering
- Web scraping
- Guard rails for AI agents
1. Introduction
The video demonstrates how to build a no-code AI agent capable of autonomously browsing the web and performing tasks like filling out forms, resolving CAPTCHAs, applying for jobs, and making purchases. The system leverages n8n for workflow automation and Skyvern as the browsing agent. The presenter, Ben, emphasizes the potential of such agents for automating high-volume tasks, especially on websites lacking APIs.
2. Limitations of Current Browser Agents
Ben acknowledges that browser agents are still in their early stages and have limitations:
- Slowness: They are generally slower than traditional automation methods.
- Cost: They can be expensive due to the computational resources required. Skyvern charges 10 cents per screen analyzed.
- Reliability: They are not yet 100% reliable and may encounter errors.
Despite these limitations, Ben believes that browser agents have significant potential, especially when combined with traditional workflow automation. He stresses the importance of implementing "guard rails" to ensure their responsible and effective use.
3. Use Case Demos
Ben showcases three use cases:
- Job Application: The agent finds UI/UX designer jobs and applies to them using a provided resume and additional information. The agent asks clarifying questions to gather all necessary information before applying.
- Example: The agent successfully applied to two UI/UX designer jobs, filling out forms and uploading a resume. The cost was 10 cents per application.
- Lead Generation: The agent performs outreach to plumbers in Miami by filling out contact forms on their websites.
- Example: The agent drafted a message offering lead generation services and filled out contact forms for two plumbing companies.
- Product Purchasing: The agent finds and adds a cookbook to a shopping cart on Amazon.
- Example: The agent found an Italian dessert cookbook and added it to the cart. This process cost 60 cents due to the multiple screens analyzed and the CAPTCHA encountered.
4. System Overview
The system consists of an AI agent connected to WhatsApp, running on Google Gemini 2.0, and utilizing three tools:
- Job Applying Tool
- Contact Us Form Tool
- Product Purchasing Tool
The agent is instructed to gather all necessary information before using a tool. Each tool corresponds to a separate workflow in n8n.
5. n8n Setup
The n8n workflow is structured as follows:
- WhatsApp Trigger: Receives messages (text or audio) from WhatsApp.
- Audio messages are transcribed using the OpenAI Whisper model.
- Agent: Processes the message and determines which tool to use.
- Tool Workflows: Execute specific tasks based on the chosen tool.
6. Job Applying Workflow (n8n)
- Workflow Input Trigger: Receives job details, resume link, and additional information from the agent.
- Set Variables: Stores the input data for easy access.
- LLM Chain (Google Search Query Generator): Generates a Google search query to find relevant job postings.
- Google SERP API: Executes the search query and retrieves search results.
- Split Out URLs: Separates the URLs from the search results.
- Loop Over Items: Iterates through each URL.
- Skyvern API Call: Sends the job URL, resume, and additional information to Skyvern to apply for the job.
- Uses the Skyvern V1 API with a specific workflow ID for job applications.
7. Contact Us Form Workflow (n8n)
- Workflow Input Trigger: Receives prospect information (name, email, etc.) and the outreach message from the agent.
- Set Variables: Stores the input data.
- LLM Chain (Google Search Query Generator): Generates a Google search query to find relevant companies.
- Google SERP API: Executes the search query and retrieves search results.
- Split Out URLs: Separates the URLs from the search results.
- Loop Over Items: Iterates through each URL.
- Web Scraping (HTTP Request): Scrapes the website to check for input fields.
- Code Step (JavaScript): Uses a regular expression to determine if the page contains input fields.
- If Statement:
- If input fields are present, send the URL and prospect information to Skyvern to fill out the contact form.
- If no input fields are present, loop to the next URL.
- Skyvern API Call: Sends the URL and prospect information to Skyvern to fill out the contact form.
- Uses the Skyvern V1 Tasks API.
8. Product Purchasing Workflow (n8n)
- Workflow Input Trigger: Receives a user prompt describing the desired product from the agent.
- Set Variables: Stores the input data.
- Skyvern API Call: Sends the user prompt to Skyvern to find and add the product to the cart.
- Uses the Skyvern V2 API, which requires only a user prompt and proxy location.
9. Skyvern Overview
Skyvern is a web browsing AI agent that has been trained on specific use cases like contact forms, job applications, invoice downloading, purchasing, and government form filling. It offers an API for integration with workflow automation platforms.
Key features of Skyvern:
- API Availability: Allows integration with n8n, Make.com, etc.
- Trained on Specific Use Cases: Improves performance and reliability for common tasks.
- Workflow Builder: Enables users to create custom workflows with guard rails for specific use cases.
- Planner, Task Execution, and Validator: Internal components that ensure tasks are performed correctly.
10. Skyvern Workflow Builder
The Skyvern workflow builder allows users to define the steps an agent should take for a specific task. This includes:
- Parsing the Resume: Extracting data from a resume.
- Looping Over Job URLs: Applying to multiple jobs.
- Adding Prompts: Providing guidelines for each step.
- Navigation Blocks: Guiding the agent through website navigation.
- Action Blocks: Specifying actions to take (e.g., filling out forms).
- Extraction Blocks: Scraping data from websites.
- Validation Blocks: Checking if the agent is on the right track.
11. Conclusion
The video provides a detailed overview of how to build a no-code AI agent for web automation using n8n and Skyvern. While browser agents are still in their early stages and have limitations, they offer significant potential for automating high-volume tasks, especially when combined with traditional workflow automation and when guard rails are implemented. The presenter emphasizes the importance of combining traditional workflow automation with browser agent APIs to create a powerful and efficient system.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "This Browser Agent Automates ANYTHING (N8N + Skyvern)". What would you like to know?