I built JARVIS from Iron Man with AI (NO CODE n8n tutorial)

Key Concepts

AI Personal Assistant: Building an automated personal assistant using n8n, inspired by Jarvis from Iron Man.
n8n: A workflow automation platform used to create the Jarvis personal assistant.
LLM Chain: A basic language model chain used for defining Jarvis's personality.
Child Agents: Separate workflows (email agent, calendar agent, contact agent, etc.) called by the main personal assistant agent to handle specific tasks.
HTTP Request Node: Used to interact with the 11 Labs API for text-to-speech conversion.
11 Labs API: A service used to convert text into speech, utilizing a cloned voice of Jarvis.
Telegram Bot: Used as the primary interface for interacting with the Jarvis personal assistant.
Voice Transcription: Converting voice messages from Telegram into text using a transcription node.
Prompt Engineering: Crafting detailed prompts to define the behavior and personality of AI agents.
JSON Handling: Ensuring valid JSON format for data transfer between nodes, especially for the 11 Labs API.

Jarvis Personal Assistant Build: A Detailed Breakdown

1. Introduction

The video demonstrates how to build an AI personal assistant, "Jarvis 2.0," using n8n, inspired by the Jarvis character from Iron Man. This version is an improvement over a previous iteration, offering enhanced functionality and a more refined personality. The assistant automates tasks like sending emails, managing calendars, and providing information.

2. Importing the Blueprint

The workflow is available as a JSON blueprint in the n8n community classroom.
The blueprint can be downloaded and imported into n8n to quickly set up the basic structure.
The resource file is located in the classroom section under "All n8n Automation" and then "Jarvis 2.0".

3. Demo Overview

The demo showcases Jarvis responding to a Telegram message requesting an email to be sent to John Doe, informing him of an event absence.
Jarvis autonomously retrieves John Doe's email from a contact database, composes the email, sends it, and provides a text and audio confirmation.
The email content is generated intelligently based on minimal instructions.

4. Building the Jarvis Personality and Voice

Jarvis Personality (LLM Chain):
- An "Advanced AI" -> "Basic LLM Chain" node is used to define Jarvis's personality.
- Anthropic's Claude model is selected as the language model (though OpenAI's ChatGPT could also be used).
- A detailed system prompt, available in the community resources, is crucial.
- The prompt defines Jarvis as a sophisticated, quick-witted AI assistant with a refined British accent, providing examples of how to respond.
- The prompt is pasted into the "Expression" field of the system prompt.
Text-to-Speech (HTTP Request):
- An HTTP Request node is used to interact with the 11 Labs API for text-to-speech conversion.
- The method is set to "POST."
- The URL points to the 11 Labs API endpoint, including the ID of the cloned Jarvis voice: api.elevenlabs.io/v1/text-to-speech/{voice_id}.
- Authentication is handled using a "Generic Credential Type" with "Custom Oauth 2."
- A JSON structure for the custom OAuth is provided in the community resources, requiring the user's 11 Labs API key.
- The API key is obtained from the 11 Labs account settings.
- A "Content-Type" header is added with the value "application/json."
- The request body is set to JSON, using a code snippet that cleans and validates the JSON output from the Jarvis personality LLM chain. This ensures compatibility with the 11 Labs API.
Output to Telegram (Send Audio):
- A Telegram "Send Audio" node is added to send the generated audio file back to the user.

5. Building the Main Workflow

Telegram Trigger:
- A Telegram trigger node is used to initiate the workflow upon receiving a message.
- The trigger is set to "On Message."
- Setting up the Telegram credentials involves interacting with the Telegram BotFather to obtain an access token.
Switch Node:
- A switch node is used to differentiate between voice and text messages.
- It routes voice messages to a transcription path and text messages directly to processing.
Download File (Voice Messages):
- A "Download File" node downloads the voice message file from Telegram.
Transcription (Voice-to-Text):
- A transcription node converts the downloaded audio file into text.
Set Node:
- A set node ensures that the output from both the text and voice message paths is in a consistent json.output format.
Personal Assistant Agent (Tools Agent):
- A "Tools Agent" node acts as the main personal assistant, delegating tasks to child agents.
- A system prompt defines the agent's role as efficiently delegating user queries to appropriate tools.
- The prompt is available in the community resources.
Child Agents (Separate Workflows):
- Child agents are separate n8n workflows that handle specific tasks (e.g., email, calendar, contacts, expenses).
- Examples include:
  - Email Agent: Sends emails.
  - Calendar Agent: Manages calendar events.
  - Contact Agent: Accesses contact information from an Airtable database.
  - Personal Expense Agent: Tracks and manages personal expenses, accessing data from a Google Sheet and a Pinecone vector database.
  - Research Agent: Gathers information from sources like Hacker News and the Serp API.
- Child agents are called using the "Call n8n Workflow" node, configured as a tool.
- Each child agent has its own system prompt defining its role and the tools it has access to.
- The "When executed by another workflow" trigger is used in child agents.
- The "From AI" functionality (indicated by three stars) in n8n is used to automatically define parameters for nodes within the child agents.
Calculator Tool:
- A calculator tool is included for handling mathematical operations.
Response to Telegram (Text Response):
- A Telegram "Send Message" node sends the text response from the personal assistant back to the user.
Jarvis Personality Integration (Voice Response):
- The text response from the personal assistant is also sent to the Jarvis personality LLM chain.
- Jarvis adds a witty comment to the response, reflecting his personality.
- The output from the Jarvis personality is then converted to speech using the 11 Labs API.
- The resulting audio file is sent back to the user via Telegram.

6. Conclusion

The video provides a detailed guide on building a sophisticated AI personal assistant using n8n, leveraging various AI models, APIs, and workflow automation techniques. The key takeaways include the importance of well-defined prompts, the modularity of child agents, and the integration of text-to-speech for a more engaging user experience. The presenter encourages viewers to join the n8n community for further learning and support.