Extract Text From Images & PDFs Using AI (n8n tutorial)

Key Concepts

AI Agent, Telegram, PDF Processing, Image Processing, OCR (Optical Character Recognition), NADN, Blueprints, Workflows, Triggers, Tools, Chat Model, Memory, Google Sheets, JSON Data, OpenAI, API Keys, System Prompt, User Message, Assistant Message, Split Out, File ID, Caption, Unstructured Data, Structured Data.

AI Agent for Automated Invoice and Receipt Processing

Overview

The video demonstrates how to build an AI agent using NADN that automatically extracts data from PDF invoices and image receipts received via Telegram and uploads it into a Google Sheet. The agent uses OCR to process both document types, structuring the extracted data for easy analysis and reporting. The blueprints for this project are available for free download.

Workflow Breakdown

The build consists of three main workflows:

Main AI Agent (Telegram Trigger): Receives messages from Telegram, determines whether the message contains a PDF document or an image, and then calls the appropriate sub-workflow (PDF Processor or Image Processor).
PDF Processor: Extracts data from PDF invoices.
Image Processor: Extracts data from image receipts.

Building the Main AI Agent

Telegram Trigger Setup:
- Uses the "on message" trigger to initiate the workflow when a new message is received in Telegram.
- Connects to a Telegram bot using an access token obtained from BotFather.
- The access token is created by initiating a chat with BotFather, creating a new bot, and following the provided instructions.
- The bot name and username are defined during the bot creation process.
AI Agent Configuration:
- Uses the NADN AI Agent node to determine the appropriate action based on the message content.
- The "Chat Model" is set to OpenAI, acting as the "brain" of the operation.
- "Window Buffer Memory" is optionally used to enable the AI agent to recall past messages.
- The "System Message" provides context to the AI agent, defining its role as an intelligent document processing assistant.
- The system prompt instructs the agent to call the document processing tool for PDF documents and the image processing tool for image receipts.
- The "User Message" is set to the caption of the Telegram message.
Telegram Response:
- Sends a message back to the Telegram chat with the output from the AI agent.
- The "append NADN attribution" option is disabled to remove the default NADN branding from the message.

Building the PDF Processor Workflow

Execute Workflow Trigger:
- This workflow is triggered by the main AI agent.
- Defines input parameters: "ID" (file ID) and "Message" (caption).
Telegram Get File:
- Downloads the PDF file from Telegram using the file ID.
Extract from PDF:
- Extracts text from the PDF file using the "Extract from PDF" node.
OpenAI Message Model:
- Uses OpenAI's GPT model to structure the extracted text into JSON data.
- The "System Message" defines the AI agent as an intelligent bot capable of extracting data from invoices, receipts, or documents.
- The "User Message" provides the extracted text to the AI agent.
- The "Assistant Message" defines the desired JSON data format, including fields like invoice number, invoice date, due date, billing info, and line items.
- Sample JSON data is generated using ChatGPT to define the structure.
Split Out:
- Splits the line items from the JSON data into individual items for processing.
Google Sheets Append:
- Appends each line item to a Google Sheet.
- Connects to the Google Sheet using credentials.
- Selects the target spreadsheet and sheet.
- Maps the extracted data fields (name, price, quantity, invoice number, date, description) to the corresponding columns in the Google Sheet.

Building the Image Processor Workflow

Execute Workflow Trigger:
- This workflow is triggered by the main AI agent.
- Defines input parameters: "ID" (file ID) and "Message" (caption).
Telegram Get File:
- Downloads the image file from Telegram using the file ID.
Edit Image:
- Uses the "Edit Image" node to composite the image, effectively changing the MIME type to a format compatible with OpenAI.
OpenAI Analyze Image:
- Analyzes the image using OpenAI to extract text.
- The input type is set to "binary," and the data is pulled from the "Edit Image" node.
OpenAI Message Model:
- Uses OpenAI's GPT model to structure the extracted text into JSON data, similar to the PDF Processor workflow.
Split Out:
- Splits the line items from the JSON data into individual items for processing.
Google Sheets Append:
- Appends each line item to a Google Sheet, similar to the PDF Processor workflow.

Key Arguments and Perspectives

AI Agents as Intelligent Automation Tools: The video emphasizes the power of AI agents in automating complex tasks like document processing.
Importance of Structured Data: The video highlights the need to transform unstructured data (raw text) into structured data (JSON) for efficient analysis and reporting.
Flexibility and Customization: The demonstrated workflow can be adapted to handle various document types and integrate with different applications (e.g., QuickBooks, accounting software).

Conclusion

The video provides a detailed guide on building an AI agent that automates invoice and receipt processing using NADN, Telegram, and Google Sheets. By leveraging OCR and AI, the agent extracts data from PDF documents and images, structures it into JSON format, and uploads it to a Google Sheet for further analysis. The free blueprints enable users to easily implement this solution in their own businesses. The speaker also promotes his school community for additional resources and support on AI automation.