I Built an AI Photo-Generating Project in 8 Hours
By Harkirat Singh
TechnologyAIBusiness
Share:
Key Concepts
- Image Generation Platform: A SaaS platform for generating personalized images using AI models.
- Model Training: The process of customizing an AI model with user-provided images to generate images that resemble the user.
- Prompts: Textual descriptions used to guide the AI model in generating specific images.
- Packs: Pre-defined sets of prompts designed for specific themes or use cases (e.g., Tinder pack, Valentine's Day pack).
- Text Stack: Node.js (backend), Next.js (frontend), Clerk (authentication), file.ai (model hosting), PostgreSQL (database), Prisma (ORM), Bun (runtime), Digital Ocean (deployment).
- File.ai & Replicate: Platforms that offer APIs for training and running AI models, including image generation models.
- Webhooks: HTTP callbacks that allow external services (e.g., file.ai) to notify the application about events such as model training completion.
- Pre-signed URLs: Secure URLs generated by the backend that allow clients to upload files directly to object storage (e.g., S3, Bunny) without exposing credentials.
- Strategy Pattern: A design pattern used to enable easy swapping of different AI model providers (e.g., file.ai, Replicate, self-hosted).
- JWT (JSON Web Token): A standard for securely transmitting information between parties as a JSON object. Used here for authentication.
Main Topics and Key Points
1. Overview of Photo AI Clone
- The video aims to clone photo a.com, a SaaS platform generating $122k/month (1 CR INR) by allowing users to train AI models on their faces and generate images.
- Key features to replicate:
- User-friendly interface for model training (uploading photos, specifying attributes).
- Image generation based on prompts.
- Pre-defined "packs" with curated prompts for specific use cases.
- The target audience is "dumb" users who want to generate images without understanding the underlying technology.
2. Technology Stack and Architecture
- Frontend: Next.js, Shadcn UI (component library), Tailwind CSS (styling).
- Backend: Node.js or Bun (runtime), PostgreSQL (database), Prisma (ORM).
- Authentication: Clerk (outsourced authentication).
- Model Hosting: file.ai (initially), with potential for self-hosting later.
- Object Storage: S3 or Bunny (for storing training images and generated images).
- Deployment: Digital Ocean (simple VM).
- The architecture is designed to be provider-agnostic, allowing easy swapping of model providers (file.ai, Replicate, self-hosted).
3. Detailed Architecture and Endpoints
- Training Endpoint (/train):
- Receives a list of image URLs (uploaded to object storage by the client).
- Creates a zip file of the images.
- Uploads the zip file to object storage.
- Forwards the zip file URL to file.ai for model training.
- Uses a webhook to receive training completion notifications from file.ai.
- Image Generation Endpoint (/generate):
- Receives a prompt and model ID.
- Forwards the prompt and model ID to file.ai for image generation.
- Uses a webhook to receive image generation completion notifications from file.ai.
- Other Endpoints:
- /images: Load images with pagination (limit, offset).
- /generate-pack: Generate multiple images from a pre-defined pack.
- /packs: Get all available packs.
- /image: Get a specific image.
- Asynchronous Architecture:
- Model training is handled asynchronously using webhooks due to its long duration.
- Image generation can be implemented using polling or webhooks.
4. Database Schema
- Users: ID, username, profile picture (optional).
- Models: ID, name, type (enum: man, woman, others), age, ethnicity (enum), eye color (enum), bald (boolean), zip URL, trigger word (optional), tensor path (optional), status (enum: pending, generated, failed), user ID.
- Output Images: ID, image URL, prompt, status (enum: pending, generated, failed), model ID, user ID, file AI request ID, created at, updated at.
- Packs: ID, name.
- Pack Prompts: ID, prompt, pack ID.
5. Implementation Details
- Frontend:
- Uses Shadcn UI components for a consistent look and feel.
- Implements image upload using pre-signed URLs to S3-compatible storage (Cloudflare R2).
- Uses jsZip to create zip files of images in the browser.
- Uses Clerk for authentication.
- Backend:
- Uses file.ai's API for model training and image generation.
- Implements webhooks to receive notifications from file.ai.
- Uses Prisma for database access.
- Uses JWT for authentication.
- Authentication:
- Clerk is used for user authentication.
- JWTs are used to authenticate requests to the backend.
- A middleware is used to verify JWTs and extract the user ID.
6. Code Examples and Technical Terms
- Zod: A TypeScript-first schema declaration and validation library. Used to define and validate the structure of API requests.
- Prisma: A next-generation ORM for Node.js and TypeScript. Used to interact with the database.
- Pre-signed URLs: Secure URLs generated by the backend that allow clients to upload files directly to object storage (e.g., S3, Bunny) without exposing credentials.
- Strategy Pattern: A design pattern used to enable easy swapping of different AI model providers (e.g., file.ai, Replicate, self-hosted).
- JWT (JSON Web Token): A standard for securely transmitting information between parties as a JSON object. Used here for authentication.
z.enum
: Zod function to create an enum schema.z.object
: Zod function to create an object schema.z.infer
: Zod function to infer the TypeScript type from a schema.PrismaClient
: The generated client from Prisma that allows you to interact with your database.createMany
: Prisma function to create multiple records in a table.findUnique
: Prisma function to find a single record in a table.updateMany
: Prisma function to update multiple records in a table.getSignedUrl
: Function from the@aws-sdk/s3
library to generate a pre-signed URL for uploading to S3.jsZip
: A JavaScript library for creating, reading, and editing .zip files.
7. Logical Connections
- The video starts with the business opportunity and then breaks down the technical requirements.
- The architecture section logically leads into the database schema design.
- The backend implementation is driven by the need to interact with file.ai's API and handle webhooks.
- The frontend implementation is driven by the need to provide a user-friendly interface for model training and image generation.
- Authentication is integrated into both the frontend and backend to secure the application.
8. Data, Research Findings, or Statistics
- photo a.com generates $122k/month in revenue.
- The $50 plan on photo a.com allows users to generate 1,000 images.
- Replicate charges $32 for generating 1,000 images.
- file.ai charges $30 for generating 1,000 images.
9. Section Headings
- Key Concepts
- Main Topics and Key Points
- Technology Stack and Architecture
- Detailed Architecture and Endpoints
- Database Schema
- Implementation Details
- Code Examples and Technical Terms
- Logical Connections
- Data, Research Findings, or Statistics
- Section Headings
- A brief synthesis/conclusion of the main takeaways
10. Synthesis/Conclusion
The video provides a detailed walkthrough of cloning photo a.com, covering the technical architecture, database schema, frontend and backend implementation, and deployment. It emphasizes the use of outsourced services (Clerk, file.ai) to accelerate development and the importance of designing a provider-agnostic architecture. The video also highlights the challenges of integrating AI models and handling asynchronous operations using webhooks. The final product is a functional image generation platform with user authentication, model training, and image generation capabilities.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "I Built an AI Photo-Generating Project in 8 Hours". What would you like to know?
Chat is based on the transcript of this video and may not be 100% accurate.