The best free AI image model just got better! Z-Image Base

Zimage Full Model: Comprehensive Review & Tutorial

Key Concepts:

Zimage Full: A recently released, full-capacity, undistilled AI image generation model from Alibaba’s Tongi Lab. Not to be confused with the base model (Zimage Omni Base).
Zimage Turbo: A previous, faster, but less versatile model from Tongi Lab.
Lora (Low-Rank Adaptation): Fine-tuned models for specific styles, characters, or effects, added to base models for customization.
GGUF: A compressed model format allowing Zimage to run on lower VRAM systems.
VAE (Variational Autoencoder): Used to encode and decode images for processing within the AI model.
CFG (Classifier-Free Guidance): A setting controlling how closely the AI adheres to the prompt.
Inpainting: The process of editing specific areas of an image using AI.
Diffusion Model: The core technology behind image generation, gradually refining noise into an image.
Flux 2: Another open-source image generation model used for comparison.
Quen Image: An open-source model particularly strong at generating long text within images.

1. Introduction & Model Overview

The video focuses on the newly released “Zimage Full” model from Alibaba’s Tongi Lab, highlighting its capabilities and comparing it to Zimage Turbo and other open-source alternatives like Flux 2. Previously, only Zimage Turbo was available. Zimage Full is described as a “raw” model, ideal for fine-tuning and creating Loras, unlike the pre-tuned Zimage Turbo. It excels in generating realistic and diverse images across various art styles, photography styles, emotions, and poses. The presenter emphasizes the model’s potential for customization and offline, unlimited use.

2. Zimage Full vs. Zimage Turbo: Key Differences

A core comparison is drawn between Zimage Full and Zimage Turbo. While Zimage Turbo generates images quickly, Zimage Full offers significantly improved diversity in image generation. Using the same prompt and seed with Zimage Turbo often yields very similar results, whereas Zimage Full produces substantial variations. An example given is generating a selfie with four girls – Zimage Turbo produces nearly identical faces, while Zimage Full creates distinct individuals.

Furthermore, Zimage Full supports negative prompts – instructions specifying what not to include in the image. This feature is less effective with Zimage Turbo. Examples demonstrate how negative prompts can alter character ethnicity (removing “Westerner” to generate more Asian features) or emotional expression (removing “sad” to create a happier subject).

A comparison table summarizes:

Zimage Full: Requires more steps for generation, better diversity, superior for fine-tuning Loras.
Zimage Turbo: Faster generation, optimized for realistic photos and visual aesthetics, slightly higher visual quality.

3. Comparative Performance: Zimage vs. Flux 2

The video presents several comparative tests against Flux 2, another open-source model. Zimage (both Full and Turbo) demonstrates a superior ability to recognize existing people and characters. In prompts featuring celebrities like Hathaway, Jackie Chan, and Messi, Zimage accurately identifies them, while Flux 2 fails. Similarly, Zimage correctly renders anime characters like Miku, Nezuko, Gojo Saturo, and Sasuke, while Flux 2 struggles.

However, Zimage Full sometimes produces “plasticky” looking faces, reducing realism compared to Zimage Turbo. For realistic portraits, Zimage Turbo is preferred.

4. Testing Artistic Styles & Prompt Understanding

The presenter tests the models’ ability to handle various artistic styles:

Flat Illustration (Dots): Flux 2 performs best, accurately rendering an image composed of dots.
Manet Style Impressionism: Zimage excels, closely mimicking the rough brushstrokes characteristic of Manet’s paintings.
Minimalist Chinese Watercolor: Both Zimage models perform well, capturing the abstract brushstroke style. Flux 2’s output is too defined.
UI Design: All models generate reasonable UI designs, but Flux 2’s output is considered the least aesthetically pleasing.

Tests of prompt understanding reveal:

Long Text Generation: All models struggle with rendering long, coherent text. Quen image is recommended for this task.
Specific Challenges: A prompt requiring a specific time on a clock (11:15) and a full wine glass consistently fails across all models.
Anatomy: Zimage models generally follow anatomy prompts better than Flux 2.
Bali Beach Scene: Zimage Full correctly spells “Bali sunset,” while Zimage Turbo and Flux 2 misspell it.

5. Installation & Usage with Comfy UI

The video provides a step-by-step guide to installing and running Zimage Full locally using Comfy UI, a customizable platform for open-source AI generators.

Updating Comfy UI: Instructions are given for updating to the latest version.
Downloading Models: The necessary models are:
- Zimage BF-16 (12 GB): The core Zimage Full model.
- Quen 34B (7.8 GB): The text encoder.
- VAE (327 MB): For encoding and decoding images.
Loading Models in Comfy UI: The models are placed in the appropriate folders within the Comfy UI directory (diffusion models, text encoders, VAE).
Workflow Setup: The Zimage text-to-image workflow is selected within Comfy UI.
Key Settings: The presenter explains the function of settings like seed, step count (recommended 30-50 for Zimage Full), and CFG (recommended 3-5 for Zimage Full). The importance of negative prompts is reiterated.

6. Running Zimage Full on Low VRAM Systems

The presenter addresses running Zimage Full on systems with limited VRAM by utilizing GGUF (compressed model) versions. Smaller GGUF models (e.g., Zimage Q2K – 4 GB) are available, allowing the model to run on lower-end hardware, albeit with potentially reduced quality. The process involves:

Downloading a GGUF model.
Replacing the standard model node with a “Unit Loader GGUF” node in Comfy UI.
Refreshing the model list and selecting the downloaded GGUF model.

7. Image-to-Image & Inpainting

The video demonstrates how to modify the workflow for image-to-image generation and inpainting:

Image-to-Image: Replacing the noise canvas with a loaded image, using a VAE encoder to convert the image into latent space, and connecting it to the K sampler.
Inpainting: Loading an image, using a mask editor to select the area to be modified, and utilizing a “Set Latent Noise Mask” node to apply the changes. The “Denoise” setting controls the extent of the modification.

8. Loras & Future Developments

The presenter discusses Loras and their potential with Zimage Full. While existing Zimage Turbo Loras are currently incompatible, Zimage Full is well-suited for creating new Loras.

AI Toolkit by Oris: A platform for training Loras, requiring a dataset of labeled images.
Zimage Image to Lora (Diff Studio): A new tool allowing Lora creation from just a few input images, offering a faster, though potentially lower-quality, alternative.

9. Conclusion & Resources

The video concludes with a summary of Zimage Full’s strengths – its diversity, versatility, and potential for customization. The presenter encourages viewers to experiment with the model and provides links to resources in the description, including:

The official HuggingFace page.
Comfy UI installation tutorials.
Links to download models.
The AI Toolkit by Oris.
The Zimage Image to Lora tool.
A link to the presenter’s weekly AI newsletter.

Notable Quote:

“Zimage is a more raw and unpolished model, which you can fine-tune further.” – Highlighting the model’s potential for customization.

This summary aims to provide a detailed and specific overview of the video’s content, preserving the technical language and nuances of the original presentation.

The best free AI image model just got better! Z-Image Base

Zimage Full Model: Comprehensive Review & Tutorial

Chat with this Video

Related Videos

Ready to summarize another video?