This AI video generator does it all!

Key Concepts

Cling01: A unified multimodal AI video model capable of processing text, images, and video inputs simultaneously for editing and generation.
Cling 2.6: Cling's latest model with native audio integration, excelling in high-action scenes and image-to-video generation.
Unified Multimodal Model: An AI model that can understand and process multiple types of data (text, images, video) concurrently.
Element Feature (Cling01): A function allowing users to create fine-tuned models of specific characters or objects for consistent insertion into videos.
Image-to-Video: The process of generating a video sequence from a static image.
Text-to-Video: The process of generating a video sequence from a text description.
Jiggle Physics: The simulation of realistic movement and deformation in soft objects or characters, particularly in animation.
Control Nets: A technique used in AI image generation to guide the output based on structural information from a reference image.
Nano Banana: Mentioned as a tool for generating reference images, possibly for use with Cling01.
Art List: A platform for video creators offering AI models, music, sound effects, and digital assets.

Cling01: A Unified Multimodal Video Model

Cling has released Cling01, a powerful and flexible unified multimodal model. This model can accept any combination of text, images, and video as input simultaneously.

Key Capabilities and Features:

Versatile Editing: Cling01 can perform various editing tasks, including:
- Replacing backgrounds.
- Replacing characters.
- Inserting new characters or objects from reference photos.
- Changing the style of a video (e.g., to anime style).
Input Flexibility: Users can drag and drop reference images of characters or objects, merge them into a video, and prompt for stylistic changes.
Video Editing: Existing videos can be edited by prompting for specific changes. For example, a statue can be replaced with a Christmas tree.
Prompt-Based Editing: The model understands content within images or videos and edits them according to user prompts.
Object Transformation: Existing objects within a video can be transformed, such as turning a sword into a flaming blade, eliminating the need for manual editing.
Background Swapping: Cling01 can seamlessly swap backgrounds in existing videos while preserving original elements like characters and objects.
Green Screen Generation: The background of a video can be converted into a green screen for later post-processing.
Character Generation and Merging:
- Two characters from separate images can be generated dancing and singing on stage, with outfits consistent with the reference images.
- Characters from different art styles can be merged seamlessly.
Character Replacement and Insertion:
- A character in a video can be replaced with another character (e.g., replacing a girl with an assassin) while maintaining the original pose and interaction with the environment.
Multi-Angle Scene Generation: From an original video, Cling01 can generate new shots from different angles (e.g., from behind or a front view), preserving the overall look of the original scene.
Product Video Generation: Cling01 can create product videos from reference images, placing the product in specified environments with dynamic elements like falling petals.
Product Consistency: The model excels at maintaining product consistency when replacing items in a video with new reference images, preserving details like logos and stitching.
Element Feature for Custom Assets:
- Users can create custom "elements" which are fine-tuned models of characters or objects.
- This involves uploading a frontal image and then having Cling generate additional views at different angles.
- These elements can then be named and described with text prompts.
- Custom elements can be plugged into any video, allowing for consistent insertion of specific characters or objects.
- Real-world Application: An influencer element can be used to promote products, generating marketing content or UGC by having the influencer wear specific clothing, hold products, and discuss them in a given setting.
Complex Scene Merging: Cling01 can merge multiple assets (video, elements, images) into a single video, inserting characters or objects into existing videos while maintaining motion and original video elements.
Style Transfer: Videos can be transformed into different styles, such as anime or 3D Pixar style.

Limitations of Cling01:

Artifacts: Some generations may exhibit artifacts, such as unclearly defined hands and fingers, and less detailed faces.
Expression Inconsistencies: In style transfer, character expressions might not always align with the intended mood (e.g., talking when not supposed to, smiling when a serious face is expected).
Color Tone Preservation: Style transfer generations might appear overly saturated and not preserve the original color tone.

Usage and Access:

Cling01 can be accessed via a new "O" icon on the left menu after signing up for a free account.
It can generate videos up to 10 seconds in various aspect ratios.
Credit System:
- Cling01 costs 30 credits for generations without uploading a video.
- It costs 45 credits when a reference video is uploaded.

Cling 2.6: Advanced Model with Native Audio

Cling has also released Cling 2.6, their latest and most advanced model, featuring native audio integration.

Key Capabilities and Features:

Text-to-Video and Image-to-Video: Cling 2.6 supports both text-to-video and image-to-video generation.
High-Action Scenes: The model excels at generating high-action cinematic scenes with dynamic camera movements and impressive physics.
- Example: A sorceress casting fireballs clashing with icy dragons, with explosive shockwaves and dynamic camera pans.
Anatomy and Physics: Cling 2.6 demonstrates remarkable accuracy in anatomy and physics, even in complex movements.
- Example: A gymnast performing a flip on a balance beam, with anatomically correct execution. This is highlighted as a difficult task that few other models can handle.
Detailed Scene Generation: The model can follow prompts with a high level of detail, accurately depicting complex scenarios.
- Example: A courier escaping a warehouse with a security guard in pursuit, including specific environmental details like flickering shadows and crackling radios.
Dialogue and Lip-Sync: Cling 2.6 can generate videos with spoken dialogue and achieve seamless lip-sync.
- Example: A podcast interview with accurate lip-sync and coherent dialogue, though sometimes with unexpected accent translations. The performance is considered on par with models like Vio and Sora.
Image-to-Video Excellence: Cling has a strong reputation for image-to-video generation, and Cling 2.6 continues this trend, often ranking high on leaderboards.
- Example: Generating a video from an anime image with spoken dialogue, though it may translate the specified language to English.
- Example: Generating a video from an image of Godzilla destroying a city, maintaining consistency and coherence.
Jiggle Physics: The model demonstrates impressive "jiggle physics," producing realistic movement in soft objects or characters.
Audio Generation: Native audio integration enhances the realism of generated scenes.
K-Pop/J-Pop Generation (with caveats): While prompted for specific music genres like K-pop, the model may default to English for singing, and the coherence of hands and faces can be inconsistent.

Limitations of Cling 2.6:

Text Rendering: Cling models, including 2.6, are known to perform poorly at rendering legible text within videos. Prompts involving text on whiteboards or signs often result in gibberish.
World Understanding and Diagrams: The model may understand concepts (like the Pythagorean theorem) but struggle to accurately render associated diagrams and text on visual elements like whiteboards.
Language Translation: In image-to-video, the model may translate specified foreign language dialogue into English, even when not prompted to do so.
Character Inconsistencies: While good at anatomy, specific character details or poses can sometimes be inconsistent or unusual (e.g., a character running diagonally from a dragon).
Audio Quality: While audio is natively built-in, the quality can sometimes be described as "weird" in certain scenarios.
Resolution: Currently outputs 1080p resolution videos, not yet supporting 2K or 4K like some competitors.

Usage and Access:

Cling 2.6 can be accessed via the "video" icon on the left menu, selecting "video 2.6" from the dropdown.
Generations are 5 to 10 seconds long with various aspect ratios.
Credit System: Each generation from Cling 2.6 costs 35 credits.

Comparison with Competitors

Cling 2.6 vs. Competitors: Cling 2.6 is presented as a "clear winner" in high-action cinematic scenes with audio.
Anatomy: Cling and Kind of High Loa are noted for their ability to generate anatomically correct complex movements, surpassing models like Vio and Sora in specific instances.
Fictional Characters: Sora 2 is highlighted as being particularly good at understanding and generating existing fictional characters.
LTX2: Mentioned as a competitor that can produce 2K or 4K resolution videos, which Cling 2.6 currently does not.
Vio and Sora: Cling 2.6 is considered on par with these models for speaking and lip-sync.

Art List Integration

Art List is promoted as a comprehensive platform for video creators, integrating AI models like Nano Banana Pro and the latest Cling models (01 and 2.6) with curated music, sound effects, and digital assets.
It allows users to generate videos from prompts using these AI models directly within the platform.
Example: Generating a scene of kung fu masters fighting on a rooftop with rain and lightning using Cling 01.

Future Outlook and Conclusion

The release of Cling01 and Cling 2.6 signifies a significant advancement in AI video generation and editing.
The trend towards unified multimodal models that can process diverse inputs and allow for intuitive, prompt-based manipulation is seen as the future of AI video.
December is anticipated to be a "crazy month" for AI video releases, with more state-of-the-art generators expected.
Both Cling01 and Cling 2.6 are available for free trial with a limited number of credits.
The presenter encourages viewers to share their experiences and discoveries with these models.
A weekly newsletter is recommended for staying up-to-date with AI news and tools.

This AI video generator does it all!

Key Concepts

Cling01: A Unified Multimodal Video Model

Key Capabilities and Features:

Limitations of Cling01:

Usage and Access:

Cling 2.6: Advanced Model with Native Audio

Key Capabilities and Features:

Limitations of Cling 2.6:

Usage and Access:

Comparison with Competitors

Art List Integration

Future Outlook and Conclusion

Chat with this Video

Related Videos

Ready to summarize another video?