Gemini 3.0 Pro (Lithiumflow & Orionmist - Tested): This Checkpoint of Gemini 3 is PRETTY GOOD!
By AICodeKing
Key Concepts
- Gemini 3: Google's upcoming AI model.
- LM Arena: A platform for testing and comparing AI models.
- Checkpoints: Different versions or stages of an AI model's development.
- Orion Mist: A new AI model checkpoint available on LM Arena, potentially Gemini 3 with grounding/search enabled.
- Lithium Flow: A new AI model checkpoint available on LM Arena, potentially Gemini 3 base model without grounding/search.
- Grounding/Search Tool: The ability of an AI model to access and use external information or perform searches.
- Quantization: A process of reducing the precision of a model's weights, often leading to smaller model sizes and faster inference, but potentially some performance degradation.
- ECPT: A previous AI model checkpoint mentioned for comparison.
- X28 checkpoint / X58 checkpoint: Earlier AI model checkpoints referenced for superior performance in specific areas.
- Tool Calling: The ability of an AI model to interact with external tools or APIs.
- Photogenius AI: A sponsor of the video, an AI-powered creation suite.
- Nano Banana: An AI image generation model available on Photogenius AI.
- VO3: A video generation model available on Photogenius AI.
New AI Model Checkpoints on LM Arena
The video discusses the recent launch of two new AI model checkpoints on LM Arena, which are strongly suspected to be related to Google's Gemini 3. These models are named Orion Mist and Lithium Flow.
- Orion Mist is believed to be the same model as Lithium Flow but with the grounding or search tool enabled. This means it can access recent events and external information.
- Lithium Flow is considered the base model without the grounding or search functionality.
While not officially confirmed by Google, the performance and characteristics observed align with expectations for Gemini 3 checkpoints.
Performance Comparison and Observations
The presenter tested Lithium Flow using 11 questions, as it represents the base model. The performance was compared against previous checkpoints, particularly ECPT.
- Floor Plan Generation: The floor plan generation was described as "not anything extraordinary" and "not as good" as previous checkpoints, similar to ECPT.
- SVG Panda Eating a Burger: This generation was rated as "pretty great," with good anatomy and color layout, surpassing ECPT and on par with better previous checkpoints.
- Pokeball Generation: The Pokeball generation was preferred over ECPT, with better colors and lighting. However, it lacked the background elements seen in previous checkpoints.
- Chessboard Generation: The chessboard generation was also considered good and better than ECPT, making good moves.
- 3D Minecraft Game: This generation was rated as "good" and similar to the 2HT checkpoint. It was described as "performant" and better than ECPT, though not as good in lighting as the X28 checkpoint, which is still considered superior.
- Majestic Butterfly Flying in the Garden: This generation was similar to ECPT and considered "pretty good." The environment was not as fleshed out as in the X58 checkpoint.
- Blender Script for Pokeball: This was rated as "pretty great," with correct lighting and functionality.
- General and Math Questions: The model performed "pretty great" on these questions, scoring above ECPT but below the two best previous checkpoints.
Analysis of Model Performance and Quantization
The presenter suggests that these new checkpoints might be more finely quantized versions of an existing model. This means the model's precision has been reduced, likely for efficiency and deployment, which can sometimes lead to a slight degradation in performance.
- The presenter believes that LM Arena endpoints are typically deployed to users, and if these are the deployed models, they might be running with "slightly lower thinking budgets."
- The observed performance is considered an improvement over ECPT but not necessarily groundbreaking.
- Some users on Twitter have reported that these models are "nerfed" or not as great as original checkpoints, which the presenter attributes to reliance on shared information rather than direct testing.
Comparison with "Flash" Model
There was speculation that these new models might be the "flash" model. However, the presenter strongly disagrees, stating that they are "in no way flash" and are worse than the first checkpoints but not "flash level degraded performance."
Call for Official Releases and Tool Calling
The presenter expresses frustration with the continuous release of checkpoints and urges Google to launch the final models and clearly label which checkpoint is being released.
A key concern for the presenter is the model's tool calling capability. While the raw generation capabilities are decent, the ability to interact with external tools is crucial for many users who integrate these models with coders. The presenter hopes that the deployed model will have "pristine tool calling."
Sponsor: Photogenius AI
The video is sponsored by Photogenius AI, an AI-powered creation suite.
- Features: Allows users to generate visuals from text prompts.
- Key Models: Supports Google's Nano Banana for images and VO3 for videos.
- Image Playground: Offers fast, high-quality image generation with Nano Banana, reference image support, and editing capabilities. It also includes Flux, Stable Diffusion, and Kandinsky.
- Video Playground: Supports Google VO3 with and without reference images, allowing rendering in different styles without complexity.
- 3D Model Generation: Enables users to upload a PNG (e.g., a simple robot) and get a printable 3D model, described as "cheap, quick, and surprisingly clean for rapid prototyping."
- Pricing: Competitive for VO3 and Nano Banana.
- Additional Tools: Includes avatars, background removal, logo, emoji, ads, and app icons.
- Discount: A 30% discount is available with the coupon code king30.
Conclusion and Future Outlook
The presenter finds the new checkpoints to be good and an improvement over ECPT. They are hopeful that the deployed model will be one of these checkpoints and not further degraded. The performance degradation due to quantization is considered justifiable for deployment. The presenter is also excited to see what the actual "flash" model will be.
The video encourages viewers to share their thoughts, subscribe to the channel, and check out the sponsor.
Notable Quotes
- "It doesn't seem that Google is going to launch Gemini 3 anytime soon, at least not this week, because they have now launched two new checkpoints on LM Arena."
- "Orion Mist is supposed to be the same model as lithium flow, but with the grounding or search tool enabled, while lithium flow is the base model without grounding or search enabled."
- "People on Twitter are saying that this model is supposedly not as great as the original checkpoints and a bit more nerfed, which aligns with what we saw in the ECP checkpoint."
- "I've only tested lithium flow because both models have very similar responses and lithium flow is the base model. So, we can test the base model this way."
- "This is great. The lighting of the scene is also good. But the previous checkpoints also used to add a background and stuff as well. So that is not available here."
- "It has been enough checkpoints at this point. It would be better if we just get the final model release now because I'm really getting fed up with all these checkpoints."
- "I really find all these models good and we'll always only get quantized models at deployment. We'll never see the base model. So the performance degradation is justifiable in my mind."
- "I also hope that this is good at tool calling as well because not many people use models for just raw capability. Many people use them with coders and that requires pristine tool calling."
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Gemini 3.0 Pro (Lithiumflow & Orionmist - Tested): This Checkpoint of Gemini 3 is PRETTY GOOD!". What would you like to know?