Gemini 3.0 Pro (Early Checkpoint - Tested): OH MY GOD! IT'S #1 & This IS THE CRAZIEST SOTA Model!

Key Concepts

Gemini 3 Pro Checkpoint: An early, highly performant version of Google's Gemini 3 Pro model, accessible via an AB test.
AB Test: A method used by Google AI Studio to randomly serve different model versions (e.g., Gemini 2.5 Pro, Gemini 3.0 Flash, Gemini 3.0 Pro) to users.
Checkpoint ID (2HT): A specific identifier in network logs indicating the Gemini 3.0 Pro checkpoint.
One-shot Generations: The ability of the model to produce high-quality output on the first attempt without further prompting or refinement.
Thinking Variant/Model: An AI model that performs explicit reasoning steps, potentially indicated by longer first-token generation times.
Multimodal: The capability of an AI model to process and generate various types of data, such as text, images, and code.
Tool Calling: The ability of an AI model to effectively integrate and utilize external tools or APIs.
Leaderboard Performance: A ranking system used by the speaker to compare AI model capabilities, where Gemini 3 Pro achieved the top position.

Accessing and Testing Gemini 3 Pro

A checkpoint of Gemini 3 Pro is currently accessible through an AB test within Google's AI Studio. Users can encounter this by selecting Gemini 2.5 Pro and sending messages; occasionally, the AB test will serve either Gemini 3.0 Flash or Gemini 3.0 Pro. The presence of Gemini 3.0 Pro can be verified by checking network logs for a checkpoint ID starting with "2HT".

The speaker conducted extensive testing on this model using 13 general questions. A significant challenge was the rarity of encountering the correct Gemini 3.0 Pro checkpoint, which appeared only approximately once in 50 prompts. Despite this, the testing revealed exceptionally high performance.

Generative Capabilities and Performance Benchmarks

The Gemini 3 Pro checkpoint demonstrated remarkable generative capabilities across a diverse range of tasks:

Floor Plan Generation: Described as "one of the most insane generations" and "the most sensible generation I've seen so far from any model." It correctly placed the entry, living room, kitchen, dining area, and door spaces. The only minor flaw noted was the placement of a washroom at the front of a room.
SVG Panda with a Burger: Produced a "pretty great" image with an "amazingly good" burger and realistic interaction from the panda.
Pokeball Built with 3.js: Considered "one of the best, if not the best generations I have seen yet," praised for its visual quality and effective lighting. These were all one-shot generations.
Autoplay Chess Game: Stood out as the "first model that I have seen that doesn't use the purple and blue colors," suggesting highly curated training data. It also uniquely generated a "proper chessboard with pieces placed at top when a piece is eliminated."
Minecraft Game in Kandinsky Style: Achieved the "best generation for this prompt" seen by the speaker, with excellent visual quality and a "great FPS," indicating strong performance.
Butterfly Flying in the Garden Simulation: While "really good," it was noted as "not the best generation," with GPT5 performing better in this specific area. The speaker believed better results could be achieved with more attempts.
CLI Tool for Image Conversion: Performed "good but not the best."
Blender Script for Making a Pokeball: This generation was particularly impressive, as the model not only created the Pokeball but also "set up the lighting and the camera," with "lighting being reflected correctly." The speaker stated, "Only Opus is at the level of being this good. And this is even better than that."

Problem-Solving and Reasoning

Beyond generation, the model excelled in problem-solving:

AIM Questions and Riddles: It "aced every one of them." For a specific AM question, Gemini 3 Pro succeeded in "one shot," a task that "even takes GPT 53 or four tries." Sonnet models reportedly failed this question, even with "Max thinking." A simple riddle was also "obviously crushed" by Gemini 3 Pro.

Comparative Analysis and Leaderboard Position

The Gemini 3 Pro checkpoint achieved the number one position on the speaker's leaderboard, demonstrating a "wide margin of a 25% improvement over Sonnet 4.5." This significant leap in performance positions it as a major upgrade, potentially "a real upgrade after 3.5 sonnet."

Future Implications and Pricing Expectations

The speaker speculates that this checkpoint represents the Pro model of Gemini 3. While acknowledging that the exact model might not be shipped (referencing a past "Zenith Checkpoint of GPT5" that was never released), the current performance suggests a highly capable version.

The speaker expressed willingness to "gladly pay the price that Sonnet charges" for this model, believing it to be "genuinely better." The model is likely a thinking variant, as indicated by a delay in generating the first token during AB tests, even without explicit "thinking traces."

Regarding pricing, the speaker calculated the cost based on token counts at Sonnet's rate, concluding that it "aces the price to performance chart." While it could potentially be a Gemini 3 Ultra model, the speaker believes Ultra models have been discontinued, making the Pro version more likely.

Broader Impact on Google's AI Ecosystem

The speaker highlights ongoing improvements across Google's AI products, including Gemini CLI, Jules, and AI Studio's app generator, all receiving "great upgrades." The overall sentiment is that Google's offerings are "superior as a product compared to Anthropics or OpenAI's offerings." The speaker concludes that "the only thing limiting these products right now is the model, and Gemini 3 will really resurrect all of them by a lot." The expectation is that Gemini 3 will also be multimodal.

Conclusion

The Gemini 3 Pro checkpoint, despite its infrequent appearance in AB tests, showcases an unprecedented level of performance in both generative tasks and complex problem-solving. Its ability to produce highly sensible floor plans, intricate 3D models with correct lighting, and ace challenging reasoning questions positions it significantly ahead of current leading models like Sonnet and even GPT5 in several benchmarks. This early glimpse suggests that Gemini 3 Pro could be a transformative release, offering a substantial upgrade to Google's AI ecosystem and potentially setting new industry standards for AI capabilities.