Back to all videos

Google, OpenAI and MiniMax Just Dropped Insanely Powerful AI at Once (Shocking Update)

By AI Revolution

Large Language Models AI Software Development AI Agents AI Hardware

Share:

Key Concepts

GPT 5.3 Codeex Spark: OpenAI’s model optimized for real-time coding assistance, prioritizing speed and responsiveness.
Gemini 3 Deepthink: Google’s specialized reasoning mode focused on complex problem-solving in science, research, and engineering.
Miniax M2.5: A cost-effective AI model designed for continuous agent operation, emphasizing affordability and practical task completion.
Wafer Scale Engine (WSE3): Cerebras’ specialized hardware designed for low-latency AI inference.
Test Time Compute: A technique where models are given more processing power during inference to improve reliability.
Reinforcement Learning (RL): A training method where agents learn through trial and error, receiving rewards for desired outcomes.
Agentic Tool Use: The ability of AI agents to utilize various tools and APIs to accomplish tasks autonomously.

OpenAI Codeex Spark: Real-Time Coding Assistance

OpenAI has launched Codeex Spark, a smaller, faster version of GPT 5.3 Codeex, specifically engineered for a “tight, real-time loop” during coding. Unlike models focused on long-form tasks, Spark prioritizes immediate responsiveness to maintain developer “flow.” This means quick edits, refactoring, and UI adjustments with minimal latency. OpenAI emphasizes that speed isn’t solely about the model itself, but also about optimizing the entire request-response pipeline – streamlining client-server communication, improving session initialization, and utilizing persistent websocket connections.

Spark runs on Cerebras’ Wafer Scale Engine 3 (WSE3), a specialized chip built on a massive silicon wafer containing 4 trillion transistors. This represents a significant $10+ billion investment in a “mixed compute future,” where GPUs remain cost-effective for broad usage, while Cerebras provides a “latency-first tier” for immediate interaction.

While Spark sacrifices some raw power compared to full GPT 5.3 Codeex (58.4% on Terminal Bench 2.0 vs. 77.3%), this trade-off is intentional. It’s designed for rapid iteration on small steps, not massive engineering projects. Spark is currently available to ChatGPT Pro users within the Codeex app, CLI, and VS Code extension, with a 128k context window and text-only input. Usage doesn’t count against standard rate limits during the research preview.

Google Gemini 3 Deepthink: Advanced Reasoning and Sketch-to-3D

Google’s Gemini 3 Deepthink is positioned as the opposite extreme – a model focused on “practical use in science, research, and engineering.” It excels at handling incomplete data, fuzzy constraints, and uncertain problem-solving scenarios. Google highlights Deepthink’s performance on several benchmarks: 48.4% on Humanity’s Last Exam (HLE), 84.6% on ARC AGI 2, a 3,455 ELO on Code Forces, and gold medal performance on the International Math Olympiad 2025.

These benchmarks demonstrate Deepthink’s capabilities in broad reasoning (HLE), generalization (ARC AGI 2), algorithmic thinking (Code Forces), and mathematical problem-solving (IMO). Google also utilizes “test time compute,” providing the model with additional processing power during inference to verify steps and reduce incorrect answers, crucial for high-stakes domains.

A key demonstration of Deepthink’s capabilities is its “sketch-to-3D printing” functionality. Users can draw a concept, and Deepthink analyzes the drawing, models the shape, generates a printable file, and enables physical creation. This exemplifies the model’s ability to translate fuzzy human input into concrete output. Deepthink is available through the Gemini app for Google AI Ultra subscribers and via the Gemini API with early access for enterprise users.

Miniax M2.5: Affordable and Always-On AI Agents

Miniax’s M2.5 model takes a different approach, focusing on affordability to enable continuous agent operation. Trained with reinforcement learning across hundreds of thousands of real-world environments, M2.5 is designed for coding, agentic tool use, search, and office work. Benchmarks include 80.2% on SWE Bench verified, 51.3% on MultiSWE Bench, and 76.3% on Browse Comp with context management, alongside a 37% speed improvement over M2.1 on SWE Bench.

However, the most significant aspect of M2.5 is its cost. Miniax claims continuous operation for an hour at 100 tokens per second costs approximately $1, dropping to around $0.30 at 50 tokens per second. This pricing structure is specifically designed for agent builders who require frequent retries, exploration, and iterative loops.

Miniax offers two versions: M2.5 and M2.5 Lightning, differing in speed. Both support caching. M2.5 is also designed to “plan like an architect,” breaking down features and structuring UI before coding, which reduces rewrites and improves code quality. The model supports over 10 programming languages and 200,000 environments, covering the full software development lifecycle. Miniax emphasizes strong tool calling and search capabilities, essential for autonomous tasks. Internally, Miniax reports that M2.5 completes approximately 30% of the company’s tasks autonomously and generates around 80% of newly committed code, indicating a deep integration into their workflow. Their in-house framework, Forge, facilitates rapid training and deployment of agents.

Logical Connections and Synthesis

These three launches – OpenAI’s Spark, Google’s Deepthink, and Miniax’s M2.5 – represent a significant shift in AI development. Spark addresses the need for speed in interactive coding, Deepthink focuses on depth and reliability in complex reasoning, and Miniax prioritizes affordability for continuous agent operation. They aren’t necessarily competing directly, but rather targeting different niches within the evolving AI landscape.

The common thread is a move towards more practical applications of AI. Spark aims to improve developer productivity, Deepthink tackles real-world scientific challenges, and Miniax enables cost-effective automation. The emphasis on hardware optimization (Cerebras for Spark) and reinforcement learning (Miniax) highlights the importance of tailoring both the model and the infrastructure to specific use cases.

The launches collectively suggest that the future of software development will involve a combination of AI-powered tools, specialized hardware, and continuous agent operation, ultimately reshaping how software is built and deployed. The question remains whether faster feedback (Spark) or deeper reasoning (Deepthink) will prove more crucial, but the emergence of affordable, always-on agents (Miniax) is likely to be a disruptive force in the industry.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video