Back to all videos

GLM-4.7 (Fully Tested): RIP Opus 4.5! The BEST Open Model is HERE!

By AICodeKing

Large Language Models AI Coding Benchmarking Open Source AI

Share:

GLM4.7: A Detailed Evaluation of the New Open-Weight Model

Key Concepts:

GLM4.7: The latest iteration of the GLM (General Language Model) series, an open-weight language model developed by a team the speaker has early access to.
Open-Weight Models: Language models where the model weights are publicly available, allowing for customization and self-hosting.
SWEBench, Multilingual SWEbench, Terminal Bench 2.0, T2Bench, HLE: Benchmarks used to evaluate the performance of language models in specific areas (Software Engineering, Tool Use, Mathematical Reasoning).
Agentic Benchmarks: Tests evaluating a model’s ability to perform complex tasks requiring planning, tool use, and iterative problem-solving.
UI Quality: The aesthetic appeal and usability of user interfaces generated by the model.
Kilo Code, Claude Code, Clin, Ru Code: Agent frameworks used for evaluating coding capabilities.
ZAI, Synthetic, Verdant: Third-party inference providers offering access to GLM4.7.
Coding Plan: A specialized, cheaper API access option for GLM4.7 focused on coding tasks.

I. Overview and Performance Improvements

The video focuses on a detailed evaluation of GLM4.7, positioning it as the best currently available open-weight language model. The speaker, having had early access, highlights significant improvements over previous GLM versions (4.5, 4.6, and earlier code models) and even challenges closed-source models like Claude 4.5 Sonnet. GLM4.7 demonstrates substantial gains across various benchmarks:

SWEBench: 6% improvement.
Multilingual SWEbench: 13% improvement.
Terminal Bench 2.0: Score of 16.5%.
HLE (Mathematical Reasoning): 42.8% – a significant boost compared to GLM 4.6.
Tool Using (T2Bench, BrowseComp): Significant performance improvements.

The speaker emphasizes that while GLM4.7 doesn’t aim to beat top-tier, closed-source models like Gemini or GPT-5.2 in raw benchmark scores, its performance is competitive and represents a substantial leap forward for open-weight alternatives. A key point is the model’s continued strength in visual design, previously a standout feature of GLM models, which has been further enhanced.

II. Visual Design and Code Generation Examples

The speaker presents several examples showcasing GLM4.7’s capabilities:

Floor Plan Generation: Functionally works, but the design layout is somewhat disorganized. The ability to hover over rooms to see their names is a positive feature.
SVG Panda with Burger: High-quality generation with good hand and body rendering, and includes animation (floating and blinking).
Pokeball in 3JS: Excellent rendering with realistic dimensions, light reflection, and overall visual quality.
Chessboard with Autoplay: Considered the best generation seen recently, with a sleek color scheme and realistic pieces (not emojis). The autoplay functionality adds to its appeal.
Minecraft Game: Successfully generates a playable Minecraft environment with mist, grass, and movement.
Majestic Butterfly: Accurate butterfly rendering with correct flight and wing flapping animation.
Rust CLI Tool & Blender Script: Good, but not exceptional, results.

These examples demonstrate GLM4.7’s improved ability to generate visually appealing and functional code-based outputs.

III. Agentic Benchmarks and Coding Performance

The speaker details GLM4.7’s performance on agentic benchmarks, which assess its ability to handle complex, multi-step tasks.

Go TUI Calculator (using Kylo Code): Successfully built a visual calculator using Lip Gloss and Bubble Tea, demonstrating strong coding capabilities.
Movie Tracker App (using Expo & TMDB API): Generated a functional app with a carousel of movies and detailed movie pages.
Spelta Conban App: Performance is mixed. While the login and inner pages are well-designed, the backend functionality is flawed.
Tori App (in Godo): Demonstrates improved game development capabilities, successfully generating life bar and jump mechanics.
Open Code Question: GLM4.7 successfully passes this challenging question, a significant achievement.

These benchmarks place GLM4.7 in fifth position on the leaderboard, above Claude 4.5 Sonnet and GPT-5.2, but below Opus and Gemini 3 Pro. The speaker notes that Gemini 3 Pro excels at one-shot questions but struggles with agentic tasks.

IV. Cost and Accessibility

A significant advantage of GLM4.7 is its affordability and accessibility.

ZAI’s GLM Coding: Offered at a very low price, starting around $3.
Coding Plan: A specialized API access option for coding tasks, offering even lower costs.
Open Weights: The open-weight nature allows for self-hosting and use through third-party inference providers like Synthetic and Verdant.

The speaker suggests that GLM4.7 is a compelling alternative to Gemini Flash, which he considers overpriced. He also notes that recent bug fixes have improved the model’s ability to handle long-running tasks.

V. Coding Plan Details & Reasoning

The speaker addresses questions about reasoning capabilities within the “Coding Plan” API. While the model does reason, the “thinking traces” (step-by-step reasoning logs) are not currently available through the Coding Plan API. The speaker acknowledges that the $6 plan can sometimes be slow but considers it worthwhile given the price.

VI. Conclusion and Future Outlook

The speaker concludes that GLM4.7 is currently the best open-weight model available, particularly for coding tasks. He highlights its speed, affordability, open weights, and strong performance. He anticipates that combining GLM4.7 with Opus could yield even more impressive results, which he plans to explore in future videos. He encourages viewers to share their thoughts and support the channel.

Notable Quote:

“It is the best open model yet by a long shot.” – The speaker, summarizing his overall impression of GLM4.7.

Technical Terms Explained:

API (Application Programming Interface): A set of rules and specifications that allow different software applications to communicate with each other.
Inference Provider: A service that hosts and runs language models, providing access to their capabilities through an API.
TUI (Text-based User Interface): A user interface rendered using text characters within a terminal or console.
3JS: A JavaScript library for creating 3D graphics in web browsers.
SVG (Scalable Vector Graphics): An XML-based vector image format.
Expo: A framework for building native mobile apps with JavaScript.
TMDB API (The Movie Database API): An API providing access to movie and TV show data.
Spelta: A full-stack web framework.
Godo: A game engine.

The video provides a comprehensive and positive assessment of GLM4.7, emphasizing its significant improvements over previous versions and its competitive position within the landscape of open-weight language models. The detailed examples and benchmark results offer concrete evidence of its capabilities, making it a valuable resource for developers and researchers interested in exploring this promising technology.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video