Zhipu Just Dropped Full Stack AI Model on China Chips: West Panics!
By AI Revolution
Key Concepts
- GLM Image: A multimodal image generation model developed by Zepu AI and Huawei, trained entirely on domestic Chinese hardware (Ascend Atlas 800 TA2 & Mindspore).
- Veo 3.1: Google’s upgraded text-to-video model with improvements in control, consistency, and format support (including vertical video and 4K upscaling).
- MedGemmma 1.5: Google’s open-source medical AI model for processing medical imaging (CT, MRI, pathology slides) and text data. Available in 4B and 127B parameter sizes.
- MedASR: Google’s medical speech recognition model, designed for clinical audio transcription with significantly improved accuracy compared to Whisper Large V3.
- Rokid Eyeglasses Style: AI-powered glasses featuring voice assistance, a 4K camera, translation capabilities, and integrated payment systems (Alipay Plus & Glass Pay).
- Full Stack AI: The concept of controlling the entire AI development pipeline, from hardware and frameworks to models and applications, to achieve independence and optimization.
- Synth ID: Google’s digital watermarking technology for AI-generated videos, aimed at transparency and content verification.
China’s AI Independence: Zepu & Huawei’s GLM Image
Zepu AI, in partnership with Huawei, has open-sourced GLM Image, a new generation multimodal image generation model. This is significant not for the model’s quality alone, but because it’s the first such model in China fully trained on domestically produced hardware – Huawei’s Ascend Atlas 800 TA2 and utilizing Huawei’s Mindspore AI framework. This achievement addresses the critical bottleneck in AI development: compute power and independence from foreign hardware, specifically Nvidia. Zepu emphasized that the entire pipeline, from data processing to training, was built and optimized for this domestic stack.
Technically, GLM Image employs a hybrid auto-regressive plus diffusion decoder architecture, differing from the more common latent diffusion model (LDM) approach. Zepu claims this hybrid approach enhances integration between language and image generation, improving results in knowledge-intensive scenarios. Research fellows at Zepu reported achieving near-practical limits of the Ascend Atlas 800 TA2 hardware through joint debugging and optimization with Huawei.
Zang Wendy of Zepu AI stated the project’s goal was “full stack innovation,” validating the architecture and adapting training/inference for Ascend devices with Huawei’s support to remove bottlenecks. The cost of generating a single image via the API is 0.1 yuan (approximately $0.014), representing mass-scale pricing. Industry expert Tienfang of the FastThink Institute predicts this collaboration will boost confidence in the local AI supply chain, particularly around Ascend chips and frameworks, potentially reshaping the AI computing market and reducing reliance on foreign hardware. Zepu’s stock has reportedly increased by over 80% following the announcement, reflecting investor enthusiasm. Huawei has also announced a full open-source release of its Ascend chip software ecosystem.
Google’s AI Ecosystem Expansion: Veo 3.1 & MedGemmma 1.5/MedASR
Google is strategically expanding its AI capabilities across multiple domains. Veo 3.1, its text-to-video model, has been upgraded with key features including reference image input, native vertical video support, 1080p/4K upscaling, and Synth ID watermarking. This update is being rolled out across Gemini, YouTube Create, Vertex AI, and YouTube Shorts, demonstrating a comprehensive distribution strategy. Improvements in identity consistency and granular control over backgrounds and textures signal a move towards a more controllable production tool.
Synth ID, Google’s digital watermark, is embedded in generated videos to promote transparency and content verification. Google positions Veo 3.1 as competitive on quality, format support, and trust/detection capabilities.
In healthcare, Google released MedGemmma 1.5, an open-source multimodal model available in 4B and 127B parameter sizes. MedGemmma 1.5 supports text, 2D images, 3D CT/MRI volumes, and whole slide pathology images. Benchmark improvements include:
- Disease related CT findings accuracy: 58% to 61%
- MRI disease findings accuracy: 51% to 65%
- Histopathology (Ruel score): 0.02 to 0.49 (matching a task-specific model)
- Chest X-ray anatomical localization (IoU): 3% to 38%
- Longitudinal chest X-ray comparison (Macro Accuracy): 61% to 66%
- Average accuracy across internal benchmarks: 59% to 62%
- Medical lab report extraction (Macro F1): 60% to 78%
MedGemmma also supports DICOM, the standard radiology file format, simplifying integration with existing systems. Alongside MedGemmma, Google released MedASR, a conformer-based medical speech recognition model. MedASR significantly outperforms Whisper Large V3 in medical dictation:
- Chest X-ray dictation WER: 12.5% (MedASR) vs. 5.2% (MedASR) – a 58% reduction in errors.
- General medical dictation WER: 5.2% (MedASR) vs. 28.2% (Whisper Large V3) – an 82% reduction in errors.
Rokid’s AI Glasses: Integrating AI into Everyday Wearables
Rokid unveiled its AI eyeglasses style at CES 2026, focusing on voice assistance and a camera rather than a built-in display. Weighing 38.5 grams, they are lighter than Meta’s Ray-Bans. The glasses support integration with multiple AI services (ChatGPT, DeepSeek, Qen, Google Maps, Microsoft Translation) and feature a 12MP Sony sensor for 4K video recording. Translation is available in 89 languages, and voice commands in 12. Interaction is enabled through voice, touch, AI shortcuts, and head gestures.
Rokid has partnered with Ant International to integrate Alipay Plus and Glass Pay for mobile payments via QR code scanning and biometric authentication. A microLED projection model with a display is also under development, having raised over $500,000 on Kickstarter with planned delivery in Spring 2026. The Rokid Eyeglasses Style will be available for $299 starting January 19, 2026, and offer prescription lens options.
Conclusion
This week showcased significant advancements across the AI landscape. China’s development of a fully domestic AI stack with Zepu and Huawei’s GLM Image signals a growing independence and competitiveness. Google is aggressively expanding its AI ecosystem through upgrades to Veo 3.1 and the release of powerful medical AI tools (MedGemmma 1.5 and MedASR). Finally, Rokid’s AI glasses demonstrate the increasing integration of AI into everyday wearable technology, including innovative payment solutions. These developments collectively point towards a rapidly evolving AI landscape with increasing accessibility, specialization, and strategic competition.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Zhipu Just Dropped Full Stack AI Model on China Chips: West Panics!". What would you like to know?