The Era of 1 Trillion Models Is Upon Us

By Prompt Engineering

AITechnologyBusiness
Share:

Key Concepts Ling (Bailing), One Trillion Parameter Model, Open-weight model, Mixture of Experts (MoE), Sparse Mixture of Experts, Token Efficiency, Evolutionary Chain of Thought (EvoCoT), Non-cognitive/Non-reasoning model, FP8 Mixed Precision Training, Ant Group, Inclusion AI, AGI Initiative, ARGI1 score, Coding benchmarks, Context Window, Zenmucks, Trolley Problem, Wolf, Goat, and Cabbage problem.

Introduction to the Ling Model

The video introduces Ling, a new one trillion parameter open-weight model developed by Inclusion AI, a research group under Ant Group's AGI initiative. Ant Group is closely associated with Alibaba. Ling marks a significant entry into the era of truly large language models.

Key Features and Claims:

  • Performance: Ling claims to outperform both open-weight and proprietary models on several key math and reasoning benchmarks, evidenced by an "incredible" ARGI1 score. It also achieves state-of-the-art or near state-of-the-art performance on various coding benchmarks.
  • Token Efficiency: A standout feature is its exceptional token efficiency. On the AM 2025 benchmark, Ling achieves state-of-the-art performance using almost 40% fewer tokens compared to Gemini 2.5 Pro.
  • Model Architecture: It is a Mixture of Experts (MoE) model with a total of 1 trillion parameters. However, it's a sparse Mixture of Experts, meaning only 50 billion parameters are activated per token.
  • Nature: Intriguingly, Ling is described as a "non-cognitive," "non-thinking," or "non-reasoning" model, which makes its high performance even more remarkable.
  • Context Window: It boasts a 128,000 token context window and can generate up to 32,000 tokens as maximum output.
  • Availability: The model weights are available on Hugging Face, and it can be tested on platforms like Zenmucks.

Architectural Details and Training Methodology

Ling's development is detailed in the paper "Towards greater leverage scaling laws of efficient mixture of expert language models."

Key Architectural and Training Aspects:

  • Scaling: The paper demonstrates the feasibility of scaling these models to a massive 1 trillion parameters.
  • Training Data: Ling was trained on approximately 20 trillion tokens of high-quality, high-reasoning density data.
  • Evolutionary Chain of Thought (EvoCoT): The reasoning capabilities, despite being a "non-reasoning" model, stem from "Evolutionary Chain of Thought." These are reasoning traces produced during training, applied both during mid-training and post-training, significantly contributing to its token efficiency.
  • Architecture: The architecture is noted to be very similar to DeepSeek, though it does not incorporate specific techniques like MLA introduced by DeepSeek.
  • FP8 Mixed Precision Training: Ling is the largest-scale known foundation model trained entirely using FP8 mixed precision training. While DeepSeek R1 also used FP8, Ling is a considerably larger model.

Practical Testing and Performance Evaluation

The presenter conducted several tests on the Zenmucks platform (where Ling is available as "inclusion AI link oneT" and offers $5 in credits upon signup).

Test Cases and Results:

  1. Self-Description:
    • Prompt: "Hey, can you tell me about yourself and who created you?"
    • Response: Ling identified itself as "Bailing," a large language model developed by Ant Group. It stated it lacks personal experiences or consciousness but can provide helpful, accurate, and context-aware responses. It also explained "Bailing" as a Chinese word reflecting values like inclusivity, technological agility, and precise intent understanding.
  2. 3D Visualization of Los Angeles (Coding):
    • Goal: Create an in-browser 3D visualization of LA with user-selectable spots and a fly-over effect.
    • V1: Produced a working map but lacked specific tourist points.
    • Iterated Version: Successfully highlighted points of interest (e.g., Downtown LA skyline, Griffith Observatory, Venice Beach Boardwalk) with zoom-in/out functionality, though the Venice Beach location was slightly off.
  3. Pokemon Encyclopedia (Website Creation/Coding):
    • Result: Generated a functional website with clickable cards displaying details and filtering capabilities, demonstrating strong coding prowess.
  4. Spinning Heptagon with 20 Balls (Overfitting Test/Coding):
    • Purpose: Test for overfitting on common benchmarks by providing a novel, specific coding challenge.
    • Initial Attempt: Produced a blank screen.
    • Follow-up: After a prompt indicating the blank screen, the model successfully fixed the code, displaying the spinning heptagon with 20 balls as instructed. It also added features like restarting the simulation and controlling the heptagon's rotation.
  5. Trolley Problem (Reasoning):
    • Scenario: A variation where five people are already dead, and one living person is tied up.
    • Initial Response: The model initially failed to capture that the five people were dead, giving a standard trolley problem response.
    • Profound Final Answer: Despite the initial misinterpretation, its concluding statement was remarkably profound: "Yes, I would pull the lever but I would do it with profound discomfort not because one life is worth less than five but because ethics forces us into tragic trade-offs where no option is clean. This dilemma reveals how our moral framework fractures under impossible scenarios." The presenter noted this as an unprecedented response from an LLM.
  6. Wolf, Goat, and Cabbage Problem (Misguided Attention/Reasoning):
    • Scenario: A variation where the user only wants to take the goat to the other side.
    • Result: The model failed by attempting to follow all the unnecessary steps of the classic problem instead of the single requested action. This failure case is common among most large reasoning models.

Synthesis and Conclusion

Ling represents a significant advancement in large language models, particularly as an open-weight, one trillion parameter model from Ant Group's Inclusion AI. Its exceptional token efficiency, achieved through techniques like Evolutionary Chain of Thought, and its training entirely in FP8 mixed precision are notable technical achievements. While it exhibits some common failure modes in specific reasoning tests, its ability to generate complex code, iterate on solutions, and produce profoundly ethical statements in challenging scenarios (like the Trolley Problem) highlights its impressive capabilities, especially for a model described as "non-cognitive." The model's availability and reasonable pricing on platforms like Zenmucks make it accessible for further exploration.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "The Era of 1 Trillion Models Is Upon Us". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video