GPT-4.5 shocks the world with its lack of intelligence...
By Fireship
Key Concepts:
- GPT 4.5: OpenAI's latest large language model, positioned as having improved "vibes" and lower hallucination rates.
- AI Hype Cycle: The cycle of inflated expectations followed by disillusionment in AI development.
- Technological Singularity: A hypothetical point in time when technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization.
- Vibes Benchmark: A new, subjective metric introduced by OpenAI to measure creative thinking and the "feel" of interactions with AI models.
- Hallucination Rate: The frequency with which an AI model generates incorrect or nonsensical information.
- AER Polyglot Coding Benchmark: A benchmark used to evaluate the coding capabilities of AI models across multiple programming languages.
- Gro: xAI's large language model, currently considered the best model in the world by betting markets.
- Pre-training: The initial training phase of a large language model, where it learns from a massive dataset of text and code.
- Parameters: The variables that a machine learning model uses to make predictions.
- Compute: The amount of computational resources used to train a machine learning model.
- Sigmoid of Sorrow: A term used to describe the shape of the AI hype cycle, where initial excitement is followed by a period of disappointment.
GPT 4.5: An Underwhelming Release
The video argues that the release of GPT 4.5 has dampened the AI hype, as it doesn't significantly outperform previous models in terms of benchmarks, awards, or novel capabilities. Its primary selling point is improved "vibes," which is a subjective measure of its ability to chat in a more natural, human-like way. The presenter notes the lack of enthusiasm from Sam Altman, who didn't attend the launch, further suggesting a lack of confidence in the model.
Performance and Cost Analysis
GPT 4.5 is presented as an extremely expensive model. The cost is $75 per million input tokens and $150 per million output tokens. It is currently only available to $200/month Pro users. While it seems to emit "chill vibes," this is subjective. The presenter tested GPT 4.5 and found that it still makes silly mistakes, is not self-aware, and has no idea what GPT 4.5 even is. It was able to tell the presenter how many R's are in Strawberry, but gave the wrong number of L's in laap paloa. The presenter did not test programming and science because it is already known that it will not perform as well as deep thinking models like 03. On the AER Polyglot coding Benchmark, it performs worse than DeepSeek and is hundreds of times more expensive.
Benchmarking and Comparisons
The video compares GPT 4.5 to other models, including Claude and Gro. Claude is mentioned as being significantly cheaper at $15 per million tokens. Gro, xAI's model, is currently considered the best in the world according to betting markets. The presenter notes that OpenAI's odds of having the best model by the end of 2025 are declining.
The Limits of Pre-training and Scaling
The presenter suggests that OpenAI may have reached the limits of pre-training in generative pre-trained Transformers. Despite scaling up the number of parameters and compute, GPT 4.5 doesn't show significant improvement over its predecessors. The presenter theorizes that OpenAI failed to train GPT 5 with any significant improvement. Sam Altman described GPT 5 as being more like a router that automatically chooses the best model based on your prompt.
Implications for the Future
The video suggests that the plateau in AI development is good news for computer science students. AI coding tools are useful for real human programmers who know what they are doing.
Brilliant.org Sponsorship
The video is sponsored by Brilliant.org, a platform that provides interactive lessons on deep learning, math, and computer science. The presenter recommends starting with Python and checking out their "How Large Language Models Work" course. Viewers can try Brilliant for free for 30 days by visiting brilliant.org/fireship or using the QR code on screen.
Conclusion
The video concludes that GPT 4.5 is an underwhelming release that signals a potential plateau in AI development. While AI tools are still valuable, they are most effective when used by skilled human programmers. The presenter expresses disappointment that artificial super intelligence has not yet arrived.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "GPT-4.5 shocks the world with its lack of intelligence...". What would you like to know?