Creative Strategies' Ben Bajarin talks the AI chip race between Alphabet and Nvidia

Key Concepts

TPUs (Tensor Processing Units): Google's specialized AI chips designed for AI tasks.
GPUs (Graphics Processing Units): General-purpose processors widely used for AI workloads, particularly by NVIDIA.
ASICs (Application-Specific Integrated Circuits): Custom-designed chips for specific functions, like Google's TPUs or AWS's Trainium and Inferentia.
AI Workloads: Computational tasks related to artificial intelligence, including training and inference.
Training: The process of teaching AI models by feeding them large datasets.
Inference: The process of using a trained AI model to make predictions or decisions.
GCP (Google Cloud Platform): Google's cloud computing service.
AWS (Amazon Web Services): Amazon's cloud computing service.
Multi-Cloud: Utilizing services from multiple cloud providers.
AI Middleware: Software layers that abstract away complexities and enable interoperability between different AI platforms and hardware.
Giga Cycle: A term used to describe a period of massive dollar Total Addressable Market (TAM) expansion in an industry.

Google's TPUs vs. NVIDIA's GPUs: A Competitive Landscape Analysis

This discussion explores the competitive dynamics between Google's specialized AI chips (TPUs) and NVIDIA's general-purpose GPUs, particularly in the context of potential deals with companies like Meta and the broader cloud computing market.

Google's TPU Strategy and Market Position

Primary Use Case: Google's TPUs are primarily designed and utilized for Google's own internal services, including YouTube, Search, and Gemini. Ben Bajarin, CEO of Creative Strategies, emphasizes that Google built these chips for their own specific needs.
Potential for Third-Party Adoption: While companies like Meta and Anthropic are exploring the use of TPUs for their AI needs, the scale of these needs is often specific to massive, customized AI tasks within a particular cloud environment (e.g., video recommendations on Reels).
Limited Competition with NVIDIA: Bajarin argues that Google's TPUs are not directly competing with NVIDIA in the broader chip market. The architectural compatibility and programmability of GPUs across various cloud environments make them more flexible for a wider range of third-party customers.
Google's Own Cloud Needs: It's important to note that Google Cloud Platform (GCP) itself utilizes and will continue to offer NVIDIA GPUs to its customers, indicating that Google's strategy isn't solely reliant on its own TPUs for all AI acceleration needs.

NVIDIA's Dominance and GPU Advantages

General-Purpose Nature: NVIDIA's GPUs are highly programmable and flexible, making them suitable for a vast array of AI workloads across different clouds and applications.
Largest Install Base: NVIDIA possesses the largest installed base of GPUs, which is a significant advantage for software developers who can leverage existing tools and optimizations.
Metaphorical Comparison: Bajarin uses a metaphor to illustrate the difference: Google supplying a specialized beef cow for McDonald's Quarter Pounder (TPUs for specific, large-scale needs) versus NVIDIA selling a standard type of beef that steak houses and other businesses can use for various purposes (GPUs for general-purpose AI).
Third-Party Adoption: The adoption of specialized ASICs like TPUs by third parties in public cloud environments is not currently a significant trend, with NVIDIA GPUs remaining the preferred choice.

Cloud Providers' Custom ASIC Strategies (AWS Example)

AWS Trainium and Inferentia: The discussion extends to Amazon Web Services (AWS) and its custom chips, Trainium and Inferentia. Inferentia is noted as being more relevant for inference workloads, which are increasing as AI models mature.
Optimization of Own Workloads: The success of these custom ASICs is largely dependent on how much of their own workloads cloud providers can optimize to run on them.
Limited Third-Party Customer Acquisition: Similar to Google's TPUs, AWS's custom chips have faced challenges in acquiring a significant number of third-party customers at scale.
Market Share Impact: While these custom chips can lead to cost savings for cloud providers by optimizing their internal operations, their impact on taking market share away from NVIDIA is still uncertain.

The "Giga Cycle" and Market Opportunity

Massive TAM Expansion: Bajarin describes the current industry as a "giga cycle," representing one of the largest dollar Total Addressable Market (TAM) expansions ever seen.
Significant Market Size: Projections of $700 billion to $1 trillion in market size by 2030 highlight the immense opportunity.
NVIDIA's Current Dominance: Despite the emergence of custom chips, NVIDIA is expected to capture the vast majority of this market in the near term.

The Need for AI Middleware and Multi-Cloud Interoperability

VMware-like Play: The conversation raises the question of whether an "and VMware type play" will emerge to facilitate interoperability across different clouds and AI hardware.
AI Middleware Layer: There is a recognized need for an AI middleware layer that allows enterprises to benefit from the efficiency of specific cloud hardware without the burden of extensive software re-engineering for each platform.
Enterprise Multi-Cloud Deployments: Enterprises are increasingly adopting multi-cloud strategies, meaning they will want to deploy workloads across various cloud environments without custom tuning each one.
Standardization and Optimization: The ideal scenario involves a standard programming language or optimization set that can be applied across different clouds, offering flexibility and reducing development overhead. Cloud vendors are actively working towards optimizing their stacks to achieve this.

Conclusion and Key Takeaways

The current AI chip landscape is characterized by NVIDIA's strong position with its versatile GPUs, while specialized ASICs from Google (TPUs) and AWS (Trainium, Inferentia) are primarily focused on optimizing their own internal workloads. While these custom chips offer potential cost savings for cloud providers, they have not yet significantly challenged NVIDIA's dominance in the broader third-party market. The industry is experiencing a massive growth phase ("giga cycle"), and the future may see the emergence of AI middleware solutions to facilitate seamless multi-cloud AI deployments and abstract away hardware complexities for enterprises. The key challenge for custom ASIC providers remains acquiring third-party customers at scale.