Back to all videos

API vs Subscriptions vs Local: I Measured Intelligence Per Dollar.

By Eduards Ruzga

LLM Pricing Models AI Hardware Performance AI Model Benchmarking

Share:

Key Concepts

Intelligence per Dollar: A metric calculated by multiplying token capacity by model quality (benchmarked via AI Index and LMSYS Chatbot Arena) and dividing by the cost of the service.
Token Efficiency: The ratio of input/output tokens required to solve a specific task, which varies significantly between models.
Subsidization: The practice where AI companies (OpenAI, Anthropic) offer consumer subscriptions at a loss to gain market share, often leading to fluctuating usage limits.
Local Inference: Running AI models on personal or enterprise hardware (e.g., MacBooks, NVIDIA GPUs) to bypass API costs and data privacy concerns.
Compute Ceiling: The physical limitation of available GPU/TPU infrastructure, forcing companies to tighten usage limits and reduce subsidies.

1. The Volatility of AI Pricing and Subscriptions

The AI landscape has shifted from "unlimited access" promises (e.g., OpenAI’s early $200 plan) to a highly restrictive, unpredictable environment.

Key Trend: Companies are frequently adjusting usage limits without clear communication.
Evidence: Anthropic’s "Claude Max" subscription saw limits tightened significantly within months of launch. Google’s "Gemini" free request quota was slashed from 250 to 20 in a single month.
Business Impact: Subscriptions are becoming less reliable. For example, GitHub Copilot is transitioning from flat-rate subscriptions to per-token pricing, signaling the end of the "unlimited" era.

2. Methodologies for Measuring Value

The speaker proposes a framework to compare disparate AI delivery methods (API, Local, Subscription) using a unified metric: Quality-Adjusted Tokens per Dollar.

API Methodology: Blends input/output token costs (default 75% input/25% output) and multiplies by a model’s intelligence score (0–100).
Local Hardware Methodology: Amortizes hardware costs over 36 months, calculates total token capacity based on tokens-per-second (TPS) telemetry, and adjusts for model quality.
Subscription Methodology: Uses CLI-based scripts to run standardized tasks in a loop to determine the "token-per-percentage" usage, then extrapolates to monthly capacity.

3. Comparative Analysis: Who Wins?

The speaker’s data (as of April/May 2026) reveals surprising winners:

| Category | Winner | Key Insight | | :--- | :--- | :--- | | API | GLM-4-Flash (Chinese) | Open-source models on APIs currently outperform Western commercial models in intelligence-per-dollar. | | Local | RTX 4090 + Qwen | High-end consumer GPUs provide ~2 million tokens per dollar, significantly cheaper than many cloud APIs. | | Subscription | ChatGPT Plus | Consumer plans are heavily subsidized; Business plans (e.g., ChatGPT Team) are often 3x more expensive per unit of intelligence. |

4. Key Arguments and Perspectives

Economies of Scale vs. Local Efficiency: While the speaker initially believed cloud infrastructure would always be cheaper due to scale, the data shows local models are only ~3x worse than the most heavily subsidized cloud plans, suggesting local inference is becoming a viable enterprise alternative.
The "China" Factor: While Chinese models (GLM, Qwen) currently dominate the "intelligence-per-dollar" rankings, the speaker warns of potential political bias and data privacy concerns. However, these models can be run on Western hardware (Azure, Bedrock) to mitigate data sovereignty risks.
The End of Subsidies: As companies hit a "compute ceiling" (evidenced by Anthropic and OpenAI buying compute from competitors like Google and xAI), the era of cheap, unlimited subscriptions is likely ending.

5. Notable Quotes

"Imagine your electricity provider promised infinite electricity for $200, then suddenly cut it off for 5 hours, and then banned your washing machine because it was too energy-hungry. That is the current state of AI subscriptions."
"If your task is simple enough for Haiku to handle, it allows you three times more intelligence per dollar than Opus."

6. Synthesis and Conclusion

The current AI market is in a state of flux where "intelligence-per-dollar" is the only reliable way to navigate the chaos.

Actionable Insight: For simple tasks, smaller, cheaper models (like Haiku or local open-source models) provide significantly better value than flagship models (Opus/GPT-5).
Strategic Recommendation: Businesses should avoid relying solely on consumer subscriptions, as these are subject to arbitrary limit changes. Instead, companies should build infrastructure that allows for "model-switching" based on the specific task requirements and current market pricing.
Future Outlook: The speaker expects the gap between subsidized cloud pricing and local/API open-source pricing to close as compute scarcity forces companies to stop subsidizing users.

Note: The speaker provides an open-source repository and tools for users to track these metrics and contribute their own telemetry data to improve the accuracy of these comparisons.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video