The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten

Key Concepts

AI adoption in the enterprise
Vertical vs. horizontal AI solutions
Closed vs. open-source models
Inference infrastructure
Model optimization (latency, throughput)
Enterprise AI adoption journey
Production inference challenges (the "dragons")
Build vs. buy decision

Why Enterprise AI Adoption Matters

The speaker, Amir, CTO of B10, emphasizes the importance of AI adoption in enterprises. While AI hype exists, the slow adoption by enterprises is a concern because enterprises possess massive reach and resources. Their slow adoption could hinder the realization of AI's full potential. Amir's perspective is informed by his experience selling horizontal AI tooling to over 100 enterprises, ranging from software companies to Fortune 50 corporations. He argues that the true value of AI is unlocked when enterprises build with AI themselves, rather than solely relying on verticalized AI solutions (e.g., AI for sales, AI for marketing). He draws an analogy to the 2000s, suggesting that the tech industry's growth wouldn't have been as significant if enterprises had only purchased verticalized products like Salesforce.

The Enterprise AI Adoption Journey

Enterprises typically begin their AI journey with OpenAI and Anthropic models, deploying them on Azure or AWS for security and privacy reasons. They transition their existing predictive ML teams into AI teams to build on top of these models. Initially, enterprises are content with this approach due to its ease of use and API-based nature. However, cracks begin to emerge in this assumption of indefinitely building on closed frontier models.

Evolution Over Time

2023: AI was seen as a "toy" for engineers to experiment with. A CIO of a large insurance company mentioned dedicating OpenAI deployments for engineers to "toy around with."
2024: Actual production use cases built on closed models emerged. Approximately 40-50% of the enterprises Amir spoke with had something in production.
2025: Cracks in the assumption of relying solely on closed frontier models became apparent.

Cracks in the Closed Model Assumption

Amir identifies several reasons why enterprises are moving away from solely relying on closed models:

Quality: While enterprises don't aim to surpass OpenAI in general model capabilities, they recognize opportunities to outperform frontier models for specific use cases.
- Example: Health plans extracting medical document information (CPT codes, diagnosis codes, prescriptions). They found that fine-tuning models on their labeled data yielded better results than using generic APIs.
- Example: Improving transcription models to understand medical jargon in the healthcare space.
Latency: Frontier models are optimized for high throughput and QPS, often at the expense of latency. Latency becomes critical in applications like AI voices and AI phone calls, where "time to first talk" and "time to first sentence" are crucial.
Unit Economics: The initial belief that plummeting token prices would solve cost concerns was challenged by the rise of agentic use cases. Each user action can trigger up to 50 inference calls, leading to ballooning costs. Enterprises seek to control costs and demonstrate ROI by running models themselves, paying for compute instead of per-token pricing.
Destiny: CIOs and CTOs are questioning their competitive advantage if they solely rely on the same frontier models as their competitors. They are considering bringing AI capabilities in-house to differentiate themselves at the AI level, not just at the workflow or application level.

Misconceptions

Amir clarifies that certain factors are not primary drivers of this shift:

Vendor Lock-in: The availability of multiple providers (OpenAI, Anthropic, Google) and a degree of interoperability mitigate vendor lock-in concerns.
Ballooning Cost (Initially): The initial drop in price per token led to the belief that cost would take care of itself.
Compliance, Privacy, Security: Frontier model companies and cloud providers address these concerns through dedicated deployments within existing VPCs.

The Dragons: Challenges of Building Inference Infrastructure

When enterprises adopt open-source models, they transition from a simple API-based world to one requiring them to build and manage their own inference infrastructure. Amir identifies several challenges, which he calls "the dragons":

Performance Layer: Optimizing models for latency is complex, requiring both model-level and infrastructure-level optimizations.
- Model-Level: Techniques like speculative decoding (using good draft models, Medusa heads, Eagle 3, MTP) are crucial. Staying on top of research and implementing these techniques requires specialized expertise.
- Infrastructure-Level: Prefix caching and disaggregated serving become important, especially in agentic use cases with massive but similar prompts. These optimizations impact "time to first token" and P99 latency.
Reliability: Achieving four nines of availability for mission-critical inference is difficult. Hardware failures and VLM/Triton crashes can cause tail latencies to spike. Building resilience requires careful planning and potentially over-provisioning, which impacts unit economics.
Scalability: Rapidly scaling up during traffic bursts is essential. One enterprise reported an eight-minute delay to bring up a new replica of the same model, which is unacceptable.
Engineering Velocity: Tooling, lifecycle management, and observability are critical for enabling engineers to move quickly. Observability is more complex than simply adding logs and metrics.
Controls and Audits: Enterprises require robust controls and audit capabilities.

Build vs. Buy Decision

Enterprises face a "build vs. buy" decision when confronted with these challenges. Amir advocates for buying a pre-built inference infrastructure platform to avoid the complexities and costs of building it in-house.

Conclusion

The adoption of AI in the enterprise is evolving. While closed models offer a convenient starting point, enterprises are increasingly recognizing the need to build with AI themselves using open-source models to achieve better quality, latency, unit economics, and competitive differentiation. However, building and managing inference infrastructure presents significant challenges ("the dragons"), leading to a "build vs. buy" decision. Amir suggests that enterprises should consider buying a pre-built platform to overcome these challenges and accelerate their AI adoption journey.