From Cloud Native to AI Native: Where Are We Going?
By The New Stack
Key Concepts
- Cloud Native Computing Foundation (CNCF): A neutral home for collaboration on critical cloud-native technologies like Kubernetes, Prometheus, and Envoy.
- AI-Driven Development: Leveraging AI to enhance software development processes, code analysis, and delivery of resilient software.
- Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications, now a leader in AI infrastructure.
- WebAssembly (Wasm): A binary instruction format for a stack-based virtual machine, enabling high-performance execution of code from various languages in web browsers and beyond.
- Serverless WebAssembly: Utilizing WebAssembly for serverless functions, offering fast execution and efficient resource utilization.
- MCP Servers: A specific use case where WebAssembly is being used to write and execute Message Control Protocol (MCP) servers, enabling polyglot development and global distribution.
- Observability: The ability to understand the internal state of a system by examining its outputs, crucial for monitoring and debugging AI systems.
- OpenTelemetry: An open-source observability framework for instrumenting applications and collecting telemetry data.
- Guardrails: Mechanisms to ensure the safe and controlled execution of AI systems, especially in agentic deployments, to prevent malicious actions or unintended consequences.
- Agentic Systems: AI systems designed to act autonomously to achieve specific goals, often involving interaction with tools and resources.
- Inference: The process of using a trained machine learning model to make predictions or generate outputs based on new data.
- Inference Layer: The part of the AI stack responsible for serving trained models for use.
- Small Language Models (SLMs): Smaller, more specialized language models that can offer efficiency and accuracy for specific tasks.
- Hyperpersonalization: Tailoring user experiences in real-time based on individual data and context.
- Infrastructure as Code (IaC): Managing and provisioning infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
- AI Native: Companies or technologies designed from the ground up with AI at their core.
- WASI (WebAssembly System Interface): An interface that allows WebAssembly modules to interact with the host system's resources.
- Component Model: A WebAssembly feature that enables modularity and interoperability between different WebAssembly components.
Main Topics and Key Points
The Evolving Role of AI in Cloud Native
- Kubernetes as an AI Leader: Kubernetes has emerged as a significant platform for AI, facilitating the deployment and management of AI workloads. This is a fascinating development, as many may not have predicted this leadership role years ago.
- Inflection Point for AI: The current era is characterized as an inflection point where AI is taking a more prominent role, moving beyond niche applications to broader adoption.
- Beyond Chatbots: While ChatGPT brought AI to mainstream awareness, the future of AI native companies will extend beyond conversational interfaces to encompass various applications and agents.
- Focus on Inference: A key area of focus is "inference," the process of serving trained models to answer questions or make predictions. This layer is crucial for realizing AI productivity but has sometimes been overshadowed by the excitement around conversational AI.
- AI as Another Application Layer: From an observability perspective, AI is viewed as another layer within the application stack, requiring similar principles of monitoring and management.
WebAssembly's Contribution to AI and Edge Computing
- Edge AI and Inference: WebAssembly is well-suited for edge AI scenarios, enabling quick inference execution at the network edge (CDNs) and farther edges (IoT, smaller servers).
- Efficiency and Security: WebAssembly offers efficiency by allowing for smaller footprints and provides security through sandboxing, making it ideal for deploying models and AI compute on resource-constrained devices.
- MCP Servers in WebAssembly: A thriving use case for WebAssembly is writing MCP servers. This allows developers to use their preferred languages (Python, Rust, Go, TypeScript, JavaScript) to compile to WebAssembly bytecode, creating highly typed binaries that can run anywhere.
- Global Distribution and Developer Experience: Platforms like Firmian WS and functions enable global distribution of WebAssembly-based MCP servers, offering fast execution, security, and an easier developer model.
Observability and Guardrails in AI Systems
- Data and Observability: The rise of AI, particularly LLMs, adds another layer of data complexity. Observability is essential for understanding the performance and behavior of AI systems, especially when dealing with vast amounts of data.
- Shift in User Data Focus: Observability has shifted from focusing on data locality to understanding users as individuals and their information needs.
- Challenges of Complexity: AI introduces new complexities, and observability helps in managing these challenges and simplifying operations.
- Importance of Guardrails: Guardrails are critical for agentic systems, especially when models are given access to code execution (e.g., Python REPL). They are necessary to prevent malicious actions, such as unauthorized data access or execution of harmful code, when processing untrusted data.
- Trusty AI and Architectural Considerations: Projects like Red Hat's Trusty AI aim to provide a common set of rules and integrations for agentic systems, highlighting that guardrails are an architectural consideration rather than a single tool solution.
Infrastructure and Efficiency in the AI Era
- The Need for Infrastructure Orchestration: All AI components need to run somewhere, requiring robust infrastructure orchestration.
- Awareness of Deep Infrastructure: Unlike traditional cloud-native abstractions, GPU-bound AI workloads necessitate awareness of deep infrastructure details like PCIe buses, InfiniBand, and network performance.
- Efficiency as a Key Challenge: Managing infrastructure efficiently, especially with power constraints and the rapid growth of data centers, is a significant challenge.
- Software's Role in Efficiency: Software plays a crucial role in improving efficiency at a much faster pace than hardware development. Innovations in inference engines (e.g., VLM, LLMD) are yielding significant performance gains.
- Optimizing Every Layer of the Stack: Efficiency gains are needed across all layers of the software stack, from Kubernetes managing infrastructure to GPU operators and inference layers.
- Kubernetes for Infrastructure Management: Kubernetes is evolving beyond container orchestration to manage infrastructure, enabling layered optimization from the OS drivers to inference.
- Power Constraints and Hardware Demand: There are significant power constraints, and hardware demand is high, with companies having years of orders. This emphasizes the need for software-driven efficiency.
The AI Revolution and Future Outlook
- AI as a New Revolution: The current AI landscape is compared to past revolutions (manufacturing, industrial) that fundamentally changed productivity and efficiency at scale.
- Uncertainty of the Future: The exact trajectory of AI and generative intelligence over the next 3-5 years is still uncertain, indicating that we are in the early stages.
- Realizing Use Cases: A key to moving forward is identifying specific use cases where AI is applicable and avoiding attempts to force-fit it into every scenario.
- AI Native Companies: Companies like Uber, with their extensive use of AI for real-time model training and deployment across hundreds of cities, are presented as examples of AI-native organizations.
- Every Company as an Algorithm Company: The trend suggests that in the next decade, every company will become an algorithm company, similar to how software became ubiquitous.
Use Cases and Model Specialization
- Hyperpersonalization: A significant use case for WebAssembly customers involves hyperpersonalization on websites, where content is dynamically adjusted based on user context and preferences.
- Edge AI in Industries: AI is already prevalent in industries like oil fields for machine health monitoring and sensor data analysis, often using traditional ML.
- Generative AI's Ripple Effects: Even when generative AI isn't directly applicable, its advancements (e.g., in model registries and attestation) have positive ripple effects on traditional ML operations.
- Contextual AI Application: The key to successful AI implementation is applying it in the right context. LLMs are good for general knowledge and human interaction, while traditional ML is suitable for specialized intelligence and direct data streams.
- Small Language Models for Specificity: SLMs, when fine-tuned on specialized datasets and integrated into agentic systems, can offer increased accuracy and efficiency by narrowing their scope to specific use cases and domains.
- Avoiding Hallucinations: Narrowly trained models are less prone to hallucinations.
- Deterministic Workflows for Models: Traditional software is deterministic, while models are typically non-deterministic. By placing models within deterministic workflows and frameworks, their outputs can be validated, and safety nets can be implemented.
Open Infrastructure and Community Evolution
- Open Infrastructure and Small Models: The emergence of small language models is significant, especially as many large language models have closed APIs.
- Evolution of Open Source: The discussion draws parallels to the evolution of OpenStack and the rise of cloud-native with Kubernetes, questioning the future open-source story for AI.
- Breaking Down AI Categories: To understand community development, AI needs to be broken down into categories like training, inference, and applications/agents.
- Robust Open Source for Training: The training aspect of AI has a strong open-source ecosystem, with projects like PyTorch dominating model repositories.
- Emergence of Layered Organizations: Different organizations are expected to emerge at various layers of the AI stack, similar to how database experts contribute to specific database technologies.
- CNCF's Role in Inference: The CNCF is focused on inference, which requires strong capabilities in deploying, scaling, observing, and securing complex systems – core tenets of cloud-native.
- Leveraging Experience: The combined experience of open infrastructure and CNCF communities (nearly 30 years) in running infrastructure systems and governing open-source projects is crucial for the future of AI.
- Massive Scale Infrastructure: The AI world is seeing unprecedented infrastructure growth, with thousands of GPUs deployed weekly and plans for data centers with 100,000 GPUs.
- Cloud Native as an Enabler: Cloud-native principles and technologies are seen as critical enablers for the current AI revolution, providing the foundation for scaling, managing hardware, and efficient application deployment.
Proprietary Technologies and Interface Definition
- Hindrances of Proprietary Technologies: Proprietary technologies, especially in hardware like GPUs, can create hindrances.
- WebAssembly for Interface Definition: WebAssembly, particularly through WASI and the component model, is effective at defining interfaces that sit on top of proprietary technologies, democratizing access.
- Defining Interfaces for Diverse Ecosystems: Communities are defining interfaces for various functionalities (key-value stores, WNN, inferencing) to enable interoperability and access to diverse ecosystems.
The Human Element and Practical Application
- People and Process Challenges: Beyond technological innovation, challenges remain in people and process management within the AI domain.
- Simplifying for Infrastructure and Developers: The goal is to simplify the lives of infrastructure teams and developers, extending efficiency to their scale.
- Context is Key: Applying AI, LLMs, SLMs, and deep learning in the correct context is paramount.
- Avoiding Unnecessary AI: Sometimes, AI is not necessary, and simpler solutions can be more efficient. The human tendency to experiment with new technologies can lead to over-application.
- Cloud Native Foundation: The advancements made in the cloud-native world over the past 10-15 years have enabled the complex AI capabilities we see today.
- Contextual and Valid AI: AI advancements are contextual and valid to the specific point in time and situation.
Important Examples, Case Studies, and Real-World Applications
- Hyperpersonalization on Retail Sites: A customer example where WebAssembly is used to dynamically adjust website content (e.g., shoe recommendations based on location and weather) for a personalized experience.
- Oil Field AI: Traditional machine learning is used in oil fields for machine health monitoring and sensor data analysis.
- Uber's AI Practices: Uber's blog post on "AI Native" is cited as a prime example of an AI-native company, performing 20,000 training runs per month and running 5,000 models in production to adapt to hundreds of cities globally.
- Microsoft's Project for MCP Servers: Mentioned as a project enabling the creation of MCP servers for VS Code.
- Faster Tools' WAMC CP: A product for creating and deploying MCP servers.
- Firmian WS and Functions: A platform for distributing MCP servers globally to 24 regions.
- LLMD (Red Hat): A project using caching and prefill techniques for efficiency in AI workloads.
- VLM (Very Large Model): An inference engine that has shown significant performance gains (6-8x efficiency) in the last year.
Step-by-Step Processes, Methodologies, or Frameworks
- WebAssembly for MCP Servers:
- Choose a preferred language (Python, Rust, Go, TypeScript, JavaScript) that compiles to WebAssembly.
- Compile the code to WebAssembly bytecode.
- Create an MCP server interface using WebAssembly.
- Package the model with the WebAssembly component.
- Deploy to hardware with GPU for direct inferencing.
- Leverage sandboxing for security.
- Guardrail Implementation in Agentic Systems:
- Identify potential attack vectors (e.g., Python REPL access, untrusted data sources).
- Implement mechanisms to scan and validate incoming data for malicious content.
- Inspect and observe interactions between the model, MCP servers, tools, and resources.
- Apply rules and controls to ensure safe execution and prevent unintended consequences.
- Utilize architectural considerations and tools like Trusty AI.
- Optimizing for Configurations (IaC):
- Define infrastructure layers (network, drivers, etc.) using APIs.
- Adopt Infrastructure as Code (IaC) principles to define stack levels in repositories.
- Enforce state reconciliation to match declared intent.
- Implement inference-time scaling with variable model precision based on task complexity.
Key Arguments or Perspectives Presented
- Jonathan Bryce: Argues that inference is a critical, often overlooked, layer in AI development. He emphasizes the need to break down AI into categories (training, inference, apps/agents) for community development and highlights the importance of cloud-native principles for deploying and scaling complex AI systems.
- Kate Goldenring: Advocates for WebAssembly's role in enabling efficient, secure, and polyglot development for AI, particularly at the edge and for use cases like MCP servers and hyperpersonalization.
- James Harson: Stresses the critical need for guardrails in agentic AI systems due to their numerous attack vectors. He also points out that while generative AI is exciting, traditional ML remains valuable, and advancements in GenAI benefit other AI areas.
- Shauna Amara: Highlights that AI workloads require a deep understanding of underlying infrastructure, moving beyond cloud-native abstractions. She emphasizes efficiency in infrastructure management due to power constraints and the rapid growth of data centers.
- Sean Odell: Views AI as another application layer from an observability perspective, emphasizing the need for robust data collection and analysis to understand AI system behavior. He also points out that people and process challenges are as significant as technological ones.
- Alex Williams: Frames the current AI landscape as a new revolution, comparable to past industrial revolutions, and suggests that every company will eventually become an algorithm company.
Notable Quotes or Significant Statements
- Jonathan Bryce: "I'm obsessed with inference."
- Jonathan Bryce: "We're only beginning [in AI]."
- Kate Goldenring: "Web assembly provides the opportunity to do that by sandboxing those executions as well."
- James Harson: "Guard rails are critical when we start talking about these agentic systems."
- Shauna Amara: "One of the key things we keep forgetting about all of this, the stuff has to run somewhere."
- Shauna Amara: "We have a power problem. We're running out of power to run these data centers."
- Jonathan Bryce: "Software can get better and software can become more efficient and software can improve at a much more rapid pace than building a nuclear power plant."
- James Harson: "I don't think that AI native or whatever the future of our AI infrastructure and AI capabilities are could have theoretically existed without cloud native."
- Sean Odell: "What we still have a problem with people in process."
- Kate Goldenring: "Context is the word native."
- Jonathan Bryce: "Every company in the world is going to be an algorithm company just like you know we've talked about for the last 15 years you know software ate the world and every company became a software company I think every company is going to become an algorithm company over the next 10 years."
Technical Terms, Concepts, or Specialized Vocabulary
- Inference: The process of using a trained machine learning model to make predictions or generate outputs.
- MLOps: Machine Learning Operations, a set of practices that combines machine learning, DevOps, and data engineering to deploy and maintain ML systems in production.
- WebAssembly (Wasm): A binary instruction format for a stack-based virtual machine, enabling high-performance execution of code from various languages.
- WASI (WebAssembly System Interface): An interface that allows WebAssembly modules to interact with the host system's resources.
- MCP Servers: Message Control Protocol servers, used for communication and control in distributed systems.
- Polyglot: Supporting or written in multiple programming languages.
- Bytecode: An intermediate representation of code that can be executed by a virtual machine.
- Sandboxing: A security mechanism that isolates a process or application from the rest of the system.
- Observability: The ability to understand the internal state of a system by examining its outputs.
- OpenTelemetry: An open-source observability framework.
- Guardrails: Safety mechanisms to control AI system behavior.
- Agentic Systems: AI systems designed to act autonomously.
- Python REPL (Read-Eval-Print Loop): An interactive environment for executing Python code.
- PCIe Bus: Peripheral Component Interconnect Express, a high-speed serial computer expansion bus standard.
- InfiniBand: A high-performance interconnect for servers and storage systems.
- Infrastructure as Code (IaC): Managing infrastructure through code.
- LLM (Large Language Model): A type of AI model trained on vast amounts of text data.
- SLM (Small Language Model): A smaller, more specialized language model.
- Hallucinations (AI): When an AI model generates false or nonsensical information.
- Deterministic: A system whose output is entirely determined by its initial state and inputs.
- Non-deterministic: A system whose output can vary even with the same initial state and inputs.
Logical Connections Between Different Sections and Ideas
The discussion flows logically from the broad impact of AI on the cloud-native world to specific technical solutions and challenges.
- Introduction to AI's Growing Role: The conversation begins by establishing the significance of AI in the cloud-native landscape, highlighting Kubernetes's leadership and the current "inflection point."
- Focus on Inference: This leads to a deeper dive into "inference" as a core component of AI productivity, setting the stage for discussions on how to achieve it efficiently.
- WebAssembly as a Solution: WebAssembly is presented as a technology that addresses efficiency and security needs for AI, particularly at the edge and for specific applications like MCP servers. This connects the need for efficient inference with a practical implementation.
- Observability and Safety: The discussion then shifts to the operational aspects of AI, emphasizing the importance of observability for understanding complex AI systems and the critical need for guardrails to ensure safety and prevent misuse. This links the deployment of AI with its responsible management.
- Infrastructure and Efficiency: The conversation moves to the foundational layer, highlighting that AI workloads require robust infrastructure and that efficiency is paramount due to power constraints and hardware demand. This connects the software solutions with the underlying hardware and operational realities.
- AI Revolution and Future Vision: The broader implications of AI are explored, framing it as a revolution and envisioning a future where companies are inherently "algorithm companies." This provides a forward-looking perspective.
- Use Cases and Specialization: Specific use cases like hyperpersonalization and industrial AI are discussed, leading to the argument for specializing AI models (e.g., SLMs) for particular tasks to improve accuracy and efficiency. This bridges the gap between broad AI concepts and practical application.
- Open Infrastructure and Community: The discussion turns to the open-source ecosystem, exploring how communities can evolve to support AI, drawing parallels to past open-source movements and emphasizing the role of CNCF in inference. This connects technological advancements with community collaboration.
- Proprietary Technologies and Interfaces: The challenges posed by proprietary hardware are addressed, with WebAssembly's interface definition capabilities presented as a solution for interoperability.
- Human Factors and Context: The conversation concludes by emphasizing the importance of human factors, process, and applying AI in the correct context, acknowledging that while experimentation is human, practical application and efficiency are key. This brings the discussion back to the human element and the practical realities of AI adoption.
Data, Research Findings, or Statistics Mentioned
- WebAssembly Adoption: Mention of customers using Firmian WS and functions for global distribution of MCP servers to 24 regions.
- MCP Server Spec Release: The MCP server spec came out in November 2024.
- VLM Efficiency Gains: VLM has achieved 6-8x efficiency gains in the last year.
- Hardware Orders: Companies in the hardware space have 5 years of orders in.
- GPU Deployment: Customers are deploying thousands of GPUs a week, with plans for 100,000 GPUs in a single data center.
- Uber Training Runs: Uber performs 20,000 training runs per month.
- Uber Production Models: Uber runs 5,000 models in production.
- PyTorch Market Share: PyTorch has 80%+ market share on Hugging Face.
- CNCF/Open Infra Experience: Combined nearly 30 years of experience in running infrastructure systems and governing open-source communities.
Clear Section Headings
The summary is structured with clear section headings as requested.
Brief Synthesis/Conclusion of the Main Takeaways
The discussion highlights that AI is fundamentally reshaping the cloud-native landscape, moving beyond initial hype to practical applications and infrastructure demands. Key takeaways include the critical role of inference and the need for efficiency across all software and infrastructure layers. WebAssembly emerges as a powerful tool for enabling efficient, secure, and polyglot AI development, especially at the edge. Observability and robust guardrails are essential for managing the complexity and safety of AI systems, particularly agentic ones. The future of AI is seen as a revolution requiring a deep understanding of infrastructure, a focus on specialized use cases and model optimization (including SLMs), and a continued reliance on open-source principles and community collaboration. Ultimately, successful AI adoption hinges on applying these technologies in the right context, balancing innovation with practical needs, and addressing both technological and human/process challenges. The cloud-native paradigm has laid a crucial foundation for this AI revolution, enabling the scale and management required for its widespread impact.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "From Cloud Native to AI Native: Where Are We Going?". What would you like to know?