2026 Will Be the Year of Agentic Workloads in Production on Amazon EKS

By The New Stack

Share:

Key Concepts

  • Amazon Web Services (AWS): A cloud computing platform offering a wide range of services.
  • Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.
  • Amazon Elastic Kubernetes Service (EKS): AWS's managed Kubernetes service.
  • Amazon Elastic Container Registry (ECR): AWS's managed Docker container registry.
  • Open Container Initiative (OCI): A specification for container image formats and runtimes.
  • Agentic AI: AI agents that take proactive tasks, make decisions, and pursue goals by interacting with tools and extensions.
  • Large Language Models (LLMs): AI models trained on vast amounts of text data, capable of understanding and generating human-like text.
  • Model-Centric Platform (MCP) Server: A tool that helps LLMs interact with specific services or platforms, like EKS.
  • AWS Labs: A division within AWS that experiments with new technologies and gathers feedback.
  • Knowledge Base: A repository of information, including documentation, support cases, and internal logs, used to inform AI agents.
  • AWS Bedrock: A service that offers access to various foundation models from leading AI companies.
  • Strands SDK: An AWS open-sourced SDK for building AI agents.
  • Cedar: An open-sourced framework for fine-grained conditional authorization in Kubernetes.
  • Kubernetes Enhancement Proposals (KEPs): Proposals for new features or changes to Kubernetes.
  • VLM (Virtual Language Model): A component that can handle batching and optimize GPU utilization for LLMs.
  • CubeFlow: An open-source platform for machine learning on Kubernetes.
  • K Gateway: A component within CubeFlow for managing agent gateways.
  • CRD (Custom Resource Definition): A way to extend Kubernetes with custom objects.
  • QuickSight: AWS's business intelligence service.
  • Quick suite: An internal AWS tool that aggregates information from various sources for LLM optimization.

EKS and ECR: AWS's Cloud Offerings for Open Source

Mike Stefaniac, Senior Manager of Product Management at AWS, discusses AWS's role in supporting open-source software in the cloud. He manages product for Amazon Elastic Kubernetes Service (EKS), AWS's managed Kubernetes service launched in 2018, and Amazon Elastic Container Registry (ECR). ECR is described as an "unsung hero" handling billions of image pulls daily, built on the Open Container Initiative (OCI) spec, making it a managed open-source product. EKS has evolved from a simple managed Kubernetes control plane to a more comprehensive service that handles significant "heavy lifting" within clusters, including add-ons and worker nodes.

Evolution of EKS Usage and AI Integration

The way users interact with EKS has shifted significantly, especially with the rise of AI. Initially, EKS targeted early adopters who preferred managing most aspects themselves. Now, AWS is adapting EKS to be more user-friendly for a broader audience, including the "late majority" and "laggards," who may not have extensive platform teams and prefer a less hands-on approach.

This evolution includes making services more compatible with agentic tools. The goal is for LLMs to understand how to interact with EKS, enabling them to run simple Kubernetes workloads.

Model-Centric Platform (MCP) Server and Knowledge Base

AWS has open-sourced an MCP server for EKS, initially as an AWS Labs project to gather feedback. Interesting feedback highlighted that the names of tools within the MCP server are crucial. LLMs can often figure out basic API calls themselves. However, more complex tasks like troubleshooting runbooks or deploying CloudFormation stacks are more valuable additions to the MCP server than simple "create cluster, deploy pod" commands. The MCP server is evolving to be more enterprise-ready and abstracting complexity.

A key component is the hosted knowledge base, which has been highly valuable to users. This knowledge base is kept up-to-date with new launches, features, and documentation, ensuring that AI recommendations are based on current best practices. It can incorporate public knowledge, thousands of support cases detailing various failure modes (both AWS-side and customer-induced), internal logs, and metrics not typically exposed to customers. Bundling this information into an agent and MCP server allows customers to self-serve and resolve issues more quickly, reducing support cases and enabling AWS to focus on new problems.

Building AI Products on Kubernetes and EKS

Kubernetes is a strong platform for AI workloads due to its ability to handle dynamic resource allocation (fine-grained CPU and GPU slices), complex inter-component communication, and standardized interfaces for ML.

Agentic AI vs. Traditional Inferencing

While traditional inferencing and reactive LLM applications are prominent on EKS, agentic AI is considered the frontier. Agentic AI involves agents that take proactive tasks, make decisions, and pursue goals by calling out to tools and extensions. This is a more fine-grained approach compared to broad LLMs, and while still in early experimental stages on Kubernetes and EKS, customer adoption is growing.

Key Decisions for Agentic Workloads on Kubernetes

  1. Model Deployment Location:

    • External Services: Using OpenAI directly, or services like AWS Bedrock (which offers models from Anthropic and OpenAI). This is generally easier for experimentation as these services handle GPU calculations.
    • Within Kubernetes: Some customers opt to run models directly within the cluster for cost optimization.
  2. Cost: This is the primary driver for deployment decisions. Token costs and GPU expenses are significant factors.

  3. Observability: Beyond traditional metrics (CPU, memory, networking), agentic workloads scale on metrics like incoming requests and tokens. Robust observability is crucial.

  4. Security: Standard security concerns remain and may even increase in importance with LLMs.

EKS vs. Managed Agent Services (e.g., Bedrock Agents)

  • Bedrock Agents: Offer a more managed experience, ideal for users seeking the simplest way to deploy an agent.
  • EKS: Preferred by customers who have already standardized on Kubernetes for their compute infrastructure. They have established observability, monitoring, security, and governance tools on Kubernetes and prefer not to introduce new production systems. While it requires more effort, it allows for standardization on a single compute platform.

Agent Frameworks and Prompt Engineering

Setting up agent frameworks like Strands or LangChain is relatively straightforward. Key components include:

  • The endpoint for the model (external or in-cluster).
  • The system prompt, which defines the agent's role (e.g., "travel booking agent," "Kubernetes troubleshooting agent").
  • Optional settings like metrics, CPU, memory, and GPU utilization.

Prompt engineering is often considered the most challenging aspect.

Security and Guardrails for Agentic Systems

Security must be considered throughout the entire stack, requiring more fine-grained control due to the broad capabilities of agents.

  • Fine-grained Authorization: Kubernetes' current authorization model is not granular enough for some agent actions. AWS is contributing to open-source solutions:

    • Cedar: An open-sourced framework for fine-grained conditional authorization.
    • KEP to integrate Cedar into Kubernetes: Allows for specific rules, such as allowing a principle to update a resource only if it doesn't modify an associated secret.
  • Security Considerations:

    • Container Security: Which models developers can use, whether they are fine-tuned on proprietary data or public data.
    • Network Policies: Crucial for controlling inter-agent communication.
    • Authorization Policies: Essential for managing agent permissions.

Common Mistakes in Setting Up Agentic Systems

  1. Lack of Human in the Loop: For troubleshooting, letting agents suggest solutions is common, but allowing them to autonomously remediate issues is still rare and risky. Many customers use a Slack channel where agents provide suggestions, and humans take over.
  2. Inadequate Model Deployment: Running LLMs as standard deployments within a cluster is often insufficient. Solutions like VLM are needed for batching and consistent GPU utilization.
  3. Complex Workflows: For sophisticated workflows and orchestrations, tools like CubeFlow's K Gateway are necessary for surfacing observability metrics and routing to the correct models, rather than relying on basic mechanisms like cube proxy for inter-agent communication.

Customer Use Cases and Adoption Trends

AWS sees two main types of EKS customers:

  1. Platform/Operations/SRE Teams: Using AI for troubleshooting, providing developer tools, and defining applications.
    • Developer Portals: Shifting left into the IDE via plugins, offering tools directly within the developer's workflow.
  2. Developers Building End-User Applications: These applications (e.g., travel booking) are still in earlier stages of agentic AI adoption due to higher bars for customer-facing products.

Multi-agent systems are also in early stages. While protocols like ADA exist for communication, and internal research tools utilize multi-agent systems, fully autonomous agents for complex tasks like booking travel are still some way off.

  • Production Deployments:
    • 2025: Expected for traditional LLM inferencing and experimenting with agentic workloads.
    • 2026: Projected as the year for widespread production deployments of agentic workloads.

Getting Started with Agents on EKS

  • Resources: Search for "AI on EKS" for a repository of examples and documentation.
  • SDK: Recommend the Strands SDK (AWS open-sourced) for simple Python agent development.
  • Deployment: Consider tools like Cube agentic for easy CRD-based agent deployment.
  • LLM Hosting: Advise against running the LLM within the cluster initially. Use external services like OpenAI or Bedrock.
  • Fun Exercise: Create a Kubernetes troubleshooting agent, intentionally break a pod, and see if the agent can diagnose and suggest a fix by analyzing logs and metrics.

Internal AWS Usage of Agentic Tools

AWS is increasingly using these technologies internally.

  • BI Agent: A BI agent can understand SQL tables and queries, allowing product managers to ask questions about customer usage and service performance, speeding up product development.
  • Quick suite: A recently relaunched internal tool that has gained significant traction. It acts as a central repository of information (documents, customer notes, public roadmaps) that LLMs and agents can be optimized for. The EKS product management team has built a Quick suite space optimized for their docs, BI dashboards, and customer feedback, enabling them to ask questions about customer pain points and potential development priorities.

Future Outlook

The tools are expected to improve rapidly. AWS is rolling out new versions of agents and research tools. The internal adoption of tools like Quick suite highlights the potential for enhanced productivity and insight generation.

Conclusion

AWS EKS and ECR are foundational services for building and running open-source software in the cloud, with a strong focus on evolving to support the growing demands of AI workloads. The integration of agentic AI on Kubernetes presents new opportunities and challenges, particularly in areas of security, cost optimization, and user experience. While still in its early stages, the rapid development and adoption of these technologies, both internally at AWS and by its customers, suggest a significant shift towards more intelligent and automated cloud operations and application development in the coming years.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "2026 Will Be the Year of Agentic Workloads in Production on Amazon EKS". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video