Microsoft wants to make service mesh invisible

By The New Stack

Share:

Key Concepts

  • Service Mesh: Infrastructure layer for handling service-to-service communication, providing security (mTLS), observability (traces, metrics, logs), and traffic control without application-level changes.
  • Istio: An open-source service mesh platform.
  • Ambient Mode: A simplified Istio architecture that removes sidecar proxies in favor of a per-node, Rust-based proxy, reducing operational complexity.
  • Azure Kubernetes Application Network (App Net): A managed service built on Istio Ambient mode, designed to provide "boring" (stable, automated) networking.
  • Gateway API: A collection of resources in Kubernetes that model service networking, now including inference extensions for AI.
  • Agent Gateway: A CNCF project focused on managing agentic traffic and AI-specific networking needs.
  • mTLS (Mutual TLS): A security protocol ensuring that both parties in a network connection are authenticated and the traffic is encrypted.
  • Day-Two Operations: The ongoing maintenance, upgrades, and management of software after initial deployment.

1. Evolution of Service Mesh: From Sidecars to Ambient

The traditional service mesh architecture relied on "sidecars"—an Envoy proxy injected into every pod. While effective, this created significant "day-two" operational burdens, such as requiring application restarts for every proxy upgrade.

Ambient Mode was introduced to solve this by:

  • Moving encryption and L4 policy enforcement to a lightweight, per-node Rust-based proxy.
  • Allowing L7 features (telemetry, weighted load balancing) to be handled by independent "waypoint" proxies that can be upgraded without restarting application pods.

2. The "Boring" Philosophy

Mitch Conners emphasizes that the ultimate goal for service mesh is to become "boring"—meaning it should be so well-integrated and automated that users don't need to think about it.

  • The Problem: 85% of manual Istio installations fail to keep up with critical CVE (Common Vulnerabilities and Exposures) patches.
  • The Solution: Transitioning from a DIY project to a managed product (Azure Kubernetes Application Network). By treating the mesh as a service, Microsoft handles upgrades, security patches, and maintenance windows, allowing platform engineers to focus on business logic rather than infrastructure plumbing.

3. AI and Networking: New Challenges

AI workloads are fundamentally changing network requirements. Unlike standard HTTP requests, LLM requests are non-deterministic and vary wildly in compute cost.

  • Token-Based Routing: Because LLM requests vary in complexity, routing based on simple request counts is insufficient. Istio is evolving to support token-count estimation to distribute load fairly.
  • Inference Extensions: The Gateway API now includes inference extensions to help route traffic based on the cost of serving a request.
  • AI Safety & Shadow AI: Enterprises face "Shadow AI," where employees use unapproved AI tools. The goal is to use the service mesh to inspect request bodies and ensure traffic is only routed to approved, secure AI endpoints.
  • Agent Gateway: For cutting-edge AI needs (A2A, MCP protocols), Microsoft is partnering with the Agent Gateway project. This allows users to opt into "alpha" features for AI while keeping the core Istio infrastructure stable and "boring."

4. Multi-Cluster and Capacity Management

A major development in the last year is Ambient support for multi-cluster environments.

  • Identity-Based Security: By using service accounts as the root of trust, security policies are enforced cryptographically across clusters, regardless of where the workload resides.
  • Capacity Constraints: AI training and inference require specific GPU hardware. Multi-cluster meshes allow organizations to move GPU-heavy workloads to regions with available capacity while maintaining seamless, secure connectivity between clusters.

5. Key Arguments and Perspectives

  • The "Land and Expand" Strategy: Most users adopt Istio for a single, high-priority requirement—usually mTLS encryption for compliance. Once the mesh is in place, they naturally expand to use ingress, traffic shifting, and observability.
  • Language Agnosticism: While application-level libraries (like Spring Boot or .NET) can handle some networking tasks, they fail in heterogeneous environments. A service mesh provides a consistent security and policy layer that works regardless of the programming language used.
  • Quote: "Success for me looks like most people not knowing what a service mesh is, even though they're using one." — Mitch Conners.

Synthesis

The service mesh landscape is shifting from complex, manual "sidecar" configurations toward managed, "boring" infrastructure. By abstracting the complexity through products like Azure Kubernetes Application Network, Microsoft aims to make mTLS and traffic control a default, invisible utility. Simultaneously, the project is evolving to address the unique, high-velocity demands of AI—specifically token-based load balancing and AI safety—by integrating specialized tools like Agent Gateway while maintaining the stability of the core Istio API.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Microsoft wants to make service mesh invisible". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video