Before you scale: A guide to Cloud Run cost optimization

By Google Cloud Tech

Share:

Cloud Run Cost Optimization & Billing Management

Key Concepts:

  • Cloud Run: A fully managed serverless execution environment for containerized applications.
  • Max Instances: A setting to limit the maximum number of instances a Cloud Run service can scale to.
  • Concurrency Limit: The number of requests a single Cloud Run instance can handle simultaneously.
  • Firebase App Check: A service to verify that requests to your Cloud Run service come from your legitimate apps.
  • Cloud Armor: A web application firewall (WAF) for protecting applications from attacks.
  • Budget Alerts: Notifications triggered when cloud spending reaches predefined thresholds.
  • Committed Use Discounts: Discounts offered for committing to a consistent level of resource usage.
  • vCPU Utilization: A measure of how much virtual CPU a Cloud Run service is using.
  • Cloud Hub & Optimization: A new page in the Google Cloud Console designed to help optimize application costs.
  • Cost Explorer: A tool within the Google Cloud Console for detailed cost breakdown and analysis.

Preventing Billing Surprises

Mitchell emphasizes two primary approaches to Cloud Run cost management: preventing unexpected bills and optimizing existing service costs. He begins by addressing billing surprises, a common concern among developers. Traditional server-based systems often turn away users during traffic spikes, but serverless platforms like Cloud Run offer scaling options.

The key is understanding how Cloud Run handles increased traffic. Users can choose to either scale up to accommodate the load or limit scaling and potentially turn away users. To prevent uncontrolled scaling, max instances can be set. For example, setting a max instance limit of two prevents the service from scaling beyond two instances, even if traffic exceeds their capacity.

Protecting against malicious traffic is also crucial. While scaling is beneficial for legitimate users, it can be exploited by attackers. Several solutions are available:

  • Authentication: Requiring authentication via Identity-Aware Proxy, Identity Platform, or Firebase Authentication filters out unauthenticated requests.
  • Firebase App Check: Useful for services supporting anonymous users, verifying requests originate from legitimate apps.
  • Cloud Armor & Load Balancer: Cloud Armor, a web application firewall, allows for granular control over traffic, including rate limiting to prevent resource exhaustion and blocking common attacks like SQL injection, cross-site scripting, and bot traffic.

Finally, budget alerts are recommended. These alerts, triggered via email or Pub/Sub, notify users when predicted spending reaches a specified amount or exceeds a percentage of the previous month’s bill. As Mitchell states, “That will help me sleep better at night and worry less about cost overruns.”

Optimizing Running Service Costs

While Cloud Run costs may be a small percentage of the total bill for some applications (less than 10% in the speaker’s experience), optimization is still worthwhile. Mitchell details how to analyze and improve Cloud Run costs using the new Cloud Hub & Optimization page in the Google Cloud Console. This page provides a comprehensive view of cost trends across all Google Cloud products.

The Cost and Utilization section breaks down costs, and the View Details and Cost Explorer link leads to a more granular analysis. The example presented shows a customer spending $370 on Cloud Run services and $157 on Cloud Run jobs in the last 30 days, with a 3% increase in Cloud Run costs and a 1% decrease in Cloud Logging costs. Significant cost fluctuations warrant investigation.

The analysis can be further refined by examining vCPU utilization. The example highlights a “Discordbot” service using only 2% of its allocated CPU, suggesting potential savings by reducing CPU allocation. Similarly, an “animated WEBP” service utilizes only 0.5% of its allocated memory, indicating an opportunity to lower memory allocation.

Concurrency & Resource Allocation

Focusing on the service driving the most cost ("API" in the example), Mitchell recommends increasing the concurrency limit. The default is 80, but some services can handle more. Higher concurrency allows each instance to process more requests, reducing the number of instances needed and lowering costs. However, he cautions against setting the limit too high, as it could lead to performance issues. Cloud Run’s automatic scaling based on CPU load mitigates this risk for CPU-bound workloads.

After adjusting concurrency, it’s essential to verify memory utilization. If instances use minimal memory, reducing allocated memory can yield further savings. Mitchell summarizes the ideal scenario: “I guess I want high CPU and high memory utilization.”

Committed Use Discounts

Finally, Compute Flexible Committed Use Discounts offer potential savings for predictable workloads. Users commit to spending a specific amount per hour across Cloud Run, Compute Engine, or Kubernetes Engine, receiving a discounted rate. The commitment is region-specific and applies across multiple services, providing flexibility. Automated recommendations for these discounts may be available in the Cloud Console.

Recap & Key Takeaways

Mitchell concludes with a concise recap:

  1. Prevent billing surprises: Utilize max instances, authentication, Firebase App Check, or Cloud Armor, and set up budget alerts.
  2. Optimize running services: Use the optimization report to identify cost drivers and underutilized services, allocate fewer resources to them, increase concurrency limits, and consider committed use discounts.

The discussion highlights the importance of proactive cost management in Cloud Run, leveraging available tools and features to optimize resource allocation and prevent unexpected expenses. The new Cloud Hub & Optimization page provides a centralized location for cost analysis and optimization recommendations.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Before you scale: A guide to Cloud Run cost optimization". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video