How to scale to 60 million requests (with ZERO ops team) using Cloud Run

By Google Cloud Tech

Share:

Key Concepts

  • Cloud Run: A fully managed serverless platform for deploying containerized applications.
  • Feature Flag Service: A software development technique that allows features to be toggled on or off without deploying new code.
  • Multi-stage Docker Build: A method to create small, secure container images by separating the build environment from the final runtime environment.
  • Cloud Armor: A web application firewall (WAF) used to protect applications from malicious traffic and DDoS attacks.
  • Firestore & BigQuery: Scalable, serverless database and analytics services, respectively.
  • Batching: A performance optimization technique that groups multiple operations into a single request to reduce costs and overhead.
  • Go Routines: Lightweight threads managed by the Go runtime, used for concurrent execution.

1. Project Overview: Rocket Flag

JK, a developer and startup founder, built a feature flag service called "Rocket Flag." The service allows developers to roll out new features gradually (e.g., to 1% of users) and perform instant rollbacks without redeploying code. The application successfully scaled to 60 million requests per month.

2. Architecture and Scaling Strategy

  • Serverless Approach: To avoid the overhead of an operations team and manage costs, JK utilized a serverless architecture.
  • Language Choice: The application was written in Go due to its exceptionally fast execution and minimal "cold start" times, which are critical for serverless performance.
  • Deployment: The app was deployed across multiple regions to ensure low latency for global users.
  • Security: To prevent exposure of sensitive files, JK used a multi-stage Docker build, copying only the final binary into a minimal scratch container.
  • Traffic Filtering: To handle "garbage traffic" and reduce log noise, JK implemented Cloud Armor. By using regular expressions to define valid URLs, unauthorized requests are blocked before they reach the application or generate logs.

3. Cost Optimization: The Batching Framework

A significant challenge arose when the application’s database costs (Firestore and BigQuery) spiked due to the high volume of write operations per request.

  • The Problem: Every request triggered a write to Firestore (for state) and BigQuery (for analytics), leading to high costs as traffic grew.
  • The Solution: JK implemented an in-memory batching system.
    • Requests are held in memory.
    • Once per minute, the application performs a single batch write to Firestore and a single batch load job to BigQuery.
  • Implementation: In Go, this was achieved using Go routines to handle background tasks while the main CPU served HTTP requests. For lower-traffic scenarios, JK suggests using Pub/Sub to queue messages for a worker to process.
  • Result: The cost curve for database writes flattened significantly immediately after deployment.

4. Financial and Operational Impact

  • Total Cost: For December 2025, the total cost to handle 60 million requests was 252 AUD (~180 USD).
  • Total Cost of Ownership (TCO): JK argues that while serverless components have per-request costs, the TCO is lower than traditional VM-based systems because it eliminates the need for an SRE (Site Reliability Engineering) team.
  • Operational Efficiency: The system allows a "team of one" to operate at scale with zero minutes per month spent on manual server maintenance or patching.

5. Key Takeaways and Lessons Learned

  • Speed to Market: Cloud Run enables rapid deployment and scaling, allowing developers to launch ideas quickly.
  • Operational Leverage: Offloading infrastructure management to serverless platforms allows small teams to "punch above their weight."
  • The "Expensive Trap": While serverless is affordable, it requires careful architectural design. As traffic scales, developers must monitor their bills and optimize data access patterns (e.g., batching) to avoid unexpected costs.
  • Strategic Planning: Success in serverless is less about managing servers and more about thoughtful architecture and planning.

"Offloading operations allows you to punch way above your weight. A team of one can build, operate, an entire application with lots of traffic." — JK

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "How to scale to 60 million requests (with ZERO ops team) using Cloud Run". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video