Cosmos DB Optimization

Key Concepts

Request Units (RUs): The fundamental compute unit in Cosmos DB, representing the cost of database operations (reads, writes, queries).
Provisioned Throughput: A model where you reserve capacity (RUs) for your database or container.
Autoscale: A provisioned throughput mode that dynamically scales RUs based on usage (10% to 100% of the set maximum).
Manual Throughput: A fixed-capacity model requiring consistent, predictable workloads.
Serverless: A consumption-based model where you pay only for the RUs consumed per million operations.
Partition Key: A mandatory attribute used to distribute data across logical and physical partitions.
High Cardinality: The requirement for a partition key to have a large number of distinct values to ensure even data distribution.
Physical Partitions: The underlying infrastructure units (10,000 RUs and 50GB storage each) that power Cosmos DB.
Hierarchical Partition Key: A feature allowing up to three levels of keys to manage data growth beyond the 20GB logical partition limit.
Global Secondary Index: A mechanism to create duplicate data sets with different partition keys to optimize query performance and cost.

1. Service Models and Cost Optimization

Cosmos DB architecture is hierarchical: Account > Database > Container > Document. Cost optimization depends on selecting the right throughput model:

Manual Provisioned: Best for highly predictable, constant workloads. You pay for the fixed RU amount regardless of usage.
Autoscale (The "Superstar"): The default recommendation for 99% of use cases. It allows for a 10x range (10% to 100% of max). While it carries a 1.5x price premium, it is almost always cheaper than manual provisioning because it scales down during idle periods. The break-even point is 66% average utilization; if your utilization is lower, Autoscale is more cost-effective.
Serverless: Ideal for sporadic, bursty, or ad-hoc workloads. It is roughly 7.5x more expensive than provisioned throughput for consistent workloads, making it unsuitable for steady-state traffic.

2. Data Modeling and Partitioning

The efficiency of your RU consumption is directly tied to your data model and partition key selection:

Partition Key Strategy: You must choose a partition key with high cardinality (thousands of distinct values) to ensure data is evenly distributed across physical partitions.
Avoiding Fan-out: Queries should ideally include the partition key. Queries that do not use the partition key result in "fan-out," where the system must search across multiple physical partitions, significantly increasing RU costs.
Document Size: The optimal document size is between 1KB and 10KB. While the limit is 2MB, smaller documents are more efficient for read/write operations.
Bounded vs. Unbounded Data: If data related to an entity is "bounded" (e.g., a few addresses), store it in the same document. If it is "unbounded" (e.g., an infinite list of comments), store it in separate documents using the same partition key to keep them in the same logical partition.

3. Advanced Optimization Techniques

Global Secondary Index: If you frequently query by an attribute other than the primary partition key, creating a global secondary index (a duplicate container synced via Change Feed) can be cheaper than paying for expensive cross-partition fan-out queries.
Hierarchical Partition Keys: Used when a single partition key exceeds the 20GB logical partition limit. By adding a second or third level to the key, you can further shard the data.
Right-Heavy Workloads: For IoT scenarios where you don't need to query by device ID, using a GUID as the partition key provides maximum cardinality and even distribution, preventing "hot" partitions.

4. Operational Controls

Account Throughput Limit: A safety feature that prevents the account from exceeding a specific RU limit, ensuring you never receive an unexpectedly high bill.
Database-level vs. Container-level: While you can set throughput at the database level, it is generally discouraged for production because it leads to "noisy neighbor" issues where containers compete for shared RUs. The "happy path" is to set throughput at the container level.

Synthesis

Cost optimization in Cosmos DB is not about finding the cheapest service, but about aligning the service model with your workload's traffic pattern. Autoscale is the most versatile and cost-efficient choice for most applications. By prioritizing high-cardinality partition keys and designing data models that avoid cross-partition fan-out, developers can significantly reduce RU consumption and maintain a performant, cost-effective architecture.