Home / Azure / Azure Cosmos DB: Choosing the Right NoSQL Solution
Azure

Azure Cosmos DB: Choosing the Right NoSQL Solution

A practical guide to selecting APIs, modeling data, provisioning throughput, securing, and optimising cost for Azure Cosmos DB in enterprise applications.

What you will learn

Practical execution with concise explanations, real implementation patterns, and production-ready recommendations.

Azure Cosmos DB: Choosing the Right NoSQL Solution

Azure Cosmos DB is a globally distributed, multi-model database service with single-digit millisecond latencies worldwide. Picking the right API, partition key, and throughput model is the difference between a system that scales effortlessly and one that throttles constantly.


API Selection Guide

API Selection Guide

flowchart TD
    Q1{What data model\ndo you need?}
    Q1 -- Documents / JSON --> Q2
    Q1 -- Key-Value --> API_TABLE[Table API\nSimple KV, Migration from Azure Table]
    Q1 -- Graphs --> API_GREMLIN[Gremlin API\nRelationship traversal]
    Q1 -- Cassandra compat --> API_CASS[Cassandra API\nLift-and-shift from Cassandra]
    Q1 -- MongoDB compat --> API_MONGO[MongoDB API\nMigration from MongoDB]

    Q2{New workload or\nMongoDB migration?}
    Q2 -- New workload --> API_NOSQL[NoSQL API ✅\nRecommended default\nSDK, LINQ, serverless]
    Q2 -- MongoDB migration --> API_MONGO

    style API_NOSQL fill:#d1fae5,stroke:#059669,color:#065f46
    style API_MONGO fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style API_TABLE fill:#fef3c7,stroke:#f59e0b,color:#78350f
    style API_GREMLIN fill:#ede9fe,stroke:#8b5cf6,color:#4c1d95
    style API_CASS fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
API Wire Protocol Best For SDK Support
NoSQL (default) Cosmos native New apps, rich queries, LINQ .NET, Java, Python, Node.js, Go
MongoDB MongoDB 4.x Migration from MongoDB All MongoDB drivers
Cassandra Cassandra CQL Migration from Cassandra Cassandra drivers
Gremlin TinkerPop Graph traversal, recommendations TinkerPop clients
Table OData Simple KV migration from Azure Tables Table SDK

Start with the NoSQL API for new workloads. It offers the richest SDK experience, server-side JavaScript, change feed, and the best integration with Azure services.


Consistency Levels

flowchart LR
    subgraph Levels["← Stronger consistency ——————————— Better performance →"]
        direction LR
        S[Strong]
        BS[Bounded\nStaleness]
        SE[Session\n★ Default]
        CP[Consistent\nPrefix]
        EV[Eventual]
    end

    S -.->|Higher latency\nHigher RU cost| EV
    EV -.->|Lower latency\nLower RU cost| S

    style SE fill:#d1fae5,stroke:#059669,color:#065f46
    style S fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
    style EV fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
Level Guarantees RU Cost Use When
Strong Linearisable reads ~2× Financial transactions requiring absolute accuracy
Bounded Staleness Lag bounded by K ops or T time ~1.5× Global apps needing near-strong
Session Consistent within a session Most OLTP apps — read your own writes
Consistent Prefix No out-of-order reads <1× Order matters but lag is acceptable
Eventual No guarantees <1× Analytics, counters, cache warm-up

Partition Key Design

Partition Key Design

flowchart TD
    subgraph Good["✅ Good Partition Keys"]
        G1[tenantId — SaaS apps]
        G2[userId — User data]
        G3[orderId — Orders]
        G4[deviceId — IoT]
    end

    subgraph Bad["❌ Anti-Patterns"]
        B1[status — low cardinality\nhot partition]
        B2[createdDate — range\nhot partition today]
        B3[type — only a few values]
    end

    subgraph Synthetic["🔧 Synthetic Keys\nfor tricky cases"]
        S1["userId + categoryId\n→ 'u123_electronics'"]
        S2["tenantId + month\n→ 't456_2025-04'"]
    end

    style Good fill:#d1fae5,stroke:#059669,color:#065f46
    style Bad fill:#fee2e2,stroke:#ef4444,color:#7f1d1d
    style Synthetic fill:#fef3c7,stroke:#f59e0b,color:#78350f

Rules for choosing a partition key:

  1. High cardinality — thousands or millions of unique values
  2. Even distribution — access spread across partitions
  3. Matches your access pattern — include it in most queries
  4. Avoid hotspots — single items should never dominate

Step 1: Create a Cosmos DB Account

RESOURCE_GROUP="rg-data-platform"
ACCOUNT_NAME="acct-biz-global"

az group create --name $RESOURCE_GROUP --location eastus

# Multi-region account with Session consistency
az cosmosdb create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --locations regionName=eastus failoverPriority=0 isZoneRedundant=false \
  --locations regionName=westeurope failoverPriority=1 isZoneRedundant=false \
  --default-consistency-level Session

# Create database and container
az cosmosdb sql database create \
  --account-name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --name ordersdb

az cosmosdb sql container create \
  --account-name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --database-name ordersdb \
  --name orders \
  --partition-key-path /partitionKey \
  --throughput 400

Sample Document Model

{
  "id": "ord-8847",
  "partitionKey": "tenant-acme",
  "customerId": "cust-123",
  "status": "Pending",
  "lineItems": [
    { "sku": "A100", "qty": 2, "price": 49.95 },
    { "sku": "B200", "qty": 1, "price": 49.60 }
  ],
  "total": 149.50,
  "_ttl": 604800
}

Partition key: tenant-acme — predictable distribution, natural access boundary, high cardinality in a SaaS context.


Step 2: Indexing Policy

Step 2: Indexing Policy

By default Cosmos DB indexes every field. For large arrays or rarely queried sub-objects, explicitly exclude them to reduce write RU costs:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [
    { "path": "/lineItems/*" },
    { "path": "/\"_etag\"/?" }
  ]
}

Update via .NET SDK:

var props = new ContainerProperties("orders", "/partitionKey")
{
    IndexingPolicy = new IndexingPolicy
    {
        Automatic = true,
        IndexingMode = IndexingMode.Consistent,
        IncludedPaths = { new IncludedPath { Path = "/*" } },
        ExcludedPaths =
        {
            new ExcludedPath { Path = "/lineItems/*" },
            new ExcludedPath { Path = "/\"_etag\"/?" }
        }
    }
};
await database.CreateContainerIfNotExistsAsync(props, throughput: 400);

Step 3: CRUD and Bulk Operations (.NET)

var client = new CosmosClient(endpoint, key, new CosmosClientOptions
{
    AllowBulkExecution = true   // Enable for bulk inserts
});
var container = client.GetContainer("ordersdb", "orders");

// Create
await container.CreateItemAsync(order, new PartitionKey(order.partitionKey));

// Point read (1 RU, fastest path)
var response = await container.ReadItemAsync<Order>(
    order.id, new PartitionKey(order.partitionKey));

// Query with projection (lower RU than SELECT *)
var query = new QueryDefinition(
    "SELECT c.id, c.status, c.total FROM c " +
    "WHERE c.partitionKey = @tenant AND c.status = @status")
    .WithParameter("@tenant", tenantId)
    .WithParameter("@status", "Pending");

var iterator = container.GetItemQueryIterator<OrderSummary>(
    query,
    requestOptions: new QueryRequestOptions
    {
        PartitionKey = new PartitionKey(tenantId)
    });

while (iterator.HasMoreResults)
{
    var page = await iterator.ReadNextAsync();
    // page.RequestCharge = RUs consumed
}

// Bulk insert
var tasks = ordersToInsert.Select(o =>
    container.CreateItemAsync(o, new PartitionKey(o.partitionKey)));
await Task.WhenAll(tasks);

Throughput Models

flowchart LR
    subgraph Manual["Manual / Provisioned"]
        M_F[Fixed RU/s\ne.g. 400 RU always]
        M_F --> M_USE[Predictable cost\nSteady traffic]
    end

    subgraph Auto["Autoscale"]
        A_MIN[Min 10%\nof max RU/s]
        A_MAX[Max RU/s\nyou define]
        A_MIN <-->|Scales automatically| A_MAX
        A_MAX --> A_USE[Spiky workloads\nNo manual tuning]
    end

    subgraph Serverless["Serverless"]
        SL[Pay per\nrequest only]
        SL --> SL_USE[Dev / test\nUnpredictable low volume]
    end

    style Manual fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style Auto fill:#d1fae5,stroke:#059669,color:#065f46
    style Serverless fill:#fef3c7,stroke:#f59e0b,color:#78350f
Model Cost Pattern Scale Behaviour Best For
Provisioned (Manual) Fixed per hour Fixed, no auto-scale Steady, predictable workloads
Autoscale Min 10% of max RU, up to max Scales 10× on demand Spiky / variable workloads
Serverless Per-request billing No provisioned capacity Dev/test, low or sporadic volume

Cost Optimisation Strategies

Strategy RU Impact How To
Autoscale Efficient burst handling Set max RU, let Cosmos scale
Analytical Store Removes RU overhead from analytics Enable per container
TTL (Time-to-live) Frees storage + index Set _ttl on items
Excluded paths Lowers write RU Skip indexing for large unused arrays
Caching (Redis) Reduces read RU dramatically Cache hot items by id + partitionKey
Bulk execution Reduces per-call overhead Enable AllowBulkExecution in SDK
Query projection Lower response/RU SELECT c.id, c.total not SELECT *

Security

# Use Managed Identity — no keys in application code
az cosmosdb sql role assignment create \
  --account-name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --role-definition-name "Cosmos DB Built-in Data Contributor" \
  --principal-id <managed-identity-object-id> \
  --scope "/"

# Private endpoint (restricts public access)
az network private-endpoint create \
  --name pe-cosmos \
  --resource-group $RESOURCE_GROUP \
  --vnet-name vnet-app \
  --subnet snet-data \
  --private-connection-resource-id $(az cosmosdb show -n $ACCOUNT_NAME -g $RESOURCE_GROUP --query id -o tsv) \
  --group-id Sql \
  --connection-name cosmos-private-connection

Security checklist:

Control Recommendation
Authentication Managed Identity via DefaultAzureCredential — no connection strings
Network Private Endpoints + deny public access for production
Encryption CMK (Customer Managed Keys) for regulated workloads
Least privilege Assign Cosmos DB Built-in Data Reader to read-only identities
Auditing Enable diagnostic logs → Log Analytics

Monitoring and Alerts

# Alert when RU consumption spikes above 50,000 in a 5-min window
az monitor metrics alert create \
  --name cosmos-ru-burst \
  --resource-group $RESOURCE_GROUP \
  --scopes "/subscriptions/$SUB/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.DocumentDB/databaseAccounts/$ACCOUNT_NAME" \
  --condition "max TotalRequestUnits > 50000" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action-group "/subscriptions/$SUB/resourceGroups/$RESOURCE_GROUP/providers/microsoft.insights/actionGroups/ag-oncall"

Key metrics to monitor:

Metric What to Watch Alert Threshold
TotalRequestUnits Overall RU consumption > 80% of provisioned
NormalizedRUConsumption Per-partition hotspots Any partition > 90%
ThrottledRequests HTTP 429 rate > 0 in production
ServerSideLatency Data plane p99 > 10 ms
AvailabilityPercentage SLA compliance < 99.99%

Troubleshooting

Symptom Root Cause Fix
High RU per query SELECT * or full partition scans Add projections; exclude unused indexed paths
Hot partition (partition > 90% RU) Low-cardinality partition key Redesign key; add synthetic key
Frequent 429 throttling RU under-provisioned Increase RU or enable autoscale
Latency increase globally Cross-region reads with Strong consistency Downgrade to Session where safe
High storage cost Orphaned / stale documents Enable TTL; archive essentials
SDK timeout Network / large document Enable retry policy; split large docs

Key Takeaways

  • ✅ Start with the NoSQL API for new workloads — richest SDK and query support
  • Session consistency is the right default for 95% of OLTP applications
  • ✅ Partition key choice is the most critical design decision — high cardinality, even distribution
  • Autoscale handles burst traffic without manual intervention
  • ✅ Use Managed Identity — zero connection strings in application code
  • Excluded indexing paths and query projections are the fastest way to cut RU costs

Additional Resources


What partition key patterns have worked best for your Cosmos DB workloads? Share your data modelling experience below.

Discussion