Home / Deep Dive / Cloud-Native Application Platform: AKS + Dapr + KEDA + Azure Service Bus
Deep Dive

Cloud-Native Application Platform: AKS + Dapr + KEDA + Azure Service Bus

Build a production-grade cloud-native application platform on Azure Kubernetes Service using Dapr for microservice communication, KEDA for event-driven autoscaling, and Azure Service Bus for reliable messaging — complete with observability, CI/CD, and GitOps.

What you will learn

Practical execution with concise explanations, real implementation patterns, and production-ready recommendations.

Introduction: Cloud-Native Done Right

Introduction: Cloud-Native Done Right

Building microservices is easy. Building production-grade microservices that are resilient, observable, secure, and cost-efficient is hard. This deep dive assembles a complete cloud-native platform on AKS, using Dapr to abstract away infrastructure complexity, KEDA for intelligent autoscaling, Azure Service Bus for reliable async messaging, and Flux for GitOps-driven deployments. The result is a platform that development teams can build on without worrying about the underlying distributed systems challenges.

Prerequisites

  • Azure subscription with Contributor access
  • Azure CLI 2.50+ with aks-preview extension
  • kubectl and Helm 3.x installed
  • Familiarity with Kubernetes, Docker, and microservice patterns
  • Azure DevOps or GitHub for CI/CD pipelines

Phase 1: AKS Cluster Provisioning

Phase 1: AKS Cluster Provisioning

Production-Grade Cluster with Bicep

param location string = resourceGroup().location
param clusterName string = 'aks-cloudnative-prod'

resource aksCluster 'Microsoft.ContainerService/managedClusters@2024-01-01' = {
  name: clusterName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    kubernetesVersion: '1.29'
    dnsPrefix: '${clusterName}-dns'
    enableRBAC: true
    aadProfile: {
      managed: true
      enableAzureRBAC: true
      adminGroupObjectIDs: ['admin-group-id']
    }
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'calico'
      serviceCidr: '10.0.0.0/16'
      dnsServiceIP: '10.0.0.10'
      loadBalancerSku: 'standard'
      outboundType: 'userDefinedRouting'
    }
    agentPoolProfiles: [
      {
        name: 'system'
        count: 3
        vmSize: 'Standard_D4s_v5'
        osType: 'Linux'
        mode: 'System'
        availabilityZones: ['1', '2', '3']
        enableAutoScaling: true
        minCount: 3
        maxCount: 5
        nodeTaints: ['CriticalAddonsOnly=true:NoSchedule']
      }
      {
        name: 'apppool'
        count: 3
        vmSize: 'Standard_D8s_v5'
        osType: 'Linux'
        mode: 'User'
        availabilityZones: ['1', '2', '3']
        enableAutoScaling: true
        minCount: 3
        maxCount: 20
        nodeLabels: {
          workload: 'application'
        }
      }
      {
        name: 'gpupool'
        count: 0
        vmSize: 'Standard_NC6s_v3'
        osType: 'Linux'
        mode: 'User'
        enableAutoScaling: true
        minCount: 0
        maxCount: 4
        nodeLabels: {
          workload: 'gpu'
        }
        nodeTaints: ['nvidia.com/gpu=true:NoSchedule']
      }
    ]
    autoUpgradeProfile: {
      upgradeChannel: 'stable'
      nodeOSUpgradeChannel: 'NodeImage'
    }
    oidcIssuerProfile: { enabled: true }
    securityProfile: {
      workloadIdentity: { enabled: true }
      defender: { securityMonitoring: { enabled: true } }
      imageCleaner: { enabled: true, intervalHours: 48 }
    }
    addonProfiles: {
      omsagent: {
        enabled: true
        config: { logAnalyticsWorkspaceResourceID: logAnalytics.id }
      }
      azurepolicy: { enabled: true }
      azureKeyvaultSecretsProvider: {
        enabled: true
        config: { enableSecretRotation: 'true', rotationPollInterval: '2m' }
      }
    }
  }
}

Phase 2: Dapr Runtime for Microservice Abstractions

Installing and Configuring Dapr

# Install Dapr on AKS with HA mode
helm repo add dapr https://dapr.github.io/helm-charts/
helm repo update
helm install dapr dapr/dapr `
    --namespace dapr-system `
    --create-namespace `
    --set global.ha.enabled=true `
    --set dapr_placement.replicaCount=3 `
    --set dapr_sentry.replicaCount=3 `
    --set dapr_operator.replicaCount=3 `
    --set global.mtls.enabled=true `
    --set global.logAsJson=true `
    --version 1.13

Dapr Components for Azure Service Bus

# dapr-components/servicebus-pubsub.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: order-pubsub
  namespace: production
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretKeyRef:
        name: servicebus-secret
        key: connectionString
    - name: maxActiveMessages
      value: "100"
    - name: maxConcurrentHandlers
      value: "10"
    - name: lockRenewalInSec
      value: "60"
    - name: maxConnectionRecoveryInSec
      value: "300"
    - name: publishMaxRetries
      value: "5"
    - name: publishInitialRetryIntervalInMs
      value: "500"
  auth:
    secretStore: azure-keyvault
---
# dapr-components/statestore.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: order-state
  namespace: production
spec:
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: "https://cloudnative-cosmos.documents.azure.com:443/"
    - name: masterKey
      secretKeyRef:
        name: cosmos-secret
        key: masterKey
    - name: database
      value: "orders"
    - name: collection
      value: "state"
    - name: actorStateStore
      value: "true"
  auth:
    secretStore: azure-keyvault
---
# dapr-components/bindings-blob.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: document-storage
  namespace: production
spec:
  type: bindings.azure.blobstorage
  version: v1
  metadata:
    - name: accountName
      value: "cloudnativedocs"
    - name: accountKey
      secretKeyRef:
        name: storage-secret
        key: accountKey
    - name: containerName
      value: "documents"
  auth:
    secretStore: azure-keyvault

Microservice with Dapr SDK

// OrderService/Program.cs - .NET 8 microservice using Dapr
using Dapr.Client;
using Dapr.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddDaprClient();
builder.Services.AddControllers().AddDapr();
builder.Services.AddHealthChecks();

var app = builder.Build();

app.UseCloudEvents();
app.MapSubscribeHandler();
app.MapControllers();
app.MapHealthChecks("/healthz");

app.Run();

// Controllers/OrderController.cs
[ApiController]
[Route("api/orders")]
public class OrderController : ControllerBase
{
    private readonly DaprClient _daprClient;
    private readonly ILogger<OrderController> _logger;

    public OrderController(DaprClient daprClient, ILogger<OrderController> logger)
    {
        _daprClient = daprClient;
        _logger = logger;
    }

    [HttpPost]
    public async Task<IActionResult> CreateOrder([FromBody] CreateOrderRequest request)
    {
        var order = new Order
        {
            Id = Guid.NewGuid().ToString(),
            CustomerId = request.CustomerId,
            Items = request.Items,
            Status = OrderStatus.Created,
            CreatedAt = DateTime.UtcNow,
            TotalAmount = request.Items.Sum(i => i.Price * i.Quantity)
        };

        // Save state to Cosmos DB via Dapr state store
        await _daprClient.SaveStateAsync("order-state", order.Id, order);

        // Publish order created event via Dapr pub/sub (Service Bus)
        await _daprClient.PublishEventAsync("order-pubsub", "orders", new OrderCreatedEvent
        {
            OrderId = order.Id,
            CustomerId = order.CustomerId,
            TotalAmount = order.TotalAmount,
            CreatedAt = order.CreatedAt
        });

        _logger.LogInformation("Order {OrderId} created for customer {CustomerId}", order.Id, order.CustomerId);
        return CreatedAtAction(nameof(GetOrder), new { id = order.Id }, order);
    }

    [Topic("order-pubsub", "payments")]
    [HttpPost("payment-completed")]
    public async Task<IActionResult> HandlePaymentCompleted([FromBody] PaymentCompletedEvent evt)
    {
        var order = await _daprClient.GetStateAsync<Order>("order-state", evt.OrderId);
        if (order == null) return NotFound();

        order.Status = OrderStatus.Paid;
        order.PaidAt = evt.PaidAt;

        await _daprClient.SaveStateAsync("order-state", order.Id, order);

        // Invoke shipping service via Dapr service invocation
        await _daprClient.InvokeMethodAsync(HttpMethod.Post, "shipping-service", "api/shipments", new
        {
            OrderId = order.Id,
            Address = order.ShippingAddress,
            Items = order.Items
        });

        return Ok();
    }
}

Phase 3: KEDA Event-Driven Autoscaling

Phase 3: KEDA Event-Driven Autoscaling

Scaling Based on Azure Service Bus Queue Depth

# keda/order-processor-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15
  cooldownPeriod: 60
  minReplicaCount: 1
  maxReplicaCount: 30
  fallback:
    failureThreshold: 3
    replicas: 5
  triggers:
    - type: azure-servicebus
      metadata:
        queueName: order-processing
        namespace: cloudnative-bus
        messageCount: "10"
        activationMessageCount: "1"
        connectionFromEnv: SERVICEBUS_CONNECTION
    - type: cron
      metadata:
        timezone: "America/New_York"
        start: "0 8 * * 1-5"
        end: "0 20 * * 1-5"
        desiredReplicas: "5"
---
# Scale based on HTTP traffic using Prometheus metrics
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-gateway-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-gateway
  pollingInterval: 10
  cooldownPeriod: 120
  minReplicaCount: 3
  maxReplicaCount: 50
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-server.monitoring:9090
        metricName: http_requests_per_second
        query: |
          sum(rate(http_requests_total{app="api-gateway"}[2m]))
        threshold: "200"
        activationThreshold: "50"
    - type: cpu
      metricType: Utilization
      metadata:
        value: "70"
    - type: memory
      metricType: Utilization
      metadata:
        value: "80"

Phase 4: Azure Service Bus Patterns

Dead Letter Processing and Retry Logic

public class ServiceBusDeadLetterProcessor : BackgroundService
{
    private readonly ServiceBusClient _client;
    private readonly ILogger<ServiceBusDeadLetterProcessor> _logger;

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var dlqReceiver = _client.CreateReceiver(
            "order-processing",
            new ServiceBusReceiverOptions
            {
                SubQueue = SubQueue.DeadLetter,
                ReceiveMode = ServiceBusReceiveMode.PeekLock
            });

        while (!stoppingToken.IsCancellationRequested)
        {
            var messages = await dlqReceiver.ReceiveMessagesAsync(
                maxMessages: 10,
                maxWaitTime: TimeSpan.FromSeconds(30),
                cancellationToken: stoppingToken);

            foreach (var message in messages)
            {
                var deadLetterReason = message.DeadLetterReason;
                var errorDescription = message.DeadLetterErrorDescription;

                _logger.LogWarning(
                    "Dead letter: {Reason} - {Description} for message {Id}",
                    deadLetterReason, errorDescription, message.MessageId);

                // Analyze and route dead letters
                if (IsRetryable(deadLetterReason))
                {
                    // Re-enqueue with exponential backoff
                    var retryCount = message.ApplicationProperties.ContainsKey("RetryCount")
                        ? (int)message.ApplicationProperties["RetryCount"] + 1 : 1;

                    if (retryCount <= 5)
                    {
                        var sender = _client.CreateSender("order-processing");
                        var retryMessage = new ServiceBusMessage(message.Body)
                        {
                            MessageId = message.MessageId,
                            ScheduledEnqueueTime = DateTimeOffset.UtcNow.AddSeconds(Math.Pow(2, retryCount) * 10)
                        };
                        retryMessage.ApplicationProperties["RetryCount"] = retryCount;
                        await sender.SendMessageAsync(retryMessage, stoppingToken);
                    }
                    else
                    {
                        await ParkMessageAsync(message);
                    }
                }
                else
                {
                    await ParkMessageAsync(message);
                }

                await dlqReceiver.CompleteAsync(message, stoppingToken);
            }
        }
    }
}

Phase 5: GitOps with Flux CD

Repository Structure and Flux Configuration

# flux-system/gotk-sync.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: platform-config
  namespace: flux-system
spec:
  interval: 1m
  url: https://dev.azure.com/org/project/_git/platform-config
  ref:
    branch: main
  secretRef:
    name: git-credentials
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: infrastructure
  namespace: flux-system
spec:
  interval: 5m
  path: ./infrastructure
  prune: true
  sourceRef:
    kind: GitRepository
    name: platform-config
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: dapr-operator
      namespace: dapr-system
  dependsOn: []
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: applications
  namespace: flux-system
spec:
  interval: 5m
  path: ./applications/production
  prune: true
  sourceRef:
    kind: GitRepository
    name: platform-config
  dependsOn:
    - name: infrastructure
  postBuild:
    substitute:
      ENVIRONMENT: production
      CLUSTER_NAME: aks-cloudnative-prod

Phase 6: Observability Stack

OpenTelemetry Configuration

# otel-collector/config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitoring
data:
  otel-collector-config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      prometheus:
        config:
          scrape_configs:
            - job_name: 'dapr'
              scrape_interval: 15s
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels: [__meta_kubernetes_pod_annotation_dapr_io_metrics_port]
                  action: keep
                  regex: '.+'
    
    processors:
      batch:
        timeout: 10s
        send_batch_size: 1024
      memory_limiter:
        limit_mib: 512
        spike_limit_mib: 128
        check_interval: 5s
      resource:
        attributes:
          - key: environment
            value: production
            action: upsert
          - key: cloud.provider
            value: azure
            action: upsert
    
    exporters:
      azuremonitor:
        connection_string: ${APPLICATIONINSIGHTS_CONNECTION_STRING}
      prometheus:
        endpoint: 0.0.0.0:8889
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, resource]
          exporters: [azuremonitor]
        metrics:
          receivers: [otlp, prometheus]
          processors: [memory_limiter, batch]
          exporters: [azuremonitor, prometheus]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [azuremonitor]

Platform Decision Matrix

Concern Technology Why
Container Orchestration AKS Managed Kubernetes, Azure-native
Service Communication Dapr Infrastructure abstraction, mTLS, pub/sub
Autoscaling KEDA Event-driven, scales to zero, multi-trigger
Messaging Azure Service Bus Enterprise reliability, dead letter, sessions
Deployment Flux CD GitOps, drift detection, multi-tenancy
Observability OpenTelemetry + Azure Monitor Vendor-neutral, distributed tracing
Secrets Key Vault + CSI Driver Auto-rotation, managed identity
Networking Calico Network policies, microsegmentation

Best Practices

  1. Use Dapr for service-to-service calls: Automatic retries, mTLS, and observability without code changes
  2. KEDA over HPA for event-driven workloads: Scale based on queue depth, not just CPU/memory
  3. GitOps is non-negotiable: Never kubectl apply in production — always go through Git
  4. Separate system and application node pools: Prevent application workloads from starving system components
  5. Implement circuit breakers: Use Dapr's built-in retry and circuit breaker policies
  6. Dead letter queues from day one: Every Service Bus subscription should have DLQ processing
  7. Use workload identity over service principals: Pod-level managed identity is more secure

Troubleshooting

Issue Root Cause Resolution
Dapr sidecar not injecting Missing annotation or namespace label Add dapr.io/enabled: "true" annotation
KEDA not scaling Trigger authentication failure Verify TriggerAuthentication secret references
Service Bus message loss Message not completed before lock expires Increase lock duration, add renewal logic
Flux drift detected Manual change outside GitOps Revert manual change, update Git source
High latency between services Missing Dapr service invocation Use Dapr service invocation, not direct HTTP

Architecture Decision and Tradeoffs

When designing integrated solutions solutions with Azure + Power Platform, consider these key architectural trade-offs:

Approach Best For Tradeoff
Managed / platform service Rapid delivery, reduced ops burden Less customisation, potential vendor lock-in
Custom / self-hosted Full control, advanced tuning Higher operational overhead and cost

Recommendation: Start with the managed approach for most workloads and move to custom only when specific requirements demand it.

Validation and Versioning

  • Last validated: April 2026
  • Validate examples against your tenant, region, and SKU constraints before production rollout.
  • Keep module, CLI, and SDK versions pinned in automation pipelines and review quarterly.

Security and Governance Considerations

  • Apply least-privilege access using RBAC roles and just-in-time elevation for admin tasks.
  • Store secrets in managed secret stores and avoid embedding credentials in scripts or source files.
  • Enable audit logging, data protection policies, and periodic access reviews for regulated workloads.

Cost and Performance Notes

  • Define budgets and alerts, then monitor usage and cost trends continuously after go-live.
  • Baseline performance with synthetic and real-user checks before and after major changes.
  • Scale resources with measured thresholds and revisit sizing after usage pattern changes.

Official Microsoft References

  • https://learn.microsoft.com/azure/architecture/
  • https://learn.microsoft.com/azure/well-architected/
  • https://learn.microsoft.com/power-platform/guidance/

Public Examples from Official Sources

  • These examples are sourced from official public Microsoft documentation and sample repositories.
  • Documentation examples: https://learn.microsoft.com/azure/well-architected/
  • Sample repositories: https://github.com/Azure/ArchitectureCenter
  • Prefer adapting these examples to your tenant, subscriptions, and governance requirements before production use.

Key Takeaways

  • A cloud-native platform is more than Kubernetes — it requires service mesh, event-driven scaling, reliable messaging, and GitOps
  • Dapr eliminates boilerplate distributed systems code while remaining runtime-agnostic
  • KEDA enables true event-driven architectures that scale to zero during idle periods
  • Azure Service Bus provides enterprise-grade messaging with dead letter handling and sessions
  • GitOps with Flux ensures every deployment is auditable, reversible, and repeatable

Further Reading

Discussion