Error Handling and Retry Patterns: Resilience
Prerequisites
| Requirement | Details |
|---|---|
| Basic setup and tooling | Basic setup and tooling |
Figure: Flow architecture diagram for error handling and retry patterns—trigger configuration, action sequences, branching logic, and error handling patterns.
Figure: Integration pattern showing error handling and retry patterns—connector configuration, authentication setup, data transformation, and retry policies.
Figure: Enterprise governance model for error handling and retry patterns—DLP policies, environment isolation, audit logging, and compliance controls.
if status in [429,503] -> transient-retry elif status in [400,404] -> permanent-fail elif status == 409 -> concurrency-retry-limited elif status == 401 or 403 -> security-terminate else if duration > threshold -> latency-monitor
## 3. Scope Orchestration & Try/Catch Pattern
Recommended scopes:
1. Validation (input schema & guard conditions)
2. CoreBusiness (primary transaction logic)
3. ExternalIntegrations (API / connector calls)
4. Compensation (reverse partial side effects)
5. ErrorHandler (central logging, classification, alert routing)
ErrorHandler Run After: failed, timed out, or skipped. Non‑critical actions (notifications, metrics) may still execute for forensic completeness.
Standardized Error Object Compose:
```json
{
"runId": "@{workflow().run.name}",
"flowName": "@{workflow().name}",
"utcTimestamp": "@{utcNow()}",
"statusCode": "@{coalesce(actions('CoreBusiness').error.code,'0')}",
"errorType": "@{coalesce(actions('CoreBusiness').error.type,'unknown')}",
"errorMessage": "@{coalesce(actions('CoreBusiness').error.message,'n/a')}",
"retryAttempt": "@{int(coalesce(actions('CoreBusiness').retryCount,0))}"
}
> **Architecture Overview:** Persist to Dataverse or log analytics; attach correlation header if present.
@{coalesce(triggerOutputs()?['headers']['x-correlation-id'], guid())}
If upstream system did not supply an x-correlation-id header, generate a GUID. Persist this value in every log row and external call header (Custom Connector policy injects it automatically).
20.2 Circuit Breaker State Variables
Environment variables (recommended) or solution-level variables:
CircuitFailureThreshold= 3CircuitCooldownMinutes= 15CircuitStateTable(Dataverse table logical name)
Retrieve current state:
Get a row by alternate key (flowName) → circuitOpenedAt, consecutiveFailures
Update logic (pseudo):
if(consecutiveFailures >= CircuitFailureThreshold) {
if(utcNow() < addMinutes(circuitOpenedAt, CircuitCooldownMinutes)) {
```text
status = 'CircuitOpen'; terminate;```
}
}
20.3 Dataverse Logging Table Schema (ResilienceLog)
| Column | Type | Notes |
|---|---|---|
| runId | Text | Alternate key candidate |
| flowName | Text | Index for analytics |
| actionName | Text | Failing or key action |
| errorType | Choice | Matches taxonomy enumeration |
| statusCode | Whole Number | HTTP or internal code |
| errorMessage | Text (max 4k) | Sanitized |
| correlationId | Text | Global trace |
| retryAttempt | Whole Number | 0 if initial |
| durationMs | Whole Number | Action execution duration |
| timestampUtc | DateTime | Logged moment |
| environmentName | Text | For multi-env rollups |
| flowVersion | Text | Semantic version (e.g., 2.4.1) |
| triggerType | Choice | Recurrence / HTTP / Dataverse |
| tenantId | Text | Optional multi-tenant scenarios |
| compensationApplied | Two Options | Yes / No |
| circuitState | Choice | Closed / Open / Cooldown |
| schemaVersion | Whole Number | Logging contract evolution |
20.4 Log Analytics HTTP Data Collector Payload
{
"records": [
{
"TimeGenerated": "@{utcNow()}",
"runId": "@{workflow().run.name}",
"flowName": "@{workflow().name}",
"correlationId": "@{variables('correlationId')}",
"errorType": "@{variables('errorType')}",
"statusCode": "@{variables('statusCode')}",
"retryAttempt": "@{variables('retryAttempt')}",
"durationMs": "@{variables('durationMs')}",
"circuitState": "@{variables('circuitState')}",
"environmentName": "@{variables('environmentName')}",
"flowVersion": "@{variables('flowVersion')}"
}
]
}
20.5 Compensation Loop Pseudo Implementation
Architecture Overview: Initialize Array compensationSteps []
Switch type → invoke corresponding reversal connector Log success/failure If failure → push to CompensationDeadLetter
```text
### 20.6 Idempotent Create Pattern
```text
GET record by ExternalId
if(found) { skip create; log outcome='Skipped'; }
else { POST create; log outcome='Created'; }
Include IdempotencyAudit insert with fields: key, outcome, elapsedMs.
21. Dynamic Threshold Automation
Static limits become stale. Implement adaptive thresholds derived from rolling averages:
avgDurationLast24h = float(variables('totalDurationWindow')) / float(variables('countWindow'))
dynamicTimeout = mul(avgDurationLast24h, 2.5)
If currentDuration > dynamicTimeout classify as latency anomaly and emit proactive warning before user impact escalates.
Transient retry tuning example:
baseDelaySeconds = 10
jitter = rand(0,4)
attemptDelay = pow(2, attemptNumber) * baseDelaySeconds + jitter
Cap attemptDelay at 300 seconds to prevent extreme waits.
22. Advanced Telemetry Queries (Kusto)
Dead Letter Backlog Age P95:
ResilienceLog
| where deadLetter == true
| summarize backlogAgeP95 = percentile(datetime_diff('minute', timestampUtc, now()), 95)
Circuit Opens Trend:
ResilienceLog
| where circuitState == 'Open'
| summarize opens=count() by bin(timestampUtc, 1d)
Retry Success Rate:
ResilienceLog
| where retryAttempt > 0
| summarize successes = countif(errorType == 'None'), total = count()
| extend rate = successes * 100.0 / total
Compensation Failure Ratio:
ResilienceLog
| where compensationApplied == 'Yes'
| summarize failures = countif(errorType != 'None'), total = count()
| extend failureRate = failures * 100.0 / total
23. Resilience Architecture (Text Diagram)
Figure: Program.cs – service registration with IntelliSense for DI lifetimes.
Architecture Overview: [Trigger]
→ CoreBusiness Scope → ExternalIntegrations Scope (retries, idempotency) ↘ ErrorHandler Scope (logging, classification, alerting) → Compensation Scope (reverse actions cascade)``` → Circuit State Check → Short-Circuit Terminate (if open) → Dead Letter Remediation Flow (scheduled)
## 24. Governance & Operational Cadence
| Cadence | Activity | Owner | Artifact |
|---------|----------|-------|---------|
| Daily | Review critical failures & dead letters | Operations | Dashboard snapshot |
| Weekly | Resilience KPI review & tuning | Engineering Lead | KPI report |
| Monthly | Pattern adoption audit | Architecture | Adoption matrix |
| Quarterly | Threshold recalibration & taxonomy updates | Architecture + Ops | Resilience baseline doc |
| Annual | ML predictive pilot evaluation | Data Science | Experiment summary |
## 25. Pattern Catalog Quick Reference
| Pattern | Purpose | Trigger | Guard Rails |
|---------|---------|--------|------------|
| Retry (Exponential) | Handle transient throttle | 429/503 | Max total wait <5m |
| Circuit Breaker | Prevent cascade failures | Repeated outage | Cooldown enforced |
| Idempotency Key | Avoid duplicate create | Timeout & retry scenario | Unique alternate key |
| Compensation | Reverse partial side effects | Downstream failure after commits | Log reversal attempt |
| Dead Letter | Isolate unrecoverable payload | Permanent 4xx / logic error | Purge schedule set |
| Structured Log | Enable analytics | All failures & key actions | No secrets stored |
| Dynamic Threshold | Adapt to performance drift | Sustained latency increases | Min/max bounds |
## 26. Sample Environment Variable Set
| Name | Example Value | Description |
|------|---------------|-------------|
| CircuitFailureThreshold | 3 | Failures before open |
| CircuitCooldownMinutes | 15 | Cooldown duration |
| MaxRetryAttempts | 4 | Global safety cap |
| BaseRetryDelaySeconds | 10 | Exponential seed |
| DeadLetterRetentionDays | 60 | Purge policy |
| WarningRetryCount | 3 | Alert threshold |
| MaxLatencyMs | 5000 | Static latency ceiling |
| LogSchemaVersion | 2 | Evolves contract |
| CompensationEnabled | true | Toggle reversal behavior |
| DynamicTimeoutMultiplier | 2.5 | Scales rolling mean |
## 27. ML / Predictive Roadmap
Phase 1: Baseline metrics (already implemented).
Phase 2: Anomaly detection (Kusto query detects deviation >3σ from mean latency).
Phase 3: Predictive throttle forecasting (model uses time-series of 429 counts to pre-emptively reduce concurrency).
Phase 4: Automated policy recalibration (function updates environment variables based on sustained trends).
Phase 5: Self-healing orchestration (flows re-route to alternate connector or cached data source during predicted outages).
## 28. FAQ
| Question | Answer |
|----------|--------|
| Why not retry 400 errors? | They signal client/data issues; retry wastes cost. |
| Should all flows implement circuit breaker? | Only high-volume or business-critical; simple sporadic flows can skip. |
| How many log fields are too many? | Favor essential analytics; keep schema lean (<20 columns) to control storage. |
| When to archive dead letters? | After remediation + age threshold (e.g., 30 days) to cheaper storage. |
| Difference between idempotency and deduplication? | Idempotency prevents duplicate side effects; deduplication filters identical payloads. |
| Can compensation replace formal rollback? | No; it's best-effort reversal for distributed operations lacking atomic transactions. |
| What if compensation causes new errors? | Log separately, escalate if pattern emerges, refine reversal logic. |
| How to test dynamic thresholds? | Replay historical logs into test harness, verify classification boundaries. |
## Architecture Decision and Tradeoffs
When designing process automation solutions with Power Automate, consider these key architectural trade-offs:
| Approach | Best For | Tradeoff |
|----------|----------|----------|
| Managed / platform service | Rapid delivery, reduced ops burden | Less customisation, potential vendor lock-in |
| Custom / self-hosted | Full control, advanced tuning | Higher operational overhead and cost |
> **Recommendation:** Start with the managed approach for most workloads and move to custom only when specific requirements demand it.
## Security and Governance Considerations
- **Least Privilege:** Grant only the permissions required for each role
- **Secret Management:** Store credentials in Azure Key Vault or equivalent; never hard-code secrets
- **Audit Logging:** Enable diagnostic and activity logs for compliance and forensic analysis
- **Data Protection:** Encrypt data at rest and in transit; classify data with sensitivity labels where applicable
## Cost and Performance Notes
- **Primary Cost Drivers:** Compute tier, storage volume, and network egress
- **Optimization Levers:** Right-size resources, use reserved instances or savings plans, and review Azure Advisor recommendations regularly
- **Performance Baseline:** Define SLAs, latency targets, and throughput thresholds before going live
- **Scaling Strategy:** Use auto-scale rules and monitor utilisation to balance cost and responsiveness
## Validation and Versioning
- **Last Validated:** April 2026
- **Tested With:** Current generally-available Power Automate APIs and SDKs
- **Known Constraints:** Check regional availability and service limits before production deployment
## Official Microsoft References
- [Microsoft Learn – Power Automate](https://learn.microsoft.com)
- [Power Automate Documentation](https://learn.microsoft.com)
- [Azure Architecture Center](https://learn.microsoft.com/azure/architecture/)
## Public Examples from Official Sources
- [Microsoft official samples on GitHub](https://github.com/Azure-Samples)
- [Microsoft Learn training modules](https://learn.microsoft.com/training/)
Discussion