Home / Power BI / Dataflows and Shared Datasets: Architecture
Power BI

Dataflows and Shared Datasets: Architecture

Master enterprise-scale reusable semantic architecture with dataflows and shared datasets: layered design patterns, refresh orchestration, incremental refres...

What you will learn

Practical execution with concise explanations, real implementation patterns, and production-ready recommendations.

Orders_Raw = Source{[Schema="dbo",Item="Orders"]}[Data],

// Type conversion only (no business logic) TypedColumns = Table.TransformColumnTypes(Orders_Raw, { {"OrderID", Int64.Type}, {"OrderDate", type datetime}, {"CustomerID", Int64.Type}, {"Amount", type number}, {"Status", type text} }),

// Add metadata for lineage tracking AddedMetadata = Table.AddColumn(TypedColumns, "DataflowRefreshDate", each DateTime.LocalNow(), type datetime)``` in

AddedMetadata

### Implementation: Standardization Layer

```powerquery
// DF_Standardized_Customers
// References DF_Ingestion_Customers
// Cleanses, deduplicates, standardizes naming

let

> **Architecture Overview:** Reference ingestion dataflow (linked entity)

FinalColumns

### Implementation: Business Layer

```powerquery
// DF_Business_SalesMetrics
// References DF_Standardized_Orders and DF_Standardized_Customers
// Enriches with business calculations

let
```python
// Reference standardized dataflows
Orders = PowerBI.Dataflows(...){[entity="Dim_Orders"]}[Data],
Customers = PowerBI.Dataflows(...){[entity="Dim_Customer"]}[Data],

// Join orders with customer tier (for discount calculation)
MergedTier = Table.NestedJoin(
    Orders, {"CustomerID"},
    Customers, {"CustomerID"},
    "CustomerInfo", JoinKind.Inner
),

ExpandedTier = Table.ExpandTableColumn(MergedTier, "CustomerInfo", 
    {"CustomerTier", "Region"}, {"CustomerTier", "Region"}),

// Business Logic: Calculate discounted amount
AddDiscountRate = Table.AddColumn(ExpandedTier, "DiscountRate", 
    each if [CustomerTier] = "Gold" then 0.15
         else if [CustomerTier] = "Silver" then 0.10
         else if [CustomerTier] = "Bronze" then 0.05
         else 0.00,
    type number),

AddDiscountedAmount = Table.AddColumn(AddDiscountRate, "DiscountedAmount",
    each [Amount] * (1 - [DiscountRate]), type number),

// Calculate margin (requires COGS from product table)
AddMargin = Table.AddColumn(AddDiscountedAmount, "Margin",
    each [DiscountedAmount] - [COGS], type number),

AddMarginPercent = Table.AddColumn(AddMargin, "MarginPercent",
    each if [DiscountedAmount] <> 0 
         then [Margin] / [DiscountedAmount] 
         else null, type number)```
in
```text
AddMarginPercent

## Computed Entities vs Linked Entities

### Linked Entities (References)





```powerquery
// Linked Entity: References another dataflow entity without copying data
// Storage: Points to source entity (no data duplication)
// Refresh: Doesn't require re-querying source system

let
```text
// Links to DF_Ingestion_Orders
Source = PowerBI.Dataflows(null),
Workspace = Source{[workspaceId="workspace-guid"]}[Data],
Dataflow = Workspace{[dataflowId="ingestion-dataflow-guid"]}[Data],
Orders = Dataflow{[entity="Orders_Raw",version=""]}[Data]```
in
```text
Orders  // This is a LINKED entity (no data copied)

**Linked Entity Characteristics**:

- ✅ No data duplication (references source entity)
- ✅ Faster refresh (no data transfer)
- ✅ Lower storage costs
- ✅ Always up-to-date with source entity
- ❌ Requires source dataflow to be available
- ❌ Adds dependency between dataflows


### Computed Entities

```powerquery
// Computed Entity: Stores transformed data independently
// Storage: Materialized in Azure Data Lake (separate from source)
// Refresh: Queries source and stores result

let
```text
// References source entity
Source = PowerBI.Dataflows(...){[entity="Orders_Raw"]}[Data],

// Perform transformations (these materialize the computed entity)
Filtered = Table.SelectRows(Source, each [OrderDate] >= #date(2024, 1, 1)),
Aggregated = Table.Group(Filtered, {"CustomerID"}, {
    {"TotalAmount", each List.Sum([Amount]), type number},
    {"OrderCount", each Table.RowCount(_), Int64.Type}
})```
in
```text
Aggregated  // This is STORED as a computed entity

**Computed Entity Characteristics**:

- ✅ Materialized (stored independently)
- ✅ Decouples from source dataflow
- ✅ Supports Enhanced Compute Engine for complex transformations
- ✅ Can be incrementally refreshed
- ❌ Duplicates data (higher storage cost)
- ❌ Requires refresh to sync with source


**When to Use Each**:

| Scenario | Use Linked Entity | Use Computed Entity |
|----------|-------------------|---------------------|
| Simple pass-through | ✅ Yes | ❌ No |
| Complex aggregations/joins | ❌ No | ✅ Yes |
| Minimize storage | ✅ Yes | ❌ No |
| Incremental refresh needed | ❌ No | ✅ Yes |
| Decouple from source | ❌ No | ✅ Yes |
| Enhanced Compute Engine | ❌ No | ✅ Yes |

## Incremental Refresh Configuration

### Dataflow Incremental Refresh Setup





```powerquery
// Step 1: Create RangeStart and RangeEnd parameters (case-sensitive!)
// In Power BI Service: Create these in dataflow settings

RangeStart = #datetime(2020, 1, 1, 0, 0, 0) meta [IsParameterQuery=true, Type="DateTime"]
RangeEnd = #datetime(2025, 12, 31, 23, 59, 59) meta [IsParameterQuery=true, Type="DateTime"]

// Step 2: Filter table using parameters
let
```text
Source = Sql.Database("server", "DB"),
Orders = Source{[Schema="dbo",Item="Orders"]}[Data],

// Filter by date range (must use parameter names exactly as "RangeStart" and "RangeEnd")
FilteredRows = Table.SelectRows(Orders, 
    each [OrderDate] >= RangeStart and [OrderDate] < RangeEnd)```
in
```text
FilteredRows

**Incremental Refresh Policy Configuration** (Power BI Service):


> **Architecture Overview:** Dataflow Settings → Incremental Refresh:


### Dataset Incremental Refresh from Dataflow

```powerquery
// Dataset consuming dataflow with incremental refresh

// Step 1: Define RangeStart/RangeEnd parameters in dataset
RangeStart = #datetime(2024, 1, 1, 0, 0, 0) meta [IsParameterQuery=true, Type="DateTime"]
RangeEnd = #datetime(2025, 12, 31, 23, 59, 59) meta [IsParameterQuery=true, Type="DateTime"]

// Step 2: Connect to dataflow
let
```csharp
Source = PowerBI.Dataflows(null),
Workspace = Source{[workspaceId="workspace-guid"]}[Data],
Dataflow = Workspace{[dataflowId="dataflow-guid"]}[Data],
Orders = Dataflow{[entity="Fact_Sales",version=""]}[Data],

// Apply filter using parameters (enables incremental refresh)
FilteredRows = Table.SelectRows(Orders,
    each [OrderDate] >= RangeStart and [OrderDate] < RangeEnd)```
in
```text
FilteredRows

**Result**: Dataset only refreshes recent data, while dataflow handles full history.

### Incremental Refresh Best Practices

```text
☑ Design Considerations:
  ☐ Use consistent date column (OrderDate, TransactionDate)
  ☐ Ensure date column is indexed in source database
  ☐ Rolling window: 7-30 days (daily refresh needs)
  ☐ Historical window: 2-5 years (long-term storage)
  ☐ Test with "Refresh full data" option before enabling

☑ Performance Optimization:
  ☐ Partition large tables (> 10M rows)
  ☐ Use "Detect data changes" to skip unchanged partitions
  ☐ Enable "Only refresh complete days" to avoid partial data
  ☐ Schedule refreshes during off-peak hours

☑ Monitoring:
  ☐ Track refresh duration (should decrease with incremental)
  ☐ Monitor partition creation (should align with policy)
  ☐ Alert on full refresh (indicates policy issue)
  ☐ Validate historical data integrity regularly

Shared Dataset Architecture

Creating Enterprise Shared Datasets

Diagram: See the official Microsoft documentation for architecture details.

Dataset Certification Process

# Power BI REST API: Certify dataset

$headers = @{
```text
"Authorization" = "Bearer $accessToken"
"Content-Type" = "application/json"```
}

$certificationBody = @{
```text
"endorsementDetails" = @{
    "endorsement" = "Certified"  # Promoted, Certified


    "certifiedBy" = "Data Governance Team"
}```
} | ConvertTo-Json

$datasetId = "dataset-guid"

Invoke-RestMethod -Uri "https://api.powerbi.com/v1.0/myorg/datasets/$datasetId/endorsement" `
```text
-Method Post `
-Headers $headers `
-Body $certificationBody

Write-Host "✅ Dataset certified successfully"


**Certification Criteria Checklist**:

```text
☑ Data Quality:
  ☐ Data validated against source systems
  ☐ Refresh schedule tested and stable
  ☐ Data quality checks implemented (row counts, null checks)
  ☐ Historical data integrity validated

☑ Semantic Model:
  ☐ Relationships correctly defined (star schema preferred)
  ☐ DAX measures tested and optimized
  ☐ Measure naming convention followed (no spaces, PascalCase)
  ☐ Calculation groups used for time intelligence (if applicable)

☑ Security:
  ☐ Row-level security implemented and tested
  ☐ Object-level security applied (if sensitive columns exist)
  ☐ Build permission granted to report creators only
  ☐ Viewer permission for end users

☑ Documentation:
  ☐ Dataset description updated
  ☐ Measure descriptions added
  ☐ Source dataflow dependencies documented
  ☐ Refresh schedule and SLA documented

☑ Governance:
  ☐ Dataset owner assigned
  ☐ Support contact documented
  ☐ Change management process defined
  ☐ Version control implemented (export .pbix to Git)

Refresh Orchestration

Dependency Management

Diagram: See the official Microsoft documentation for architecture details.

PowerShell Orchestration Script

## Orchestrate dataflow and dataset refreshes with dependency checking

function Invoke-DataflowRefresh {
```powershell
param(
    [string]$WorkspaceId,
    [string]$DataflowId,
    [string]$DataflowName
)





Write-Host "🔄 Starting refresh: $DataflowName" -ForegroundColor Cyan

$headers = @{
    "Authorization" = "Bearer $accessToken"
}

## Trigger refresh
$refreshResponse = Invoke-RestMethod `
    -Uri "https://api.powerbi.com/v1.0/myorg/groups/$WorkspaceId/dataflows/$DataflowId/refreshes" `
    -Method Post `
    -Headers $headers





## Poll refresh status
do {
    Start-Sleep -Seconds 30
    $status = Invoke-RestMethod `
        -Uri "https://api.powerbi.com/v1.0/myorg/groups/$WorkspaceId/dataflows/$DataflowId/refreshes?`$top=1" `
        -Method Get `
        -Headers $headers




    
    $currentStatus = $status.value[0].status
    Write-Host "  Status: $currentStatus"
    
} while ($currentStatus -eq "InProgress")

if ($currentStatus -eq "Completed") {
    Write-Host "✅ Refresh completed: $DataflowName" -ForegroundColor Green
    return $true
} else {
    Write-Host "❌ Refresh failed: $DataflowName - $($status.value[0].error.message)" -ForegroundColor Red
    return $false
}```
}

## Orchestration workflow
$workspaceId = "workspace-guid"





## Layer 1: Ingestion (parallel)
$layer1Success = @(
```text
Invoke-DataflowRefresh -WorkspaceId $workspaceId -DataflowId "ingestion-orders-guid" -DataflowName "DF_Ingestion_Orders",
Invoke-DataflowRefresh -WorkspaceId $workspaceId -DataflowId "ingestion-customers-guid" -DataflowName "DF_Ingestion_Customers"```
) -notcontains $false





if (-not $layer1Success) {
```text
Write-Host "❌ Layer 1 refresh failed. Aborting." -ForegroundColor Red
exit 1```
}

## Layer 2: Standardization (depends on Layer 1)
$layer2Success = Invoke-DataflowRefresh -WorkspaceId $workspaceId `
```text
-DataflowId "standardized-orders-guid" `
-DataflowName "DF_Standardized_Orders"

if (-not $layer2Success) {

Write-Host "❌ Layer 2 refresh failed. Aborting." -ForegroundColor Red
exit 1```
}





## Layer 3: Business (depends on Layer 2)
$layer3Success = Invoke-DataflowRefresh -WorkspaceId $workspaceId `
```text
-DataflowId "business-sales-guid" `
-DataflowName "DF_Business_SalesMetrics"

if (-not $layer3Success) {

Write-Host "❌ Layer 3 refresh failed. Aborting." -ForegroundColor Red
exit 1```
}





## Layer 4: Dataset refresh (depends on Layer 3)
Write-Host "🔄 Refreshing dataset: Sales_Certified" -ForegroundColor Cyan
Invoke-PowerBIRestMethod -Url "groups/$workspaceId/datasets/$datasetId/refreshes" -Method Post





Write-Host "✅ Full refresh orchestration completed successfully" -ForegroundColor Green

Cost Optimization

Premium Capacity Cost Analysis

Dataflow Storage Costs (Premium):

Factors:
- Storage: Azure Data Lake Storage Gen2 (pay per GB)
- Compute: Premium capacity CU consumption
- Refresh frequency: More refreshes = higher CU usage

Cost Optimization Strategies:

1. Minimize Computed Entities
   ❌ 10 computed entities with duplicated aggregations
   ✅ 1 computed entity with comprehensive aggregations
   Savings: ~40% storage, ~30% refresh time

2. Incremental Refresh
   ❌ Full refresh of 100M row table daily
   ✅ Incremental refresh (7-day window)
   Savings: ~90% refresh time, ~60% CU usage

3. Linked Entities Where Possible
   ❌ Copy data across 5 dataflows (5x storage)
   ✅ Use linked entities (1x storage)
   Savings: ~80% storage cost

4. Archive Old Data
   ❌ Keep 10 years of detailed transactions
   ✅ Aggregate to monthly summaries after 2 years
   Savings: ~70% storage for historical data

5. Optimize Refresh Schedule
   ❌ Hourly refresh for rarely changing dimensions
   ✅ Daily refresh for dimensions, hourly for facts
   Savings: ~50% CU usage

PowerShell Cost Analysis Script

## Analyze dataflow storage and refresh costs

function Get-DataflowCostAnalysis {
```powershell
param([string]$WorkspaceId)





$dataflows = Invoke-PowerBIRestMethod -Url "groups/$WorkspaceId/dataflows" -Method Get | ConvertFrom-Json

$analysis = @()

foreach ($df in $dataflows.value) {
    $refreshes = Invoke-PowerBIRestMethod -Url "groups/$WorkspaceId/dataflows/$($df.objectId)/refreshes?`$top=30" -Method Get | ConvertFrom-Json
    
    $avgRefreshDuration = ($refreshes.value | Measure-Object -Property durationInMs -Average).Average / 1000 / 60  # minutes
    $refreshesPerMonth = $refreshes.value.Count * (30 / 7)  # Extrapolate to monthly
    
    $analysis += [PSCustomObject]@{
        DataflowName = $df.name
        EntityCount = $df.entities.Count
        AvgRefreshMinutes = [math]::Round($avgRefreshDuration, 2)
        EstimatedMonthlyRefreshHours = [math]::Round(($refreshesPerMonth * $avgRefreshDuration) / 60, 2)
        RefreshFrequency = if ($refreshes.value.Count -gt 0) { "$($refreshes.value.Count) in last 7 days" } else { "Unknown" }
        EstimatedMonthlyCost = [math]::Round(($refreshesPerMonth * $avgRefreshDuration / 60) * 0.20, 2)  # $0.20 per CU-hour estimate
    }
}

$analysis | Format-Table -AutoSize

$totalMonthlyCost = ($analysis | Measure-Object -Property EstimatedMonthlyCost -Sum).Sum
Write-Host "`n💰 Estimated Monthly Cost: `$$totalMonthlyCost" -ForegroundColor Yellow```
}

Get-DataflowCostAnalysis -WorkspaceId "workspace-guid"

Migration from Monolithic Datasets

Migration Strategy

Phase 1: Assessment (Week 1-2)
☐ Inventory all existing datasets
☐ Identify duplicated Power Query logic
☐ Document current refresh schedules
☐ Analyze dataset dependencies (which reports use which datasets)
☐ Prioritize datasets for migration (start with most duplicated)

Phase 2: Design Dataflow Architecture (Week 3-4)
☐ Design layered dataflow structure (Ingestion/Standardization/Business)
☐ Define naming conventions
☐ Plan incremental refresh policies
☐ Design shared datasets (one per subject area)
☐ Document dependencies and refresh order

Phase 3: Build Dataflows (Week 5-8)
☐ Create Ingestion dataflows (extract raw data)
☐ Create Standardization dataflows (cleanse, conform)
☐ Create Business dataflows (metrics, aggregations)
☐ Test refresh times and validate data accuracy
☐ Implement incremental refresh where applicable

Phase 4: Build Shared Datasets (Week 9-10)
☐ Create shared datasets consuming business dataflows
☐ Define relationships and DAX measures
☐ Implement RLS
☐ Test with sample reports
☐ Certify datasets

Phase 5: Migrate Reports (Week 11-14)
☐ Identify reports using old datasets
☐ Rebind reports to new shared datasets
☐ Remove duplicate visuals/calculations
☐ Test report functionality
☐ Deploy to production workspace
☐ Deprecate old datasets (after validation period)

Phase 6: Governance & Monitoring (Ongoing)
☐ Monitor refresh success rates
☐ Track cost savings (refresh time, storage)
☐ Collect user feedback
☐ Iterate and optimize

Migration Script Example

// Old Dataset (Monolithic):
// All logic in single dataset

let
```text
Source = Sql.Database("server", "DB"),
Orders = Source{[Schema="dbo",Item="Orders"]}[Data],

// All transformations in one place
Filtered = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1)),
Typed = Table.TransformColumnTypes(Filtered, {{"Amount", type number}}),
AddedMargin = Table.AddColumn(Typed, "Margin", each [Amount] * 0.30),

// ... 50 more transformation steps ...

Final = AddedMargin```
in
```text
Final

// New Approach (Dataflow + Shared Dataset):

// DATAFLOW: DF_Business_Sales let

// Reference standardized dataflow (reusable)
Source = PowerBI.Dataflows(...){[entity="Fact_Sales"]}[Data]

// Only business-specific logic here
// (No extraction, cleansing - already done in lower layers)```
in
```text
Source

// SHARED DATASET: // Consumes DF_Business_Sales // Adds DAX measures only // Multiple reports connect to this dataset


## Troubleshooting Guide

### Issue 1: Dataflow Refresh Failures





**Symptoms**:

- Dataflow shows "Refresh failed" status
- Timeout errors
- Data inconsistency


**Diagnosis**:

```powershell
## Check recent refresh history
$workspaceId = "workspace-guid"
$dataflowId = "dataflow-guid"





$refreshHistory = Invoke-PowerBIRestMethod `
```text
-Url "groups/$workspaceId/dataflows/$dataflowId/refreshes?`$top=10" `
-Method Get | ConvertFrom-Json

$refreshHistory.value | ForEach-Object {

[PSCustomObject]@{
    StartTime = $_.startTime
    EndTime = $_.endTime
    Status = $_.status
    Duration = if ($_.endTime) { 
        (New-TimeSpan -Start $_.startTime -End $_.endTime).TotalMinutes 
    } else { "In Progress" }
    ErrorMessage = $_.error.message
}```
} | Format-Table -AutoSize

Common Causes & Resolutions:

  1. Query timeout (> 2 hours)

    • Add incremental refresh
    • Optimize Power Query (enable query folding)
    • Partition large tables
  2. Memory limits exceeded

    • Reduce entity size (split into multiple entities)
    • Use computed entities instead of linked (materializes data)
    • Upgrade Premium capacity SKU
  3. Source system unavailable

    • Check gateway connectivity
    • Verify credentials haven't expired
    • Implement retry logic in orchestration

Issue 2: Duplicate Data in Reports

Symptoms:

  • Metrics don't match between reports
  • Same query returns different results
  • "Which number is correct?" from business users

Root Causes:

  • Multiple datasets with different transformation logic
  • Lack of certified shared dataset
  • Inconsistent DAX measure definitions

Resolution:

Architecture Overview: 1. Audit Current State:

Issue 3: Slow Dataset Refresh

Symptoms:

  • Dataset refresh takes hours
  • Reports show stale data
  • Capacity overload

Diagnosis:

-- Check dataset refresh duration trend
SELECT 
```text
DatasetName,
RefreshDate,
DATEDIFF(MINUTE, StartTime, EndTime) AS DurationMinutes```
FROM DatasetRefreshLog
WHERE DatasetName = 'Sales_Certified'
  AND RefreshDate >= DATEADD(DAY, -30, GETDATE())
ORDER BY RefreshDate DESC;

Optimization Steps:

  1. Enable incremental refresh on dataset
  2. Use dataflows for heavy transformations (pre-process data)
  3. Optimize DAX measures (avoid expensive iterators)
  4. Use aggregations for large fact tables
  5. Schedule refreshes during off-peak hours

Best Practices Checklist

Best Practices Checklist

Figure: SharePoint document library – metadata columns, views, and filter panel.

☑ Architecture:
  ☐ Use layered dataflow design (Ingestion/Standardization/Business)
  ☐ Minimize computed entities (prefer linked where possible)
  ☐ Create shared datasets for enterprise metrics
  ☐ Certify datasets used by multiple teams
  ☐ Document dependencies (dataflow → dataset → report lineage)





☑ Performance:
  ☐ Implement incremental refresh for large tables (> 1M rows)
  ☐ Use Enhanced Compute Engine for computed entities
  ☐ Optimize Power Query (enable query folding)
  ☐ Schedule refreshes to respect dependencies
  ☐ Monitor refresh duration trends

☑ Governance:
  ☐ Naming convention: DF_<Layer>_<Domain>_<Purpose>
  ☐ Assign dataflow owners
  ☐ Document refresh SLAs
  ☐ Version control dataflow definitions (export JSON)
  ☐ Regular access reviews (quarterly)
  ☐ Implement change management process

☑ Cost Optimization:
  ☐ Use incremental refresh (reduce CU usage)
  ☐ Linked entities over computed (reduce storage)
  ☐ Archive old data (aggregate to summaries)
  ☐ Optimize refresh schedules (avoid over-refreshing)
  ☐ Monitor Premium capacity utilization

☑ Security:
  ☐ Apply RLS at shared dataset level
  ☐ Use service principals for gateway connections
  ☐ Implement least-privilege access
  ☐ Audit data access logs

Architecture Decision and Tradeoffs

When designing business intelligence solutions with Power BI, consider these key architectural trade-offs:

Approach Best For Tradeoff
Managed / platform service Rapid delivery, reduced ops burden Less customisation, potential vendor lock-in
Custom / self-hosted Full control, advanced tuning Higher operational overhead and cost

Recommendation: Start with the managed approach for most workloads and move to custom only when specific requirements demand it.

Validation and Versioning

  • Last validated: April 2026
  • Validate examples against your tenant, region, and SKU constraints before production rollout.
  • Keep module, CLI, and SDK versions pinned in automation pipelines and review quarterly.

Security and Governance Considerations

  • Apply least-privilege access using RBAC roles and just-in-time elevation for admin tasks.
  • Store secrets in managed secret stores and avoid embedding credentials in scripts or source files.
  • Enable audit logging, data protection policies, and periodic access reviews for regulated workloads.

Cost and Performance Notes

  • Define budgets and alerts, then monitor usage and cost trends continuously after go-live.
  • Baseline performance with synthetic and real-user checks before and after major changes.
  • Scale resources with measured thresholds and revisit sizing after usage pattern changes.

Official Microsoft References

  • https://learn.microsoft.com/power-bi/
  • https://learn.microsoft.com/power-bi/guidance/
  • https://learn.microsoft.com/fabric/

Public Examples from Official Sources

  • These examples are sourced from official public Microsoft documentation and sample repositories.
  • Documentation examples: https://learn.microsoft.com/power-bi/
  • Sample repositories: https://github.com/microsoft/PowerBI-Developer-Samples
  • Prefer adapting these examples to your tenant, subscriptions, and governance requirements before production use.

Key Takeaways

  • Layered dataflow architecture (Ingestion/Standardization/Business) eliminates duplication and ensures consistency
  • Linked entities avoid data duplication, while computed entities enable complex transformations and incremental refresh
  • Incremental refresh is critical for large tables to reduce refresh time and capacity costs
  • Shared datasets provide single source of truth for enterprise metrics, with RLS and certification
  • Refresh orchestration must respect dependencies between dataflow layers and datasets
  • Cost optimization focuses on minimizing computed entities, enabling incremental refresh, and optimizing schedules
  • Migration from monolithic datasets requires phased approach: assess, design, build, migrate, monitor
  • Governance includes naming conventions, ownership, documentation, and change management
  • Performance monitoring tracks refresh duration, failure rates, and capacity utilization

Next Steps

  1. Assess current dataset landscape (identify duplication)
  2. Design layered dataflow architecture (Ingestion/Standardization/Business)
  3. Create first ingestion dataflow (start simple, single source system)
  4. Build standardization layer (cleanse and conform dimensions)
  5. Implement business layer (add calculated metrics)
  6. Create first shared dataset (import from business dataflow)
  7. Certify dataset and grant Build permissions
  8. Migrate first report to use shared dataset
  9. Implement refresh orchestration script
  10. Monitor refresh success rates and optimize
  11. Document architecture and governance process
  12. Scale to additional subject areas (Sales, Finance, Operations)

Additional Resources


Reusable. Governed. Scalable.

Discussion