Development Documentation
View as:

Data Sources

The Smart Data Platform ingests data from multiple source systems into a Bronze Lakehouse, where it serves as the immutable foundation for all downstream transformations. This page covers each source system, its ingestion method, and the Bronze layer's architecture.

Ingestion Architecture

graph LR
  DV[Dataverse D365] -->|Shortcut| BRZ[Bronze Lakehouse]
  AX[Dynamics AX 2012] -->|Cross-workspace Shortcut| BRZ
  SP[SharePoint Lists] -->|Shortcut| BRZ
  VS[Vesper API] -->|Azure Function| BRZ
  DC[Datacollect Excel] -->|Azure Function| BRZ
  BD[Broker Data] -->|Shortcut| BRZ
  MK[Market Data APIs] -->|Azure Function| BRZ
  MON[dbt Artifacts] -->|Post-build Script| BRZ

Data arrives in Bronze through two primary mechanisms: shortcuts (zero-copy references that incur no ingestion cost) and Azure Functions (scheduled or event-driven code that writes Parquet files to OneLake).

Source Inventory

SourceOwnerRefreshIngestion MethodSchema PrefixTables
Dynamics AX 2012IT (legacy)Static archiveCross-workspace shortcut to Bronze_AXax221 tables
Dataverse (D365)Business AppsReal-time syncDataverse shortcutdataverse_d365CRM entities
SharePointVarious departmentsNear real-timeSharePoint shortcutsharepoint_adApplication data lists
Broker Data (BD)Trading deskDailyShortcutbdMarex, Stonex futures
Vesper APIData EngineeringScheduled (daily)Azure FunctionvesperFutures, spot prices
DatacollectData EngineeringOn-demandAzure FunctiondatacollectVendor data, market forms
Market DataData EngineeringScheduled (daily)Azure Functionmarket_dataExchange rates, futures
Monitoringdbt PipelinePost-buildPython scriptmonitoringRun results, test results

Why Shortcuts Over ETL?

The platform favors OneLake shortcuts over traditional ETL pipelines for most sources. The reasoning:

  1. Zero maintenance -- Shortcuts are declarative references, not code. There are no transformation jobs to monitor, no retry logic to maintain, no scheduling to configure.
  2. Real-time sync -- Dataverse shortcuts reflect changes within seconds. Traditional ETL would introduce batch latency.
  3. No data duplication -- Shortcuts point to the source data in-place. No storage cost for the Bronze copy.
  4. Automatic schema evolution -- When source tables add columns, shortcuts pick them up automatically. ETL pipelines would require schema change detection and handling.

Azure Functions are used only for sources that require active ingestion: external APIs (Vesper, market data) and file-based uploads (Datacollect Excel files).

Shortcut Lifecycle

Shortcuts follow a Git-managed lifecycle from creation to production deployment:

graph TD
  CREATE[Create shortcut in DEV Fabric UI] --> COMMIT[Commit via Source Control panel]
  COMMIT --> METADATA[shortcuts.metadata.json updated on main branch]
  METADATA --> PR_UAT[PR: main to release/uat]
  PR_UAT --> UAT_SYNC[UAT workspace auto-syncs from Git]
  UAT_SYNC --> PR_PROD[PR: release/uat to release/prod]
  PR_PROD --> PROD_SYNC[PROD workspace auto-syncs from Git]

Key details:

  • workspaces/bronze/Lakehouse_Bronze.Lakehouse/shortcuts.metadata.json is the IaC source of truth for all shortcuts deployed by scripts/deploy_shortcuts.py. The pipeline runs this script on every infra-deploy.
  • For shortcuts whose target item ID differs per environment (e.g. the 7 OneLake shortcuts pointing to the local Application_Data_Lists MirroredDatabase), the manifest uses "oneLakeLookupName": "Application_Data_Lists" instead of a hardcoded item ID. deploy_shortcuts.py resolves the ID at deploy time by listing workspace items — no per-environment config needed.
  • The pipeline also calls scripts/deploy_dataverse_shortcuts.py separately for the 96 PROD Dataverse shortcuts managed via deployment/bronze-dataverse-shortcuts.json.
  • The Git-based promotion flow (PR + merge) moves shortcuts across environments.
  • Always use "Update" (pull) before "Commit" (push) in the Fabric UI to avoid merge conflicts.

Schema Organization

Bronze Lakehouse uses schema-enabled mode. Shortcuts are organized into schemas that map to source systems:

SchemaSourceContents
dataverse_d365Dataverse — PROD only (operations-geris-prod.crm4.dynamics.com)All D365 F&O business tables used by marts. 96 shortcuts. Canonical source for every downstream dbt model.
dataverse_d365_uatDataverse — UAT (operations-geris-uat.crm4.dynamics.com)Parallel copies of 11 core business tables (salestable, salesline, purchtable, purchline, lgslogisticfiletable, lgslogisticfileline, inventtrans, inventdim, inventbatch, inventtransorigin, prodtable) for UAT-data inspection. Never joined by production dbt models.
sharepointSharePoint + local MirroredDatabase5 OneDriveSharePoint shortcuts (Stonex/Marex broker data) and 7 OneLake shortcuts pointing to the local Application_Data_Lists MirroredDatabase (budget, KPI, report-owner reference data).
datacollectDatacollectMarket data collection forms
sharepoint_adSharePointLegacy application data lists (pre-2026 path)
axDynamics AX 2012Cross-workspace shortcut to shared Bronze_AX lakehouse
dboVariousLegacy and miscellaneous tables

Environment split: before 2026-04 the 11 UAT-sourced tables sat under dataverse_d365 alongside the PROD shortcuts, silently winning the create-race and making DEV Bronze serve UAT data for those tables. The split into two schemas makes the source environment explicit and is enforced by scripts/deploy_dataverse_shortcuts.py (duplicate (path, name) keys now fail fast). See the Bronze shortcut env-split runbook for the one-time reconciliation procedure.

Bronze_AX: Shared Lakehouse

Bronze_AX is a special case. It is a shared lakehouse containing the AX 2012 archive (221 tables) that all environments reference via cross-workspace shortcuts. Unlike other shortcuts, AX shortcuts:

  • Are identical across DEV, UAT, and PROD (same workspaceId + itemId)
  • Should NOT be included in environment promotion PRs
  • Point to the same physical data regardless of environment

If Bronze_AX needs to move or be recreated, all three environments must be updated simultaneously using the migration script, not through the standard Git promotion flow.

Bronze Immutability

Bronze data is treated as immutable -- the raw record of what was received from source systems. This is enforced through multiple layers:

SQL Write Protection

The security pipeline (security-deploy.yml) deploys DENY grants that block INSERT, UPDATE, and DELETE operations on all Bronze schemas for non-admin roles. This is automated and deployed on every push to main, release/uat, or release/prod.

OneLake Soft-Delete

Soft-delete is enabled on all three Bronze lakehouses with a 30-day retention period. Accidentally deleted files are retained and can be recovered via the OneLake File Explorer or REST API. This is a manual Fabric portal configuration (not automatable via Terraform).

LakehousePurposeSoft-Delete
Lakehouse_BronzeD365, logistics, BD tables30-day retention
Lakehouse_Bronze_AXAX 2012 historical data (221 tables)30-day retention
Lakehouse_DatacollectExcel files, market data APIs30-day retention

AX Archive Immutability

The AX 2012 archive in Azure Blob Storage (gerisdbtartifacts/ax-archive) has a 7-year (2555-day) time-based retention policy. Once locked, blobs cannot be deleted until the retention period expires. This satisfies financial data retention requirements.

Validation

Run the Bronze immutability validation script to verify all protections are in place:

python scripts/validate_bronze_immutability.py

This checks: OneLake soft-delete on all Bronze lakehouses, SQL DENY grants deployed, and ax-archive container immutability policy configured.

Related Pages