Technology Stack
This page documents every major technology in the platform, why it was chosen, and the constraints you need to know when working with it.
Core Technologies
| Technology | Version / Constraint | Purpose | Auth Method |
|---|---|---|---|
| dbt Core | Latest (pip) | Data transformation framework. Dual-target: DuckDB local, Fabric Warehouse remote. | CLI (az login) |
| Python | 3.10 -- 3.12 only | All scripts, Azure Functions, dbt adapters. 3.13 is incompatible with dbt-fabric. | -- |
| Terraform | >= 1.8 (with Fabric provider) | Infrastructure provisioning: workspaces, warehouses, lakehouses, shortcuts, role assignments. | CLI (FABRIC_USE_CLI=true) |
| Microsoft Fabric | Cloud service | Lakehouse (Bronze), Warehouse (Gold), Semantic Models (DirectLake), Power BI Reports, OneLake. | SPN + CLI |
| Azure DevOps | Cloud service | Git hosting, CI/CD pipelines, service connections. Org: geris-devops, project: insights-requests. | Service connection |
| Azure Functions | Python runtime (Linux) | Data ingestion (Datacollect, broker tables), observability (pipeline metrics, CU monitoring), exports. | Managed Identity |
| Azure Key Vault | kv-fabric-dbt-keys | Single vault for all environment secrets. SPN credentials and connection strings. | SPN Get + List |
| SWA Managed Functions (Node 20 + TypeScript) | v4 programming model (@azure/functions) | Cloud-side portal API: reads feature envs, triggers actions. Authenticates to Fabric via platform-admin SPN read from Key Vault via system-assigned MI. | System-assigned MI → Key Vault → MSAL |
Key Libraries
| Library | Install Method | Purpose |
|---|---|---|
| dbt-fabric | pip | Fabric Warehouse adapter for dbt. Uses CLI auth (not SPN -- ODBC timeouts in CI). |
| dbt-duckdb | pip | Local development adapter. Fast iteration without Fabric connection. |
| dbt_utils | packages.yml (dbt Hub) | Utility macros. The ONLY dbt Hub package -- do not add pip packages here. |
| fabric-cicd | pip | Deploys semantic models and reports from git to Fabric with parameter substitution. |
| pyodbc | pip | Direct Fabric Warehouse connections. Uses access token auth (attrs_before=\{1256: token_struct\}). |
| azure-identity | pip | DefaultAzureCredential for Azure Functions, AzureCliCredential for local scripts. |
| azure-keyvault-secrets | pip | Key Vault secret retrieval in deployment scripts. |
Key Architectural Decisions
Why dbt (not Azure Data Factory)?
dbt provides version-controlled SQL transformations with built-in testing, documentation, and lineage tracking. ADF would require managing JSON pipeline definitions with limited testability and no local development story. With dbt, developers iterate on model logic in seconds against DuckDB, run the full test suite locally, and only deploy to Fabric when ready.
Why DuckDB for Local Development (not Docker Fabric)?
There is no Docker image for Microsoft Fabric Warehouse. The alternative -- always developing online against the DEV Fabric Warehouse -- was rejected because it is slow (minutes per build vs seconds), blocks on network availability, and consumes shared Fabric CU capacity.
DuckDB provides a fast, zero-dependency local environment. The cost is maintaining dual-dialect SQL:
| Feature | DuckDB | Fabric Warehouse (T-SQL) |
|---|---|---|
| Case sensitivity | Case-insensitive | Case-sensitive for quoted identifiers |
datetime2 | Use \{\{ cast_timestamp() \}\} macro | Requires explicit datetime2(6) |
bit (boolean) | Use \{\{ cast_boolean() \}\} macro | Resolves to bit |
lpad() | Supported | Not available -- use right('00' || cast(...), n) |
| Recursive CTEs | WITH RECURSIVE required | WITH (implicit recursion) |
| Bracket identifiers | Not supported | [column] works |
varchar default | Unlimited | varchar(30) -- silently truncates! Always specify length. |
Why Terraform (not Bicep)?
Terraform has a first-party Microsoft Fabric provider that supports workspaces, warehouses, lakehouses, git connections, and role assignments. Bicep has no Fabric resource types -- it only covers ARM resources. Since the platform's infrastructure is primarily Fabric resources (not ARM), Terraform is the natural fit.
Why CLI Auth (not SPN in CI)?
ODBC Driver 18's ActiveDirectoryServicePrincipal authentication consistently times out on Azure DevOps Ubuntu hosted agents due to a libmsal library issue. The az account get-access-token workaround is reliable on the same agents. This means:
- All dbt
profiles.ymltargets useauthentication: CLI - CI pipeline dbt steps must run inside
AzureCLI@2tasks - Local development uses
az loginsessions - No SPN environment variables are needed locally
Why Cherry-Pick Promotion (not Branch Merges)?
Changes flow from main to release/uat to release/prod via cherry-pick PRs, not full merges. This gives granular control: a critical bug fix can be promoted to PROD immediately without carrying along an unfinished feature. UAT requires 1 reviewer; PROD requires 2 reviewers plus an approval gate.
MCP Servers (AI Tool Integration)
The project includes 9 MCP servers configured in .mcp.json at the repo root, providing AI tools with direct access to platform knowledge and operations.
Documentation and Knowledge Servers
| Server | Transport | Auth | Capabilities |
|---|---|---|---|
Microsoft Learn (microsoft-learn) | Remote HTTP | None | Semantic search across all Microsoft docs, code samples |
Fabric Pro-Dev (fabric-prodev) | npx (stdio) | None | Full Fabric OpenAPI specs, JSON schemas, best practices. Knowledge only -- no live Fabric connection. |
Terraform (terraform) | Docker (stdio) | None | Live Terraform Registry: provider docs, module search, config validation |
Data Platform Operations Servers
| Server | Transport | Auth | Capabilities |
|---|---|---|---|
Power BI Modeling (powerbi-modeling) | npx (stdio) | Browser login | TMDL import/export, DAX queries, measures, relationships, calculation groups |
Fabric Ops (fabric-ops) | uvx (stdio) | az login | Read-only operational intel: workspace listing, lakehouse schemas, lineage, CU usage |
DuckDB (duckdb) | uvx (stdio) | None | SQL queries against local ./dbt/fabric_datalake.duckdb |
dbt Core (dbt-core) | uvx (stdio) | None | Lineage, impact analysis, column-level tracing, SQL execution with ref()/source() |
Infrastructure and DevOps Servers
| Server | Transport | Auth | Capabilities |
|---|---|---|---|
Azure DevOps (azure-devops) | npx (stdio) | Browser login | Work items, pipelines, builds, PRs, repos, wiki for geris-devops org |
Azure (azure) | npx (stdio) | az login | 276 tools across 57 Azure services including Key Vault, Storage, ARM |
What We Deliberately Do NOT Use
These are not oversights -- they are conscious decisions with specific reasoning.
| Technology | Why Not |
|---|---|
dbt-fabric / dbt-duckdb in packages.yml | They are pip packages, not dbt Hub packages. Adding them to packages.yml breaks dbt deps. |
| ODBC SPN auth in CI | ActiveDirectoryServicePrincipal times out on Azure DevOps Ubuntu agents. CLI auth is reliable. |
| Per-environment Key Vaults | Unnecessary complexity for this project's scale. One kv-fabric-dbt-keys for all environments. |
| Azure DevOps variable groups | All config lives in deployment/ENV.yml -- single source of truth, version-controlled, auditable. |
| PySpark notebooks | Legacy datalake/ notebooks are reference-only. All new transforms are dbt SQL. |
| ADF / Synapse pipelines | dbt provides better testability, version control, and local development. |
| Bicep | No Fabric resource types. Terraform's Fabric provider covers the full infrastructure. |
| Docker for local dev | No Fabric Warehouse Docker image exists. DuckDB is faster and simpler. |
Version Constraints
These constraints have caused production issues in the past and must be respected:
| Constraint | Detail | Consequence of Violation |
|---|---|---|
| Python 3.10 -- 3.12 | dbt-fabric is incompatible with 3.13 | pip install dbt-fabric fails on 3.13 |
datetime2(6) explicit precision | Fabric Warehouse requires the (6) suffix | Bare datetime2 fails with error 24597 |
No lpad() in T-SQL | Not available in Fabric SQL | Use right('00' || cast(value as varchar), n) |
varchar needs explicit length | cast(x as varchar) defaults to varchar(30) in T-SQL | Silent data truncation, collapsed rows after GROUP BY |
| Case-sensitive quoted identifiers | Fabric Warehouse is case-sensitive; DuckDB is not | Works locally, fails in CI/DEV. Invisible bug. |
WITH RECURSIVE vs WITH | DuckDB requires RECURSIVE keyword; T-SQL does not | Use Jinja target checks for recursive CTEs |