Modern Azure Lakehouse Architecture – Low Cost Scalable Data Engineering
Added Value
The project was successful as we could create real value for the customer.
- Daily extraction, overall runtime < 1,5 hours
- Replacement of SSIS and on-prem infrastructure and maintenance efforts
- Run cost < 250€ / month for whole architecture
- Proactive monitoring in case of pipeline failure
Manufacturing Industry: Finance, Production, Sales.
In this project, I created a scalable modern lakehouse platform that Azure Synapse Analytics orchestrates. The workspace and Databricks repository are attached to Azure DevOps, which made migrating between two subscriptions easy. To keep an eye on disaster recovery and security, we ensured to include all authentication (except managed identities) in an Azure Key Vault.
Learnings
In this project, I encountered different issues that required some Azure Synapse Analytics deep-dive.
- SAP Table vs. SAP CDC
- Migrating from a managed subscription to a customer-owned subscription
- Workspace
- SQL Serverless DB
- all related resources
- Extracting data from SharePoint Online Lists for sites and sub-sites
Reducing Complexity in a straightforward Power BI Pro environment
Before: Multiple Power BI dataflows, redundant, complex transformations and maintenance overhead for duplicate datasets
Before we evaluated the environment, the implementation was maintained by one person, which was quite complex, and in case of failure or on-premises data gateway issues, the reporting of the current day was not up-to-date. This led to misinformation and mistrust from the C-management level.
- Bad design of dataflows, redundant transformations
- Usage of multiple datasets for different access levels
- Complex refresh planning
After: Reduced number of Power BI dataflows and datasets leading to better maintenance, quicker refresh and less transformations.
After we analyzed the required data sources and relevant transformations, we reduced the number of dataflows, leading to better refresh cycles and less complexity within the different workspaces. Therefore, maintaining the environment was made more accessible and reduced the overhead for the Power BI developer.
- Reduction of dataflows, re-usage of transformations
- The introduction of dynamic Row-Level-Security (RLS) leads to less datasets
- Refresh planning adjusted to enhance daily reporting
Added Value
- Reduction of maintenance overhead
- Improved stability
- Reduction of data redundancy
Learnings
- Enhancing Power BI Pro dataflows mimicking Premium capabilities
- Dynamic RLS for different hierarchy levels
- REST-API access and authentication