Skip links

Date:
September 1, 2025

@author: gggordon

This project demonstrates practical, end-to-end use of core Azure Data Engineering services by integrating batch and streaming data pipelines. It improves visibility into daily product category sales, enables faster responses to sales events through real-time processing, and enhances data accuracy using Cosmos DB and MDM. Additionally, it supports cost optimization through effective monitoring and orchestration, showcasing a cloud-native approach to building scalable, insights-driven data solutions.

This demo shows how AdventureWorks, a traditional relational dataset, can be modernized into a 360° analytics platform on Azure. By combining:

  • Data Factory for orchestration,
  • Data Lake for raw/curated storage,
  • Stream Analytics + Event Hub for real-time ingestion,
  • SQL & Synapse for analytics,
  • Cosmos DB for flexible JSON views,
  • Cost Analysis for governance,

thereby achieving a scalable, flexible architecture that addresses both batch reporting and real-time analytics use cases.


Project Goals

  • Increase visibility into enterprise data by unifying different data sources.
  • Decrease latency in reporting by introducing real-time streaming pipelines.
  • Maximize data quality and governance with master data management.
  • Enable flexible, scalable analytics across both structured (SQL) and semi-structured (JSON) data.
  • Provide a cost-conscious architecture with clear monitoring of usage and spend.

Architecture Overview

AdventureWorks360 – Unified Analytics Platform on Azure - Architectural Overview


Resource Group Overview

Azure Resource Group

Data Ingestion

AdventureWorks sales data is ingested using Azure Data Factory pipelines.

  • SQL → Blob
    ADF SQL to Blob
  • Blob → Data Lake Gen2
    ADF Blob to ADLS

Data Lake Structure

  • Raw Layer
    ADLS Raw
  • Curated Layer
    ADLS Curated

Master Data Management

Product categories consolidated in Azure SQL Database:
MDM Product Category

Streaming Analytics

  • Streamed data from Event Hubs:
    Event Hub Resource Usage
    Event Hub Explorer
  • Stream Processing with Azure Stream Analytics:
    ASA Query
    ASA Resource Usage

Analytics & Reporting

  • Sales by category/day in Azure Synapse Analytics:
    Synapse View

Cosmos DB

Consolidated JSON view for customer analytics:
Cosmos DB Explorer

Cost Analysis

Mointoring spend:
Cost Analysis

NB: Even though the project didn’t require this much time, it shows how expensive breaks can be and how activities and costs should be monitored around that. Furthermore this view does not include alerts and scheduled activities which can be used to monitor and manage costs.

A few additional notes

A few additional notes for Real-World Deployments

  • Security & Authentication: Use Managed Identities and Role-Based Access Control instead of SAS tokens in production.
  • Data Governance: Establish naming conventions, metadata tracking, and data lineage for the Data Lake.
  • Performance Tuning: Scale Stream Analytics and Synapse queries based on data volume.
  • Cost Management: Regularly review Azure Cost Analysis to avoid unused resources consuming budget.
  • Monitoring & Alerting: Implement Azure Monitor and Log Analytics for proactive monitoring of data pipelines and jobs.