Modern Data Engineering with Medallion Architecture

Project Overview

This project sets up an end-to-end data engineering pipeline using Apache Spark, Azure Databricks, and Data Build Tool (DBT) on the Azure cloud platform. Leveraging the Medallion Architecture, our pipeline encompasses data ingestion, integration, and transformation processes designed to prepare data for advanced analytics.

Architecture

Components

Apache Spark: Utilized for large-scale data processing.
Azure Databricks: Provides a high-performance analytics platform.
DBT (Data Build Tool): Used for data modeling and transformations within the data lakehouse.
Azure Data Factory: Manages data pipelines for data integration and transformation.

Data Layers

Bronze: Raw data ingestion and storage.
Silver: Data cleaning and enrichment.
Gold: Aggregated data optimized for business intelligence.

Workflow Commands

dbt run         # Run transformation models
dbt test        # Execute data tests
dbt snapshot    # Manage slowly changing dimensions
dbt docs generate # Generate project documentation
dbt docs serve   # Serve documentation locally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Modern Data Engineering with Medallion Architecture

Project Overview

Architecture

Components

Data Layers

Workflow Commands

Files

README.md

Latest commit

History

README.md

File metadata and controls

Modern Data Engineering with Medallion Architecture

Project Overview

Architecture

Components

Data Layers

Workflow Commands