databrickslabs
diff --git a/‎demo/README.md‎
Lines changed: 81 additions & 6 deletions b/‎demo/README.md‎
Lines changed: 81 additions & 6 deletions
diff --git a/‎demo/dabs/conf/dqe/dqe_silver_people.json‎
Lines changed: 5 additions & 0 deletions b/‎demo/dabs/conf/dqe/dqe_silver_people.json‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎demo/dabs/conf/onboarding_bronze_silver_people.template‎
Lines changed: 80 additions & 0 deletions b/‎demo/dabs/conf/onboarding_bronze_silver_people.template‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎demo/dabs/conf/onboarding_silver_fanout_people.template‎
Lines changed: 37 additions & 0 deletions b/‎demo/dabs/conf/onboarding_silver_fanout_people.template‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎demo/dabs/conf/silver_queries_people.json‎
Lines changed: 48 additions & 0 deletions b/‎demo/dabs/conf/silver_queries_people.json‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎demo/dabs/notebooks/init_dlt_meta_pipeline.ipynb‎
Lines changed: 86 additions & 0 deletions b/‎demo/dabs/notebooks/init_dlt_meta_pipeline.ipynb‎
Lines changed: 86 additions & 0 deletions
@@ -4,10 +4,9 @@
  3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows)  and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
  4. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows)  and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
  5. [Silver Fanout Demo](#silver-fanout-demo): This demo showcases the implementation of fanout architecture in the silver layer.
- 6. [Apply Changes From Snapshot Demo](#Apply-changes-from-snapshot-demo): This demo showcases the implementation of ingesting from snapshots in bronze layer
- 7. [Lakeflow Declarative Pipelines Sink Demo](#dlt-sink-demo): This demo showcases the implementation of write to external sinks like delta and kafka
-
-The source argument is optional for the demos.
+ 6. [Apply Changes From Snapshot Demo](#apply-changes-from-snapshot-demo): This demo showcases the implementation of ingesting from snapshots in bronze layer
+ 7. [Lakeflow Declarative Pipelines Sink Demo](#lakeflow-declarative-pipelines-sink-demo): This demo showcases the implementation of write to external sinks like delta and kafka
+ 8. [DAB Demo](#dab-demo): This demo showcases how to use Databricks Assets Bundles with dlt-meta
 
 
 # DAIS 2023 DEMO
@@ -224,7 +223,6 @@ This demo will perform following tasks:
     
     ![silver_fanout_dlt.png](../docs/static/images/silver_fanout_dlt.png)
 
-
 # Apply Changes From Snapshot Demo
   - This demo will perform following steps
     - Showcase onboarding process for apply changes from snapshot pattern([snapshot-onboarding.template](https://github.com/databrickslabs/dlt-meta/blob/main/demo/conf/snapshot-onboarding.template))
@@ -311,4 +309,81 @@ This demo will perform following tasks:
     ```
     ![dlt_demo_sink.png](../docs/static/images/dlt_demo_sink.png)
     ![dlt_delta_sink.png](../docs/static/images/dlt_delta_sink.png)
-    ![dlt_kafka_sink.png](../docs/static/images/dlt_kafka_sink.png)
+    ![dlt_kafka_sink.png](../docs/static/images/dlt_kafka_sink.png)
+
+
+# DAB Demo
+
+## Overview
+This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta:
+* This demo will perform following steps
+* * Create dlt-meta schema's for dataflowspec and bronze/silver layer
+* * Upload nccessary resources to unity catalog volume
+* * Create DAB files with catalog, schema, file locations populated
+* * Deploy DAB to databricks workspace
+* * Run onboarding usind DAB commands
+* * Run Bronze/Silver Pipelines using DAB commands
+* * Demo examples will showcase fan-out pattern in silver layer
+* * Demo example will show case custom transfomations for bronze/silver layers
+* * Adding custom columns and metadata to Bronze tables
+* * Implementing SCD Type 1 to Silver tables
+* * Applying expectations to filter data in Silver tables
+
+## Prerequisites
+1. Launch Command Prompt
+
+2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
+
+3. ```commandline
+    git clone https://github.com/databrickslabs/dlt-meta.git 
+    ```
+
+4. ```commandline
+    cd dlt-meta
+    ```
+5. Set python environment variable into terminal
+    ```commandline
+    dlt_meta_home=$(pwd)
+    ```
+    ```commandline
+    export PYTHONPATH=$dlt_meta_home
+    ```
+
+6. Generate DAB resources and set up schemas:
+    This command will:
+    - Generate DAB configuration files
+    - Create DLT-Meta schemas
+    - Upload necessary files to volumes
+    ```commandline
+        python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
+    ```
+    > Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token.
+
+7. Deploy and run the DAB bundle:
+    - Navigate to the DAB directory
+    ```commandline
+        cd demo/dabs
+    ```
+
+    - Validate the bundle configuration
+    ```commandline
+        databricks bundle validate --profile=<your_profile>
+    ```
+
+    - Deploy the bundle to dev environment
+    ```commandline
+        databricks bundle deploy --target dev --profile=<your_profile>
+    ```
+
+    - Run the onboarding job
+    ```commandline
+        databricks bundle run onboard_people -t dev --profile=<your_profile>
+    ```
+
+    - Execute the pipelines
+    ```commandline
+        databricks bundle run execute_pipelines_people -t dev --profile=<your_profile>
+    ```
+
+    ![dab_onboarding_job.png](../docs/static/images/dab_onboarding_job.png)
+    ![dab_dlt_pipelines.png](../docs/static/images/dab_dlt_pipelines.png)
@@ -0,0 +1,5 @@
+{
+    "expect_or_drop": {
+        "valid_id": "id IS NOT NULL"
+    }
+}
@@ -0,0 +1,80 @@
+[
+    {
+        "data_flow_id": "007",
+        "data_flow_group": "my_people",
+
+        "source_system": "Manual_Download",
+        "source_format": "cloudFiles",
+
+        "source_details": {
+            "source_path_dev": "{uc_volume_path}/demo/dabs/resources/data/people",
+            "source_path_prod": "{uc_volume_path}/demo/dabs/resources/data/people",
+            "source_metadata": {
+                "include_autoloader_metadata_column": "True",
+                "autoloader_metadata_col_name": "source_metadata",
+                "select_metadata_cols": {
+                    "input_file_name": "_metadata.file_name",
+                    "input_file_path": "_metadata.file_path"
+                }
+            }         
+        },
+        "bronze_catalog_dev": "{uc_catalog_name}",
+        "bronze_database_dev": "{bronze_schema}",
+        "bronze_catalog_prod": "{uc_catalog_name}",
+        "bronze_database_prod": "{bronze_schema}",        
+        "bronze_table": "people_bronze",
+        "bronze_quarantine_table": "people_bronze_quarantine",
+        "bronze_cluster_by": ["country"],
+
+        "bronze_table_properties" : {
+            "pipelines.autoOptimize.managed": "true", 
+            "pipelines.reset.allowed": "true",
+            "delta.autoOptimize.optimizeWrite": "true",
+            "delta.autoOptimize.autoCompact": "true",
+            "delta.tuneFileSizesForRewrites": "true",
+            "delta.columnMapping.mode": "name",
+            "delta.checkpointRetentionDuration": "30 days",
+            "delta.deletedFileRetentionDuration": "30 days",
+            "delta.logRetentionDuration": "30 days"
+        },
+
+        "bronze_reader_options": {
+            "cloudFiles.format": "csv",
+            "cloudFiles.inferColumnTypes": "true",
+            "cloudFiles.rescuedDataColumn": "_rescued_data"
+        },
+        "silver_catalog_dev": "{uc_catalog_name}",
+        "silver_database_dev": "{silver_schema}",
+        "silver_catalog_prod": "{uc_catalog_name}",
+        "silver_database_prod": "{silver_schema}",
+        "silver_table": "people_silver",
+        "silver_cluster_by": ["country"],
+
+        "silver_table_properties" : {
+            "pipelines.autoOptimize.managed": "true", 
+            "pipelines.reset.allowed": "true",
+            "delta.autoOptimize.optimizeWrite": "true",
+            "delta.autoOptimize.autoCompact": "true",
+            "delta.tuneFileSizesForRewrites": "true",
+            "delta.columnMapping.mode": "name",
+            "delta.checkpointRetentionDuration": "30 days",
+            "delta.deletedFileRetentionDuration": "30 days",
+            "delta.logRetentionDuration": "30 days"
+        },
+
+        "silver_transformation_json_dev": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json",
+        "silver_data_quality_expectations_json_dev": "{uc_volume_path}/demo/dabs/conf/dqe/dqe_silver_people.json",
+
+        "silver_transformation_json_prod": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json",
+        "silver_data_quality_expectations_json_prod": "{uc_volume_path}/demo/dabs/conf/dqe/dqe_silver_people.json",
+
+        "silver_cdc_apply_changes": {
+            "keys": [
+                "id"
+            ],
+            "sequence_by": "id",
+            "scd_type": "1"
+        }
+    }
+    
+]
@@ -0,0 +1,37 @@
+[
+  {
+      "data_flow_id": "0070",
+      "data_flow_group": "my_people",
+      "bronze_catalog_dev": "{uc_catalog_name}",
+      "bronze_database_dev": "{bronze_schema}",
+      "bronze_catalog_prod": "{uc_catalog_name}",
+      "bronze_database_prod": "{bronze_schema}",  
+      "bronze_table": "people_bronze",
+      
+      "silver_catalog_dev": "{uc_catalog_name}",
+      "silver_database_dev": "{silver_schema}",
+      "silver_catalog_prod": "{uc_catalog_name}",
+      "silver_database_prod": "{silver_schema}",
+      "silver_table": "people_silver_sal_below_50K",
+      
+      "silver_transformation_json_dev": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json",
+      "silver_transformation_json_prod": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json"      
+  },
+  {
+      "data_flow_id": "0071",
+      "data_flow_group": "my_people",
+      "bronze_catalog_dev": "{uc_catalog_name}",
+      "bronze_database_dev": "{bronze_schema}",
+      "bronze_catalog_prod": "{uc_catalog_name}",
+      "bronze_database_prod": "{bronze_schema}",  
+      "bronze_table": "people_bronze",
+      "silver_catalog_dev": "{uc_catalog_name}",
+      "silver_database_dev": "{silver_schema}",
+      "silver_catalog_prod": "{uc_catalog_name}",
+      "silver_database_prod": "{silver_schema}",
+      "silver_table": "people_silver_sal_above_50K",
+      
+      "silver_transformation_json_dev": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json",
+      "silver_transformation_json_prod": "{uc_volume_path}/demo/dabs/conf/silver_queries_people.json"
+  }        
+]
@@ -0,0 +1,48 @@
+[
+    {
+        "target_table": "people_silver",
+        "select_exp": [
+            "id",
+            "md5(concat_ws('-',firstName, middleName, lastName,gender,birthDate,ssn)) as row_id",
+            "firstName as first_name",
+            "middleName as middle_name",
+            "lastName as last_name",
+            "gender as gender",
+            "birthDate as birth_date",
+            "ssn as ssn",
+            "salary as salary",
+            "country as country",
+            "_rescued_data"
+        ]
+    },
+    {
+        "target_table": "people_silver_sal_above_50K",
+        "select_exp": [
+            "id",
+            "md5(concat_ws('-',firstName, middleName, lastName,gender,birthDate,ssn)) as row_id",
+            "concat(firstName,' ',middleName,' ',lastName) as full_name",
+            "gender as gender",
+            "birthDate as birth_date",
+            "ssn as ssn",
+            "salary as salary",
+            "country as country",
+            "_rescued_data"
+        ],
+        "where_clause": ["salary > 50000"]
+    },
+    {
+        "target_table": "people_silver_sal_below_50K",
+        "select_exp": [
+            "id",
+            "md5(concat_ws('-',firstName, middleName, lastName,gender,birthDate,ssn)) as row_id",
+            "concat(firstName,' ',middleName,' ',lastName) as full_name",
+            "gender as gender",
+            "birthDate as birth_date",
+            "ssn as ssn",
+            "salary as salary",
+            "country as country",
+            "_rescued_data"
+        ],
+        "where_clause": ["salary <= 50000"]
+    }
+]
@@ -0,0 +1,86 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a67ad808",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%pip install dlt-meta"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d59a07d",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "from pyspark.sql import DataFrame\n",
+    "from pyspark.sql.functions import lit\n",
+    "from pyspark.sql.functions import current_date\n",
+    "\n",
+    "def custom_transform_func_test(input_df, _) -> DataFrame:\n",
+    "  \n",
+    "  if layer == \"bronze\":\n",
+    "    dummy_param = spark.conf.get(\"dummy_param\",None)\n",
+    "  else:\n",
+    "    dummy_param = \"Test NA\"\n",
+    "\n",
+    "  return (input_df\n",
+    "        .withColumn('last_updated_on', current_date())\n",
+    "        .withColumn('some_dummy_from_task_param', lit(dummy_param) )\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3a86db7",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "layer = spark.conf.get(\"layer\", None)\n",
+    "\n",
+    "from src.dataflow_pipeline import DataflowPipeline\n",
+    "DataflowPipeline.invoke_dlt_pipeline(spark, layer, bronze_custom_transform_func=custom_transform_func_test,\n",
+    "silver_custom_transform_func=custom_transform_func_test\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d0734f4",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
-Original file line number
+Diff line change
@@ @@ -0,0 +1,5 @@ @@
 +{
 +    "expect_or_drop": {
 +        "valid_id": "id IS NOT NULL"
 +    }
 +}