-
Notifications
You must be signed in to change notification settings - Fork 16
Mart tables for GTFS Downloader Dashboard and Documentation #4575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
97b410d to
e6537a2
Compare
|
Terraform plan in iac/cal-itp-data-infra/airflow/us Plan: 9 to add, 8 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
!~ update in-place
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-composer["dags/README.md"] will be created
+ resource "google_storage_bucket_object" "calitp-composer" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "dags/README.md"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../airflow/dags/README.md"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer["dags/create_external_tables/METADATA.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer" {
!~ crc32c = "TAOSvA==" -> (known after apply)
!~ detect_md5hash = "BWlENDF70NFJXo55EWKoiQ==" -> "different hash"
!~ generation = 1764100054786170 -> (known after apply)
id = "calitp-composer-dags/create_external_tables/METADATA.yml"
!~ md5hash = "BWlENDF70NFJXo55EWKoiQ==" -> (known after apply)
name = "dags/create_external_tables/METADATA.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer["dags/download_parse_and_validate_gtfs.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer" {
!~ crc32c = "G0BNVQ==" -> (known after apply)
!~ detect_md5hash = "MXFYtPprsFpWte9yivjiHQ==" -> "different hash"
!~ generation = 1765308834202492 -> (known after apply)
id = "calitp-composer-dags/download_parse_and_validate_gtfs.py"
!~ md5hash = "MXFYtPprsFpWte9yivjiHQ==" -> (known after apply)
name = "dags/download_parse_and_validate_gtfs.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer["dags/sync_ntd_data_api/METADATA.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer" {
!~ crc32c = "jC3FRw==" -> (known after apply)
!~ detect_md5hash = "FVme+riRchXahturQIFHlg==" -> "different hash"
!~ generation = 1765312247501642 -> (known after apply)
id = "calitp-composer-dags/sync_ntd_data_api/METADATA.yml"
!~ md5hash = "FVme+riRchXahturQIFHlg==" -> (known after apply)
name = "dags/sync_ntd_data_api/METADATA.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer["plugins/operators/bigquery_to_download_config_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer" {
!~ crc32c = "KvENpw==" -> (known after apply)
!~ detect_md5hash = "T4VqE/DM0g8dGIROGUZ4Ww==" -> "different hash"
!~ generation = 1765308834176164 -> (known after apply)
id = "calitp-composer-plugins/operators/bigquery_to_download_config_operator.py"
!~ md5hash = "T4VqE/DM0g8dGIROGUZ4Ww==" -> (known after apply)
name = "plugins/operators/bigquery_to_download_config_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer["plugins/operators/gcs_to_gtfs_download_operator.py"] will be created
+ resource "google_storage_bucket_object" "calitp-composer" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "plugins/operators/gcs_to_gtfs_download_operator.py"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../airflow/plugins/operators/gcs_to_gtfs_download_operator.py"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["dbt_project.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "xy4uIA==" -> (known after apply)
!~ detect_md5hash = "jaSUWSXE+sudfy0c0AgiJA==" -> "different hash"
!~ generation = 1763589717446130 -> (known after apply)
id = "calitp-composer-data/warehouse/dbt_project.yml"
!~ md5hash = "jaSUWSXE+sudfy0c0AgiJA==" -> (known after apply)
name = "data/warehouse/dbt_project.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/_int_gtfs.yaml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "JuVq+A==" -> (known after apply)
!~ detect_md5hash = "XuNJoijQihZNoiN67C5rlg==" -> "different hash"
!~ generation = 1761707840881407 -> (known after apply)
id = "calitp-composer-data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
!~ md5hash = "XuNJoijQihZNoiN67C5rlg==" -> (known after apply)
name = "data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_datasets.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/intermediate/gtfs/int_gtfs_datasets.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/intermediate/gtfs/int_gtfs_datasets.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/intermediate/transit_database/dimensions/int_transit_database__gtfs_datasets_dim.sql"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "cy68sQ==" -> (known after apply)
!~ detect_md5hash = "2FqX0bk2xM2PZaS2Esldww==" -> "different hash"
!~ generation = 1757531130514617 -> (known after apply)
id = "calitp-composer-data/warehouse/models/intermediate/transit_database/dimensions/int_transit_database__gtfs_datasets_dim.sql"
!~ md5hash = "2FqX0bk2xM2PZaS2Esldww==" -> (known after apply)
name = "data/warehouse/models/intermediate/transit_database/dimensions/int_transit_database__gtfs_datasets_dim.sql"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/_mart_gtfs_audit.yml"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/_mart_gtfs_audit.yml"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/_mart_gtfs_audit.yml"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/dim_gtfs_download_configs.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/dim_gtfs_download_configs.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/dim_gtfs_download_configs.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/dim_gtfs_schedule_download_outcomes.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_download_outcomes.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_download_outcomes.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/dim_gtfs_schedule_unzip_outcomes.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_unzip_outcomes.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_unzip_outcomes.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/dim_gtfs_schedule_validation_notices.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_validation_notices.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_validation_notices.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_audit/dim_gtfs_schedule_validation_outcomes.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_validation_outcomes.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/gtfs_audit/dim_gtfs_schedule_validation_outcomes.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "JRuXXA==" -> (known after apply)
!~ detect_md5hash = "Caqsk8kIhrzLrYxkwYo53g==" -> "different hash"
!~ generation = 1751416666931203 -> (known after apply)
id = "calitp-composer-data/warehouse/models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"
!~ md5hash = "Caqsk8kIhrzLrYxkwYo53g==" -> (known after apply)
name = "data/warehouse/models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"
# (17 unchanged attributes hidden)
}
Plan: 9 to add, 8 to change, 0 to destroy.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1199 |
|
Terraform plan in iac/cal-itp-data-infra-staging/airflow/us Plan: 0 to add, 4 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~ update in-place
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-staging-composer["dags/download_parse_and_validate_gtfs.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "GvruJA==" -> (known after apply)
!~ detect_md5hash = "EVCNHN6Hq0uodzYuK8Zg4A==" -> "different hash"
!~ generation = 1765494172783784 -> (known after apply)
id = "calitp-staging-composer-dags/download_parse_and_validate_gtfs.py"
!~ md5hash = "EVCNHN6Hq0uodzYuK8Zg4A==" -> (known after apply)
name = "dags/download_parse_and_validate_gtfs.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/bigquery_to_download_config_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "6oRS3A==" -> (known after apply)
!~ detect_md5hash = "xxkAT0jhkh3LKcuWUQ40AA==" -> "different hash"
!~ generation = 1765492313306019 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/bigquery_to_download_config_operator.py"
!~ md5hash = "xxkAT0jhkh3LKcuWUQ40AA==" -> (known after apply)
name = "plugins/operators/bigquery_to_download_config_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-catalog will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-catalog" {
!~ content = (sensitive value)
!~ crc32c = "C6pTUQ==" -> (known after apply)
!~ detect_md5hash = "G1/zRN8BpLbORlo3Fq9mFQ==" -> "different hash"
!~ generation = 1765493898541075 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/catalog.json"
!~ md5hash = "G1/zRN8BpLbORlo3Fq9mFQ==" -> (known after apply)
name = "data/warehouse/target/catalog.json"
# (16 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-manifest will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-manifest" {
!~ content = (sensitive value)
!~ crc32c = "9rnV6w==" -> (known after apply)
!~ detect_md5hash = "ByFXB40BeDe5NkvaWS4/Qg==" -> "different hash"
!~ generation = 1765493899808953 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/manifest.json"
!~ md5hash = "ByFXB40BeDe5NkvaWS4/Qg==" -> (known after apply)
name = "data/warehouse/target/manifest.json"
# (16 unchanged attributes hidden)
}
Plan: 0 to add, 4 to change, 0 to destroy.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1199 |
08b0863 to
438cdff
Compare
|
Warehouse report: Failed to add ci-report to a comment. Review the ci-report in the Summary. |
|
Thank you @erikamov ! A couple of questions:
|
84c3eef to
4294d13
Compare
|
Answering @lauriemerrell questions:
3.a. We heard you and we are splitting the Download from the unzip, parse, and validate and including on this PR. |
|
For 3b-- I would be inclined to define this as an exposure on the dbt side so that the dependency is very explicit and visible there, because this will be a unique pattern (to have a dependency between an ingest and something managed by dbt). |
|
We also might want to call this (again 3b) out on this page: https://docs.calitp.org/data-infra/architecture/data.html#architecture-data since this introduces a new dependency pattern |
4294d13 to
ad3c623
Compare
|
Terraform plan in iac/cal-itp-data-infra-staging/composer/us No changes. Your infrastructure matches the configuration.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1199 |
3badf3f to
78e5e4f
Compare
|
Terraform plan in iac/cal-itp-data-infra-staging/dashboards/us No changes. Your infrastructure matches the configuration.📝 Plan generated in Terraform Plan #726 |
78e5e4f to
a90e6d6
Compare
|
Notes from live walkthrough convo 12/11:
|
Create a main DAG README file since new DAGS are not in folders [#4571]
…_datasets_dim to a new view int_gtfs_datasets
c8cda86 to
4e3b576
Compare
|
As decided by @vevetron we'll want to make 2 issues after this branch is merged:
|
4e3b576 to
432e9bf
Compare
Description
This PR created materialized tables to generate a GTFS Downloader Dashboard in order to validate the results of the new GTFS Download DAG #4571.
In conversation with @vevetron and @lauriemerrell we noticed that the filter
data_quality_pipelinewas missing, but it did not cause any change on the datasets. Added the filter on this PR.As requested we are splitting Download process from Unzip, parse, and Validate. So a new
download_gtfsDAG will be responsible for downloading the GTFS data, then triggeringparse_and_validate_gtfsthat will do the rest.For the concern of getting current data, we moved the logic from the materialized table
staging.int_transit_database__gtfs_datasets_dimto a new viewstaging.int_gtfs_datasetsto get the current datasets fromairtable.california_transit__gtfs_datasets.staging.int_transit_database__gtfs_datasets_dimto the new view to have only one source of truth.Also to make sure the external tables are created/updated in correct order, I switched the create_external_table DAG to after syncs DAGS as part of the
Review DAGs’ schedule#3714.Type of change
How has this been tested?
Tested running dbt tables locally:
Post-merge follow-ups