[#334] Add Terraform variable validation; trim Python pre-flight#335
Draft
gavinelder wants to merge 12 commits intochore/v26-only-strip-legacyfrom
Draft
[#334] Add Terraform variable validation; trim Python pre-flight#335gavinelder wants to merge 12 commits intochore/v26-only-strip-legacyfrom
gavinelder wants to merge 12 commits intochore/v26-only-strip-legacyfrom
Conversation
…hecks
First pass at moving sanity checks from scripts/installer/validation/check_configuration.py
up to native `validation {}` blocks on the corresponding variables. Errors fire at
`terraform plan` time instead of via a separate Python pre-flight, and the message
text travels with the variable definition.
Variable validations added (variables.tf):
- tower_container_version: tag-shape regex + v26.1.0 floor.
- tower_server_url: rejects scheme prefixes.
- tower_root_users: rejects empty/REPLACE_ME.
- alb_certificate_arn: ACM ARN shape (REPLACE_ME passthrough when no ALB).
- private_cacert_bucket_prefix: s3:// shape (REPLACE_ME passthrough when not in use).
- db_engine_version, db_container_engine_version: reject 5.6.
- data_studio_eligible_workspaces, data_studio_ssh_eligible_workspaces,
pipeline_versioning_eligible_workspaces: comma-separated integer format.
Python residual (check_configuration.py):
- Drops the duplicate floor/tag check from __main__ (now in variable validation).
- verify_tower_server_url: keeps the port-reminder warning only.
- verify_tower_self_signed_certs: narrowed to the cross-variable rule
(private_cacert_bucket_prefix required when flag_use_private_cacert = true).
- verify_data_studio / verify_data_studio_ssh / verify_pipeline_versioning: drop
the now-duplicate workspace-id format checks.
- verify_docker_version: keeps the docker-compose template scan; drops the
db_engine_version 5.6 check (now a variable validation).
- verify_tower_root_users: deleted (replaced by variable validation).
Cross-variable rules and AWS-data-source rules deliberately stay in Python.
Black-formatted.
Refs #334
- variables.tf: db_engine_version and db_container_engine_version validations now require an "8." prefix instead of merely rejecting "5.6". Error message states the master-only-8.x policy explicitly. - check_configuration.py: verify_docker_version now flags any mysql:N.x pin below 8 in the docker-compose template (was: only flagged 5.6 specifically). Refs #334
First slice of native Terraform testing, stacked on top of the variable-validation
work in this branch. 21 `run` blocks across three .tftest.hcl files exercise the
new validation {} rules with positive and negative cases.
tests/terraform/version_validation.tftest.hcl
- Floor + tag-shape rules on tower_container_version (v26.1.0+, v26.x major lock).
tests/terraform/string_shape_validation.tftest.hcl
- tower_server_url scheme-prefix rejection.
- tower_root_users empty/REPLACE_ME rejection.
- alb_certificate_arn ACM-ARN shape.
- private_cacert_bucket_prefix s3:// shape.
- MySQL 8.x floor on db_engine_version and db_container_engine_version.
tests/terraform/workspace_id_validation.tftest.hcl
- Comma-separated-integer rule on three workspace-ID list variables.
Each file uses a top-level `override_data` block to stub the connection-strings
external Python data source, so variable-validation tests run without needing a
Docker daemon (the script reads the tfvars from disk via Docker).
Other changes
- variables.tf: tower_container_version floor rule rewritten to parse the tag
into numeric components (Terraform doesn't support `>=` between strings).
Now requires major == 26 and minor >= 1, with a guard that lets the tag-shape
validation report cleanly first.
- Makefile: new `terraform_test` target. Materialises the generated tfvars at
project root (the external Python script reads it by hard-coded path) and
cleans up on exit.
- tests/datafiles/generate_core_data.sh: bumps the test-fixture
tower_container_version from v25.3.0 to v26.1.0 so the baseline tfvars
satisfies the new floor.
- tests/terraform/README.md: usage notes; what's in vs out of scope.
Run: make generate_test_data && make terraform_test
Result: 21 passed, 0 failed.
Refs #334
…oss-variable validation
Replaces the verify_only_one_true_set and verify_tfvars_config_dependencies
groups in check_configuration.py with native cross-variable `validation {}`
blocks. Errors fire at terraform plan time and the rule travels with the
variable rather than living in a Python pre-flight.
Cross-variable rules added (variables.tf):
Only-one-of-N flag groups (each anchored on the first flag in the group):
- flag_create_new_vpc / flag_use_existing_vpc
- flag_create_external_db / flag_use_existing_external_db / flag_use_container_db
- flag_create_external_redis / flag_use_container_redis
- flag_create_load_balancer / flag_use_private_cacert / flag_do_not_use_https
- flag_use_aws_ses_iam_integration / flag_use_existing_smtp
Conditional-dependency rules (anchored on the dependent variable):
- flag_use_existing_vpc => vpc_existing_id != REPLACE_ME
- flag_create_load_balancer => alb_certificate_arn != REPLACE_ME (second
validation block alongside the existing ARN-shape check)
- flag_create_route53_private_zone => new_route53_private_zone_name set
- flag_use_existing_route53_public_zone => existing_route53_public_zone_name set
- flag_use_existing_route53_private_zone => existing_route53_private_zone_name set
Residual Python:
- verify_only_one_true_set, verify_tfvars_config_dependencies, the
only_one_true_set helper, and the ensure_dependency_populated helper
are deleted entirely (not stubbed) — no callers remain.
- verify_sensitive_keys (parsed-tfvars-dict shape) and verify_subnet_privacy
(AWS data source) stay; those can't be expressed as variable validations.
tests/terraform/cross_variable_validation.tftest.hcl
- 11 run blocks covering negative cases for all 11 new rules.
- All 32 .tftest.hcl run blocks now pass: 0 failed.
Refs #334
…ushes Adds .github/workflows/terraform-test.yml. Triggers on PRs targeting master or release/v* branches when relevant files change, and on pushes to those branches. Uses Terraform 1.14.x and Python 3.12, runs `make generate_test_data` followed by `make terraform_test`. The variable-validation suite uses `override_data` blocks to stub the connection-strings external Python data source, so the run does not need real cloud credentials. `make generate_test_data` does need Docker (extractors.py uses an hcl2json container to parse tfvars); GHA ubuntu-latest runners have Docker available by default. CX_SKIP_SSM=true is set for fixture generation to avoid touching real SSM parameters. Format-check is intentionally NOT enabled in this workflow yet — the repo currently has pre-existing fmt drift across ~10 files outside this PR's scope. A separate housekeeping pass will run `terraform fmt -recursive` and add the check. Drive-by: applied `terraform fmt` to two files this PR already touched (modules/connection_strings/v1.0.0/variables.tf, templates/TEMPLATE_terraform.tfvars) so they're at least clean within scope. Refs #334
The original triggers only fired on PRs targeting master or release/v*. Since this work is stacked (PR #335 -> PR #333 -> master), GHA reads the workflow config from the PR base branch, which doesn't yet have this file. Result: the workflow never ran on the stacked PR. Changes: - pull_request: drop the branch restriction so any PR with relevant path changes runs the suite. Cheap; the suite is variable-validation-only. - push: add chore/v26-** as a temporary trigger so commits to the in-flight stacked branches fire the workflow. Remove once master has the workflow. - workflow_dispatch: added so the suite can be triggered manually for ad-hoc runs (e.g. when iterating on a .tftest.hcl file). Refs #334
The previous CI run failed at `pip install -r tests/requirements.txt` because
that file (pre-existing) contains a malformed line ('pip install requests=2.31.0').
That file is for the pytest suite — terraform-test doesn't need it.
generate_testing_secrets.py uses only stdlib, and the data.external script is
mocked via override_data, so no third-party Python deps are required.
Also create tests/logs/ before running generate_test_data; the bash script
appends to a file in there and emits a warning when the directory is missing.
Refs #334
The first CI run failed because the test fixtures pin var.aws_profile = "development"
and CI doesn't have that AWS profile configured. Variable-validation tests don't
need a real provider — they should run without any cloud credentials.
Adds to each .tftest.hcl file:
- `mock_provider "aws"` so the AWS provider stops trying to read shared config.
- Four `override_data` blocks providing realistic JSON for the four SSM secret
bundles (tower, seqerakit, groundswell, wave_lite). The locals jsondecode each
payload and look up specific keys (TOWER_DB_USER, SWELL_DB_USER, ssm_key, etc.),
so the stub JSON has to carry both the value and ssm_key fields per entry.
The override blocks are duplicated across all four .tftest.hcl files because
terraform test doesn't have a shared-fixture mechanism. If a new SSM key reference
lands in the .tf files, all four .tftest.hcl files need the matching update.
Result: 32 passed, 0 failed locally without any AWS credentials or AWS_PROFILE set.
Refs #334
Wires two new gates into .github/workflows/terraform-test.yml:
- `terraform fmt -check -recursive -diff` — fails on any unformatted file.
- `tflint --recursive` (with .tflint.hcl at repo root) — fails on errors;
warnings are still informational and gated behind disabled rules in
.tflint.hcl pending follow-up housekeeping.
Real bugs surfaced + fixed by tflint (terraform_map_duplicate_keys):
- 004_iam.tf: `app_name` defined twice in instance_role_policy_main
templatefile. Removed the duplicate.
- 009_define_file_templates.tf: `db_database_name` defined twice in
groundswell_sql templatefile. Removed the duplicate.
Auto-fixes applied via `tflint --fix`:
- 23 deprecated interpolation expressions ("${var.x}" -> var.x) across
000_main.tf, 002_security_groups.tf, 004_iam.tf, 005_parameter_store.tf,
008_route53.tf, 009_define_file_templates.tf, 010_prepare_config_files.tf,
011_configure_vm.tf, modules/connection_strings/v1.0.0/main.tf, and
modules/elasticache/output.tf.
Manual residual:
- 5 deprecated interpolations in modules/connection_strings/v1.0.0/main.tf
and modules/elasticache/output.tf that --fix bailed on (path resolution
quirk in the fix tool with submodules).
Repo-wide fmt:
- `terraform fmt -recursive` applied to the whole tree. tests/datafiles/
fixtures (which are regenerated each test run) are now also fmt'd at the
end of generate_core_data.sh so the regenerated output stays clean.
.tflint.hcl (new, repo root):
- Enables the `terraform` recommended preset and the AWS ruleset.
- Disables a handful of rules with documented follow-up:
terraform_module_version (the repo pins via folder name, not version arg),
terraform_documented_variables / _outputs (large pre-existing gap;
docs pass scheduled separately),
terraform_required_version / _required_providers (need a minimum-version
decision; terraform test/override_data set the floor at 1.7+),
terraform_unused_declarations (per-item audit needed; some are TODO'd).
Local: terraform fmt -check -recursive (clean), tflint --recursive (exit 0),
make terraform_test (32 passed, 0 failed).
Refs #334
…bmodules
The previous CI run failed at the tflint step because:
1. tflint exits 2 on *any* issue, including warnings — so even though
errors were absent, warnings still failed the job.
2. The root .tflint.hcl ruleset (which disables a handful of pre-existing
housekeeping warnings) is not auto-discovered when --recursive walks
into modules/connection_strings/v1.0.0/, modules/subnet_collector/v1.0.0/,
and tests/datafiles/.
Fixes:
- Pass --minimum-failure-severity=error so warnings stay informational.
- Set TFLINT_CONFIG_FILE so submodules use the root config.
Tighten to --minimum-failure-severity=warning as part of the housekeeping
follow-up once the disabled rules in .tflint.hcl are addressed.
Refs #334
Addresses every remaining tflint finding instead of suppressing them.
After this commit `tflint --recursive` is fully clean (exit 0, no warnings)
and the CI step no longer needs --minimum-failure-severity=error.
terraform_required_version
- Root 000_main.tf: bumped from ">= 1.1.0" to ">= 1.7.0" — terraform test
override_data blocks (used by tests/terraform/) require >= 1.7.
- modules/connection_strings/v1.0.0/main.tf: added terraform { required_version }.
- modules/subnet_collector/v1.0.0/main.tf: added terraform { required_version }.
- tests/datafiles/generate_core_data.sh: emit a terraform { required_version }
block in the generated 012_testing_outputs.tf.
terraform_required_providers
- Root 000_main.tf: declared all five providers used in the tree —
aws (~> 5.12), tls (~> 4.0), null (~> 3.2), random (~> 3.6), external (~> 2.3).
- modules/connection_strings/v1.0.0/main.tf: declared external (~> 2.3).
- modules/subnet_collector/v1.0.0/main.tf: declared aws (~> 5.12).
terraform_unused_declarations
- Removed `flag_create_load_balancer` from the connection_strings module's
variables.tf and from the parent's module call in 000_main.tf — it was
declared but never referenced inside the module.
- Removed `mock_sp_vpc` and `mock_aws_subnets_all` locals from
modules/subnet_collector/v1.0.0/main.tf — leftover stubs from an earlier
mock pattern; never referenced.
.tflint.hcl
- Removed the temporary disables for terraform_required_version,
terraform_required_providers, and terraform_unused_declarations now that
the underlying findings are fixed.
CI workflow
- tflint step no longer passes --minimum-failure-severity=error; default
behaviour fails on any warning, which is what we want now.
Local: `terraform fmt -check -recursive` exit 0, `tflint --recursive` exit 0,
`make terraform_test` 32 passed / 0 failed.
Refs #334
Sweeps comments added during the variable-validation migration that document
"this used to be in check_configuration.py / now lives in validation {} blocks"
or otherwise narrate the migration story. Those belong in the PR description,
not in long-lived code where they will rot.
Cuts (≈120 lines removed, no behaviour change):
scripts/installer/validation/check_configuration.py
- Trimmed "Note: ... is now enforced by `validation {}` block" notes
from verify_tower_server_url, verify_tower_self_signed_certs,
verify_docker_version, verify_data_studio, verify_data_studio_ssh.
- Removed verify_pipeline_versioning entirely (no-op stub) and its
call site in __main__.
- Dropped two migration-narrative blocks above the residual __main__ calls.
variables.tf
- Dropped "replaces verify_only_one_true_set" preamble before the
cross-variable rules; kept a one-liner stating the anchor convention.
- Removed "REPLACE_ME passthrough" / "stays in Python because spans
variables" notes — the conditions and error_messages already say that.
- Trimmed the floor-rule comment to just the genuinely non-obvious part
(`!can(...)` short-circuit guard).
- Removed inline "master supports MySQL 8.x only" comments — duplicates
the error_message text.
tests/terraform/*.tftest.hcl
- File headers reduced to a single descriptive line. The repeated
4-line "000_main.tf jsondecodes..." block is now a 2-line note that
states the only non-obvious constraint (each entry needs both `value`
and `ssm_key`).
- Dropped "See version_validation.tftest.hcl for the baseline-tfvars
convention" cross-references; tests/terraform/README.md owns that.
.github/workflows/terraform-test.yml
- Header no longer references issue #334.
- Trimmed the long Python-deps explanation to one line.
- Dropped the inline default-behaviour explanation on the tflint step.
- Dropped the redundant CX_SKIP_SSM = "skip writing real SSM" comment
(the variable name is self-documenting).
.tflint.hcl
- Removed the "Initial baseline ... tighten over time" preamble and
restated the rule disables as compact one-liners.
tests/datafiles/generate_core_data.sh
- Tightened the terraform-fmt block comment.
Local: terraform fmt -check -recursive (clean), tflint --recursive (exit 0),
make terraform_test (32 passed, 0 failed).
Refs #334
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #334. Stacked on PR #333 (`chore/v26-only-strip-legacy`) — review/merge that first.
Migrates sanity-check logic from `scripts/installer/validation/check_configuration.py` to native Terraform variable validation, adds the corresponding `terraform test` suite, and wires it into CI.
Commits (in order)
Coverage
Native validations now in variables.tf
`terraform test` suite (`tests/terraform/`)
CI
Python residual (`check_configuration.py`)
What stays:
What's deleted:
Out of scope (deferred under #334)
Verification
Automated triage by Claude Code, stacked on #333.