fix(test_cases): replace hardcoded user paths with overridable env vars#1092
fix(test_cases): replace hardcoded user paths with overridable env vars#1092Zhenye-Na wants to merge 1 commit into
Conversation
Introduce DATA_HOME_DIR, ESM2_DATA_DIR, and NSYS_OUTPUT_DIR environment variables that default to the original values. Fix bionemo enroot.sh removing wrong path for bionemo.sqsh.
KeitaW
left a comment
There was a problem hiding this comment.
Review Batch 1/2 — Deployment Pipeline & Operational Correctness
Thanks for this portability fix — it's small, consistent, and well-targeted. One note below worth confirming before merge.
| @@ -1,12 +1,17 @@ | |||
| #! /bin/bash -x | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Shebang change silently drops set -x tracing
I noticed the shebang changed from #! /bin/bash -x to #!/bin/bash. Normalizing the stray space after #! is a good cleanup, but the same edit also removes the -x flag, which was turning on execution tracing for the whole script. For an Nsight profiling helper, that trace is often deliberately there so the exact nsys profile ... invocation shows up in the Slurm logs — losing it makes failed profiling runs harder to debug. Could you confirm dropping -x is intentional? If the trace was load-bearing, I'd keep it as #!/bin/bash -x (or add an explicit set -x after the shebang) so the normalization doesn't change observable behavior.
There was a problem hiding this comment.
Thank you for the detailed explanation. I will send out a second revision to get this comment resolved
KeitaW
left a comment
There was a problem hiding this comment.
Review Batch 2/2 — Documentation Consistency
New override env vars aren't documented anywhere
Files: 3.test_cases/23.SMHP-esm2/, 3.test_cases/megatron/bionemo/bionemo_2.5/, 4.validation_and_observability/5.nsight/nemotron/ (READMEs)
The three new knobs — ESM2_DATA_DIR, DATA_HOME_DIR, NSYS_OUTPUT_DIR — are the user-facing payoff of this PR, but a reader can only discover them by opening the scripts. I checked the corresponding READMEs and none mention them (and the SMHP-esm2 README's prerequisites section would be the natural home). The PR checklist also leaves "README updated" unchecked. I'd suggest a one-line note per test case along the lines of "Set DATA_HOME_DIR to your shared-filesystem mount before running; it defaults to /fsxl/$USER/bionemo." so the override is discoverable without reading source. Not blocking, but it's the difference between a knob people use and one they never find.
Things That Look Great
- The bionemo
enroot.shbug fix is real and correct. The original removed/fsxl/awsankur/bionemo.sqsh(parent dir) but imported to/fsxl/awsankur/bionemo/bionemo.sqsh(subdir) — so the cleanup never actually deleted the stale image. The fix now removes and imports to the same${DATA_HOME_DIR}/bionemo.sqsh, and I verified that path matches theIMAGEdefault intrain-esm.sbatch. The inline comment explaining the path match is a nice touch. - Consistent, idiomatic env-var defaulting. Using
: "${VAR:=default}"(andexport VARseparately in the sbatch) is exactly the right pattern, and the defaults are kept consistent across sibling scripts (ESM2_DATA_DIRin SMHP-esm2'senroot.shand4.train_docker_dpp.sh;DATA_HOME_DIRacross all three bionemo scripts). - Quoting was tightened along the way. The SMHP-esm2
enroot.shwent from unquoted$file_nameto"$file_name"and gained a trailing newline — small but correct hygiene improvements bundled sensibly with the path change. - Helpful override comments. Every introduced default is preceded by a short "Override … if your shared filesystem mount differs" comment, which makes the intent obvious at the point of use.
- Tightly scoped. Six files, all related to the same portability theme, no drive-by refactors. Easy to review.
Introduce DATA_HOME_DIR, ESM2_DATA_DIR, and NSYS_OUTPUT_DIR environment variables that default to the original values. Fix bionemo enroot.sh removing wrong path for bionemo.sqsh.
Purpose
Fixes portability issues where scripts assumed a specific user's filesystem layout, making them unusable without manual edits.
Changes
DATA_HOME_DIRandESM2_DATA_DIRenv vars (defaulting to original values)NSYS_OUTPUT_DIRenv varenroot.shremoving wrong path forbionemo.sqsh(was targeting source instead of enroot import output)#!/bin/bashTest Plan
N/A - shell variable substitution only; no behavioral change when env vars are unset (defaults match original hardcoded values).
Environment:
Test commands:
Test Results
No runtime changes when env vars are unset. Scripts produce identical behavior to before.
Directory Structure
No new directories or files added. Changes are to existing scripts only.
Checklist
mainbranch.latest).