Implement YAML+Jinja2 support, code re-org, remove out-dated functions by hagertnl · Pull Request #264 · olcf/olcf-test-harness

hagertnl · 2026-04-30T19:25:22Z

A few interesting changes in this PR:

Implement YAML rgt_test_input.yaml, plus .template.j2 job template
Remove all the PYTHONPATH stuff from the modulefile in favor of a single sys.path addition in the core user-facing binaries (e.g., runtests.py). This is much better practice than polluting the heck out of the environment.

Planned changes before merging:

Re-structure machine_types to split out schedulers from machine_types and move rgt_tests.py over to libraries, where it belongs
Possible other modernization that is needed

…s to the public interfaces of the OTH

…fundamental_types into libraries

…_64 case and is an old architecture

…. Will re-implement if needed

hagertnl · 2026-04-30T20:26:06Z

Added a lot more cleanup here now:

broke out schedulers into their own directory
removed old RunTimeEnvironment section of .ini file (never used, probably broken)
removed get_new_environment functionality, will re-implement later if needed. But the scheduler has most of this type of functionality nowadays
removed unnecessarily-spec'd replacements: walltime, total_processes, processes_per_node, executable_path. The harness doesn't need to know any of these, the user should be free to define whatever they want to provide a time to their batch job. The primary counterexample is if you want to provide Slurm & lsf job scripts to a single test, you can't use a single walltime field, because they use different time formats
removed IBM_POWER9 "machine_type", since it can be fully represented by linux_x86_64. There is nothing specific to IBM POWER9 to require a different machine_type

…ove handling at the top launcher level.

hagertnl · 2026-05-04T15:07:11Z

Just now, I finished implementing a better CTRL+C handling as well, to bundle into this PR:

$ ./run_tests.sh
Failed to import Kafka backend: No module named 'confluent_kafka'
Failed to import Kafka backend: No module named 'confluent_kafka'
Starting tasks for harness_unit_tests.test_long_build_long_run: ['start_tests', 'stop_tests']
Using machine config: borg.ini
Failed to import Kafka backend: No module named 'confluent_kafka'
Failed to import Kafka backend: No module named 'confluent_kafka'
Path to Source: /autofs/nccs-svm1_proj/stf243/hagertnl/harness/unit_tests/harness_unit_tests/Source
Path to Build: /lustre/orion/proj-shared/stf243/hagertnl/harness_sspace/borg/05.04.26-10.58/harness_unit_tests/test_long_build_long_run/1777906730.3121712/build_directory
Path to Run_Archive: /autofs/nccs-svm1_proj/stf243/hagertnl/harness/unit_tests/harness_unit_tests/test_long_build_long_run/Run_Archive/1777906730.3121712
^CDetected CTRL+C, aborting build.
No submit action due to prior failed build.
The command 'test_harness_driver.py -r -l borg_test/hagertnl@2026-05-04T10:58:50.10 --loglevel WARNING' has exited with a failure.
The exit return value is 1.

Starting tasks for harness_unit_tests.test_long_build_long_run: ['start_tests', 'stop_tests']
Test harness_unit_tests.test_long_build_long_run failed to launch.


Using machine config: borg.ini
Failed to import Kafka backend: No module named 'confluent_kafka'
Failed to import Kafka backend: No module named 'confluent_kafka'
Path to Source: /autofs/nccs-svm1_proj/stf243/hagertnl/harness/unit_tests/harness_unit_tests/Source
Path to Build: /lustre/orion/proj-shared/stf243/hagertnl/harness_sspace/borg/05.04.26-10.58/harness_unit_tests/test_long_build_long_run/1777906733.029087/build_directory
Path to Run_Archive: /autofs/nccs-svm1_proj/stf243/hagertnl/harness/unit_tests/harness_unit_tests/test_long_build_long_run/Run_Archive/1777906733.029087
SLURM jobID = 600345
Test harness_unit_tests.test_long_build_long_run is launched.


Launched 1 tests, failed to launch 1 tests.
Failed tests:
	harness_unit_tests.test_long_build_long_run

A CTRL+C now cancels the currently-running build, additionally allowing that build to log a failed build_end event before exiting to leave behind appropriate bread crumbs. The main thread effectively ignores any CTRL+C. In the future, we may want a "if I have 2 CTRL+C's within 3 seconds, I'll cancel everything" type of functionality, but I have a feeling we may move away from this method of multithreading test submissions as we push to modernize, so I don't think it's worth the development time now.

hagertnl · 2026-05-05T12:28:15Z

Full list of changes that I think are in this PR:

Added YAML input file & Jinja2 template support (must be used together)
Remove all the PYTHONPATH modifications from the modulefile in favor of a single sys.path addition in the user-facing entry points (e.g., runtests.py, test_harness_driver.py, etc.). This is better practice than polluting the heck out of the user's global Python environment.
Move schedulers into their own directory instead of being under "machine_types"
Removed old RunTimeEnvironment section of .ini file (never used, probably doesn't even work, not a feature I'd advise folks to use)
Removed the now-unused get_new_environment functionality (was part of RunTimeEnvironment), will re-implement later if needed. But the scheduler has most of this type of functionality nowadays
Removed unnecessarily-spec'd replacements: walltime, total_processes, processes_per_node, executable_path. The harness doesn't need to know any of these, the user should be free to define whatever they want to provide a time to their batch job. The primary example is if you want to provide Slurm & LSF job scripts to a single test, you can't use a single walltime field, because they use different time formats. So the exact field should be up to the user. If they want to hard-code a 10-minute wall time in the batch script, more power to them.
Removed IBM_POWER9 "machine_type", since it can be fully represented by linux_x86_64. There is nothing specific to IBM POWER9 to require a different machine_type
Add graceful handling of CTRL+C. New behavior is to gracefully cancel the currently-running build steps, but do NOT cancel the parent process that launched it.

… feature) in the YAML-based test inputs

…to enable tab sets

hagertnl · 2026-05-11T14:31:09Z

@ddietz89 @AcerP-py , docs have been updated to include YAML+Jinja2 descriptions & examples, let me know if there's any more changes you need! I'm currently evaluating how safe it is to switch /sw/acceptance/olcf-test-harness-dev to this branch for in-production testing now. Almost certain it's backwards compatible, at least for Frontier's context (obviously, power9 was removed, so not 100% backwards compatible).

…th derived from absolute path of the current file

…test-harness into nick-issue219-yaml-jinja2

… is not set

hagertnl added 4 commits April 30, 2026 13:53

Move from PYTHONPATH in the modulefile to using sys.path modification…

c5c11ce

…s to the public interfaces of the OTH

Implement Jinja2+YAML template functionality. Basic checks passing

ea83cf4

Re-organizing scheduler and rgt_test classes, moving sole class from …

144df39

…fundamental_types into libraries

Updating __init__.py for new directory names

32cfb93

hagertnl linked an issue Apr 30, 2026 that may be closed by this pull request

Feature Request: Add support for YAML test input file and Jinja2 scheduler job templates #219

Open

hagertnl added 3 commits April 30, 2026 15:51

Removing IBM POWER9 type, since it can easily be covered by linux_x86…

1e53cfd

…_64 case and is an old architecture

Removing unnecessarily-spec'd replacements

2d5f3ae

Removing RunTimeEnvironment options and new_environment functionality…

f9b3f20

…. Will re-implement if needed

hagertnl linked an issue Apr 30, 2026 that may be closed by this pull request

Walltime required, but possible different time format required for LSF vs Slurm #178

Open

hagertnl marked this pull request as ready for review April 30, 2026 20:26

hagertnl changed the title ~~Implement YAML+Jinja2 support, code re-org~~ Implement YAML+Jinja2 support, code re-org, remove out-dated functions Apr 30, 2026

Fixing issue 168, logging build_end when CTRL+C'd. Still need to impr…

1dca02d

…ove handling at the top launcher level.

hagertnl linked an issue Apr 30, 2026 that may be closed by this pull request

Feature Request: log end event when a test is CTRL+C'd #168

Open

hagertnl added 2 commits May 1, 2026 14:09

Fix template extension detection

ae0bd27

Fully implement a fix for issue olcf#168, CTRL+C behavior

d20b921

Merged devel into nick-issue219-yaml-jinja2

e4802ad

AcerP-py reviewed May 5, 2026

View reviewed changes

hagertnl added 9 commits May 5, 2026 16:38

Switch all user-facing script path additions to use modern pathlib

c8f00b8

Make several conversions from os.path to pathlib

4ddd390

Small typo/bug/merge fixes

9fdf15c

Replacing low-hanging os.path.exists fruit with the equivalent Path

aee3597

More os.path.exists to Path conversions

0f17a4d

Small pathlib-related fixes

80c9e79

Implemented variables and replacements blocks (with string formatting…

a776c1f

… feature) in the YAML-based test inputs

Updating user guide

f18d0be

Re-building docs with latest content, added sphinx-design dependency …

f0161f8

…to enable tab sets

hagertnl added 10 commits May 11, 2026 11:18

Fixed typo in update_databases.py, made report_cmd optional

1b389b1

A few more 'import sys' bug fixes

62bc5f0

Removed lone reference to OLCF_HARNESS_DIR envvar, replaced with a Pa…

2611a25

…th derived from absolute path of the current file

Merge branch 'nick-issue219-yaml-jinja2' of github.com:hagertnl/olcf-…

99a2780

…test-harness into nick-issue219-yaml-jinja2

Add a much better default value in status_file if PATH_TO_RGT_PACKAGE…

af2e6ff

… is not set

Merge remote-tracking branch 'olcf/devel' into nick-issue219-yaml-jinja2

641b8f5

Merged upstream app/test filters into YAML branch

5b68bd3

Updating docs to cover app-filter and test-filter

c583350

Re-build docs

f6028d2

Hot-fix for un-handled Jinja2 import error in linux_utilities.py

7b860ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement YAML+Jinja2 support, code re-org, remove out-dated functions#264

Implement YAML+Jinja2 support, code re-org, remove out-dated functions#264
hagertnl wants to merge 30 commits into
olcf:develfrom
hagertnl:nick-issue219-yaml-jinja2

hagertnl commented Apr 30, 2026

Uh oh!

hagertnl commented Apr 30, 2026

Uh oh!

hagertnl commented May 4, 2026

Uh oh!

hagertnl commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hagertnl commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hagertnl commented Apr 30, 2026

Uh oh!

hagertnl commented Apr 30, 2026

Uh oh!

hagertnl commented May 4, 2026

Uh oh!

hagertnl commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hagertnl commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants