-
Notifications
You must be signed in to change notification settings - Fork 30
Updated test logging and timeouts #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from 1 commit
33d8fe2
5e21c6c
aa587fb
2d5ca6f
f39b0f3
ba88e02
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,17 @@ TEST_START_TS=`date +%s` | |
| #To disable some logs trimming | ||
| export CI=1 | ||
|
|
||
| # Crash/hang visibility and bounding: | ||
| # - PYTHONFAULTHANDLER dumps a Python traceback on fatal signals (segfaults). | ||
| # - PYTEST_TIMEOUT bounds every individual test item so a single hang cannot | ||
| # stall the whole CI job; the offending test is recorded as a failure with a | ||
| # traceback instead of the run silently timing out hours later. | ||
| # All are overridable from the environment. | ||
| export PYTHONFAULTHANDLER=1 | ||
| : ${PYTEST_TIMEOUT:=1200} # per-test (per-parametrization) timeout, seconds | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say 20 minutes for individual test is overkill
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated to 5min |
||
| : ${PYTEST_TIMEOUT_METHOD:=thread} # 'thread' reliably unsticks GPU/collective hangs | ||
| : ${CTEST_TIMEOUT:=1200} # per-cpp-test timeout, seconds | ||
|
|
||
| _script_error_count=0 | ||
| _run_error_count=0 | ||
| _ignored_error_count=0 | ||
|
|
@@ -213,6 +224,12 @@ get_pytest_junitxml() { | |
| fi | ||
| } | ||
|
|
||
| get_ctest_junitxml() { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also update README to indicate that core tests honor JUNITXML* envs
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated |
||
| if [ -n "$JUNITXML_PREFIX$JUNITXML_SUFFIX" ]; then | ||
| echo "--output-junit ${JUNITXML_PREFIX}$1${JUNITXML_SUFFIX}" | ||
| fi | ||
| } | ||
|
|
||
| check_test_filter() { | ||
| test -z "$TEST_FILTER" && return 0 | ||
| for _tf in $TEST_FILTER; do | ||
|
|
@@ -266,7 +283,12 @@ pytest_run() { | |
| check_test_filter $_test_name_tag || return | ||
| _start_ts=`date +%s` | ||
| echo "Run [$_test_variant_tag] $@ at `time_elapsed $TEST_START_TS`" | ||
| python3 -m pytest -v -rfEs `get_pytest_junitxml $_test_name_tag` $TEST_PYTEST_ARGS "$TEST_DIR/$@" | ||
| # A per-test timeout is applied to every item. Callers may still append their | ||
| # own --timeout/--timeout-method (e.g. distributed tests); since argparse | ||
| # takes the last value, a caller-supplied override wins over these defaults. | ||
| python3 -m pytest -v -rfEs \ | ||
| --timeout=$PYTEST_TIMEOUT --timeout-method=$PYTEST_TIMEOUT_METHOD \ | ||
| `get_pytest_junitxml $_test_name_tag` $TEST_PYTEST_ARGS "$TEST_DIR/$@" | ||
| test $? -eq 0 || test_run_error "[$_test_variant_tag] $1" | ||
| echo "Done [$_test_variant_tag] $1 in `time_elapsed $_start_ts`" | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| #!/usr/bin/env python3 | ||
| # Copyright (c) 2026, Advanced Micro Devices, Inc. All rights reserved. | ||
| # | ||
| # See LICENSE for license information. | ||
|
|
||
| """Summarize pytest/ctest JUnit XML results into a GitHub Actions report. | ||
|
|
||
| Reads every ``*.xml`` file in the given directory -- each produced by a single | ||
| pytest file invocation (see ``get_pytest_junitxml`` in ``_utils.sh``) or a | ||
| ctest run -- aggregates pass/fail/error/skip/timeout counts, writes a Markdown | ||
| digest to ``$GITHUB_STEP_SUMMARY`` (or stdout when run locally), and emits | ||
| ``::error::`` workflow annotations for the failing tests. | ||
|
|
||
| Design notes: | ||
| * Standard library only, so it runs on any runner without provisioning. | ||
| * Purely informational -- it always exits 0 and never gates the job. The | ||
| pass/fail gate stays with the existing ``FAIL_*`` markers / suite exit | ||
| codes. This keeps the change strictly additive. | ||
| * A run that is cut off mid-way (hang/crash/job-timeout) still produces a | ||
| digest for every test file that finished. Files whose XML is missing or | ||
| truncated are surfaced explicitly as "incomplete" rather than silently | ||
| dropped, which is exactly the signal that is invisible today. | ||
| """ | ||
|
|
||
| import glob | ||
| import os | ||
| import sys | ||
| import xml.etree.ElementTree as ET | ||
| from collections import defaultdict | ||
|
|
||
| # UI shows at most ~10 annotations of each level; cap to keep the log readable. | ||
| ANNOTATION_CAP = 20 | ||
|
|
||
|
|
||
| def iter_testsuites(root): | ||
| """Yield every <testsuite>; handles both <testsuites> and bare roots.""" | ||
| if root.tag == "testsuite": | ||
| yield root | ||
| else: | ||
| yield from root.iter("testsuite") | ||
|
|
||
|
|
||
| def classify(testcase): | ||
| """Return (status, first-line-message) for a <testcase>. | ||
|
|
||
| status is one of: passed, failed, error, skipped. | ||
| """ | ||
| for tag, status in (("failure", "failed"), ("error", "error"), ("skipped", "skipped")): | ||
| el = testcase.find(tag) | ||
| if el is not None: | ||
| msg = (el.get("message") or el.text or "").strip() | ||
| return status, msg | ||
| return "passed", "" | ||
|
|
||
|
|
||
| def is_timeout(message): | ||
| m = message.lower() | ||
| return "timeout" in m or "timed out" in m | ||
|
|
||
|
|
||
| def emit(lines): | ||
| """Append the report to the step summary if available, else stdout.""" | ||
| text = "\n".join(lines) + "\n" | ||
| summary = os.environ.get("GITHUB_STEP_SUMMARY") | ||
| if summary: | ||
| with open(summary, "a", encoding="utf-8") as fh: | ||
| fh.write(text) | ||
| else: | ||
| sys.stdout.write(text) | ||
|
|
||
|
|
||
| def main(): | ||
| if len(sys.argv) < 2: | ||
| print("usage: junit_report.py <results-dir> [--title TITLE]", file=sys.stderr) | ||
| return 0 | ||
| results_dir = sys.argv[1] | ||
| title = "Test Results" | ||
| if "--title" in sys.argv: | ||
| title = sys.argv[sys.argv.index("--title") + 1] | ||
|
|
||
| xml_files = sorted(glob.glob(os.path.join(results_dir, "*.xml"))) | ||
|
|
||
| lines = [] | ||
| lines.append(f"## {title}\n") | ||
|
|
||
| if not xml_files: | ||
| lines.append( | ||
| "> :warning: **No JUnit XML files were produced.** No test file " | ||
| "completed far enough to write results -- the run likely crashed or " | ||
| "hung before any suite finished. Inspect the uploaded `*.log` " | ||
| "artifacts to see where it stopped.\n" | ||
| ) | ||
| emit(lines) | ||
| return 0 | ||
|
|
||
| totals = defaultdict(float) # passed/failed/error/skipped/timeout/incomplete/time | ||
| per_file = [] # (name, counts, time) | ||
| failures = [] # (file, testid, label, message) | ||
|
|
||
| for xf in xml_files: | ||
| name = os.path.basename(xf)[: -len(".xml")] | ||
| counts = defaultdict(int) | ||
| suite_time = 0.0 | ||
|
|
||
| try: | ||
| root = ET.parse(xf).getroot() | ||
| except (ET.ParseError, OSError) as exc: | ||
| # Truncated/unreadable XML => the pytest process was killed while | ||
| # writing it (hard timeout, segfault, or job cancellation). | ||
| counts["incomplete"] += 1 | ||
| totals["incomplete"] += 1 | ||
| per_file.append((name, counts, 0.0)) | ||
| failures.append((name, "(whole file)", "incomplete", | ||
| f"unparseable/truncated XML: {exc}")) | ||
| continue | ||
|
|
||
| for ts in iter_testsuites(root): | ||
| try: | ||
| suite_time += float(ts.get("time") or 0.0) | ||
| except ValueError: | ||
| pass | ||
| for tc in ts.findall("testcase"): | ||
| status, msg = classify(tc) | ||
| counts[status] += 1 | ||
| totals[status] += 1 | ||
| if status in ("failed", "error"): | ||
| label = status | ||
| if is_timeout(msg): | ||
| label = "timeout" | ||
| totals["timeout"] += 1 | ||
| cls = tc.get("classname", "") | ||
| tcname = tc.get("name", "") | ||
| testid = f"{cls}::{tcname}" if cls else tcname | ||
| failures.append((name, testid, label, | ||
| msg.splitlines()[0] if msg else "")) | ||
|
|
||
| totals["time"] += suite_time | ||
| per_file.append((name, counts, suite_time)) | ||
|
|
||
| n_pass = int(totals["passed"]) | ||
| n_fail = int(totals["failed"]) | ||
| n_err = int(totals["error"]) | ||
| n_skip = int(totals["skipped"]) | ||
| n_to = int(totals["timeout"]) | ||
| n_incomplete = int(totals["incomplete"]) | ||
| total_tests = n_pass + n_fail + n_err + n_skip | ||
|
|
||
| ok = (n_fail + n_err + n_incomplete) == 0 | ||
| headline = ":white_check_mark:" if ok else ":x:" | ||
| summary_line = ( | ||
| f"{headline} **{total_tests} tests** -- {n_pass} passed, {n_fail} failed, " | ||
| f"{n_err} errored, {n_skip} skipped" | ||
| ) | ||
| if n_to: | ||
| summary_line += f" ({n_to} timed out)" | ||
| if n_incomplete: | ||
| summary_line += f"; **{n_incomplete} file(s) incomplete**" | ||
| summary_line += f" -- {totals['time']:.0f}s across {len(xml_files)} files\n" | ||
| lines.append(summary_line) | ||
|
|
||
| # Per-file breakdown (collapsed to keep the summary scannable). | ||
| lines.append("<details><summary>Per-file breakdown</summary>\n") | ||
| lines.append("| Test file (backend.label) | Pass | Fail | Error | Skip | Time (s) |") | ||
| lines.append("|---|---:|---:|---:|---:|---:|") | ||
| for name, counts, t in per_file: | ||
| bad = counts["failed"] + counts["error"] + counts["incomplete"] | ||
| mark = "" if bad == 0 else " :warning:" | ||
| lines.append( | ||
| f"| {name}{mark} | {counts['passed']} | {counts['failed']} | " | ||
| f"{counts['error']} | {counts['skipped']} | {t:.0f} |" | ||
| ) | ||
| lines.append("\n</details>\n") | ||
|
|
||
| # Failure / error / timeout detail (always expanded -- this is the payload). | ||
| if failures: | ||
| lines.append("### Failures / errors / timeouts\n") | ||
| for fname, testid, label, msg in failures: | ||
| entry = f"- **[{label}]** `{testid}` _(in {fname})_" | ||
| if msg: | ||
| entry += f" -- {msg}" | ||
| lines.append(entry) | ||
| lines.append("") | ||
|
|
||
| emit(lines) | ||
|
|
||
| # Inline workflow annotations. | ||
| for i, (fname, testid, label, msg) in enumerate(failures): | ||
| if i >= ANNOTATION_CAP: | ||
| print( | ||
| f"::warning::{len(failures) - ANNOTATION_CAP} more failures " | ||
| "omitted from annotations; see the job summary for the full list." | ||
| ) | ||
| break | ||
| body = msg or label | ||
| print(f"::error title={label}: {testid}::{body}") | ||
|
|
||
| return 0 | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can separate reports be generated for Pytorch/JAX/Core then? It will require passing JUNITXML_PREFIX=${JUNITXML_PREFIX}/[torch|jax|core] but will probably be much user friendly than having all tests in one report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated