Skip to content

Updated test logging and timeouts#608

Open
Micky774 wants to merge 6 commits into
devfrom
zain/ci-test-logging
Open

Updated test logging and timeouts#608
Micky774 wants to merge 6 commits into
devfrom
zain/ci-test-logging

Conversation

@Micky774
Copy link
Copy Markdown
Contributor

@Micky774 Micky774 commented Jun 2, 2026

Description

This PR improves CI test logging by:

  1. Adding a fixed timeout to every pytest run to ensure that CI doesn't hang
  2. Enables the partially-implemented JUNITXML_PREFIX logging
  3. Adds a parser script to digest the (newly enabled) XML test report
  4. Adds an always-run (even on the failure of previous steps) step to digest the XML via parser script

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Micky774 Micky774 added the ci-level 3 CI test level 3 label Jun 3, 2026
Comment thread ci/_utils.sh Outdated
# traceback instead of the run silently timing out hours later.
# All are overridable from the environment.
export PYTHONFAULTHANDLER=1
: ${PYTEST_TIMEOUT:=1200} # per-test (per-parametrization) timeout, seconds
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say 20 minutes for individual test is overkill

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to 5min

Comment thread .github/workflows/rocm-ci.yml Outdated
if: always()
run: |
command -v python3 >/dev/null 2>&1 || { echo "python3 not available; skipping report"; exit 0; }
python3 ci/junit_report.py test-results \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can separate reports be generated for Pytorch/JAX/Core then? It will require passing JUNITXML_PREFIX=${JUNITXML_PREFIX}/[torch|jax|core] but will probably be much user friendly than having all tests in one report

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread ci/_utils.sh
fi
}

get_ctest_junitxml() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update README to indicate that core tests honor JUNITXML* envs

Copy link
Copy Markdown
Contributor Author

@Micky774 Micky774 Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Micky774 and others added 5 commits June 5, 2026 15:41
The sGPU job ran pytorch.sh/jax.sh/core.sh in parallel into a shared
test-results/ dir and merged them into one report. Give each suite its
own subdir via a per-subprocess JUNITXML_PREFIX override and emit one
junit_report.py report per suite.

This also fixes a latent collision: test_sanity_import.py runs in both
the torch and jax suites at level 1/auto, so both wrote
test-results/test_sanity_import.auto.xml in parallel and clobbered each
other. Per-suite subdirs resolve it.

mGPU is unchanged (already per-framework via the build matrix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Micky774
Copy link
Copy Markdown
Contributor Author

Micky774 commented Jun 5, 2026

Example of test results:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants