Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected OSSCI Related Inconsistencies / flakes #1043

Open
renxida opened this issue Mar 6, 2025 · 0 comments
Open

Suspected OSSCI Related Inconsistencies / flakes #1043

renxida opened this issue Mar 6, 2025 · 0 comments

Comments

@renxida
Copy link
Contributor

renxida commented Mar 6, 2025

This issue tracks:

  • failures reproducible on CI but not on dev machines / vice versa
  • failures that fail inconsistently on CI machines, possibly due to inter-machine differences

Huggingface Authentication Errors

E               huggingface_hub.errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-67c8d3e8-377d2afb1b6871bc6963be58;bb06d16a-9f8c-493f-9e37-fe7fd4f83d5a)
E               
E               Cannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/vae/config.json.
E               Access to model black-forest-labs/FLUX.1-dev is restricted. You must have access to it and be authenticated to access it. Please log in.

Sometimes sharktank CI failes with huggingface reporting failure to download model files.
This section keeps track of which models, which runner, and add links to the logs

Shortfin LLM Sharded Integration Tests work fine locally, but fails on CI with 2025-03-05T19:17:22.6032392Z Memory access fault by GPU node-10 (Agent handle: 0x557ae2a43b20) on address 0x7ef88f5d0000. Reason: Unknown.

See: https://github.com/nod-ai/shark-ai/actions/runs/13681855838/job/38256089307?pr=1021
One suspected cause of the inconsistency is, the CI machines have ROCk module version 6.10.10 while the machien where it worked (mi300x-3) has ROCk module version 6.12.3. There might also be other version inconsistencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant