-
Notifications
You must be signed in to change notification settings - Fork 432
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug report
Running the following:
python3 maxtext/tools/orchestration/multihost_runner.py \
--ZONE=$ZONE \
--TPU_PREFIX=$QR_NAME \
--COMMAND="bash maxtext/tools/setup/setup.sh MODE=nightly LIBTPU_GCS_PATH=${LIBTPU_GCS_PATH}"will lead to an a CalledProcessError.
This was done on a 2 slice v6e-8 (from a Queued Resource)
Logs/Output
Starting multihost runner...
2 slices found.
Traceback (most recent call last):
File "maxtext/tools/orchestration/multihost_runner.py", line 493, in <module>
main()
~~~~^^
File "maxtext/tools/orchestration/multihost_runner.py", line 464, in main
slices = get_slices()
File "maxtext/tools/orchestration/multihost_runner.py", line 182, in get_slices
completed_command = subprocess.run(command, capture_output=True, check=True)
File "/usr/lib/python3.13/subprocess.py", line 577, in run
raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['gcloud', 'compute', 'tpus', 'describe', 'REDACTED_QR_NAME-1', '--flatten=networkEndpoints[]', '--format=csv[no-heading](networkEndpoints.ipAddress)', '--project=REDACTED_PROJECT_ID', '--zone=us-central2-b']' returned non-zero exit status 2.
Environment Information
Ran from 796eaeb
Additional Context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working