I am getting the following error when trying to run the SM NB.
UnexpectedStatusException: Error for Training job pytorch-training-2023-05-19-19-07-59-014: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 train_xgboost_airline.py"
2023-05-19 19:13:50,343 INFO worker.py:1432 -- Connecting to existing Ray cluster at address: 10.2.116.47:9339...
2023-05-19 19:13:50,364 INFO worker.py:1625 -- Connected to Ray cluster.
Traceback (most recent call last):
File "train_xgboost_airline.py", line 125, in <module>
main()
File "train_xgboost_airline.py", line 107, in main
evals=[(dtrain, "train"), (dval, "val")])
File "/opt/conda/lib/python3.6/site-packages/xgboost_ray/main.py", line 1565, in train
placement_strategy,
File "/opt/conda/lib/python3.6/site-packages/xgboost_ray/main.py", line 959, in _create_placement_group
f"Placement group creation timed out after {timeout} seconds. "
TimeoutError: Placement group creation timed out after 100 seconds. Make sure your cluster either has enough resources or use an autoscaling cluster. Current resources available: {'node:10.2.116.47': 0.98, 'memory': 39562652059.0, 'CPU': 16.0, 'acc
I am getting the following error when trying to run the SM NB.