Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

q131172019 · 2022-07-18T17:11:17Z

In the test for 'field goal' (1m nodes / 40 schedulers / 25K nodes per scheduler) https://github.com/yb01/arktos/wiki/730-test, 40th scheduler client can start in error using the following for-loop bash script with any delay between starting every scheduler.

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ for i in {1..14}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=6 > ~/TMP/simulator.6.$i.log.2022-07-15.v000052 2>&1 & done

ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ for i in {1..14}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.8.$i.log.2022-07-15.v000052 2>&1 & done

ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ for i in {1..12}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.7.$i.log.2022-07-15.v000052 2>&1 & done

40th client
I0716 02:01:46.641067   77017 singleClientTest.go:184] End of results
I0716 02:01:46.641087   77017 stats.go:26] RegisterClientDuration: 53.424195ms
I0716 02:01:46.641101   77017 stats.go:39] ListDuration: 1.267543087s. Number of nodes listed: 25002
I0716 02:01:46.641111   77017 stats.go:56] Watch session last: 30m0.001236584s
I0716 02:01:46.641118   77017 stats.go:57] Number of nodes Added: 0
I0716 02:01:46.641126   77017 stats.go:58] Number of nodes Updated: 16693
I0716 02:01:46.641156   77017 stats.go:59] Number of nodes Deleted: 0
I0716 02:01:46.641162   77017 stats.go:60] Number of nodes watch prolonged than 1s: 10537

If adding 2 seconds delay between starting every scheduler, 40th scheduler, 41th scheduler can not be allocated with 25K request machines due to error "no enough hosts.", which is expected.

--- 41 schedulers

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ for i in {1..14}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=6 > ~/TMP/simulator.6.$i.log.2022-07-15.v000052 2>&1 & done
ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ for i in {1..14}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.8.$i.log.2022-07-15.v000052 2>&1 & done
ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ for i in {1..13}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.7.$i.log.2022-07-15.v000052 2>&1 & done


ubuntu@ip-172-31-8-82:~/go/src/global-resource-service$ redis-cli info keyspace
# Keyspace
db0:keys=1000041,expires=0,avg_ttl=0

40th client:
I0716 03:07:02.869481   28054 installer.go:30] handle /client. URL path: /clients
I0716 03:07:02.869532   28054 installer.go:48] handle client registration
E0716 03:07:02.870499   28054 distributor.go:66] Error allocate resource for client. Error Not enough hosts
I0716 03:07:02.870526   28054 installer.go:79] error register client. error Not enough hosts

41th client:
I0716 03:07:04.862654   28054 installer.go:30] handle /client. URL path: /clients
I0716 03:07:04.862707   28054 installer.go:48] handle client registration
E0716 03:07:04.863704   28054 distributor.go:66] Error allocate resource for client. Error Not enough hosts
I0716 03:07:04.863744   28054 installer.go:79] error register client. error Not enough hosts

The text was updated successfully, but these errors were encountered:

yb01 · 2022-07-18T21:41:53Z

Assign to Carl to confirm if other client failed instead of the 41th one.

q131172019 · 2022-07-18T21:44:33Z

For test for 40 scheduler clients:

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.6.*.log.2022-07-14.v000047

/home/ubuntu/TMP/simulator.6.1.log.2022-07-14.v000047:I0715 02:19:04.610735   65558 singleClientTest.go:145] Got [25047] nodes from service
/home/ubuntu/TMP/simulator.6.10.log.2022-07-14.v000047:I0715 02:19:04.616474   65584 singleClientTest.go:145] Got [25075] nodes from service
/home/ubuntu/TMP/simulator.6.11.log.2022-07-14.v000047:I0715 02:19:04.326803   65514 singleClientTest.go:145] Got [25090] nodes from service
/home/ubuntu/TMP/simulator.6.12.log.2022-07-14.v000047:I0715 02:19:04.521134   65573 singleClientTest.go:145] Got [25013] nodes from service
/home/ubuntu/TMP/simulator.6.13.log.2022-07-14.v000047:I0715 02:19:03.725418   65529 singleClientTest.go:145] Got [25055] nodes from service
/home/ubuntu/TMP/simulator.6.14.log.2022-07-14.v000047:I0715 02:19:04.112944   65601 singleClientTest.go:145] Got [25045] nodes from service
/home/ubuntu/TMP/simulator.6.2.log.2022-07-14.v000047:I0715 02:19:04.181524   65547 singleClientTest.go:145] Got [25051] nodes from service
/home/ubuntu/TMP/simulator.6.3.log.2022-07-14.v000047:I0715 02:19:04.229525   65569 singleClientTest.go:145] Got [25011] nodes from service
/home/ubuntu/TMP/simulator.6.4.log.2022-07-14.v000047:I0715 02:19:04.603221   65592 singleClientTest.go:145] Got [25088] nodes from service
/home/ubuntu/TMP/simulator.6.5.log.2022-07-14.v000047:I0715 02:19:03.679677   65506 singleClientTest.go:145] Got [25076] nodes from service
/home/ubuntu/TMP/simulator.6.6.log.2022-07-14.v000047:I0715 02:19:04.546804   65553 singleClientTest.go:145] Got [25092] nodes from service
/home/ubuntu/TMP/simulator.6.7.log.2022-07-14.v000047:I0715 02:19:04.404384   65515 singleClientTest.go:145] Got [25079] nodes from service
/home/ubuntu/TMP/simulator.6.8.log.2022-07-14.v000047:I0715 02:19:03.934076   65535 singleClientTest.go:145] Got [25043] nodes from service
/home/ubuntu/TMP/simulator.6.9.log.2022-07-14.v000047:I0715 02:19:04.008116   65501 singleClientTest.go:145] Got [25011] nodes from service

ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.8.*.log.2022-07-15.v000052

/home/ubuntu/TMP/simulator.8.1.log.2022-07-15.v000052:I0715 23:42:55.840128   75031 singleClientTest.go:145] Got [25042] nodes from service
/home/ubuntu/TMP/simulator.8.10.log.2022-07-15.v000052:I0715 23:42:55.944416   75082 singleClientTest.go:145] Got [25070] nodes from service
/home/ubuntu/TMP/simulator.8.11.log.2022-07-15.v000052:I0715 23:42:55.704695   75076 singleClientTest.go:145] Got [25038] nodes from service
/home/ubuntu/TMP/simulator.8.12.log.2022-07-15.v000052:I0715 23:42:55.367043   75020 singleClientTest.go:145] Got [25052] nodes from service
/home/ubuntu/TMP/simulator.8.13.log.2022-07-15.v000052:I0715 23:42:55.952733   75048 singleClientTest.go:145] Got [25049] nodes from service
/home/ubuntu/TMP/simulator.8.14.log.2022-07-15.v000052:I0715 23:42:55.712086   75057 singleClientTest.go:145] Got [25060] nodes from service
/home/ubuntu/TMP/simulator.8.2.log.2022-07-15.v000052:I0715 23:42:55.209840   74999 singleClientTest.go:145] Got [25026] nodes from service
/home/ubuntu/TMP/simulator.8.3.log.2022-07-15.v000052:I0715 23:42:55.136889   75089 singleClientTest.go:145] Got [25039] nodes from service
/home/ubuntu/TMP/simulator.8.4.log.2022-07-15.v000052:I0715 23:42:55.355580   75069 singleClientTest.go:145] Got [25093] nodes from service
/home/ubuntu/TMP/simulator.8.5.log.2022-07-15.v000052:I0715 23:42:55.356173   75012 singleClientTest.go:145] Got [25089] nodes from service
/home/ubuntu/TMP/simulator.8.6.log.2022-07-15.v000052:I0715 23:42:55.843556   75041 singleClientTest.go:145] Got [25035] nodes from service
/home/ubuntu/TMP/simulator.8.7.log.2022-07-15.v000052:I0715 23:42:54.562428   75000 singleClientTest.go:145] Got [25075] nodes from service
/home/ubuntu/TMP/simulator.8.8.log.2022-07-15.v000052:I0715 23:42:55.579352   75063 singleClientTest.go:145] Got [25064] nodes from service
/home/ubuntu/TMP/simulator.8.9.log.2022-07-15.v000052:I0715 23:42:55.886638   75025 singleClientTest.go:145] Got [25040] nodes from service

ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.7.*.log.2022-07-15.v000052

/home/ubuntu/TMP/simulator.7.1.log.2022-07-15.v000052:I0715 23:43:05.943321   41299 singleClientTest.go:145] Got [25062] nodes from service
/home/ubuntu/TMP/simulator.7.10.log.2022-07-15.v000052:I0715 23:43:06.272311   41311 singleClientTest.go:145] Got [25077] nodes from service
/home/ubuntu/TMP/simulator.7.11.log.2022-07-15.v000052:I0715 23:43:04.089682   41363 singleClientTest.go:145] Got [25039] nodes from service
/home/ubuntu/TMP/simulator.7.12.log.2022-07-15.v000052:I0715 23:43:05.740859   41372 singleClientTest.go:145] Got [25099] nodes from service
/home/ubuntu/TMP/simulator.7.2.log.2022-07-15.v000052:I0715 23:43:05.680797   41334 singleClientTest.go:145] Got [25006] nodes from service
/home/ubuntu/TMP/simulator.7.3.log.2022-07-15.v000052:I0715 23:43:04.461240   41354 singleClientTest.go:145] Got [25059] nodes from service
/home/ubuntu/TMP/simulator.7.4.log.2022-07-15.v000052:I0715 23:43:05.535765   41336 singleClientTest.go:145] Got [25080] nodes from service
/home/ubuntu/TMP/simulator.7.5.log.2022-07-15.v000052:I0715 23:43:06.353896   41348 singleClientTest.go:145] Got [25051] nodes from service
/home/ubuntu/TMP/simulator.7.7.log.2022-07-15.v000052:I0715 23:43:05.648190   41328 singleClientTest.go:145] Got [25033] nodes from service
/home/ubuntu/TMP/simulator.7.8.log.2022-07-15.v000052:I0715 23:43:05.163045   41307 singleClientTest.go:145] Got [25064] nodes from service
/home/ubuntu/TMP/simulator.7.9.log.2022-07-15.v000052:I0715 23:43:05.460942   41321 singleClientTest.go:145] Got [25066] nodes from service

In 3rd batch, the 6th schedule client was not allocated with the 25K requested machine due to "no enough machines"
ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ tail -f /home/ubuntu/TMP/simulator.7.6.log.2022-07-15.v000052

I0715 23:43:02.204094   41378 singleClientTest.go:120] Register client to service  ...
I0715 23:43:02.204288   41378 request.go:627] Request Body: {"client_info":{"client_name":"testclient","client_region":"Beijing"},"init_resource_request":{"total_machines":25000}}
I0715 23:43:02.254749   41378 request.go:627] Response Body:
E0715 23:43:02.254784   41378 singleClientTest.go:125] failed register client to service. error unexpected end of JSON input

So this issue was not real issue. It also verified our algorithm was correct so far.

q131172019 · 2022-07-18T21:53:16Z

This is not real issue and close this issue.

yb01 assigned q131172019 Jul 18, 2022

q131172019 closed this as completed Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

q131172019 commented Jul 18, 2022 •

edited

Loading

yb01 commented Jul 18, 2022

q131172019 commented Jul 18, 2022 •

edited

Loading

q131172019 commented Jul 18, 2022

Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

Comments

q131172019 commented Jul 18, 2022 • edited Loading

yb01 commented Jul 18, 2022

q131172019 commented Jul 18, 2022 • edited Loading

q131172019 commented Jul 18, 2022

q131172019 commented Jul 18, 2022 •

edited

Loading

q131172019 commented Jul 18, 2022 •

edited

Loading