Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start 40th scheduler client in error for 1m nodes / 40 schedulers/ 25K nodes per scheduler #88

Closed
q131172019 opened this issue Jul 18, 2022 · 3 comments
Assignees

Comments

@q131172019
Copy link
Collaborator

q131172019 commented Jul 18, 2022

In the test for 'field goal' (1m nodes / 40 schedulers / 25K nodes per scheduler) https://github.com/yb01/arktos/wiki/730-test, 40th scheduler client can start in error using the following for-loop bash script with any delay between starting every scheduler.

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ for i in {1..14}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=6 > ~/TMP/simulator.6.$i.log.2022-07-15.v000052 2>&1 & done

ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ for i in {1..14}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.8.$i.log.2022-07-15.v000052 2>&1 & done

ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ for i in {1..12}; do go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.7.$i.log.2022-07-15.v000052 2>&1 & done
40th client
I0716 02:01:46.641067   77017 singleClientTest.go:184] End of results
I0716 02:01:46.641087   77017 stats.go:26] RegisterClientDuration: 53.424195ms
I0716 02:01:46.641101   77017 stats.go:39] ListDuration: 1.267543087s. Number of nodes listed: 25002
I0716 02:01:46.641111   77017 stats.go:56] Watch session last: 30m0.001236584s
I0716 02:01:46.641118   77017 stats.go:57] Number of nodes Added: 0
I0716 02:01:46.641126   77017 stats.go:58] Number of nodes Updated: 16693
I0716 02:01:46.641156   77017 stats.go:59] Number of nodes Deleted: 0
I0716 02:01:46.641162   77017 stats.go:60] Number of nodes watch prolonged than 1s: 10537

If adding 2 seconds delay between starting every scheduler, 40th scheduler, 41th scheduler can not be allocated with 25K request machines due to error "no enough hosts.", which is expected.

--- 41 schedulers

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ for i in {1..14}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=6 > ~/TMP/simulator.6.$i.log.2022-07-15.v000052 2>&1 & done
ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ for i in {1..14}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.8.$i.log.2022-07-15.v000052 2>&1 & done
ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ for i in {1..13}; do sleep 2;  go run resource-management/test/e2e/singleClientTest.go --service_url=ec2-34-214-160-3.us-west-2.compute.amazonaws.com:8080 --request_machines=25000 --action=watch --repeats=1 --limit=26000 -v=9 > ~/TMP/simulator.7.$i.log.2022-07-15.v000052 2>&1 & done


ubuntu@ip-172-31-8-82:~/go/src/global-resource-service$ redis-cli info keyspace
# Keyspace
db0:keys=1000041,expires=0,avg_ttl=0

40th client:
I0716 03:07:02.869481   28054 installer.go:30] handle /client. URL path: /clients
I0716 03:07:02.869532   28054 installer.go:48] handle client registration
E0716 03:07:02.870499   28054 distributor.go:66] Error allocate resource for client. Error Not enough hosts
I0716 03:07:02.870526   28054 installer.go:79] error register client. error Not enough hosts

41th client:
I0716 03:07:04.862654   28054 installer.go:30] handle /client. URL path: /clients
I0716 03:07:04.862707   28054 installer.go:48] handle client registration
E0716 03:07:04.863704   28054 distributor.go:66] Error allocate resource for client. Error Not enough hosts
I0716 03:07:04.863744   28054 installer.go:79] error register client. error Not enough hosts
@yb01
Copy link
Collaborator

yb01 commented Jul 18, 2022

Assign to Carl to confirm if other client failed instead of the 41th one.

@q131172019
Copy link
Collaborator Author

q131172019 commented Jul 18, 2022

For test for 40 scheduler clients:

ubuntu@ip-172-31-4-135:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.6.*.log.2022-07-14.v000047

/home/ubuntu/TMP/simulator.6.1.log.2022-07-14.v000047:I0715 02:19:04.610735   65558 singleClientTest.go:145] Got [25047] nodes from service
/home/ubuntu/TMP/simulator.6.10.log.2022-07-14.v000047:I0715 02:19:04.616474   65584 singleClientTest.go:145] Got [25075] nodes from service
/home/ubuntu/TMP/simulator.6.11.log.2022-07-14.v000047:I0715 02:19:04.326803   65514 singleClientTest.go:145] Got [25090] nodes from service
/home/ubuntu/TMP/simulator.6.12.log.2022-07-14.v000047:I0715 02:19:04.521134   65573 singleClientTest.go:145] Got [25013] nodes from service
/home/ubuntu/TMP/simulator.6.13.log.2022-07-14.v000047:I0715 02:19:03.725418   65529 singleClientTest.go:145] Got [25055] nodes from service
/home/ubuntu/TMP/simulator.6.14.log.2022-07-14.v000047:I0715 02:19:04.112944   65601 singleClientTest.go:145] Got [25045] nodes from service
/home/ubuntu/TMP/simulator.6.2.log.2022-07-14.v000047:I0715 02:19:04.181524   65547 singleClientTest.go:145] Got [25051] nodes from service
/home/ubuntu/TMP/simulator.6.3.log.2022-07-14.v000047:I0715 02:19:04.229525   65569 singleClientTest.go:145] Got [25011] nodes from service
/home/ubuntu/TMP/simulator.6.4.log.2022-07-14.v000047:I0715 02:19:04.603221   65592 singleClientTest.go:145] Got [25088] nodes from service
/home/ubuntu/TMP/simulator.6.5.log.2022-07-14.v000047:I0715 02:19:03.679677   65506 singleClientTest.go:145] Got [25076] nodes from service
/home/ubuntu/TMP/simulator.6.6.log.2022-07-14.v000047:I0715 02:19:04.546804   65553 singleClientTest.go:145] Got [25092] nodes from service
/home/ubuntu/TMP/simulator.6.7.log.2022-07-14.v000047:I0715 02:19:04.404384   65515 singleClientTest.go:145] Got [25079] nodes from service
/home/ubuntu/TMP/simulator.6.8.log.2022-07-14.v000047:I0715 02:19:03.934076   65535 singleClientTest.go:145] Got [25043] nodes from service
/home/ubuntu/TMP/simulator.6.9.log.2022-07-14.v000047:I0715 02:19:04.008116   65501 singleClientTest.go:145] Got [25011] nodes from service

ubuntu@ip-172-31-1-189:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.8.*.log.2022-07-15.v000052

/home/ubuntu/TMP/simulator.8.1.log.2022-07-15.v000052:I0715 23:42:55.840128   75031 singleClientTest.go:145] Got [25042] nodes from service
/home/ubuntu/TMP/simulator.8.10.log.2022-07-15.v000052:I0715 23:42:55.944416   75082 singleClientTest.go:145] Got [25070] nodes from service
/home/ubuntu/TMP/simulator.8.11.log.2022-07-15.v000052:I0715 23:42:55.704695   75076 singleClientTest.go:145] Got [25038] nodes from service
/home/ubuntu/TMP/simulator.8.12.log.2022-07-15.v000052:I0715 23:42:55.367043   75020 singleClientTest.go:145] Got [25052] nodes from service
/home/ubuntu/TMP/simulator.8.13.log.2022-07-15.v000052:I0715 23:42:55.952733   75048 singleClientTest.go:145] Got [25049] nodes from service
/home/ubuntu/TMP/simulator.8.14.log.2022-07-15.v000052:I0715 23:42:55.712086   75057 singleClientTest.go:145] Got [25060] nodes from service
/home/ubuntu/TMP/simulator.8.2.log.2022-07-15.v000052:I0715 23:42:55.209840   74999 singleClientTest.go:145] Got [25026] nodes from service
/home/ubuntu/TMP/simulator.8.3.log.2022-07-15.v000052:I0715 23:42:55.136889   75089 singleClientTest.go:145] Got [25039] nodes from service
/home/ubuntu/TMP/simulator.8.4.log.2022-07-15.v000052:I0715 23:42:55.355580   75069 singleClientTest.go:145] Got [25093] nodes from service
/home/ubuntu/TMP/simulator.8.5.log.2022-07-15.v000052:I0715 23:42:55.356173   75012 singleClientTest.go:145] Got [25089] nodes from service
/home/ubuntu/TMP/simulator.8.6.log.2022-07-15.v000052:I0715 23:42:55.843556   75041 singleClientTest.go:145] Got [25035] nodes from service
/home/ubuntu/TMP/simulator.8.7.log.2022-07-15.v000052:I0715 23:42:54.562428   75000 singleClientTest.go:145] Got [25075] nodes from service
/home/ubuntu/TMP/simulator.8.8.log.2022-07-15.v000052:I0715 23:42:55.579352   75063 singleClientTest.go:145] Got [25064] nodes from service
/home/ubuntu/TMP/simulator.8.9.log.2022-07-15.v000052:I0715 23:42:55.886638   75025 singleClientTest.go:145] Got [25040] nodes from service

ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ grep -i "nodes from service" ~/TMP/simulator.7.*.log.2022-07-15.v000052

/home/ubuntu/TMP/simulator.7.1.log.2022-07-15.v000052:I0715 23:43:05.943321   41299 singleClientTest.go:145] Got [25062] nodes from service
/home/ubuntu/TMP/simulator.7.10.log.2022-07-15.v000052:I0715 23:43:06.272311   41311 singleClientTest.go:145] Got [25077] nodes from service
/home/ubuntu/TMP/simulator.7.11.log.2022-07-15.v000052:I0715 23:43:04.089682   41363 singleClientTest.go:145] Got [25039] nodes from service
/home/ubuntu/TMP/simulator.7.12.log.2022-07-15.v000052:I0715 23:43:05.740859   41372 singleClientTest.go:145] Got [25099] nodes from service
/home/ubuntu/TMP/simulator.7.2.log.2022-07-15.v000052:I0715 23:43:05.680797   41334 singleClientTest.go:145] Got [25006] nodes from service
/home/ubuntu/TMP/simulator.7.3.log.2022-07-15.v000052:I0715 23:43:04.461240   41354 singleClientTest.go:145] Got [25059] nodes from service
/home/ubuntu/TMP/simulator.7.4.log.2022-07-15.v000052:I0715 23:43:05.535765   41336 singleClientTest.go:145] Got [25080] nodes from service
/home/ubuntu/TMP/simulator.7.5.log.2022-07-15.v000052:I0715 23:43:06.353896   41348 singleClientTest.go:145] Got [25051] nodes from service
/home/ubuntu/TMP/simulator.7.7.log.2022-07-15.v000052:I0715 23:43:05.648190   41328 singleClientTest.go:145] Got [25033] nodes from service
/home/ubuntu/TMP/simulator.7.8.log.2022-07-15.v000052:I0715 23:43:05.163045   41307 singleClientTest.go:145] Got [25064] nodes from service
/home/ubuntu/TMP/simulator.7.9.log.2022-07-15.v000052:I0715 23:43:05.460942   41321 singleClientTest.go:145] Got [25066] nodes from service

In 3rd batch, the 6th schedule client was not allocated with the 25K requested machine due to "no enough machines"
ubuntu@ip-172-31-17-152:~/go/src/global-resource-service$ tail -f /home/ubuntu/TMP/simulator.7.6.log.2022-07-15.v000052

I0715 23:43:02.204094   41378 singleClientTest.go:120] Register client to service  ...
I0715 23:43:02.204288   41378 request.go:627] Request Body: {"client_info":{"client_name":"testclient","client_region":"Beijing"},"init_resource_request":{"total_machines":25000}}
I0715 23:43:02.254749   41378 request.go:627] Response Body:
E0715 23:43:02.254784   41378 singleClientTest.go:125] failed register client to service. error unexpected end of JSON input

So this issue was not real issue. It also verified our algorithm was correct so far.

@q131172019
Copy link
Collaborator Author

This is not real issue and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants