Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate processEvent() performance #91

Closed
yb01 opened this issue Jul 19, 2022 · 7 comments
Closed

Investigate processEvent() performance #91

yb01 opened this issue Jul 19, 2022 · 7 comments
Assignees
Milestone

Comments

@yb01
Copy link
Collaborator

yb01 commented Jul 19, 2022

On my local mac box, with 25k nodes, initial pull processEvent() call took near 4.5 seconds, this seems a bit too long. Did you guys observe the same ?

Seems something worth to take look if not done yet.

start := time.Now()
eventProcess, crv = a.EventProcessor.ProcessEvents(minRecordNodeEvents)
end := time.Now()
klog.V(6).Infof("Event Processor Processed nodes results : %v. duration: %v", eventProcess, end.Sub(start))
I0719 15:50:06.180447   91364 aggregator.go:87] Starting loop pulling nodes from region: localhost:9119
I0719 15:50:06.750931   91364 aggregator.go:97] Total (25000) region node events are pulled successfully in (1) RPs
I0719 15:50:06.751101   91364 aggregator.go:104] Total (25000) mini node events are converted successfully with length (25000)
I0719 15:50:11.236493   91364 aggregator.go:116] Event Processor Processed nodes results : true. duration: 4.485234572s
@Sindica
Copy link
Collaborator

Sindica commented Jul 20, 2022

Not sure what is your set up. On my dev machine ubuntu 1604, t2.2xlarge, 8vCPU, 32GB mem, 1M update node events took 3.45s, add node took 3.78s. 10K add/update events took 33ms, 100K add/update events took <350ms

@yb01
Copy link
Collaborator Author

yb01 commented Jul 20, 2022

2.6 GHz Quad-Core Intel Core i7
16 GB 2133 MHz LPDDR3
250G SSD
The test is with real redis store on the same machine.

@yb01
Copy link
Collaborator Author

yb01 commented Jul 20, 2022

UT, with the fake store, runs much better as shown below on the same machine, which may indicate the redis store issue here.

/___TestAddNodes_in_global_resource_service_resource_management_pkg_distributor.test -test.v -test.paniconexit0 -test.run ^\QTestAddNodes\E$
=== RUN   TestAddNodes
    distributor_test.go:92: Processing 10 AddNode events took 65.531µs. Composite RVs map[{Beijing RP1}:10]
    distributor_test.go:92: Processing 100 AddNode events took 316.521µs. Composite RVs map[{Beijing RP1}:110]
    distributor_test.go:92: Processing 1000 AddNode events took 1.508836ms. Composite RVs map[{Beijing RP1}:1110]
    distributor_test.go:92: Processing 10000 AddNode events took 13.144886ms. Composite RVs map[{Beijing RP1}:11110]
    distributor_test.go:92: Processing 100000 AddNode events took 133.951788ms. Composite RVs map[{Beijing RP1}:111110]
    distributor_test.go:92: Processing 1000000 AddNode events took 2.02929915s. Composite RVs map[{Beijing RP1}:1111110]
--- PASS: TestAddNodes (3.86s)

with change to the real redis store with code change below in distributor UT, and processEvent() shows the perf issue as end to end test.

+       store := redis.NewRedisClient()
+
        // initialize persistent store
-       distributor.SetPersistHelper(fakeStorage)
+       distributor.SetPersistHelper(store)
/___TestAddNodes_in_global_resource_service_resource_management_pkg_distributor.test -test.v -test.paniconexit0 -test.run ^\QTestAddNodes\E$
=== RUN   TestAddNodes
    distributor_test.go:95: Processing 10 AddNode events took 11.276677ms. Composite RVs map[{Beijing RP1}:10]
    distributor_test.go:95: Processing 100 AddNode events took 7.372152ms. Composite RVs map[{Beijing RP1}:110]
    distributor_test.go:95: Processing 1000 AddNode events took 4.39070858s. Composite RVs map[{Beijing RP1}:1110]
    distributor_test.go:95: Processing 10000 AddNode events took 1.717518133s. Composite RVs map[{Beijing RP1}:11110]
    distributor_test.go:95: Processing 100000 AddNode events took 2.700634385s. Composite RVs map[{Beijing RP1}:111110]
    distributor_test.go:95: Processing 1000000 AddNode events took 31.158488938s. Composite RVs map[{Beijing RP1}:1111110]
--- PASS: TestAddNodes (41.59s)
PASS

@Sindica
Copy link
Collaborator

Sindica commented Jul 20, 2022

Can you turn on profiling for your test code?

@yb01
Copy link
Collaborator Author

yb01 commented Jul 20, 2022

options to take:

  1. redis pool size configuration, -- Carl to confirm if we need to explictly config or just use default
  2. ProcessEvent go routine control, use fixed number of workers instead of one routine per node
  3. lIkely ProcessEvent return after memorycache store is done, and leave redis store to continue to persist data.-- likely will introduce other issues when next batch of nodes come in.

@yb01
Copy link
Collaborator Author

yb01 commented Aug 8, 2022

in 730, used batch persist to avoid large number of go routines.

option 3, will need evaluated with 1) perf improvement 2) reliability for service failure cases.

@yb01 yb01 added this to the 2022-1230 milestone Aug 8, 2022
@yb01
Copy link
Collaborator Author

yb01 commented Oct 14, 2022

after use list/watch, list perf is satisfying for now.

@yb01 yb01 closed this as completed Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants