Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.12 appendOrMergeRPC inefficiency #581

Open
algorandskiy opened this issue Oct 1, 2024 · 5 comments
Open

v0.12 appendOrMergeRPC inefficiency #581

algorandskiy opened this issue Oct 1, 2024 · 5 comments

Comments

@algorandskiy
Copy link
Contributor

algorandskiy commented Oct 1, 2024

Summary

I've been running some performance tests and discovered the following:

  1. doValidateTopic is blocked on a channel v.p.sendMsg <- msg for log time (from block profile):
    382            .          .           func (v *validation) doValidateTopic(vals []*validatorImpl, src peer.ID, msg *Message, r ValidationResult) { 
    388            .          .            
    389            .          .           	switch result { 
    390            .          .           	case ValidationAccept: 
    391            .   19.14hrs           		v.p.sendMsg <- msg 
    392            .          .           	case ValidationReject: 
  1. Since the channel is only consumed in a single threaded non-blocking processLoop method, this means processLoop is not fast enough anymore. Here are excerpts:
  Total:        50ms     23.66s (flat, cum) 12.34%
    562            .          .           func (p *PubSub) processLoop(ctx context.Context) { 
    574            .      1.93s           		select { 
    575         30ms       30ms           		case <-p.newPeers: 
    639            .          .           		case rpc := <-p.incoming: 
    640         10ms     17.53s           			p.handleIncomingRPC(rpc) 
    641            .          .            
    642            .          .           		case msg := <-p.sendMsg: 
    643            .      3.61s           			p.publishMessage(msg) 
  1. handleIncomingRPC boils down to appendOrMergeRPC that is accountable for 11/17=64% of a time
  Total:        10ms     11.62s (flat, cum)  6.06%
   1365            .          .           func appendOrMergeRPC(slice []*RPC, limit int, elems ...RPC) []*RPC {
   1381            .          .           	for _, elem := range elems { 
   1382            .          .           		lastRPC := out[len(out)-1] 
   1388            .          .           		for _, msg := range elem.GetPublish() { 
   1389         10ms     11.58s           			if lastRPC.Publish = append(lastRPC.Publish, msg); lastRPC.Size() > limit { 
   1390            .          .           				lastRPC.Publish = lastRPC.Publish[:len(lastRPC.Publish)-1] 

It appears this loop with lastRPC.Size() call gives us n^2 complexity since lastRPC.Size() itself iterates over lastRPC.Publish but we have not to because it was just appended and its size step back is known so it can be a simple addition.

Impact

This appears to be responsible for 14% TPS lose after upgrading to v0.12.

@vyzo
Copy link
Collaborator

vyzo commented Oct 1, 2024

Care for a patch? This was recently changed.

cc @MarcoPolo

@MarcoPolo
Copy link
Contributor

Hi, thanks for investigating this. Is the 14% TPS drop something you noticed in production or doing a local stress test?

The n^2 issue was known to me, but it was preferred as the simpler solution. The alternative is not a simple addition. This is a protobuf with a varint encoding for the length prefix, which means that when you cross certain boundaries the length prefix will increase by a byte. For example when the size increases from 127 to 128 the length prefix value goes from 7f to 8001.

The actual solution would involve this code being aware of the Protobuf encoding to catch that edge case. An alternative and slightly hacky solution would be to assume that all length prefixes are at most 8 bytes. If you're keen to submit a patch I'd prefer the hacky solution as it is likely less brittle than trying to be aware of the pb encoding.

@vyzo
Copy link
Collaborator

vyzo commented Oct 1, 2024

that sounds reasonable actually, not all that hacky!

@algorandskiy
Copy link
Contributor Author

Hi, thanks for investigating this. Is the 14% TPS drop something you noticed in production or doing a local stress test?

It was a load single DC test. We are going into production with 0.10 this week.

This is a protobuf with a varint encoding for the length prefix

Exactly, that's why it is an issue and not a pull request.

If you're keen to submit a patch I'd prefer the hacky solution as it is likely less brittle than trying to be aware of the pb encoding.

I agree, small possible over estimation should be tolerable here. I'll try one next days and report back.

@algorandskiy
Copy link
Contributor Author

I published a draft (it is not working as expected in gaining performance, investigating) but want to make sure we are aligned in the approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants