v0.12 appendOrMergeRPC inefficiency #581

algorandskiy · 2024-10-01T17:18:41Z

Summary

I've been running some performance tests and discovered the following:

doValidateTopic is blocked on a channel v.p.sendMsg <- msg for log time (from block profile):

    382            .          .           func (v *validation) doValidateTopic(vals []*validatorImpl, src peer.ID, msg *Message, r ValidationResult) { 
    388            .          .            
    389            .          .           	switch result { 
    390            .          .           	case ValidationAccept: 
    391            .   19.14hrs           		v.p.sendMsg <- msg 
    392            .          .           	case ValidationReject:

Since the channel is only consumed in a single threaded non-blocking processLoop method, this means processLoop is not fast enough anymore. Here are excerpts:

  Total:        50ms     23.66s (flat, cum) 12.34%
    562            .          .           func (p *PubSub) processLoop(ctx context.Context) { 
    574            .      1.93s           		select { 
    575         30ms       30ms           		case <-p.newPeers: 
    639            .          .           		case rpc := <-p.incoming: 
    640         10ms     17.53s           			p.handleIncomingRPC(rpc) 
    641            .          .            
    642            .          .           		case msg := <-p.sendMsg: 
    643            .      3.61s           			p.publishMessage(msg)

handleIncomingRPC boils down to appendOrMergeRPC that is accountable for 11/17=64% of a time

  Total:        10ms     11.62s (flat, cum)  6.06%
   1365            .          .           func appendOrMergeRPC(slice []*RPC, limit int, elems ...RPC) []*RPC {
   1381            .          .           	for _, elem := range elems { 
   1382            .          .           		lastRPC := out[len(out)-1] 
   1388            .          .           		for _, msg := range elem.GetPublish() { 
   1389         10ms     11.58s           			if lastRPC.Publish = append(lastRPC.Publish, msg); lastRPC.Size() > limit { 
   1390            .          .           				lastRPC.Publish = lastRPC.Publish[:len(lastRPC.Publish)-1]

It appears this loop with lastRPC.Size() call gives us n^2 complexity since lastRPC.Size() itself iterates over lastRPC.Publish but we have not to because it was just appended and its size step back is known so it can be a simple addition.

Impact

This appears to be responsible for 14% TPS lose after upgrading to v0.12.

The text was updated successfully, but these errors were encountered:

vyzo · 2024-10-01T17:23:24Z

Care for a patch? This was recently changed.

cc @MarcoPolo

MarcoPolo · 2024-10-01T17:53:53Z

Hi, thanks for investigating this. Is the 14% TPS drop something you noticed in production or doing a local stress test?

The n^2 issue was known to me, but it was preferred as the simpler solution. The alternative is not a simple addition. This is a protobuf with a varint encoding for the length prefix, which means that when you cross certain boundaries the length prefix will increase by a byte. For example when the size increases from 127 to 128 the length prefix value goes from 7f to 8001.

The actual solution would involve this code being aware of the Protobuf encoding to catch that edge case. An alternative and slightly hacky solution would be to assume that all length prefixes are at most 8 bytes. If you're keen to submit a patch I'd prefer the hacky solution as it is likely less brittle than trying to be aware of the pb encoding.

vyzo · 2024-10-01T17:57:07Z

that sounds reasonable actually, not all that hacky!

algorandskiy · 2024-10-01T18:16:25Z

Hi, thanks for investigating this. Is the 14% TPS drop something you noticed in production or doing a local stress test?

It was a load single DC test. We are going into production with 0.10 this week.

This is a protobuf with a varint encoding for the length prefix

Exactly, that's why it is an issue and not a pull request.

If you're keen to submit a patch I'd prefer the hacky solution as it is likely less brittle than trying to be aware of the pb encoding.

I agree, small possible over estimation should be tolerable here. I'll try one next days and report back.

algorandskiy · 2024-10-01T21:51:05Z

I published a draft (it is not working as expected in gaining performance, investigating) but want to make sure we are aligned in the approach.

algorandskiy mentioned this issue Oct 1, 2024

Fix appendOrMergeRPC inefficiency in message size recalculation #582

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12 appendOrMergeRPC inefficiency #581

v0.12 appendOrMergeRPC inefficiency #581

algorandskiy commented Oct 1, 2024 •

edited

Loading

vyzo commented Oct 1, 2024

MarcoPolo commented Oct 1, 2024

vyzo commented Oct 1, 2024

algorandskiy commented Oct 1, 2024

algorandskiy commented Oct 1, 2024

v0.12 appendOrMergeRPC inefficiency #581

v0.12 appendOrMergeRPC inefficiency #581

Comments

algorandskiy commented Oct 1, 2024 • edited Loading

Summary

Impact

vyzo commented Oct 1, 2024

MarcoPolo commented Oct 1, 2024

vyzo commented Oct 1, 2024

algorandskiy commented Oct 1, 2024

algorandskiy commented Oct 1, 2024

algorandskiy commented Oct 1, 2024 •

edited

Loading