Skip to content

Conversation

@Cuda-Chen
Copy link

Optimize CRC16 using multi-byte LUT.

@Cuda-Chen
Copy link
Author

Cuda-Chen commented Oct 31, 2025

Hi @zuiderkwast ,

I would like to share some benchmark results of this PR.
If we are going to merge this PR, there are at least two works waiting for finishing:

  • jump-table for reduction
  • code tidy (e.g., using macro)

Environment

  • CPU: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
  • OS: Linux 5.15.0-157-generic
  • For all "reduction", it reduces from the target length to 1-byte (e.g., 8-byte: 8 -> 4 -> 2 -> 1)

Benchmark Results

baseline (commit f54818cc60597e9fe5dc03a52fd39ab944cd4932 on unstable branch

# -t get,set
SET: 292902.97 requests per second, p50=1.431 msec                    
GET: 350557.41 requests per second, p50=1.199 msec

# -t get
GET: 454277.00 requests per second, p50=0.975 msec

2-byte reduction

# -t set,get
SET: 321502.06 requests per second, p50=1.351 msec                    
GET: 387236.69 requests per second, p50=1.151 msec

# -t get
GET: 453309.16 requests per second, p50=0.959 msec

4-byte reduction

# -t set,get
SET: 324191.12 requests per second, p50=1.335 msec                    
GET: 376364.31 requests per second, p50=1.135 msec

# -t get
GET: 470167.88 requests per second, p50=0.943 msec

8-byte reduction

# -t set,get
SET: 319550.06 requests per second, p50=1.375 msec                    
GET: 384349.31 requests per second, p50=1.143 msec

# -t get
GET: 469329.34 requests per second, p50=0.959 msec

16-byte reduction

# -t set,get
SET: 300210.12 requests per second, p50=1.431 msec                    
GET: 391880.25 requests per second, p50=1.143 msec 

# -t get
GET: 443066.03 requests per second, p50=0.943 msec

@zuiderkwast
Copy link
Contributor

Interesting! But with a larger LUT, when it is loaded into L1 cache, other stuff are evicted that might be needed later, so it may depend on use case if it's actually faster in all cases.

How much memory would the 2-byte reduction variant use? The current draft is for the 16-byte reduction, right?

@Cuda-Chen
Copy link
Author

Cuda-Chen commented Nov 7, 2025

Hi @zuiderkwast ,

How much memory would the 2-byte reduction variant use?

It will use 1 KB memory for LUT (256 entries * 2 bytes of each entry * 2 times for 2-byte lookup at the same time = 1024 bytes = 1 KB).

The current draft is for the 16-byte reduction, right?

Yes, the current draft uses 16-byte reduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants