Skip to content

Conversation

@Chessray
Copy link

Supercedes #11202 (@adamretter).

The previous PR was based on @alanpaxton's personal repository. This one is now out of Evolved Binary's own space. @alanpaxton's original description is copied for clarification.

Supercedes #6884 and #6877 (cc @adamretter )

This branch/PR is based on a copy of the code in #6884, which given the age of the PR we chose to restart, though the core of the merge operator is carried over.

This merge operator adds int64_t values. At a cursory glance, this may appear similar to UInt64AddOperator, but this merge operator handles signed numbers. It can also handle both Addition or Subtraction by merging either positive or negative integers.

Values are stored into the database using 8-bit variable-byte encoding (as suggested by @pdillinger). Because the length is known, we do not require a bit flag for continuation. So for smaller positive numbers the value stored has the advantage of also consuming less bytes than UInt64AddOperator. The most significant stored byte is a 2s complement value which is sign-extended on decoding.

Therefore the range -128..127 can be stored in 1 bytes, -32768..32767 in 2 bytes, etc.
Zig-zag encoded values are in turn encoded as varints in the data. Varints now implement the 8-bit variable-length encoding (known length) suggested by @pdillinger.

A Java API to this merge operator has been added. In addition, a small Java utility is provided which implements the same encoding as the merge operator.

Changing the UInt64MergeOperator to use a similar varint encoding would require that a "new" operator be created, in order to avoid backward compatibility breaking.

The Makefile now builds a single merge_operators_test binary which combines the GTEST for this merge operator with that of stringappend. This addresses your concerns re additional test binaries.
I have regenerated TARGETS as per instructions.

alanpaxton and others added 29 commits November 24, 2025 15:54
Add back int64_add operator, unify operator tests

Start to re-integrate int64_add merge operator. The core of the implementation (by @adamretter, from PR facebook#6884) and associated tests.

“Each additional test binary adds substantial work in linking and substantial space (~200MB) to each build.”
Therefore we now build a single merge_operators_test binary which at present links the GTESTs from int64add_test.cc and stringappend_test.cc
Build a Java API for the new Int64AddMergeOperator that looks like all the other merge operators. Add tests.

Individual operator tests which are part of merge_operators_test shouldn’t be in TEST_MAIN_SOURCES

Add int64 merge operator to the tests which mention the other merge operators.
Include the header that says `extern “C” { … `
Implement encoding and zigzag in Java so that we can test it more thoroughly.
Exporting the int64 merge operator to Java requires support for encoding to make it useful.

It also requires that support in order to test it from Java.
Accidentally re-introduced deleted loading code when rebasing.
Now removed that again, and added the loading of the new int64add merge operator into the new loading code.
Moved the new int64add operator into ROCKSDB_NAMESPACE because that’s where the other merge operators are. Split the class into .h and .cc
In the context in which we use them (values in RocksDB (key,value)-pairs, value lengths are always known, so they do not need a continuation bit and a 7-bit encoding, but can be encoded more efficiently as an 8-bit encoding.

This has the added advantage of being consistent with fixed size encodings, which can be seen as having trailing-zeros in the most significant bytes, which are truncated in this encoding.
$ python3 buckifier/buckify_rocksdb.py
The problem only shows up when ASSERT_STATUS_CHECKED=1,
i.e. running changed merge_operators_test under CI
make format doesn’t format this but does complain
Don’t need to turn them into zigzag first. There end up being 2 branches in encoding anyway, but there’s perhaps a little less actual code all told.
Changed format in core C++ 8 bit varint encoding, so the Java shadow implementation (used for testing) needs to work the same way.
stdout or stderr was causing a decoding error. Make the error mode “backslashreplace” instead of the default “strict” so we have a chance of seeing the output.
@meta-cla
Copy link

meta-cla bot commented Nov 24, 2025

Hi @Chessray!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@meta-cla meta-cla bot added the CLA Signed label Nov 25, 2025
@meta-cla
Copy link

meta-cla bot commented Nov 25, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants