Add a new int64 add merge operator, also with Java API #14149

Chessray · 2025-11-24T16:02:16Z

The previous PR was based on @alanpaxton's personal repository. This one is now out of Evolved Binary's own space. @alanpaxton's original description is copied for clarification.

Supercedes #6884 and #6877 (cc @adamretter )

This branch/PR is based on a copy of the code in #6884, which given the age of the PR we chose to restart, though the core of the merge operator is carried over.

This merge operator adds int64_t values. At a cursory glance, this may appear similar to UInt64AddOperator, but this merge operator handles signed numbers. It can also handle both Addition or Subtraction by merging either positive or negative integers.

Values are stored into the database using 8-bit variable-byte encoding (as suggested by @pdillinger). Because the length is known, we do not require a bit flag for continuation. So for smaller positive numbers the value stored has the advantage of also consuming less bytes than UInt64AddOperator. The most significant stored byte is a 2s complement value which is sign-extended on decoding.

Therefore the range -128..127 can be stored in 1 bytes, -32768..32767 in 2 bytes, etc.
Zig-zag encoded values are in turn encoded as varints in the data. Varints now implement the 8-bit variable-length encoding (known length) suggested by @pdillinger.

A Java API to this merge operator has been added. In addition, a small Java utility is provided which implements the same encoding as the merge operator.

Changing the UInt64MergeOperator to use a similar varint encoding would require that a "new" operator be created, in order to avoid backward compatibility breaking.

The Makefile now builds a single merge_operators_test binary which combines the GTEST for this merge operator with that of stringappend. This addresses your concerns re additional test binaries.
I have regenerated TARGETS as per instructions.

@adamretter

Add back int64_add operator, unify operator tests Start to re-integrate int64_add merge operator. The core of the implementation (by @adamretter, from PR facebook#6884) and associated tests. “Each additional test binary adds substantial work in linking and substantial space (~200MB) to each build.” Therefore we now build a single merge_operators_test binary which at present links the GTESTs from int64add_test.cc and stringappend_test.cc

Build a Java API for the new Int64AddMergeOperator that looks like all the other merge operators. Add tests. Individual operator tests which are part of merge_operators_test shouldn’t be in TEST_MAIN_SOURCES Add int64 merge operator to the tests which mention the other merge operators.

Include the header that says `extern “C” { … `

Implement encoding and zigzag in Java so that we can test it more thoroughly.

Exporting the int64 merge operator to Java requires support for encoding to make it useful. It also requires that support in order to test it from Java.

Accidentally re-introduced deleted loading code when rebasing. Now removed that again, and added the loading of the new int64add merge operator into the new loading code. Moved the new int64add operator into ROCKSDB_NAMESPACE because that’s where the other merge operators are. Split the class into .h and .cc

In the context in which we use them (values in RocksDB (key,value)-pairs, value lengths are always known, so they do not need a continuation bit and a 7-bit encoding, but can be encoded more efficiently as an 8-bit encoding. This has the added advantage of being consistent with fixed size encodings, which can be seen as having trailing-zeros in the most significant bytes, which are truncated in this encoding.

Fix the test inheritance

$ python3 buckifier/buckify_rocksdb.py

The problem only shows up when ASSERT_STATUS_CHECKED=1, i.e. running changed merge_operators_test under CI

make format doesn’t format this but does complain

Don’t need to turn them into zigzag first. There end up being 2 branches in encoding anyway, but there’s perhaps a little less actual code all told.

Changed format in core C++ 8 bit varint encoding, so the Java shadow implementation (used for testing) needs to work the same way.

stdout or stderr was causing a decoding error. Make the error mode “backslashreplace” instead of the default “strict” so we have a chance of seeing the output.

meta-cla · 2025-11-24T16:02:31Z

Hi @Chessray!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2025-11-25T12:22:05Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

alanpaxton and others added 29 commits November 24, 2025 15:54

Missed include so the method was mangled

fcb76c4

Include the header that says `extern “C” { … `

Java test/support for Int64Add merge operator

c0f74cf

Implement encoding and zigzag in Java so that we can test it more thoroughly.

Extract merge encodings as a public API

7e293cb

Exporting the int64 merge operator to Java requires support for encoding to make it useful. It also requires that support in order to test it from Java.

Fix naming inconsistency and typo-ed test

6941313

File missing the (c) header

6a58407

Misnamed new file in CMakeLists

262af06

Add missing ASSERT_STATUS in tests

de2ad5d

fix format

d6e925e

simple source code checks

0d67f85

Remove LITE references

2300f0e

Fix format, add compulsory unnecessary brackets

c6d786e

encode/decode testing was closing an unopened DB

207120d

Fix the test inheritance

Regenerate TARGETS as per command in header

6b87638

$ python3 buckifier/buckify_rocksdb.py

Suppress PMD warning

8583783

Add omitted check of DB destroy status on test

a062fd4

The problem only shows up when ASSERT_STATUS_CHECKED=1, i.e. running changed merge_operators_test under CI

Repair rebase of test

8425423

make format doesn’t format this but does complain

Encode signed 8-bit variable length directly

ead1254

Don’t need to turn them into zigzag first. There end up being 2 branches in encoding anyway, but there’s perhaps a little less actual code all told.

Make Java merge encoding code consistent w/C++

76eb4b1

Changed format in core C++ 8 bit varint encoding, so the Java shadow implementation (used for testing) needs to work the same way.

Attempt to fix UBSAN complaint with some casts

e389529

Try again ubsan

c209608

Format fix

b091728

Work round ubsan of (-ve << n)

77683cb

Comment UBSAN workaround

e6d5001

Try to debug db_crashtest.py in CI

05f30dc

stdout or stderr was causing a decoding error. Make the error mode “backslashreplace” instead of the default “strict” so we have a chance of seeing the output.

Add int64 add merge operator to CMake java

7a1739d

meta-cla bot added the CLA Signed label Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new int64 add merge operator, also with Java API #14149

Add a new int64 add merge operator, also with Java API #14149

Chessray commented Nov 24, 2025

Uh oh!

meta-cla bot commented Nov 24, 2025

Uh oh!

meta-cla bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a new int64 add merge operator, also with Java API #14149

Are you sure you want to change the base?

Add a new int64 add merge operator, also with Java API #14149

Conversation

Chessray commented Nov 24, 2025

Uh oh!

meta-cla bot commented Nov 24, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants