Skip to content

Conversation

@donghao526
Copy link

@donghao526 donghao526 commented Aug 19, 2025

ISSUE

It closes #3063.

Proposed Changes

Add TDIGEST.REVRANK command implementation
Add cpp unit tests

@PragmaTwice PragmaTwice changed the title feat(tdigest): add TDIGEST.Revrank command implementation #3063 feat(tdigest): add the support of TDIGEST.REVRANK command Aug 19, 2025
@PragmaTwice
Copy link
Member

Thank you for your contribution. Could you add some golang test cases for it?

Refer to https://github.com/apache/kvrocks/blob/unstable/tests/gocase/unit/type/tdigest/tdigest_test.go.

@donghao526 donghao526 closed this Aug 19, 2025
@donghao526
Copy link
Author

@PragmaTwice ok,I will add some golang test

@donghao526 donghao526 reopened this Aug 19, 2025
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
39.7% Coverage on New Code (required ≥ 50%)

See analysis details on SonarQube Cloud

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your contribution! 😊

Left some comments.

Best Regards,
Edward

Comment on lines 193 to 228
{
LockGuard guard(storage_->GetLockManager(), ns_key);

if (auto status = getMetaDataByNsKey(ctx, ns_key, &metadata); !status.ok()) {
return status;
}

if (metadata.total_observations == 0) {
result->resize(inputs.size(), -2);
return rocksdb::Status::OK();
}

if (metadata.unmerged_nodes > 0) {
auto batch = storage_->GetWriteBatchBase();
WriteBatchLogData log_data(kRedisTDigest);
if (auto status = batch->PutLogData(log_data.Encode()); !status.ok()) {
return status;
}

if (auto status = mergeCurrentBuffer(ctx, ns_key, batch, &metadata); !status.ok()) {
return status;
}

std::string metadata_bytes;
metadata.Encode(&metadata_bytes);
if (auto status = batch->Put(metadata_cf_handle_, ns_key, metadata_bytes); !status.ok()) {
return status;
}

if (auto status = storage_->Write(ctx, storage_->DefaultWriteOptions(), batch->GetWriteBatch()); !status.ok()) {
return status;
}

ctx.RefreshLatestSnapshot();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

The merge action could be refactored into a function to reduce duplication of the same logic.

Best Regards,
Edward

Comment on lines 240 to 246
for (auto value : inputs) {
auto status_or_rank = TDigestRevRank(dump_centroids, value);
if (!status_or_rank) {
return rocksdb::Status::InvalidArgument(status_or_rank.Msg());
}
result->push_back(*status_or_rank);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We could sort the inputs and get the ranks with just one scan of the centroids since it's sorted.

Best Regards,
Edward

Copy link
Author

@donghao526 donghao526 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LindaSummer
I encountered a problem when I was testing. After the nodes merged, are there two adjacent centroids can be with the same mean?

I Test with

TDIGEST.CREATE s COMPRESSION 1000

TDIGEST.ADD s 10 10 10 10 20 20

I found the centroids after merged are:
(1) mean: 10 weight: 1
(2) mean: 10 weight: 1
(3) mean: 10 weight: 1
(4) mean: 10 weight: 1
(5) mean: 20 weight: 1
(6) mean: 20 weight: 1

Is this as expected or a bug?

Copy link
Member

@LindaSummer LindaSummer Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

It is expected, and you could refer to #2878 for more details.
So we need a stable way for both serialization and deserialization.

The trigger for the merge is the weight, not the mean. So we could treat the mean only as a label of one centroid. The whole logic is driven by weight.

Best Regards,
Edward

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

}

template <typename TD>
inline StatusOr<int> TDigestRevRank(TD&& td, double value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We need to use a stable way to compare between doubles.

It will be tough to assume that the two double numbers are equal to or greater than.

After solving this, we should add some test cases for this corner case.

Best Regards,
Edward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Since the other code snippets use this way now. You could leave it with the current logic.

I will try to create a new PR to solve the unstable comparison problem in this file.

Best Regards,
Edward

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after your new PR, I can help to fix here.

@donghao526 donghao526 marked this pull request as draft August 20, 2025 14:27
zhixinwen and others added 17 commits August 28, 2025 13:06
…e#3136)

The usage of this function was removed in
apache#395.

Co-authored-by: Twice <[email protected]>
…pache#3139)

Accessing undeclared keys in lua scripting may lead to unexpected
behavior in the current design of kvrocks (also in redis, refer to
https://redis.io/docs/latest/commands/eval/), so in this PR we add a new
option `lua-strict-key-accessing` to prevent users to access undeclared
keys in lua, e.g.

```
EVAL "return redis.call('set', 'a', 1)" // ERROR!

EVAL "return redis.call('set', KEYS[1], 1)" 1 a // ok
```

This check is performed in both lua scripting and lua functions.
Also, use hardcoded IDs when creating the user and group, to ensure they
remain stable.

Closes apache#3135.

Co-authored-by: Aleks Lozovyuk <[email protected]>
…()` (apache#3145)

Historically, it was used for doing some checks.

https://github.com/apache/kvrocks/blob/2bbfe5aa9531f6e76bfd10a2cc450b9bfa0f15d9/src/storage.cc#L175-L182


But these checks no longer exist today, so this operation should be
unnecessary.

`ListColumnFamilies()` needs to iterate through the MANIFEST, and might
be costly for a large one.
For example, 80MB MANIFEST, took 1.6 seconds.
…che#3146)

Pattern-based SCAN iterations may skip remaining keys in a hash slot,
causing incomplete key retrieval.

**Bug Reproduction:**
- 10 keys in the same hash slot: ["119483", "166988", "210695",
"223656", "48063", "59022", "65976", "74937", "88379", "99338"]
- Initial SCAN with pattern `2*`:
  - Returns cursor C1 and **empty keyset** (no keys match `2*`)
  - Records "119483" as last scanned key
- Subsequent SCAN with cursor C1 and same pattern:
  1. RocksDB iterator seeks to "119483"
  2. Calls `Next()` → gets "166988" (next key in slot)
  3. "166988" ∉ `2*` pattern → no key returned
  4. **Error**: Scan incorrectly increments slot index
  5. **Result**: Remaining 8 keys in slot are skipped

**Bug Fix Implementation:**
When scanning with a previous scan cursor and match pattern:
1. If the last scanned key is lexicographically before the pattern's
start range:
   → Use the pattern's minimum matching key as the seek key
   → Instead of using the last scanned key
   
**Example:**
- Pattern: `2*` → Minimum matching key = `"2"` (hex: \x32)
- Last scanned key: `"119483"` (hex: \x31\x31...)
- Since `"119483"` < `"2"` lexically:
  ✓ **Correct:** Seek to `"2"` 
  ✗ **Buggy:** Seek to `"119483"`

---------

Co-authored-by: Twice <[email protected]>
@donghao526
Copy link
Author

@LindaSummer I have modified the code according to your suggestion, please review.

  1. The merge action has been refactored into the mergeNodes function
  2. I have sorted the inputs in TDigestRevRank, now we could get the ranks with just one scan of the centroids

Best Regards

@donghao526 donghao526 requested a review from LindaSummer August 28, 2025 05:29
@donghao526 donghao526 marked this pull request as ready for review August 28, 2025 05:29
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your great effort! 😊

Left some comments.

Best Regards,
Edward

return {Status::RedisExecErr, s.ToString()};
}

if (result.data()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (result.data()) {
if (!result.empty()) {

i--;

// handle the prev inputs which has the same value
while ((i > 0) && (inputs[indices[i]] == inputs[indices[i - 1]])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use a map<double, std::vector<size_t>> to construct a sorted input and group redundant input with rank into a group, and it may simplify our logic.

i--;
}
return Status::OK();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline to the file.

}

template <typename TD>
inline Status TDigestRank(TD&& td, const std::vector<double>& inputs, std::vector<int>& result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems specialized for RevRank rather than Rank.

Could we implement the Rank, then wrap the Iterator to a reverse version with the same logic to construct RevRank?

@donghao526 donghao526 marked this pull request as draft September 18, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TDigest: Implement TDIGEST.REVRANK command

8 participants