-
Notifications
You must be signed in to change notification settings - Fork 944
Adding support for sharing memory between the module and the engine #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Resolves valkey-io#2267 Timed out test gets logged at the end of the test run. ``` !!! WARNING The following tests failed: *** [TIMEOUT]: WAIT should not acknowledge 2 additional copies of the data in tests/unit/wait.tcl Cleanup: may take some time... OK ``` Signed-off-by: Sarthak Aggarwal <[email protected]>
…alkey-io#2283) When a replica doing the CLUSTER RESET SOFT, we should generate a new shard_id for it. Since if the node was a replica, it inherits the shard_id of its primary, and after the CLUSTER RESET SOFT, the replica become a new empty primary, it should has its own shard_id. Otherwise, when it joins back into the cluster or responds the PONG, we will have two primaries in the same shard. --------- Signed-off-by: Binbin <[email protected]>
In the unit test framework, there is a check that `zmalloc_used_memory() == 0` after each test suite. If an assertion fails in any test case, the test case is aborted and often it means some memory is not freed. The check for used memory == 0 then fails for all test suites that run after the failed one. This PR makes sure a memory leak is only reported once, for the test suite that didn't free its memory. Two test cases that explicitly checked that all memory had been freed are deleted. This is handled by the test framework. Signed-off-by: Viktor Söderqvist <[email protected]>
… and fix zadd --sequential to create expected data (valkey-io#2102) This extends the `__rand_int__` pattern to add `__rand_1st__` through `__rand_9th__`. For the new placeholders, multiple occurrences within a command will have the same value. With the `--sequential` option, each placeholder will use separate counters. For `__rand_int__`, the behavior is unchanged - multiple occurrences of the pattern within a command will have different values. Examples of using the `__rand_int__` with `--sequential`: ``` $ ./valkey-benchmark -r 300 -n 300 --sequential -q -- set key__rand_int__ val__rand_int__ set key__rand_int__ val__rand_int__: 60000.00 requests per second, p50=0.391 msec $ ./valkey-cli info keyspace # Keyspace db0:keys=150,expires=0,avg_ttl=0 $ ./valkey-cli get key000000000050 "val000000000051" ``` For the new patterns, multiple occurrences within the command will have the same value. ``` $ ./valkey-benchmark -r 300 -n 300 --sequential -q -- set key__rand_1st__ val__rand_1st__ set key__rand_int__ val__rand_int1_: 60000.00 requests per second, p50=0.383 msec $ ./valkey-cli info keyspace # Keyspace db0:keys=300,expires=0,avg_ttl=0 $ ./valkey-cli get key000000000050 "val000000000050" ``` I also fixed the zadd benchmark so it produces the expected number of keys when used with `--sequential`. (By using the same counter twice per command it was effectively counting by twos) I made this for myself but raised a PR in case y'all like it. 🙂 --------- Signed-off-by: Rain Valentine <[email protected]>
During memory eviction, we may end up with a primary kv hash table that's below the 13% `MIN_FILL_PERCENT_SOFT`, but inside `resize()` for the kv table we check with `ht->type->resizeAllowed` whether allocating a new table crosses the configured `maxmemory` limit without taking into account that shrinking, while temporarily increasing the memory usage, will ultimately reduce the memory usage in all situations. Blocking hash tables from shrinking in situations where we would only temporarily cross the maxmemory limit has at least three negative side effects: * The performance of valkey is negatively impacted if, for example, we're moving items from table A to table B since table A would eventually end up being a very sparse hash table that never shrinks. * In extreme situations, the performance of valkey slows down significantly when trying to find more keys to evict since we're trying to find keys in a sparse table that only becomes more sparse over time. * In more extreme situations, such as when reducing `maxmemory` by around 90% or writing large data to a single key, valkey can end up in a situation where, no matter how many keys are being evicted from the kv table, the table bucket struct overhead uses more memory than the data stores in the table, causing all keys to be evicted from the table and valkey becoming unusable before setting maxmemory to above the current memory usage: ``` 127.0.0.1:6379> config set maxmemory 2gb OK 127.0.0.1:6379> debug populate 20000000 OK (8.94s) 127.0.0.1:6379> debug htstats 0 [Dictionary HT] Hash table 0 stats (main hash table): table size: 29360128 number of entries: 20000000 [Expires HT] 127.0.0.1:6379> config set maxmemory 200mb OK 127.0.0.1:6379> dbsize (integer) 18725794 [...] 127.0.0.1:6379> dbsize (integer) 0 127.0.0.1:6379> keys * (empty array) 127.0.0.1:6379> set foo bar (error) OOM command not allowed when used memory > 'maxmemory'. 127.0.0.1:6379> config get maxmemory 1) "maxmemory" 2) "209715200" 127.0.0.1:6379> info memory # Memory used_memory:269890904 used_memory_human:257.39M [...] 127.0.0.1:6379> config set maxmemory 300mb OK 127.0.0.1:6379> info memory # Memory used_memory:1455512 used_memory_human:1.39M [...] 127.0.0.1:6379> set foo bar OK ``` This PR addresses these issues by allowing hash tables to allocate a new table for shrinking, even if allocating a new table causes the maxmemory limit to be crossed during resize. Additionally, the check for whether we should fast-forward the resize operation has been moved to before checking whether we are allowed to allocate a new resize table. Because we don't start a resize in this section of the function yet, a fast-forward usually drops the memory usage enough to allow another resize operation to start immediately after the fast-forward resize completes. As a side effect of this PR, we now also refuse to start a hashtable resize if the policy is set to `HASHTABLE_RESIZE_FORBID`. --------- Signed-off-by: Fusl <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
…el (valkey-io#2308) Sentinel sends switch-master message more quickly, ref [valkey-io#2282](valkey-io#2282). Signed-off-by: youngmore1024 <[email protected]>
- Subset of valkey-io#2183 --------- Signed-off-by: Josh Soref <[email protected]>
- Subset of valkey-io#2183 --------- Signed-off-by: Josh Soref <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
…rectory as path (valkey-io#630) Helps testing specific functionality like `cluster` rather than running the entire test suite. ### Usage example #### Run all the tests under tests/unit/cluster directory ``` ./runtest --single tests/unit/cluster ``` #### Run all the tests under test/unit but skip the file `memefficiency` ``` ./runtest --single tests/unit --skipunit tests/unit/memefficiency.tcl ``` #### Run all the tests under tests/integration ``` ./runtest --single tests/integration ``` Note: this doesn't work with tests under `tests/cluster`. We should migrate all of the tests under `tests/cluster` to `tests/unit/cluster`. --------- Signed-off-by: Harkrishn Patro <[email protected]>
) Needs backporting to 8.1 to fix valkey-io#2320 --------- Signed-off-by: Harkrishn Patro <[email protected]>
Resolves test failures like https://github.com/valkey-io/valkey/actions/runs/16093360514/job/45412856532. Valgrind tests can run slowly, so the 100 ms expire time may have partially began to expire items during the ingestion phase. Signed-off-by: Madelyn Olson <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2299 +/- ##
============================================
- Coverage 71.43% 71.41% -0.02%
============================================
Files 123 123
Lines 66908 67163 +255
============================================
+ Hits 47795 47964 +169
- Misses 19113 19199 +86
🚀 New features to boost your workflow:
|
Signed-off-by: yairgott <[email protected]>
Contributor
Author
|
cc @zuiderkwast |
Signed-off-by: yairgott <[email protected]>
test_vest uses a mock_defrag function in order to simulate defrag flow. In the scenario of no system malloc_size we need to use zmalloc_usable_size in order to prevent stepping over the "to-be-replaced" buffer will partially fix: valkey-io#2435 Signed-off-by: Ran Shidlansik <[email protected]>
Fix missing check for executing client in `lookupKey` function ### Issue The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in valkey-io#1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` ### Fix Added a null check for `server.executing_client` before attempting to dereference it: ### Tests Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
We used to did print the context but after valkey-io#2276, we lost the context. unstable: ``` *** Extract version and sha1 details from info command and print in tests/unit/info-command.tcl ``` now: ``` *** [err]: Extract version and sha1 details from info command and print in tests/unit/info-command.tcl Expected '0' to be equal to '1' (context: type source line 7 file /xxx/info-command.tcl cmd {assert_equal 0 1} proc ::test) ``` We can see the different, we have provided enough context when asserting fail. Otherwise we need to scroll back (which is usually a lot of server logs) to see the context. Signed-off-by: Binbin <[email protected]>
…2440) This might help shed some light on the flakey failure e.g: https://github.com/valkey-io/valkey/actions/runs/16754537665/job/47433388425?pr=2431 Signed-off-by: Ran Shidlansik <[email protected]>
Remove unused includes Signed-off-by: Xiaolong Chen <[email protected]>
…o primary (valkey-io#2446) fixes: valkey-io#2445 Signed-off-by: Ran Shidlansik <[email protected]>
1. Fix the way we toggle active expire. 2. reduce the number of spawned servers during the test Just merged some start_server blocks to reduce the time spent on starting and shutting down the server. --------- Signed-off-by: Ran Shidlansik <[email protected]>
Across multiple runs of the big list test in defrag, the latency check
is tripping because the maximum observed defrag cycle latency
occasionally spikes above our 5 ms limit. While most cycles complete in
just a few milliseconds, rare slowdowns push some cycles into the double
digit millisecond range, so a 5 ms hard cap is too aggressive for stable
testing.
```
/runtest --verbose --tls --single unit/memefficiency --only '/big list' --accurate --loop --fastfail
```
```
[err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl
Expected 12 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1)
(Fast fail: test will exit now)
[err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl
Expected 21 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1)
(Fast fail: test will exit now)
[err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl
Expected 25 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1)
(Fast fail: test will exit now)
[err]: Active Defrag big list: standalone in tests/unit/memefficiency.tcl
Expected 17 <= 5 (context: type proc line 18 cmd {assert {$max_latency <= $limit_ms}} proc ::validate_latency level 1)
(Fast fail: test will exit now)
```
---------
Signed-off-by: Seungmin Lee <[email protected]>
Co-authored-by: Seungmin Lee <[email protected]>
Fixes valkey-io#626 - Skip CI tests when only documentation files are modified --------- Signed-off-by: Hanxi Zhang <[email protected]>
…o#2455) Fixes the GitHub Actions error where both `paths` and `paths-ignore` were defined for the same event, which is not allowed. Resolves the error: "you may only define one of `paths` and `paths-ignore` for a single event" Removed the conflicting `paths` section from the `pull_request` trigger, keeping only `paths-ignore` to skip documentation changes while allowing the workflow to run on all other changes. This is a follow-up fix to address the issue identified in the previous PR. Signed-off-by: Hanxi Zhang <[email protected]>
Test `Instance valkey-io#5 is still a slave after some time (no failover)` is supposed to verify that command `CLUSTER FAILOVER` will not promote a replica without quorum from the primary; later in the file (`Instance 5 is a master after some time`), we verify that `CLUSTER FAILOVER FORCE` does promote a replica under the same conditions. There's a couple issues with the tests: 1. `Instance valkey-io#5 is still a slave after some time (no failover)` should verify that instance 5 is a replica (i.e. that there's no failover), but we call `assert {[s -5 role] eq {master}}`. 2. The reason why the above assert works is that we previously send `DEBUG SLEEP 10` to the primary, which pauses the primary for longer than the configured 3 seconds for`cluster-node-timeout`. The primary is marked as failed from the perspective of the rest of the cluster, so quorum can be established and instance 5 is promoted as primary. This commit fixes the two by shortening the sleep to less than 3 seconds, and then asserting the role is still replica. Test `Instance valkey-io#5 is a master after some time` is updated to sleep for a shorter duration to ensure that `FAILOVER FORCE` succeeds under the exact same conditions. ### Testing `./runtest --single unit/cluster/manual-failover --loop --fastfail` Signed-off-by: Tyler Amano-Smerling <[email protected]>
…-io#2452) Adds a simple check for backup directory, remove the todo. Signed-off-by: Sarthak Aggarwal <[email protected]>
The bug caused many invalid HMSET entries to be added to the AOF during rewriting. This now properly skips emitting HMSET until we find an entry with no expiry. Signed-off-by: Jacob Murphy <[email protected]>
…lkey-io#2462) In the current implementation `HGETEX`, when applied on a non existing object, will simply return null value instead of an array (like in the `HMGET` case). --------- Signed-off-by: Ran Shidlansik <[email protected]>
closes valkey-io#2352 --------- Signed-off-by: Xiaolong Chen <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
…y-io#2464) Following new API presented in valkey-io#2089, we might access out of bound memory in case of some illegal command input Signed-off-by: Ran Shidlansik <[email protected]>
Based on the review suggestion valkey-io#2432 (comment) --------- Signed-off-by: Xiaolong Chen <[email protected]>
### Description User data logged when crash caused by moduleRDBLoadError. ### Change Redact user data when hide-user-data-from-log enabled. Signed-off-by: VanessaTang <[email protected]>
…ey-io#2469) Right now, if a TLS connect fails, you get an unhelpful error message in the log since it prints out NULL. This change makes sure that report error always returns a string (never null) as well as tries to print out underlying errors. Signed-off-by: Madelyn Olson <[email protected]>
Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover). This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at valkey-io#1591, modified to meet the designs we had outlined in valkey-io#23. ## New commands * `CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id`: Begin sending the slot via replication to the target. Multiple targets can be specified by repeating `SLOTSRANGE ... NODE ...` * `CLUSTER CANCELMIGRATION ALL`: Cancel all slot migrations * `CLUSTER GETSLOTMIGRATIONS`: See a recent log of migrations This PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core. ## Slot migration jobs Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it. All jobs, including those that finished recently, can be observed using the `CLUSTER GETSLOTMIGRATIONS` command. ## Replication * Since the snapshot uses AOF, the snapshot can be replayed verbatim to any replicas of the target node. * We use the same proxying mechanism used for chaining replication to copy the content sent by the source node directly to the replica nodes. ## `CLUSTER SYNCSLOTS` To coordinate the state machine transitions across the two nodes, a new command is added, `CLUSTER SYNCSLOTS`, that performs this control flow. Each end of the slot migration connection is expected to install a read handler in order to handle `CLUSTER SYNCSLOTS` commands: * `ESTABLISH`: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots. * `SNAPSHOT-EOF`: appended to the end of the snapshot to signal that the snapshot is done being written to the target. * `PAUSE`: informs the source node to pause whenever it gets the opportunity * `PAUSED`: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable size * `REQUEST-FAILOVER`: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short duration * `FAILOVER-GRANTED`: sent to the target to inform that REQUEST-FAILOVER is granted * `ACK`: heartbeat command used to ensure liveness ## Interactions with other commands * FLUSHDB on the source node (which flushes the migrating slot) will result in the source dropping the connection, which will flush the slot on the target and reset the state machine back to the beginning. The subsequent retry should very quickly succeed (it is now empty) * FLUSHDB on the target will fail the slot migration. We can iterate with better handling, but for now it is expected that the operator would retry. * Genearlly, FLUSHDB is expected to be executed cluster wide, so preserving partially migrated slots doesn't make much sense * SCAN and KEYS are filtered to avoid exposing importing slot data ## Error handling * For any transient connection drops, the migration will be failed and require the user to retry. * If there is an OOM while reading from the import connection, we will fail the import, which will drop the importing slot data * If there is a client output buffer limit reached on the source node, it will drop the connection, which will cause the migration to fail * If at any point the export loses ownership or either node is failed over, a callback will be triggered on both ends of the migration to fail the import. The import will not reattempt with a new owner * The two ends of the migration are routinely pinging each other with SYNCSLOTS ACK messages. If at any point there is no interaction on the connection for longer than `repl-timeout`, the connection will be dropped, resulting in migration failure * If a failover happens, we will drop keys in all unowned slots. The migration does not persist through failovers and would need to be retried on the new source/target. ## State machine ``` Target/Importing Node State Machine ───────────────────────────────────────────────────────────── ┌────────────────────┐ │SLOT_IMPORT_WAIT_ACK┼──────┐ └──────────┬─────────┘ │ ACK│ │ ┌──────────────▼─────────────┐ │ │SLOT_IMPORT_RECEIVE_SNAPSHOT┼──┤ └──────────────┬─────────────┘ │ SNAPSHOT-EOF│ │ ┌───────────────▼──────────────┐ │ │SLOT_IMPORT_WAITING_FOR_PAUSED┼─┤ └───────────────┬──────────────┘ │ PAUSED│ │ ┌───────────────▼──────────────┐ │ Error Conditions: │SLOT_IMPORT_FAILOVER_REQUESTED┼─┤ 1. OOM └───────────────┬──────────────┘ │ 2. Slot Ownership Change FAILOVER-GRANTED│ │ 3. Demotion to replica ┌──────────────▼─────────────┐ │ 4. FLUSHDB │SLOT_IMPORT_FAILOVER_GRANTED┼──┤ 5. Connection Lost └──────────────┬─────────────┘ │ 6. No ACK from source (timeout) Takeover Performed│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS┼────┤ └──────────────────────────┘ │ │ ┌─────────────────────────────────────▼─┐ │SLOT_IMPORT_FINISHED_WAITING_TO_CLEANUP│ └────────────────────┬──────────────────┘ Unowned Slots Cleaned Up│ ┌─────────────▼───────────┐ │SLOT_MIGRATION_JOB_FAILED│ └─────────────────────────┘ Source/Exporting Node State Machine ───────────────────────────────────────────────────────────── ┌──────────────────────┐ │SLOT_EXPORT_CONNECTING├─────────┐ └───────────┬──────────┘ │ Connected│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_AUTHENTICATING┼───────┤ └─────────────┬────────────┘ │ Authenticated│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_SEND_ESTABLISH┼───────┤ └─────────────┬────────────┘ │ ESTABLISH command written│ │ ┌─────────────────────▼─────────────┐ │ │SLOT_EXPORT_READ_ESTABLISH_RESPONSE┼──────┤ └─────────────────────┬─────────────┘ │ Full response read (+OK)│ │ ┌────────────────▼──────────────┐ │ Error Conditions: │SLOT_EXPORT_WAITING_TO_SNAPSHOT┼─────┤ 1. User sends CANCELMIGRATION └────────────────┬──────────────┘ │ 2. Slot ownership change No other child process│ │ 3. Demotion to replica ┌────────────▼───────────┐ │ 4. FLUSHDB │SLOT_EXPORT_SNAPSHOTTING┼────────┤ 5. Connection Lost └────────────┬───────────┘ │ 6. AUTH failed Snapshot done│ │ 7. ERR from ESTABLISH command ┌───────────▼─────────┐ │ 8. Unpaused before failover completed │SLOT_EXPORT_STREAMING┼──────────┤ 9. Snapshot failed (e.g. Child OOM) └───────────┬─────────┘ │ 10. No ack from target (timeout) PAUSE│ │ 11. Client output buffer overrun ┌──────────────▼─────────────┐ │ │SLOT_EXPORT_WAITING_TO_PAUSE┼──────┤ └──────────────┬─────────────┘ │ Buffer drained│ │ ┌──────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_PAUSED┼───────┤ └──────────────┬────────────┘ │ Failover request granted│ │ ┌───────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_GRANTED┼───────┤ └───────────────┬────────────┘ │ New topology received│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS│ │ └──────────────────────────┘ │ │ ┌─────────────────────────┐ │ │SLOT_MIGRATION_JOB_FAILED│◄────────┤ └─────────────────────────┘ │ │ ┌────────────────────────────┐ │ │SLOT_MIGRATION_JOB_CANCELLED│◄──────┘ └────────────────────────────┘ ``` Co-authored-by: Binbin <[email protected]> --------- Signed-off-by: Binbin <[email protected]> Signed-off-by: Jacob Murphy <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
When we added the safeguard against sub-replicas logic, we didn't add much tests for it. As time has shown, it can happen in different scenarios. Here's a test case that used to happen in a failover scenario. Signed-off-by: Binbin <[email protected]>
…alkey-io#2431) The assert was added in valkey-io#2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from valkey-io#2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Closes valkey-io#2423. Note that the test written here also exposes the issue in valkey-io#2441, so these two may need to be addressed together. Signed-off-by: Binbin <[email protected]>
Signed-off-by: yairgott <[email protected]>
Signed-off-by: yairgott <[email protected]>
Contributor
Author
|
This PR was based on Ran's hash TTL branch. Closing this PR in favor of #2472. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.
Vectors
Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, valkey-search currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces valkey-search's competitiveness.
Memory Allocation Strategy
At a fundamental level, there are two primary allocation strategies:
For valkey-search, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.
Old Implementation
The old implementation was based on ref-counting and introduced a new SDS type. After further discussion, we agreed to simplify the design by removing ref-counting and avoiding the introduction of a new SDS type.
New Implementation - Key Points
VM_HashExternalize, which externalizes a buffer as a hash field. The externalized buffer is owned by the module. The function accepts the hash key, hash field, and a buffer along with its length.ExternalizedValueis a new data type that captures the externalized buffer and its length.FIELD_SDS_AUX_BIT_EXTERNALIZEDis a newSDSAUXflag used to distinguish between regular/embedded SDS values and ExternalizedValue.valkey-search Usage
Insertion
Deletion
When receiving a key space notification for a deleted hash key or hash field that was indexed as a vector, valkey-search deletes the corresponding entry from the index.
Update
Handled similarly to insertion.