Misc. bug: llama-server crashes with gpt-oss-20b pos_min == -1, but n_past > 0 - should not happen

### Name and Version

lama-server --version
version: 6992 (aa3b7a90b)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa on --jinja --reasoning-format none --port 8080 --n-gpu-layers 11
```

### Problem description & steps to reproduce

Start the server and send commands to the completions endpoints.
After some request the server crashes with

```
slot launch_slot_: id  1 | task 2821 | processing task
slot update_slots: id  1 | task 2821 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 6325
slot update_slots: id  1 | task 2821 | n_past = 331, slot.prompt.tokens.size() = 1517, seq_id = 1, pos_min = 1378, n_swa = 128
slot update_slots: id  1 | task 2821 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 2821 | erased invalidated context checkpoint (pos_min = 460, pos_max = 1102, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  1 | task 2821 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.323320
slot update_slots: id  1 | task 2821 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 4090, batch.n_tokens = 2048, progress = 0.646640
slot update_slots: id  1 | task 2821 | n_tokens = 4090, memory_seq_rm [4090, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6135, batch.n_tokens = 2048, progress = 0.969960
slot update_slots: id  1 | task 2821 | n_tokens = 6135, memory_seq_rm [6135, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6261, batch.n_tokens = 129, progress = 0.989881
slot update_slots: id  1 | task 2821 | n_tokens = 6261, memory_seq_rm [6261, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6325, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  1 | task 2821 | prompt done, n_tokens = 6325, batch.n_tokens = 67
slot update_slots: id  1 | task 2821 | created context checkpoint 1 of 8 (pos_min = 5621, pos_max = 6260, size = 15.008 MiB)
slot print_timing: id  2 | task 2816 |
prompt eval time =   80068.15 ms /  3072 tokens (   26.06 ms per token,    38.37 tokens per second)
       eval time =  866753.46 ms /   755 tokens ( 1148.02 ms per token,     0.87 tokens per second)
      total time =  946821.62 ms /  3827 tokens
slot      release: id  2 | task 2816 | stop processing: n_tokens = 3826, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.371 (> 0.100 thold), f_keep = 0.087
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 3826, total state size = 93.632 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.087, sim = 0.371
srv          load:  - found better prompt with f_keep = 0.264, sim = 0.420
state_read_meta: failed to find available cells in kv cache
state_seq_set_data: error loading state: failed to restore kv cache
srv          load: failed to restore state with size 39783992
D:/a/llama.cpp/llama.cpp/tools/server/server.cpp:3843: pos_min == -1, but n_past > 0 - should not happen: https://github.com/ggml-org/llama.cpp/pull/13833#discussion_r2116181237
slot  prompt_load: id  2 | task -1 | failed to load prompt from cache
srv        update:  - cache state: 14 prompts, 1000.673 MiB (limits: 8192.000 MiB, 131072 tokens, 246847 est)
```

### First Bad Commit

_No response_

### Relevant log output

```shell
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 2806 | processing task
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2807 | processing task
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = 360477965
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 2436, total state size = 60.100 MiB
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.027, sim = 0.083
srv        update:  - cache state: 1 prompts, 70.582 MiB (limits: 8192.000 MiB, 131072 tokens, 282730 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv  get_availabl: prompt cache update took 33.75 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 2808 | processing task
slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = 590723549
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 3972, total state size = 114.173 MiB
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.016, sim = 0.059
srv        update:  - cache state: 2 prompts, 205.790 MiB (limits: 8192.000 MiB, 131072 tokens, 255087 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv  get_availabl: prompt cache update took 56.64 ms
slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  2 | task 2809 | processing task
slot update_slots: id  0 | task 2807 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 824
slot update_slots: id  0 | task 2807 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2807 | prompt processing progress, n_tokens = 760, batch.n_tokens = 760, progress = 0.922330
slot update_slots: id  1 | task 2806 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 811
slot update_slots: id  1 | task 2806 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2806 | prompt processing progress, n_tokens = 747, batch.n_tokens = 1507, progress = 0.921085
slot update_slots: id  2 | task 2809 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1097
slot update_slots: id  2 | task 2809 | n_past = 65, slot.prompt.tokens.size() = 3972, seq_id = 2, pos_min = 3075, n_swa = 128
slot update_slots: id  2 | task 2809 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  2 | task 2809 | erased invalidated context checkpoint (pos_min = 2137, pos_max = 3033, n_swa = 128, size = 21.034 MiB)
slot update_slots: id  2 | task 2809 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  2 | task 2809 | prompt processing progress, n_tokens = 541, batch.n_tokens = 2048, progress = 0.493163
slot update_slots: id  0 | task 2807 | n_tokens = 760, memory_seq_rm [760, end)
slot update_slots: id  0 | task 2807 | prompt processing progress, n_tokens = 824, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2807 | prompt done, n_tokens = 824, batch.n_tokens = 64
slot update_slots: id  0 | task 2807 | created context checkpoint 1 of 8 (pos_min = 633, pos_max = 759, size = 2.978 MiB)
slot update_slots: id  1 | task 2806 | n_tokens = 747, memory_seq_rm [747, end)
slot update_slots: id  1 | task 2806 | prompt processing progress, n_tokens = 811, batch.n_tokens = 128, progress = 1.000000
slot update_slots: id  1 | task 2806 | prompt done, n_tokens = 811, batch.n_tokens = 128
slot update_slots: id  1 | task 2806 | created context checkpoint 1 of 8 (pos_min = 518, pos_max = 746, size = 5.370 MiB)
slot update_slots: id  2 | task 2809 | n_tokens = 541, memory_seq_rm [541, end)
slot update_slots: id  2 | task 2809 | prompt processing progress, n_tokens = 1033, batch.n_tokens = 620, progress = 0.941659
slot update_slots: id  3 | task 2808 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 786
slot update_slots: id  3 | task 2808 | n_past = 65, slot.prompt.tokens.size() = 2436, seq_id = 3, pos_min = 2309, n_swa = 128
slot update_slots: id  3 | task 2808 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  3 | task 2808 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 2808 | prompt processing progress, n_tokens = 722, batch.n_tokens = 1342, progress = 0.918575
slot update_slots: id  2 | task 2809 | n_tokens = 1033, memory_seq_rm [1033, end)
slot update_slots: id  2 | task 2809 | prompt processing progress, n_tokens = 1097, batch.n_tokens = 66, progress = 1.000000
slot update_slots: id  2 | task 2809 | prompt done, n_tokens = 1097, batch.n_tokens = 66
slot update_slots: id  2 | task 2809 | created context checkpoint 1 of 8 (pos_min = 906, pos_max = 1032, size = 2.978 MiB)
slot update_slots: id  3 | task 2808 | n_tokens = 722, memory_seq_rm [722, end)
slot update_slots: id  3 | task 2808 | prompt processing progress, n_tokens = 786, batch.n_tokens = 130, progress = 1.000000
slot update_slots: id  3 | task 2808 | prompt done, n_tokens = 786, batch.n_tokens = 130
slot update_slots: id  3 | task 2808 | created context checkpoint 2 of 8 (pos_min = 79, pos_max = 721, size = 15.078 MiB)
slot print_timing: id  1 | task 2806 |
prompt eval time =   95657.19 ms /   811 tokens (  117.95 ms per token,     8.48 tokens per second)
       eval time =  166955.00 ms /   329 tokens (  507.46 ms per token,     1.97 tokens per second)
      total time =  262612.19 ms /  1140 tokens
slot      release: id  1 | task 2806 | stop processing: n_tokens = 1139, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.296 (> 0.100 thold), f_keep = 0.291
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1139, total state size = 32.712 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.291, sim = 0.296
srv        update:  - cache state: 3 prompts, 243.872 MiB (limits: 8192.000 MiB, 131072 tokens, 253514 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv  get_availabl: prompt cache update took 522.21 ms
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 2810 | processing task
slot update_slots: id  1 | task 2810 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1118
slot update_slots: id  1 | task 2810 | n_past = 331, slot.prompt.tokens.size() = 1139, seq_id = 1, pos_min = 883, n_swa = 128
slot update_slots: id  1 | task 2810 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 2810 | erased invalidated context checkpoint (pos_min = 518, pos_max = 746, n_swa = 128, size = 5.370 MiB)
slot update_slots: id  1 | task 2810 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2810 | prompt processing progress, n_tokens = 1054, batch.n_tokens = 1057, progress = 0.942755
slot update_slots: id  1 | task 2810 | n_tokens = 1054, memory_seq_rm [1054, end)
slot update_slots: id  1 | task 2810 | prompt processing progress, n_tokens = 1118, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  1 | task 2810 | prompt done, n_tokens = 1118, batch.n_tokens = 67
slot update_slots: id  1 | task 2810 | created context checkpoint 1 of 8 (pos_min = 411, pos_max = 1053, size = 15.078 MiB)
slot print_timing: id  0 | task 2807 |
prompt eval time =   95656.19 ms /   824 tokens (  116.09 ms per token,     8.61 tokens per second)
       eval time =  266007.77 ms /   463 tokens (  574.53 ms per token,     1.74 tokens per second)
      total time =  361663.96 ms /  1287 tokens
slot      release: id  0 | task 2807 | stop processing: n_tokens = 1286, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 958041133
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1286, total state size = 36.253 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.257, sim = 0.083
srv        update:  - cache state: 4 prompts, 283.103 MiB (limits: 8192.000 MiB, 131072 tokens, 255596 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv  get_availabl: prompt cache update took 639.83 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2811 | processing task
slot update_slots: id  0 | task 2811 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 3971
slot update_slots: id  0 | task 2811 | n_past = 331, slot.prompt.tokens.size() = 1286, seq_id = 0, pos_min = 1026, n_swa = 128
slot update_slots: id  0 | task 2811 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 2811 | erased invalidated context checkpoint (pos_min = 633, pos_max = 759, n_swa = 128, size = 2.978 MiB)
slot update_slots: id  0 | task 2811 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2811 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.514984
slot update_slots: id  0 | task 2811 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  0 | task 2811 | prompt processing progress, n_tokens = 3907, batch.n_tokens = 1865, progress = 0.983883
slot update_slots: id  0 | task 2811 | n_tokens = 3907, memory_seq_rm [3907, end)
slot update_slots: id  0 | task 2811 | prompt processing progress, n_tokens = 3971, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  0 | task 2811 | prompt done, n_tokens = 3971, batch.n_tokens = 67
slot update_slots: id  0 | task 2811 | created context checkpoint 1 of 8 (pos_min = 3264, pos_max = 3906, size = 15.078 MiB)
slot print_timing: id  3 | task 2808 |
prompt eval time =   46249.84 ms /   786 tokens (   58.84 ms per token,    16.99 tokens per second)
       eval time =  369453.44 ms /   476 tokens (  776.16 ms per token,     1.29 tokens per second)
      total time =  415703.28 ms /  1262 tokens
slot      release: id  3 | task 2808 | stop processing: n_tokens = 1261, truncated = 0
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.144 (> 0.100 thold), f_keep = 0.262
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1261, total state size = 32.829 MiB
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.262, sim = 0.144
srv        update:  - cache state: 5 prompts, 341.492 MiB (limits: 8192.000 MiB, 131072 tokens, 242143 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv  get_availabl: prompt cache update took 625.95 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 2812 | processing task
slot update_slots: id  3 | task 2812 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 2303
slot update_slots: id  3 | task 2812 | n_past = 331, slot.prompt.tokens.size() = 1261, seq_id = 3, pos_min = 1122, n_swa = 128
state_read_meta: failed to find available cells in kv cache
state_seq_set_data: error loading state: failed to restore kv cache
slot update_slots: id  3 | task 2812 | failed to restore context checkpoint (pos_min = 79, pos_max = 721, size = 15.078 MiB)
slot update_slots: id  3 | task 2812 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  3 | task 2812 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 2812 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.887972
slot update_slots: id  3 | task 2812 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  3 | task 2812 | prompt processing progress, n_tokens = 2239, batch.n_tokens = 197, progress = 0.972210
slot update_slots: id  3 | task 2812 | n_tokens = 2239, memory_seq_rm [2239, end)
slot update_slots: id  3 | task 2812 | prompt processing progress, n_tokens = 2303, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  3 | task 2812 | prompt done, n_tokens = 2303, batch.n_tokens = 67
slot update_slots: id  3 | task 2812 | created context checkpoint 3 of 8 (pos_min = 1599, pos_max = 2238, size = 15.008 MiB)
slot print_timing: id  2 | task 2809 |
prompt eval time =  100466.86 ms /  1097 tokens (   91.58 ms per token,    10.92 tokens per second)
       eval time =  429549.97 ms /   481 tokens (  893.04 ms per token,     1.12 tokens per second)
      total time =  530016.83 ms /  1578 tokens
slot      release: id  2 | task 2809 | stop processing: n_tokens = 1577, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.329 (> 0.100 thold), f_keep = 0.210
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1577, total state size = 40.051 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.210, sim = 0.329
srv        update:  - cache state: 6 prompts, 384.521 MiB (limits: 8192.000 MiB, 131072 tokens, 248643 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv  get_availabl: prompt cache update took 611.19 ms
slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  2 | task 2813 | processing task
slot update_slots: id  2 | task 2813 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1005
slot update_slots: id  2 | task 2813 | n_past = 331, slot.prompt.tokens.size() = 1577, seq_id = 2, pos_min = 1446, n_swa = 128
slot update_slots: id  2 | task 2813 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  2 | task 2813 | erased invalidated context checkpoint (pos_min = 906, pos_max = 1032, n_swa = 128, size = 2.978 MiB)
slot update_slots: id  2 | task 2813 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  2 | task 2813 | prompt processing progress, n_tokens = 941, batch.n_tokens = 944, progress = 0.936318
slot update_slots: id  2 | task 2813 | n_tokens = 941, memory_seq_rm [941, end)
slot update_slots: id  2 | task 2813 | prompt processing progress, n_tokens = 1005, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  2 | task 2813 | prompt done, n_tokens = 1005, batch.n_tokens = 67
slot update_slots: id  2 | task 2813 | created context checkpoint 1 of 8 (pos_min = 299, pos_max = 940, size = 15.055 MiB)
slot print_timing: id  1 | task 2810 |
prompt eval time =   30955.28 ms /  1118 tokens (   27.69 ms per token,    36.12 tokens per second)
       eval time =  341132.77 ms /   300 tokens ( 1137.11 ms per token,     0.88 tokens per second)
      total time =  372088.05 ms /  1418 tokens
slot      release: id  1 | task 2810 | stop processing: n_tokens = 1417, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.328 (> 0.100 thold), f_keep = 0.234
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1417, total state size = 37.073 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.234, sim = 0.328
srv        update:  - cache state: 7 prompts, 436.673 MiB (limits: 8192.000 MiB, 131072 tokens, 245531 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv  get_availabl: prompt cache update took 638.57 ms
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 2814 | processing task
slot update_slots: id  1 | task 2814 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1010
slot update_slots: id  1 | task 2814 | n_past = 331, slot.prompt.tokens.size() = 1417, seq_id = 1, pos_min = 1253, n_swa = 128
slot update_slots: id  1 | task 2814 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 2814 | erased invalidated context checkpoint (pos_min = 411, pos_max = 1053, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  1 | task 2814 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2814 | prompt processing progress, n_tokens = 946, batch.n_tokens = 949, progress = 0.936634
slot update_slots: id  1 | task 2814 | n_tokens = 946, memory_seq_rm [946, end)
slot update_slots: id  1 | task 2814 | prompt processing progress, n_tokens = 1010, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  1 | task 2814 | prompt done, n_tokens = 1010, batch.n_tokens = 67
slot update_slots: id  1 | task 2814 | created context checkpoint 1 of 8 (pos_min = 303, pos_max = 945, size = 15.078 MiB)
slot print_timing: id  0 | task 2811 |
prompt eval time =  101880.49 ms /  3971 tokens (   25.66 ms per token,    38.98 tokens per second)
       eval time =  258165.89 ms /   277 tokens (  932.01 ms per token,     1.07 tokens per second)
      total time =  360046.39 ms /  4248 tokens
slot      release: id  0 | task 2811 | stop processing: n_tokens = 4247, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.230 (> 0.100 thold), f_keep = 0.078
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 4247, total state size = 103.879 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.078, sim = 0.230
srv        update:  - cache state: 8 prompts, 555.630 MiB (limits: 8192.000 MiB, 131072 tokens, 255580 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv  get_availabl: prompt cache update took 775.48 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2815 | processing task
slot update_slots: id  0 | task 2815 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1441
slot update_slots: id  0 | task 2815 | n_past = 331, slot.prompt.tokens.size() = 4247, seq_id = 0, pos_min = 4064, n_swa = 128
slot update_slots: id  0 | task 2815 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 2815 | erased invalidated context checkpoint (pos_min = 3264, pos_max = 3906, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  0 | task 2815 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2815 | prompt processing progress, n_tokens = 1377, batch.n_tokens = 1380, progress = 0.955586
slot update_slots: id  0 | task 2815 | n_tokens = 1377, memory_seq_rm [1377, end)
slot update_slots: id  0 | task 2815 | prompt processing progress, n_tokens = 1441, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  0 | task 2815 | prompt done, n_tokens = 1441, batch.n_tokens = 67
slot update_slots: id  0 | task 2815 | created context checkpoint 1 of 8 (pos_min = 734, pos_max = 1376, size = 15.078 MiB)
slot print_timing: id  2 | task 2813 |
prompt eval time =   26964.44 ms /  1005 tokens (   26.83 ms per token,    37.27 tokens per second)
       eval time =  205077.87 ms /   262 tokens (  782.74 ms per token,     1.28 tokens per second)
      total time =  232042.31 ms /  1267 tokens
slot      release: id  2 | task 2813 | stop processing: n_tokens = 1266, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.108 (> 0.100 thold), f_keep = 0.261
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1266, total state size = 32.712 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.261, sim = 0.108
srv        update:  - cache state: 9 prompts, 603.396 MiB (limits: 8192.000 MiB, 131072 tokens, 252536 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD0BD15E0:    1266 tokens, checkpoints:  1,    47.766 MiB
srv  get_availabl: prompt cache update took 582.22 ms
slot launch_slot_: id  2 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  2 | task 2816 | processing task
slot update_slots: id  2 | task 2816 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 3072
slot update_slots: id  2 | task 2816 | n_past = 331, slot.prompt.tokens.size() = 1266, seq_id = 2, pos_min = 1137, n_swa = 128
slot update_slots: id  2 | task 2816 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  2 | task 2816 | erased invalidated context checkpoint (pos_min = 299, pos_max = 940, n_swa = 128, size = 15.055 MiB)
slot update_slots: id  2 | task 2816 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  2 | task 2816 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.665690
slot update_slots: id  2 | task 2816 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  2 | task 2816 | prompt processing progress, n_tokens = 3008, batch.n_tokens = 966, progress = 0.979167
slot update_slots: id  2 | task 2816 | n_tokens = 3008, memory_seq_rm [3008, end)
slot update_slots: id  2 | task 2816 | prompt processing progress, n_tokens = 3072, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  2 | task 2816 | prompt done, n_tokens = 3072, batch.n_tokens = 67
slot update_slots: id  2 | task 2816 | created context checkpoint 1 of 8 (pos_min = 2365, pos_max = 3007, size = 15.078 MiB)
slot print_timing: id  3 | task 2812 |
prompt eval time =   58219.87 ms /  2303 tokens (   25.28 ms per token,    39.56 tokens per second)
       eval time =  400455.46 ms /   431 tokens (  929.13 ms per token,     1.08 tokens per second)
      total time =  458675.32 ms /  2734 tokens
slot      release: id  3 | task 2812 | stop processing: n_tokens = 2733, truncated = 0
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.380 (> 0.100 thold), f_keep = 0.121
srv  get_availabl: updating prompt cache
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv   prompt_save:  - saving prompt with length 2733, total state size = 70.582 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.121, sim = 0.380
srv          load:  - found better prompt with f_keep = 0.263, sim = 0.383
srv        update:  - cache state: 9 prompts, 666.779 MiB (limits: 8192.000 MiB, 131072 tokens, 246553 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD08B5B90:    2733 tokens, checkpoints:  3,   111.149 MiB
srv  get_availabl: prompt cache update took 995.10 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 2817 | processing task
slot update_slots: id  3 | task 2817 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 870
slot update_slots: id  3 | task 2817 | n_past = 333, slot.prompt.tokens.size() = 1266, seq_id = 3, pos_min = 1137, n_swa = 128
slot update_slots: id  3 | task 2817 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  3 | task 2817 | erased invalidated context checkpoint (pos_min = 299, pos_max = 940, n_swa = 128, size = 15.055 MiB)
slot update_slots: id  3 | task 2817 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 2817 | prompt processing progress, n_tokens = 806, batch.n_tokens = 809, progress = 0.926437
slot update_slots: id  3 | task 2817 | n_tokens = 806, memory_seq_rm [806, end)
slot update_slots: id  3 | task 2817 | prompt processing progress, n_tokens = 870, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  3 | task 2817 | prompt done, n_tokens = 870, batch.n_tokens = 67
slot update_slots: id  3 | task 2817 | created context checkpoint 1 of 8 (pos_min = 166, pos_max = 805, size = 15.008 MiB)
slot print_timing: id  1 | task 2814 |
prompt eval time =   26832.53 ms /  1010 tokens (   26.57 ms per token,    37.64 tokens per second)
       eval time =  360577.32 ms /   410 tokens (  879.46 ms per token,     1.14 tokens per second)
      total time =  387409.84 ms /  1420 tokens
slot      release: id  1 | task 2814 | stop processing: n_tokens = 1419, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  1 | task -1 | selected slot by LCP similarity, sim_best = 0.284 (> 0.100 thold), f_keep = 0.233
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1419, total state size = 37.941 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.233, sim = 0.284
srv        update:  - cache state: 10 prompts, 719.798 MiB (limits: 8192.000 MiB, 131072 tokens, 244542 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD08B5B90:    2733 tokens, checkpoints:  3,   111.149 MiB
srv        update:    - prompt 0000023FD02D51A0:    1419 tokens, checkpoints:  1,    53.019 MiB
srv  get_availabl: prompt cache update took 693.50 ms
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 2818 | processing task
slot update_slots: id  1 | task 2818 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1167
slot update_slots: id  1 | task 2818 | n_past = 331, slot.prompt.tokens.size() = 1419, seq_id = 1, pos_min = 1220, n_swa = 128
slot update_slots: id  1 | task 2818 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 2818 | erased invalidated context checkpoint (pos_min = 303, pos_max = 945, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  1 | task 2818 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2818 | prompt processing progress, n_tokens = 1103, batch.n_tokens = 1106, progress = 0.945159
slot update_slots: id  1 | task 2818 | n_tokens = 1103, memory_seq_rm [1103, end)
slot update_slots: id  1 | task 2818 | prompt processing progress, n_tokens = 1167, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  1 | task 2818 | prompt done, n_tokens = 1167, batch.n_tokens = 67
slot update_slots: id  1 | task 2818 | created context checkpoint 1 of 8 (pos_min = 460, pos_max = 1102, size = 15.078 MiB)
slot print_timing: id  0 | task 2815 |
prompt eval time =   39020.95 ms /  1441 tokens (   27.08 ms per token,    36.93 tokens per second)
       eval time =  420117.91 ms /   546 tokens (  769.45 ms per token,     1.30 tokens per second)
      total time =  459138.86 ms /  1987 tokens
slot      release: id  0 | task 2815 | stop processing: n_tokens = 1986, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 1778651419
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1986, total state size = 52.339 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.167, sim = 0.058
srv        update:  - cache state: 11 prompts, 787.215 MiB (limits: 8192.000 MiB, 131072 tokens, 244267 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD08B5B90:    2733 tokens, checkpoints:  3,   111.149 MiB
srv        update:    - prompt 0000023FD02D51A0:    1419 tokens, checkpoints:  1,    53.019 MiB
srv        update:    - prompt 0000023F7E1CBE30:    1986 tokens, checkpoints:  1,    67.417 MiB
srv  get_availabl: prompt cache update took 904.06 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 2819 | processing task
slot update_slots: id  0 | task 2819 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 5752
slot update_slots: id  0 | task 2819 | n_past = 331, slot.prompt.tokens.size() = 1986, seq_id = 0, pos_min = 1740, n_swa = 128
slot update_slots: id  0 | task 2819 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 2819 | erased invalidated context checkpoint (pos_min = 734, pos_max = 1376, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  0 | task 2819 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2819 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.355529
slot update_slots: id  0 | task 2819 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  0 | task 2819 | prompt processing progress, n_tokens = 4090, batch.n_tokens = 2048, progress = 0.711057
slot update_slots: id  0 | task 2819 | n_tokens = 4090, memory_seq_rm [4090, end)
slot update_slots: id  0 | task 2819 | prompt processing progress, n_tokens = 5688, batch.n_tokens = 1601, progress = 0.988873
slot update_slots: id  0 | task 2819 | n_tokens = 5688, memory_seq_rm [5688, end)
slot update_slots: id  0 | task 2819 | prompt processing progress, n_tokens = 5752, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  0 | task 2819 | prompt done, n_tokens = 5752, batch.n_tokens = 67
slot update_slots: id  0 | task 2819 | created context checkpoint 1 of 8 (pos_min = 5045, pos_max = 5687, size = 15.078 MiB)
slot print_timing: id  3 | task 2817 |
prompt eval time =   25345.06 ms /   870 tokens (   29.13 ms per token,    34.33 tokens per second)
       eval time =  430894.74 ms /   468 tokens (  920.72 ms per token,     1.09 tokens per second)
      total time =  456239.80 ms /  1338 tokens
slot      release: id  3 | task 2817 | stop processing: n_tokens = 1337, truncated = 0
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.102 (> 0.100 thold), f_keep = 0.248
srv  get_availabl: updating prompt cache
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv   prompt_save:  - saving prompt with length 1337, total state size = 35.831 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.248, sim = 0.102
srv        update:  - cache state: 12 prompts, 838.053 MiB (limits: 8192.000 MiB, 131072 tokens, 242518 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD08B5B90:    2733 tokens, checkpoints:  3,   111.149 MiB
srv        update:    - prompt 0000023FD02D51A0:    1419 tokens, checkpoints:  1,    53.019 MiB
srv        update:    - prompt 0000023F7E1CBE30:    1986 tokens, checkpoints:  1,    67.417 MiB
srv        update:    - prompt 0000023F7DCAE6E0:    1337 tokens, checkpoints:  1,    50.838 MiB
srv  get_availabl: prompt cache update took 818.14 ms
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 2820 | processing task
slot update_slots: id  3 | task 2820 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 3255
slot update_slots: id  3 | task 2820 | n_past = 331, slot.prompt.tokens.size() = 1337, seq_id = 3, pos_min = 1146, n_swa = 128
state_read_meta: failed to find available cells in kv cache
state_seq_set_data: error loading state: failed to restore kv cache
slot update_slots: id  3 | task 2820 | failed to restore context checkpoint (pos_min = 166, pos_max = 805, size = 15.008 MiB)
slot update_slots: id  3 | task 2820 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  3 | task 2820 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 2820 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.628264
slot update_slots: id  3 | task 2820 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  3 | task 2820 | prompt processing progress, n_tokens = 3191, batch.n_tokens = 1149, progress = 0.980338
slot update_slots: id  3 | task 2820 | n_tokens = 3191, memory_seq_rm [3191, end)
slot update_slots: id  3 | task 2820 | prompt processing progress, n_tokens = 3255, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  3 | task 2820 | prompt done, n_tokens = 3255, batch.n_tokens = 67
slot update_slots: id  3 | task 2820 | created context checkpoint 2 of 8 (pos_min = 2548, pos_max = 3190, size = 15.078 MiB)
slot print_timing: id  1 | task 2818 |
prompt eval time =   33308.17 ms /  1167 tokens (   28.54 ms per token,    35.04 tokens per second)
       eval time =  422492.45 ms /   351 tokens ( 1203.68 ms per token,     0.83 tokens per second)
      total time =  455800.62 ms /  1518 tokens
slot      release: id  1 | task 2818 | stop processing: n_tokens = 1517, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  1 | task -1 | selected slot by LRU, t_last = 2076151011
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 1517, total state size = 38.832 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.218, sim = 0.052
srv        update:  - cache state: 13 prompts, 891.963 MiB (limits: 8192.000 MiB, 131072 tokens, 241793 est)
srv        update:    - prompt 0000023F7DA5B150:    2436 tokens, checkpoints:  1,    70.582 MiB
srv        update:    - prompt 0000023F7E263940:    3972 tokens, checkpoints:  1,   135.207 MiB
srv        update:    - prompt 0000023FD0091AE0:    1139 tokens, checkpoints:  1,    38.082 MiB
srv        update:    - prompt 0000023F7DEC66D0:    1286 tokens, checkpoints:  1,    39.231 MiB
srv        update:    - prompt 0000023FD00829A0:    1261 tokens, checkpoints:  2,    58.389 MiB
srv        update:    - prompt 0000023FD0091A00:    1577 tokens, checkpoints:  1,    43.030 MiB
srv        update:    - prompt 0000023F7E06FD60:    1417 tokens, checkpoints:  1,    52.151 MiB
srv        update:    - prompt 0000023FD00806B0:    4247 tokens, checkpoints:  1,   118.957 MiB
srv        update:    - prompt 0000023FD08B5B90:    2733 tokens, checkpoints:  3,   111.149 MiB
srv        update:    - prompt 0000023FD02D51A0:    1419 tokens, checkpoints:  1,    53.019 MiB
srv        update:    - prompt 0000023F7E1CBE30:    1986 tokens, checkpoints:  1,    67.417 MiB
srv        update:    - prompt 0000023F7DCAE6E0:    1337 tokens, checkpoints:  1,    50.838 MiB
srv        update:    - prompt 0000023FD0B76580:    1517 tokens, checkpoints:  1,    53.910 MiB
srv  get_availabl: prompt cache update took 728.26 ms
slot launch_slot_: id  1 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  1 | task 2821 | processing task
slot update_slots: id  1 | task 2821 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 6325
slot update_slots: id  1 | task 2821 | n_past = 331, slot.prompt.tokens.size() = 1517, seq_id = 1, pos_min = 1378, n_swa = 128
slot update_slots: id  1 | task 2821 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  1 | task 2821 | erased invalidated context checkpoint (pos_min = 460, pos_max = 1102, n_swa = 128, size = 15.078 MiB)
slot update_slots: id  1 | task 2821 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 2045, batch.n_tokens = 2048, progress = 0.323320
slot update_slots: id  1 | task 2821 | n_tokens = 2045, memory_seq_rm [2045, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 4090, batch.n_tokens = 2048, progress = 0.646640
slot update_slots: id  1 | task 2821 | n_tokens = 4090, memory_seq_rm [4090, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6135, batch.n_tokens = 2048, progress = 0.969960
slot update_slots: id  1 | task 2821 | n_tokens = 6135, memory_seq_rm [6135, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6261, batch.n_tokens = 129, progress = 0.989881
slot update_slots: id  1 | task 2821 | n_tokens = 6261, memory_seq_rm [6261, end)
slot update_slots: id  1 | task 2821 | prompt processing progress, n_tokens = 6325, batch.n_tokens = 67, progress = 1.000000
slot update_slots: id  1 | task 2821 | prompt done, n_tokens = 6325, batch.n_tokens = 67
slot update_slots: id  1 | task 2821 | created context checkpoint 1 of 8 (pos_min = 5621, pos_max = 6260, size = 15.008 MiB)
slot print_timing: id  2 | task 2816 |
prompt eval time =   80068.15 ms /  3072 tokens (   26.06 ms per token,    38.37 tokens per second)
       eval time =  866753.46 ms /   755 tokens ( 1148.02 ms per token,     0.87 tokens per second)
      total time =  946821.62 ms /  3827 tokens
slot      release: id  2 | task 2816 | stop processing: n_tokens = 3826, truncated = 0
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
slot get_availabl: id  2 | task -1 | selected slot by LCP similarity, sim_best = 0.371 (> 0.100 thold), f_keep = 0.087
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 3826, total state size = 93.632 MiB
srv  params_from_: Chat format: GPT-OSS
srv          load:  - looking for better prompt, base f_keep = 0.087, sim = 0.371
srv          load:  - found better prompt with f_keep = 0.264, sim = 0.420
state_read_meta: failed to find available cells in kv cache
state_seq_set_data: error loading state: failed to restore kv cache
srv          load: failed to restore state with size 39783992
D:/a/llama.cpp/llama.cpp/tools/server/server.cpp:3843: pos_min == -1, but n_past > 0 - should not happen: https://github.com/ggml-org/llama.cpp/pull/13833#discussion_r2116181237
slot  prompt_load: id  2 | task -1 | failed to load prompt from cache
srv        update:  - cache state: 14 prompts, 1000.673 MiB (limits: 8192.000 MiB, 131072 tokens, 246847 est)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: llama-server crashes with gpt-oss-20b pos_min == -1, but n_past > 0 - should not happen #17118

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: llama-server crashes with gpt-oss-20b pos_min == -1, but n_past > 0 - should not happen #17118

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions