@@ -13,7 +13,7 @@ Feature: Results
13
13
14
14
Scenario Outline : consistent results with same seed
15
15
Given <n_slots> slots
16
- And 0 .0 temperature
16
+ And 1 .0 temperature
17
17
Then the server is starting
18
18
Then the server is healthy
19
19
@@ -27,7 +27,8 @@ Feature: Results
27
27
Examples :
28
28
| n_slots |
29
29
| 1 |
30
- | 2 |
30
+ # FIXME: unified KV cache nondeterminism
31
+ # | 2 |
31
32
32
33
Scenario Outline : different results with different seed
33
34
Given <n_slots> slots
@@ -73,14 +74,13 @@ Feature: Results
73
74
Examples :
74
75
| n_parallel | temp |
75
76
| 1 | 0 .0 |
76
- | 2 | 0 .0 |
77
- | 4 | 0 .0 |
78
77
| 1 | 1 .0 |
79
- # FIXME: These tests fail on master.
80
- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
78
+ # FIXME: unified KV cache nondeterminism
81
79
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
82
80
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
83
81
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
82
+ # | 2 | 0.0 |
83
+ # | 4 | 0.0 |
84
84
# | 2 | 1.0 |
85
85
# | 4 | 1.0 |
86
86
@@ -108,12 +108,11 @@ Feature: Results
108
108
Examples :
109
109
| n_slots | n_kv | n_predict | n_parallel |
110
110
| 4 | 1024 | 1 | 1 |
111
- | 4 | 1024 | 1 | 4 |
112
- # FIXME: These tests fail on master.
113
- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
111
+ # FIXME: unified KV cache nondeterminism
114
112
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
115
113
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
116
114
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
115
+ # | 4 | 1024 | 1 | 4 |
117
116
# | 4 | 1024 | 100 | 1 |
118
117
# This test still fails even the above patches; the first token probabilities are already different.
119
118
# | 4 | 1024 | 100 | 4 |
0 commit comments