Skip to content

Commit 3bc10cb

Browse files
authored
server : fix temperature + disable some tests (ggml-org#7409)
* server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo
1 parent 6bf9b66 commit 3bc10cb

File tree

2 files changed

+9
-15
lines changed

2 files changed

+9
-15
lines changed

.github/workflows/server.yml

+1-6
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,10 @@ jobs:
3333
strategy:
3434
matrix:
3535
sanitizer: [ADDRESS, THREAD, UNDEFINED]
36-
build_type: [Debug]
36+
build_type: [RelWithDebInfo]
3737
include:
3838
- build_type: Release
3939
sanitizer: ""
40-
- build_type: Debug
41-
sanitizer: THREAD
42-
disabled_on_pr: true
4340
fail-fast: false # While -DLLAMA_SANITIZE_THREAD=ON is broken
4441

4542
steps:
@@ -103,10 +100,8 @@ jobs:
103100
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON ;
104101
cmake --build build --config ${{ matrix.build_type }} -j $(nproc) --target server
105102
106-
107103
- name: Tests
108104
id: server_integration_tests
109-
if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
110105
run: |
111106
cd examples/server/tests
112107
PORT=8888 ./tests.sh

examples/server/tests/features/results.feature

+8-9
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Feature: Results
1313

1414
Scenario Outline: consistent results with same seed
1515
Given <n_slots> slots
16-
And 0.0 temperature
16+
And 1.0 temperature
1717
Then the server is starting
1818
Then the server is healthy
1919

@@ -27,7 +27,8 @@ Feature: Results
2727
Examples:
2828
| n_slots |
2929
| 1 |
30-
| 2 |
30+
# FIXME: unified KV cache nondeterminism
31+
# | 2 |
3132

3233
Scenario Outline: different results with different seed
3334
Given <n_slots> slots
@@ -73,14 +74,13 @@ Feature: Results
7374
Examples:
7475
| n_parallel | temp |
7576
| 1 | 0.0 |
76-
| 2 | 0.0 |
77-
| 4 | 0.0 |
7877
| 1 | 1.0 |
79-
# FIXME: These tests fail on master.
80-
# Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
78+
# FIXME: unified KV cache nondeterminism
8179
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
8280
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
8381
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
82+
# | 2 | 0.0 |
83+
# | 4 | 0.0 |
8484
# | 2 | 1.0 |
8585
# | 4 | 1.0 |
8686

@@ -108,12 +108,11 @@ Feature: Results
108108
Examples:
109109
| n_slots | n_kv | n_predict | n_parallel |
110110
| 4 | 1024 | 1 | 1 |
111-
| 4 | 1024 | 1 | 4 |
112-
# FIXME: These tests fail on master.
113-
# Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
111+
# FIXME: unified KV cache nondeterminism
114112
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
115113
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
116114
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
115+
# | 4 | 1024 | 1 | 4 |
117116
# | 4 | 1024 | 100 | 1 |
118117
# This test still fails even the above patches; the first token probabilities are already different.
119118
# | 4 | 1024 | 100 | 4 |

0 commit comments

Comments
 (0)