Skip to content

Commit fd9f92b

Browse files
authored
llama : update llama_timings.n_p_eval setting (ggml-org#7160)
This commit changes the value assigned to llama_timings.n_p_eval when ctx->n_p_eval is 0 to be 1 instead of 1 which is the current value. The motivation for this change is that if session caching is enabled, for example using the `--prompt-cache main-session.txt` command line argument for the main example, and if the same prompt is used then on subsequent runs, the prompt tokens will not actually be passed to llama_decode, and n_p_eval will not be updated by llama_synchoronize. But the value of n_p_eval will be set 1 by llama_get_timings because ctx->n_p_eval will be 0. This could be interpreted as 1 token was evaluated for the prompt which could be misleading for applications using this value. Signed-off-by: Daniel Bevenius <[email protected]>
1 parent 2284216 commit fd9f92b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

llama.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17879,7 +17879,7 @@ struct llama_timings llama_get_timings(struct llama_context * ctx) {
1787917879
/*.t_eval_ms =*/ 1e-3 * ctx->t_eval_us,
1788017880

1788117881
/*.n_sample =*/ std::max(1, ctx->n_sample),
17882-
/*.n_p_eval =*/ std::max(1, ctx->n_p_eval),
17882+
/*.n_p_eval =*/ std::max(0, ctx->n_p_eval),
1788317883
/*.n_eval =*/ std::max(1, ctx->n_eval),
1788417884
};
1788517885

0 commit comments

Comments
 (0)