Skip to content

BatchPrefillWithPagedKVCacheWrapper.plan() got an unexpected keyword argument 'head_dim' #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ruckc opened this issue Apr 11, 2025 · 0 comments · May be fixed by #3166
Open

BatchPrefillWithPagedKVCacheWrapper.plan() got an unexpected keyword argument 'head_dim' #3165

ruckc opened this issue Apr 11, 2025 · 0 comments · May be fixed by #3166

Comments

@ruckc
Copy link

ruckc commented Apr 11, 2025

Given this code in text-generation-inference

state.plan(
qo_indptr=cu_seqlens,
paged_kv_indptr=indptr,
paged_kv_indices=block_tables,
paged_kv_last_page_len=last_page_len,
num_qo_heads=num_heads,
num_kv_heads=num_kv_heads,
head_dim=head_size,
kv_data_type=kv_dtype,
q_data_type=q_dtype,
page_size=page_size,
)

and this code from flashinfer

https://github.com/flashinfer-ai/flashinfer/blob/55576c626421b5ee7e7ebe74afd26465c8ae863f/flashinfer/prefill.py#L1164-L1188

    def plan(
        self,
        qo_indptr: torch.Tensor,
        paged_kv_indptr: torch.Tensor,
        paged_kv_indices: torch.Tensor,
        paged_kv_last_page_len: torch.Tensor,
        num_qo_heads: int,
        num_kv_heads: int,
        head_dim_qk: int,
        page_size: int,
        head_dim_vo: Optional[int] = None,
        custom_mask: Optional[torch.Tensor] = None,
        packed_custom_mask: Optional[torch.Tensor] = None,
        causal: bool = False,
        pos_encoding_mode: str = "NONE",
        use_fp16_qk_reduction: bool = False,
        sm_scale: Optional[float] = None,
        window_left: int = -1,
        logits_soft_cap: Optional[float] = None,
        rope_scale: Optional[float] = None,
        rope_theta: Optional[float] = None,
        q_data_type: Union[str, torch.dtype] = "float16",
        kv_data_type: Optional[Union[str, torch.dtype]] = None,
        non_blocking: bool = True,
    ) -> None:

I'm getting:

2025-04-11T13:41:22.164186Z ERROR warmup{max_input_length=None max_prefill_tokens=4096 max_total_tokens=None max_batch_size=None}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: BatchPrefillWithPagedKVCacheWrapper.plan() got an unexpected keyword argument 'head_dim'
2025-04-11T13:41:22.164259Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
  File "/app/venv/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/app/venv/lib/python3.12/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/app/venv/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/app/venv/lib/python3.12/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/app/venv/lib/python3.12/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/app/venv/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/app/venv/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/app/venv/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/app/venv/lib/python3.12/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/server.py", line 315, in serve
    asyncio.run(
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641, in run_forever
    self._run_once()
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987, in _run_once
    handle._run()
  File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/app/venv/lib/python3.12/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/app/venv/lib/python3.12/site-packages/text_generation_server/interceptor.py", line 24, in intercept
    return await response
  File "/app/venv/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
    raise error
  File "/app/venv/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/server.py", line 144, in Warmup
    self.model.warmup(batch, max_input_tokens, max_total_tokens)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/models/flash_causal_lm.py", line 1548, in warmup
    _, _batch, _ = self.generate_token(batch)
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/models/flash_causal_lm.py", line 1928, in generate_token
    out, speculative_logits = self.forward(batch, adapter_data)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/models/flash_causal_lm.py", line 1810, in forward
    with self._forward_context(
  File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
  File "/app/venv/lib/python3.12/site-packages/text_generation_server/layers/attention/flashinfer.py", line 86, in use_prefill_with_paged_kv_state
    state.plan(
TypeError: BatchPrefillWithPagedKVCacheWrapper.plan() got an unexpected keyword argument 'head_dim'
@ruckc ruckc linked a pull request Apr 11, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant