Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

「RuntimeError: Cannot find compilation output, compilation failed」エラーメッセージがでて、コンパイルプロセス中に出力が生成されない #1

Open
YosAwed opened this issue Jan 30, 2025 · 3 comments

Comments

@YosAwed
Copy link

YosAwed commented Jan 30, 2025

Windowsで実行したところuv run app.pyで最初の画面が表示されたあと、どの質問をしても以下のエラーで結果が保存されません。

[2025-01-30 21:46:38] INFO auto_config.py:70: Found model configuration: model\mlc-chat-config.json
[2025-01-30 21:46:38] INFO auto_target.py:91: Detecting target device: vulkan:0
[2025-01-30 21:46:38] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2025-01-30 21:46:38] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2025-01-30 21:46:38] INFO auto_target.py:111: Found host LLVM CPU: znver3
[2025-01-30 21:46:38] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
Compiling with arguments:
  --config          QWen2Config(hidden_act='silu', hidden_size=1536, intermediate_size=8960, num_attention_heads=12, num_hidden_layers=28, num_key_value_heads=2, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=151936, tie_word_embeddings=True, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=128, kwargs={})
  --quantization    GroupQuantize(name='q4f32_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
  --model-type      qwen2
  --target          {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "znver3", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          c:\Temp\tmp93lo6nn0\lib.dll
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None;disaggregation=None
[2025-01-30 21:46:38] INFO compile.py:140: Creating model from: QWen2Config(hidden_act='silu', hidden_size=1536, intermediate_size=8960, num_attention_heads=12, num_hidden_layers=28, num_key_value_heads=2, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=151936, tie_word_embeddings=True, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=128, kwargs={})
[2025-01-30 21:46:38] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2025-01-30 21:46:40] INFO compile.py:164: Running optimizations using TVM Unity
[2025-01-30 21:46:40] INFO compile.py:186: Registering metadata: {'model_type': 'qwen2', 'quantization': 'q4f32_1', 'context_window_size': 32768, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'disaggregation': False, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128}
[2025-01-30 21:46:41] INFO pipeline.py:55: Running TVM Relax graph-level optimizations
[2025-01-30 21:46:43] INFO pipeline.py:55: Lowering to TVM TIR kernels
[2025-01-30 21:46:48] INFO pipeline.py:55: Running TVM TIR-level optimizations
[2025-01-30 21:46:54] INFO pipeline.py:55: Running TVM Dlight low-level optimizations
[2025-01-30 21:46:55] INFO pipeline.py:55: Lowering to VM bytecode
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 12.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `argsort_probs`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 91.69 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 354.94 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 1467.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.72 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 12.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `multinomial_from_uniform`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 280.59 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `renormalize_by_top_p`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `sample_with_top_p`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_take_probs`: 0.01 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_verify_draft_tokens`: 0.00 MB
[2025-01-30 21:46:57] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2025-01-30 21:46:57] INFO pipeline.py:55: Compiling external modules
[2025-01-30 21:46:57] INFO pipeline.py:55: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\__main__.py", line 69, in <module>
    main()
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\__main__.py", line 34, in main
    cli.main(sys.argv[2:])
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
    compile(
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\compile.py", line 244, in compile
    _compile(args, model_config)
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\compile.py", line 189, in _compile
    args.build_func(
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\support\auto_target.py", line 301, in build
    relax.build(
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\relax\vm_build.py", line 353, in build
    return _vmlink(
           ^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\relax\vm_build.py", line 249, in _vmlink
    lib = tvm.build(
          ^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\driver\build_module.py", line 297, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\_ffi\_ctypes\packed_func.py", line 245, in __call__
    raise_last_ffi_error()
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\_ffi\base.py", line 481, in raise_last_ffi_error
    raise py_err
ValueError: Traceback (most recent call last):
  File "D:\a\package\package\tvm\src\ir\expr.cc", line 94
InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (549755814848 vs. 2147483648) : ValueError: Literal value 549755814848 exceeds maximum of int32
Traceback (most recent call last):
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\blocks.py", line 2044, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\blocks.py", line 1603, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 728, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 833, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\chat_interface.py", line 898, in _stream_fn
    first_response = await utils.async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 728, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 722, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 705, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\app.py", line 88, in generate_response
    self.initialize_engine()
  File "F:\TinySwallow-ChatUI-Local\app.py", line 84, in initialize_engine
    self.engine = MLCEngine(self.model)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine.py", line 1466, in __init__
    super().__init__(
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
    model_lib = jit.jit(
                ^^^^^^^^
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
    _run_jit(
  File "F:\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
    raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
@YosAwed
Copy link
Author

YosAwed commented Jan 30, 2025

補足ですが、Mac mini (M4 pro max)で同様に実行したところ問題なく実行できました。

@MNeMoNiCuZ
Copy link

RuntimeError: Cannot find compilation output, compilation failed
[2025-01-30 20:51:27] INFO auto_config.py:70: Found model configuration: model\mlc-chat-config.json
[2025-01-30 20:51:27] INFO auto_target.py:91: Detecting target device: vulkan:0
[2025-01-30 20:51:27] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2025-01-30 20:51:27] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2025-01-30 20:51:27] INFO auto_target.py:111: Found host LLVM CPU: znver4
[2025-01-30 20:51:27] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
Compiling with arguments:
  --config          QWen2Config(hidden_act='silu', hidden_size=1536, intermediate_size=8960, num_attention_heads=12, num_hidden_layers=28, num_key_value_heads=2, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=151936, tie_word_embeddings=True, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=128, kwargs={})
  --quantization    GroupQuantize(name='q4f32_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
  --model-type      qwen2
  --target          {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "znver4", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          C:\Users\User\AppData\Local\Temp\tmpg71v4or4\lib.dll
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None;disaggregation=None
[2025-01-30 20:51:27] INFO compile.py:140: Creating model from: QWen2Config(hidden_act='silu', hidden_size=1536, intermediate_size=8960, num_attention_heads=12, num_hidden_layers=28, num_key_value_heads=2, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=151936, tie_word_embeddings=True, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=128, kwargs={})
[2025-01-30 20:51:27] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2025-01-30 20:51:28] INFO compile.py:164: Running optimizations using TVM Unity
[2025-01-30 20:51:28] INFO compile.py:186: Registering metadata: {'model_type': 'qwen2', 'quantization': 'q4f32_1', 'context_window_size': 32768, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'disaggregation': False, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128}
[2025-01-30 20:51:29] INFO pipeline.py:55: Running TVM Relax graph-level optimizations
[2025-01-30 20:51:30] INFO pipeline.py:55: Lowering to TVM TIR kernels
[2025-01-30 20:51:34] INFO pipeline.py:55: Running TVM TIR-level optimizations
[2025-01-30 20:51:38] INFO pipeline.py:55: Running TVM Dlight low-level optimizations
[2025-01-30 20:51:39] INFO pipeline.py:55: Lowering to VM bytecode
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 12.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `argsort_probs`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 91.69 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 354.94 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 1467.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.72 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 12.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `multinomial_from_uniform`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 280.59 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `renormalize_by_top_p`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `sample_with_top_p`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_take_probs`: 0.01 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_verify_draft_tokens`: 0.00 MB
[2025-01-30 20:51:40] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2025-01-30 20:51:41] INFO pipeline.py:55: Compiling external modules
[2025-01-30 20:51:41] INFO pipeline.py:55: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\__main__.py", line 69, in <module>
    main()
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\__main__.py", line 34, in main
    cli.main(sys.argv[2:])
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
    compile(
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\compile.py", line 244, in compile
    _compile(args, model_config)
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\compile.py", line 189, in _compile
    args.build_func(
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\support\auto_target.py", line 301, in build
    relax.build(
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\relax\vm_build.py", line 353, in build
    return _vmlink(
           ^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\relax\vm_build.py", line 249, in _vmlink
    lib = tvm.build(
          ^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\driver\build_module.py", line 297, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\_ffi\_ctypes\packed_func.py", line 245, in __call__
    raise_last_ffi_error()
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\tvm\_ffi\base.py", line 481, in raise_last_ffi_error
    raise py_err
ValueError: Traceback (most recent call last):
  File "D:\a\package\package\tvm\src\ir\expr.cc", line 94
InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (549755814848 vs. 2147483648) : ValueError: Literal value 549755814848 exceeds maximum of int32
Traceback (most recent call last):
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\blocks.py", line 2044, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\blocks.py", line 1603, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 728, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 833, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\chat_interface.py", line 898, in _stream_fn
    first_response = await utils.async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 728, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 722, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\gradio\utils.py", line 705, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\app.py", line 88, in generate_response
    self.initialize_engine()
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\app.py", line 84, in initialize_engine
    self.engine = MLCEngine(self.model)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine.py", line 1466, in __init__
    super().__init__(
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
    model_lib = jit.jit(
                ^^^^^^^^
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
    _run_jit(
  File "C:\AI\LLM\TinySwallow-ChatUI-Local\.venv\Lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
    raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed

@hanpen72
Copy link

hanpen72 commented Jan 31, 2025

私もWindows環境で試してみたらYosAwedさんと同様のエラーがでました。
エラーログの内容はYosAwedさんと一緒なので省略します。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants