Skip to content

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Jan 5, 2026

Resolve #30663
Rendered version:

TODO

cc @hshen14 @thuang6 @wenhuach21 @jikunshang @kzawora-intel @xuechendi


Note

Unifies Intel quantization by routing auto-round through INC and removing the dedicated AutoRound config/paths.

  • Map quantization="auto-round" to INCConfig; delete auto_round.py and related imports
  • Expand inc.py to support weight-only recipes (e.g., W4A16/W8A16), backend selection (GPTQ/AWQ/Marlin/IPEX), per-layer configs, and act dtypes (fp16/bf16)
  • Simplify model loading: remove special CPU load for online quantization and stop treating inc as config-less (only gguf remains special)
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support"; add AutoRound usage via INC, installation/CLI/API examples, deploy/eval guides
  • Update quantization README: adjust hardware support table (drop Gaudi column), add note migrating Gaudi support to vLLM-Gaudi, and link "Intel Neural Compressor"

Written by Cursor Bugbot for commit c41ead8. This will update automatically on new commits. Configure here.


Note

Unifies Intel quantization under INC, deprecating the standalone AutoRound path and broadening supported recipes/backends.

  • Map auto-round to INCConfig; delete auto_round.py and related imports/mappings
  • Expand inc.py to support weight-only schemes (e.g., W4A16, W8A16), per-layer configs, fp16/bf16 activations, and backends (GPTQ/AWQ/Marlin/IPEX)
  • Simplify model load path: remove special CPU load for "online" quantization and stop treating inc as config-less (only gguf remains)
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization README to link "Intel Neural Compressor", drop Gaudi column, and note Gaudi migration to vllm-gaudi

Written by Cursor Bugbot for commit aeaa863. This will update automatically on new commits. Configure here.


Note

Consolidates Intel quantization by deprecating the standalone AutoRound path and routing it through INC with broader functionality.

  • Map quantization="auto-round" to INCConfig; remove auto_round.py and related imports
  • Extend inc.py to support weight-only schemes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and backends (GPTQ/AWQ/Marlin/IPEX)
  • Simplify load path: remove special CPU load for "online" quantization and stop treating inc as config-less (only gguf remains)
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization README to link Intel Neural Compressor, adjust hardware table (drop Gaudi), and note Gaudi migration to vllm-gaudi

Written by Cursor Bugbot for commit 8ce92e2. This will update automatically on new commits. Configure here.


Note

Unifies Intel quantization by routing auto-round through INC and expanding INC’s functionality; updates docs and simplifies load paths.

  • Map quantization="auto-round" to INCConfig; remove auto_round.py and related imports/mappings
  • Extend inc.py: support weight-only recipes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and backends (GPTQ/AWQ/Marlin/IPEX)
  • Simplify model loading: remove special CPU load for “online” quantization and stop treating inc as config-less (only gguf remains)
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update README to link Intel Neural Compressor, adjust hardware table (drop Gaudi) and add Gaudi migration note

Written by Cursor Bugbot for commit b79c37d. This will update automatically on new commits. Configure here.


Note

Unifies Intel quantization under INC and removes the standalone AutoRound path.

  • Map quantization="auto-round" to INCConfig; delete auto_round.py and related imports/mappings
  • Expand inc.py to support weight-only recipes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and GPTQ/AWQ/Marlin/IPEX backends
  • Simplify model loading: remove CPU "online" quantization handling and treat only gguf as config-less
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization/README.md to link Intel Neural Compressor, drop Gaudi column, and note Gaudi migration to vllm-gaudi

Written by Cursor Bugbot for commit f9588ea. This will update automatically on new commits. Configure here.


Note

Consolidates Intel quantization and expands capabilities while simplifying load paths.

  • Map quantization="auto-round" to INC and delete auto_round.py; add override so auto-round checkpoints resolve to INC
  • Extend inc.py to support weight-only recipes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and backends (GPTQ/AWQ/Marlin/IPEX)
  • Update quantization registry to return INCConfig for auto-round; adjust override order to include inc
  • Simplify loading: remove special CPU "online quantization" path and treat only gguf as config-less
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization README (link Intel Neural Compressor, drop Gaudi column, add Gaudi migration note)
  • Tests: streamline AutoRound test runner args

Written by Cursor Bugbot for commit 55c043d. This will update automatically on new commits. Configure here.


Note

Consolidates Intel quantization by mapping auto-round to INC and expanding INC to cover weight-only schemes and multiple backends.

  • Map quantization="auto-round" to INCConfig; delete auto_round.py and related imports; add override so auto-round checkpoints resolve to inc
  • Extend inc.py: support W4A16/W8A16, per-layer configs, fp16/bf16 activations, GPTQ/AWQ/Marlin/IPEX backends
  • Simplify loading: remove special CPU "online" quantization path; treat only gguf as config-less; add inc to override probe order
  • Tests: streamline AutoRound test runner args
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization README (link "Intel Neural Compressor", adjust hardware table, add Gaudi migration note)

Written by Cursor Bugbot for commit d3c9232. This will update automatically on new commits. Configure here.


Note

Unifies Intel quantization by routing auto-round through INC and removing the dedicated AutoRound config/paths.

  • Delete auto_round.py; map auto-round to INCConfig and add override so auto-round checkpoints resolve to inc
  • Expand inc.py to include weight-only schemes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and GPTQ/AWQ/Marlin/IPEX backends
  • Update quantization registry and override probe order to include inc; treat only gguf as config-less; remove special CPU "online" load path
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support"; update quantization/README.md (add "Intel Neural Compressor" link, adjust hardware table, note Gaudi migration)
  • Tests: simplify AutoRound test runner args

Written by Cursor Bugbot for commit 3295857. This will update automatically on new commits. Configure here.


Note

Unifies Intel quantization under INC and removes the standalone AutoRound implementation.

  • Map quantization="auto-round" to INCConfig; delete auto_round.py and related imports; add override so auto-round checkpoints resolve to inc
  • Extend inc.py to support weight-only recipes (e.g., W4A16/W8A16), per-layer configs, fp16/bf16 activations, and backends (GPTQ/AWQ/Marlin/IPEX)
  • Update quantization registry: return INCConfig for auto-round; include inc in override order
  • Simplify loading: remove special CPU "online quantization" handling; treat only gguf as config-less
  • Docs: replace AutoRound page with consolidated "Intel Quantization Support" (install/CLI/API, deploy/eval); update quantization README (link Intel Neural Compressor, adjust hardware table by dropping Gaudi and noting migration to vLLM-Gaudi)
  • Tests: streamline AutoRound test by removing deprecated flag

Written by Cursor Bugbot for commit 6b499ff. This will update automatically on new commits. Configure here.

@mergify
Copy link

mergify bot commented Jan 5, 2026

Documentation preview: https://vllm--31716.org.readthedocs.build/en/31716/

@mergify mergify bot added the documentation Improvements or additions to documentation label Jan 5, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively consolidates the Intel Quantization Toolkit integration by merging AutoRound into the Intel Neural Compressor (INC) configuration. The changes are well-structured, removing redundant files and clarifying the documentation. The code refactoring correctly updates the quantization configuration mappings and removes obsolete logic. My only feedback is a minor but important typo in the documentation that could confuse users.

@jikunshang
Copy link
Collaborator

cc @robertgshaw2-redhat @xuechendi PTAL

@mergify
Copy link

mergify bot commented Jan 5, 2026

Hi @yiliu30, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
@heheda12345
Copy link
Collaborator

also CC @yewentao256

@mergify
Copy link

mergify bot commented Jan 7, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yiliu30.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! Just found one error in the doc.
I might not be the best person to review for this PR and perhaps you can find someone else to give a formal review @heheda12345

@mergify
Copy link

mergify bot commented Jan 13, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yiliu30.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 13, 2026
Signed-off-by: yiliu30 <[email protected]>
@mergify mergify bot removed the needs-rebase label Jan 13, 2026
@robertgshaw2-redhat
Copy link
Collaborator

Thanks for the hard work on this!

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 14, 2026 05:24
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 14, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 50632ad into vllm-project:main Jan 14, 2026
60 checks passed
sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
sangbumlikeagod pushed a commit to sangbumlikeagod/vllm that referenced this pull request Jan 16, 2026
sangbumlikeagod pushed a commit to sangbumlikeagod/vllm that referenced this pull request Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Consolidate Intel Quantization Toolkit Integration in vLLM

5 participants