Skip to content

KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8 #25

@NotShrirang

Description

@NotShrirang

Title: KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8

Summary

Models quantized with nbits=8, saved via HF save_pretrained, and reloaded via from_pretrained crash on the first forward with KeyError: 'packing'. The same model works fine if used in-memory without the save/load round-trip. The bug reproduces for any 8-bit quantized model; I hit it with google/gemma-4-E4B-it via transformers's SinqConfig integration (huggingface/transformers#46050), but the fault is in sinq itself.

Reproduction

from transformers import AutoProcessor, AutoModelForCausalLM, SinqConfig

model_id = 'google/gemma-4-E4B-it'
save_dst = './gemma-4-E4B-it-sinq/'

quant_cfg = SinqConfig(
    nbits=8, group_size=64, tiling_mode='2D', method='sinq',
    modules_to_not_convert=["lm_head", "model.audio_tower"],
)

# Quantize + save
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map='cpu', quantization_config=quant_cfg,
)
model.save_pretrained(save_dst)

# Reload + run
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(save_dst, device_map='cpu')
inputs = processor.apply_chat_template(
    [{'role': 'user', 'content': 'this is a test.'}],
    add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors='pt',
).to(model.device)
model.generate(**inputs, max_new_tokens=32)   # KeyError: 'packing'

Traceback tail:

  File ".../sinq/sinqlinear.py", line 329, in dequantize
    W_est = Quantizer.dequantize(W_q, meta, use_unpack_kernel=self.use_unpack_kernel)
  File ".../sinq/quantizer.py", line 246, in dequantize
    if meta["packing"]:
       ~~~~^^^^^^^^^^^
KeyError: 'packing'

Root cause

Two sites interact:

  1. SINQLinear.load_state_dict (sinq/sinqlinear.py:430-431) drops a stale "packing" flag when the stored tensor's element count already matches the unpacked size:

    if self.meta.get("packing") and numel == expected_unpacked:
        self.meta.pop("packing", None)

    For nbits=8, expected_packed = N*K*8//8 = N*K = expected_unpacked, so this branch always fires and the "packing" key is always removed on reload. (For nbits=4 the two sizes differ, the key survives, and this path is fine.)

  2. Quantizer.dequantize (sinq/quantizer.py:246) uses bracket access on a key the rest of the package treats as optional:

    if meta["packing"]:        # KeyError after step 1
        ...
        W_r = cls.unpack[meta["packing"]](W_q, dtype=compute_dtype)
    else:
        W_r = W_q.to(compute_dtype)

    The else branch is the correct one for unpacked 8-bit tensors; only the bracket access on the if line crashes before we get there.

Note that every other consumer in sinqlinear.py (lines 184, 298, 430) already guards with meta.get("packing"). Line 246 of quantizer.py is the lone outlier.

Proposed fix

--- a/sinq/quantizer.py
+++ b/sinq/quantizer.py
@@ -243,7 +243,7 @@ class Quantizer:
         compute_dtype = meta.get("compute_dtype", torch.float16)

         # 1) Unpack to per-element codes
-        if meta["packing"]:
+        if meta.get("packing"):
             if meta.get("view_as_float", False):
                 W_q = W_q.view(meta["unpack_view_dtype"])
             W_r = cls.unpack[meta["packing"]](W_q, dtype=compute_dtype)

I've verified this locally against the repro above — inference completes and outputs are sensible. Happy to open a PR with the patch plus a regression test that exercises an 8-bit quantize → save → load → forward round-trip if it would help.

Environment

  • sinq: 0.2.0
  • transformers: 5.8.1
  • torch: 2.11.0+cu130
  • Python: 3.12.13
  • Platform: Linux x86_64
  • GPU: NVIDIA RTX 3070 Ti (issue is CPU-path; GPU not exercised)

Related: huggingface/transformers#46050

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions