KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8

# Title: `KeyError: 'packing'` in `Quantizer.dequantize` after save/load round-trip with `nbits=8`

## Summary

Models quantized with `nbits=8`, saved via HF `save_pretrained`, and reloaded via `from_pretrained` crash on the first forward with `KeyError: 'packing'`. The same model works fine if used in-memory without the save/load round-trip. The bug reproduces for any 8-bit quantized model; I hit it with `google/gemma-4-E4B-it` via `transformers`'s `SinqConfig` integration ([huggingface/transformers#46050](https://github.com/huggingface/transformers/issues/46050)), but the fault is in `sinq` itself.

## Reproduction

```python
from transformers import AutoProcessor, AutoModelForCausalLM, SinqConfig

model_id = 'google/gemma-4-E4B-it'
save_dst = './gemma-4-E4B-it-sinq/'

quant_cfg = SinqConfig(
    nbits=8, group_size=64, tiling_mode='2D', method='sinq',
    modules_to_not_convert=["lm_head", "model.audio_tower"],
)

# Quantize + save
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map='cpu', quantization_config=quant_cfg,
)
model.save_pretrained(save_dst)

# Reload + run
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(save_dst, device_map='cpu')
inputs = processor.apply_chat_template(
    [{'role': 'user', 'content': 'this is a test.'}],
    add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors='pt',
).to(model.device)
model.generate(**inputs, max_new_tokens=32)   # KeyError: 'packing'
```

Traceback tail:

```
  File ".../sinq/sinqlinear.py", line 329, in dequantize
    W_est = Quantizer.dequantize(W_q, meta, use_unpack_kernel=self.use_unpack_kernel)
  File ".../sinq/quantizer.py", line 246, in dequantize
    if meta["packing"]:
       ~~~~^^^^^^^^^^^
KeyError: 'packing'
```

## Root cause

Two sites interact:

1. `SINQLinear.load_state_dict` (`sinq/sinqlinear.py:430-431`) drops a stale `"packing"` flag when the stored tensor's element count already matches the unpacked size:

   ```python
   if self.meta.get("packing") and numel == expected_unpacked:
       self.meta.pop("packing", None)
   ```

   For `nbits=8`, `expected_packed = N*K*8//8 = N*K = expected_unpacked`, so this branch *always* fires and the `"packing"` key is *always* removed on reload. (For `nbits=4` the two sizes differ, the key survives, and this path is fine.)

2. `Quantizer.dequantize` (`sinq/quantizer.py:246`) uses bracket access on a key the rest of the package treats as optional:

   ```python
   if meta["packing"]:        # KeyError after step 1
       ...
       W_r = cls.unpack[meta["packing"]](W_q, dtype=compute_dtype)
   else:
       W_r = W_q.to(compute_dtype)
   ```

   The `else` branch is the correct one for unpacked 8-bit tensors; only the bracket access on the `if` line crashes before we get there.

Note that every other consumer in `sinqlinear.py` (lines 184, 298, 430) already guards with `meta.get("packing")`. Line 246 of `quantizer.py` is the lone outlier.

## Proposed fix

```diff
--- a/sinq/quantizer.py
+++ b/sinq/quantizer.py
@@ -243,7 +243,7 @@ class Quantizer:
         compute_dtype = meta.get("compute_dtype", torch.float16)

         # 1) Unpack to per-element codes
-        if meta["packing"]:
+        if meta.get("packing"):
             if meta.get("view_as_float", False):
                 W_q = W_q.view(meta["unpack_view_dtype"])
             W_r = cls.unpack[meta["packing"]](W_q, dtype=compute_dtype)
```

I've verified this locally against the repro above — inference completes and outputs are sensible. Happy to open a PR with the patch plus a regression test that exercises an 8-bit quantize → save → load → forward round-trip if it would help.

## Environment

- `sinq`: 0.2.0
- `transformers`: 5.8.1
- `torch`: 2.11.0+cu130
- Python: 3.12.13
- Platform: Linux x86_64
- GPU: NVIDIA RTX 3070 Ti (issue is CPU-path; GPU not exercised)

Related: [huggingface/transformers#46050](https://github.com/huggingface/transformers/issues/46050)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8 #25

Title: `KeyError: 'packing'` in `Quantizer.dequantize` after save/load round-trip with `nbits=8`

Summary

Reproduction

Root cause

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8 #25

Description

Title: KeyError: 'packing' in Quantizer.dequantize after save/load round-trip with nbits=8

Summary

Reproduction

Root cause

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Title: `KeyError: 'packing'` in `Quantizer.dequantize` after save/load round-trip with `nbits=8`