Skip to content

convert : improve model arch handling #13122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 26, 2025

This improves the case where architectures is defined in both vision_config and text_config. For example:

{
  "architectures": [
    "InternVLChatModel"
  ],
  "text_config": {
    "architectures": [
      "Qwen2ForCausalLM"
    ],
    ...
  },
  "vision_config": {
    "architectures": [
      "InternVisionModel"
    ],
    ...
  },
  ...
}

The arch will be mapped correctly:

  • ModelType.TEXT --> Qwen2ForCausalLM
  • ModelType.VISION --> InternVisionModel

In case the arch in the sub-config is missing, we simply fallback to the arch in the root-level config. Example:

{
  "architectures": [
    "SmolVLMForConditionalGeneration"
  ],
  "text_config": {
    "architectures": [
      "VLlama3ForCausalLM"
    ],
    ...
  },
  "vision_config": {
    ...
  },
  ...
}
  • ModelType.TEXT --> Qwen2ForCausalLM
  • ModelType.VISION --> SmolVLMForConditionalGeneration

This is also the same case where "architectures": null

@ngxson ngxson requested a review from compilade April 26, 2025 09:28
@github-actions github-actions bot added the python python script changes label Apr 26, 2025
@@ -1078,8 +1081,12 @@ def __init__(self, *args, **kwargs):
raise TypeError("VisionModel must be subclassed with model_arch = gguf.MODEL_ARCH.CLIP_VISION")

# small hack to correct the number of layers
self.tensor_map = gguf.get_tensor_name_map(gguf.MODEL_ARCH.CLIP_VISION, 128)
self.n_embd_text = self.find_hparam(["hidden_size", "n_embd"])
self.block_count = 512 # vision models are small, this "ought to be enough for anybody"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"small" with up to 512 layers :)
What is the max you've seen in the wild?

I think this number could be taken from the config, since vision_config seems to contain num_hidden_layers at least for Llama-3.2-11B-Vision. (there's also num_global_layers, I guess the max of the layer counts should be used)

What model doesn't specify how many layers the vision part has?

Copy link
Collaborator Author

@ngxson ngxson Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never seen any model having more than 64 layers, but I'm just putting this number here for future proof.

We could get this number from the config, but the problem is that many config.json nowadays misses that number, as transformers library omit it if it's the same as default value. For example, this, this and this where you cannot find num_hidden_layers in vision_config

The frustrating thing is that this start to happen on some text_config too.

One way to fix this could be to use AutoConfig, but this won't work on models not transformers library. While I'm pretty sure this kind of model is rare nowadays, I can't know for sure if people still using it. WDYT?

Copy link
Collaborator

@compilade compilade Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to fix this could be to use AutoConfig,

That could be the way to go, since convert_hf_to_gguf.py already has hf in the name and mostly already expects HF-like models which are supported by transformers. I don't know how much it would change how hparams is used in set_gguf_parameters, though.

but this won't work on models not transformers library. While I'm pretty sure this kind of model is rare nowadays, I can't know for sure if people still using it. WDYT?

I guess if this is a problem, (e.g. for very new architectures), it could always be possible to temporarily define a PreTrainedConfig and use AutoConfig.register.

But! Something which seems important is that AutoConfig uses the model_type field instead of archictectures field, which may change the assumptions in a bunch of places. I'm not sure if it would really be compatible with the idea of using sub-architectures like in this PR.

I guess it's probably fine to keep a high block count, but it makes the tensor map dict bigger than it needs to be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much it would change how hparams is used in set_gguf_parameters, though.

It should not change much, AutoConfig has a to_dict() function which basically returns the same config.json, but will all hparams pre-filled.

The simple plan is to replace load_hparams from open(...) to AutoConfig.from_pretrained(dir_model).to_dict()

Copy link
Collaborator Author

@ngxson ngxson Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I mean: 4840d2f

It does work better now, num_hidden_layers is no longer missing. However, for smolvlm, some configs are still missing entirely in the to_dict(), like num_attention_heads or hidden_size. Though I think it's not very important for now. Alternative way, we can get them from AutoConfig object before to_dict()

hparams["architectures"] = architectures
return hparams
try:
return AutoConfig.from_pretrained(dir_model, trust_remote_code=True).to_dict()
Copy link
Collaborator

@compilade compilade Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return AutoConfig.from_pretrained(dir_model, trust_remote_code=True).to_dict()
return AutoConfig.from_pretrained(dir_model, trust_remote_code=False).to_dict()

I don't think trust_remote_code=True should be the default here.

If a model uses a custom module, then hopefully it also has the relevant information in config.json (and/or we can assume some defaults as usual).

I would prefer to avoid running arbitrary code from the config of the models.

(Rethinking about this, and maybe removing that trust would (partially) defeat the purpose of using AutoConfig...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why it left that trust_remote_code=True was because we already had it in some places in the code, mostly to load the tokenizer. But on second thought, yeah this can be a huge security risk. Basically any bad actors can just trick user to try their model, and they can add a command execution inside.

I think the fallback to config.json should work well for now, given that not many models need trust_remote_code

One thing I'm a bit concern though, should we guard all other places with trust_remote_code=True behind a flag, like for example --trust-remote-code? This is good for performance, but can be quite a bad UX for people don't know why it's needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants