Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions src/transformers/processing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1432,14 +1432,34 @@ def apply_chat_template(
"template_kwargs": {},
}

# Get the kwargs type annotation from __call__, if any
typing_processor_kwargs_class = self.__call__.__annotations__.get("kwargs")
processor_defaults_kwargs = {}

# Retrieve processor default kwargs
if typing_processor_kwargs_class:
processor_kwargs_class = typing_processor_kwargs_class.__args__[0]
processor_defaults = getattr(processor_kwargs_class, "_defaults", {})

# This combines all default values from different categories (like text_kwargs, images_kwargs, etc.)
processor_defaults_kwargs = {k: v for values in processor_defaults.values() for k, v in values.items()}

Comment on lines +1435 to +1446
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting way too overengineered imo and it's not this PR's fault. Prob it works for a while but a longer term solution I had is to use dataclasses or similar to split out kwargs per type

Couldn't get time to draft it though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this is too complicated 😅 I considered an attempt to reuse _merge_kwargs (which hides a lot of similarly complex logic), but it was not obvious at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short term option would be to just special-case the missing params (fps and sampling_rate) instead of doing it in a general way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, same, but _merge_kwargs wasn't much generalizable unless rewritten 🥲

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I wanted to go for a general approach but it's indeed overengineered. Since it is only relevant for fps and sampling_rate, I am gonna special-case if. Thanks both for the feedback!

for kwarg_type in processed_kwargs:
for key in AllKwargsForChatTemplate.__annotations__[kwarg_type].__annotations__.keys():
kwarg_type_defaults = AllKwargsForChatTemplate.__annotations__[kwarg_type]
default_value = getattr(kwarg_type_defaults, key, None)

# handle the two naming conventions for fps: video_fps in load kwargs and fps in processor kwargs
processor_key = key if key != "video_fps" else "fps"
default_value = processor_defaults_kwargs.get(processor_key, getattr(kwarg_type_defaults, key, None))
Comment on lines +1451 to +1453
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, totally missed this one, better if we align in naming for future models and use either video_fps or fps. Since we already added qwen with fps, imo we can converge under that name and deprecate video_fps in this PR

value = kwargs.pop(key, default_value)

if value is not None and not isinstance(value, dict):
processed_kwargs[kwarg_type][key] = value

# If the key is in the processor defaults, we need to pass it to the processor
if processor_key in processor_defaults_kwargs and processor_key not in kwargs:
kwargs[processor_key] = value

if isinstance(conversation, (list, tuple)) and (
isinstance(conversation[0], (list, tuple)) or hasattr(conversation[0], "content")
):
Expand Down Expand Up @@ -1509,7 +1529,7 @@ def apply_chat_template(
num_frames=mm_load_kwargs.get("num_frames", None),
fps=mm_load_kwargs.get("video_fps", None),
backend=mm_load_kwargs["video_load_backend"],
**kwargs,
**{k: v for k, v in kwargs.items() if k != "fps"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik qwen doesn't have a special load_for_model defined, why do we need this?

)
videos.append(video)
video_metadata.append(metadata)
Expand Down