Skip to content

Conversation

@vovanphuc
Copy link
Contributor

Summary

Add multimodal (vision-language) support for LiquidAI's LFM2.5-VL to LLaMA-Factory.

Changes

  • Add LFMVLPlugin class with dynamic image token expansion based on spatial shapes
  • Add lfm2_vl chat template with multimodal plugin support
  • Register LFM2.5-VL-1.6B model with multimodal=True
  • Add transformers version check (requires >=4.58.0)
  • Add unit test for LFMVLPlugin

Supported Models

Model HuggingFace ID
LFM2.5-VL-1.6B LiquidAI/LFM2.5-VL-1.6B

Key Features

  • Dynamic image token count: (spatial_h × spatial_w) / downsample_factor²
  • Two-phase token expansion pattern (prevents infinite loop)
  • Uses <image> token (ID 396) with SigLIP2 NaFlex vision encoder
  • Inherits tool calling support from LFM text template

Test plan

  • Unit tests pass for LFMVLPlugin
  • Template encoding verified
  • Model loads with trust_remote_code=True
  • LoRA SFT training verified with multimodal dataset

References

LFM2.5-VL requires transformers>=4.58.0 or a specific commit
(3c2517727ce28a30f5044e01663ee204deb1cdbe) due to the new
TokenizersBackend class not available in transformers 4.57.1.

This adds a version check in patcher.py that raises an informative
error message with installation instructions when the model is
loaded with an incompatible transformers version.
Fix infinite loop bug in LFMVLPlugin.process_messages() that occurred
when expanding image tokens. The issue was that both IMAGE_PLACEHOLDER
and self.image_token were `<image>`, causing the replacement loop to
repeatedly find new placeholders to expand.

Solution: Use two-phase replacement pattern (matching Qwen2VLPlugin):
1. First replace `<image>` → `{{image}}` × N (intermediate placeholder)
2. After loop, replace `{{image}}` → `<image>` (actual token)

Also adds proper _get_mm_inputs override to correctly process images
through the LFM2.5-VL image processor and retrieve spatial_shapes
tensor for calculating dynamic token counts per image.

Token calculation: (spatial_h × spatial_w) / (downsample_factor²)
Rename template and plugin from `lfm_vl` to `lfm2_vl` to match
the model's config.model_type ("lfm2_vl"), following the same
pattern as qwen2_vl.

Files updated:
- mm_plugin.py: Plugin registration
- template.py: Template name and mm_plugin reference
- constants.py: Model group template reference
- test_mm_plugin.py: Test function and variable names
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @vovanphuc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates LiquidAI's LFM2.5-VL vision-language model into the LLaMA-Factory, significantly enhancing its multimodal capabilities. The changes introduce a novel approach to handling image tokens dynamically based on image resolution, alongside a tailored chat template and necessary system configurations, allowing the framework to process and generate responses that incorporate both text and visual information.

Highlights

  • LFM2.5-VL Model Support: Added comprehensive support for LiquidAI's LFM2.5-VL vision-language model, enabling multimodal capabilities within the framework.
  • Dynamic Image Token Expansion: Implemented a new LFMVLPlugin class that dynamically expands image tokens based on the spatial resolution of input images, using a two-phase token expansion pattern.
  • New Chat Template: Introduced an lfm2_vl chat template, specifically designed to integrate with the new multimodal plugin and support the LFM2.5-VL model's conversational structure.
  • Model Registration and Version Check: Registered the LFM2.5-VL-1.6B model and added a transformers version check, requiring version >=4.58.0 for compatibility.
  • Unit Testing: Included a dedicated unit test for the LFMVLPlugin to ensure its correct instantiation and functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga hiyouga merged commit 958fb52 into hiyouga:main Jan 7, 2026
17 checks passed
@hiyouga hiyouga added the solved This problem has been already solved label Jan 7, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the LiquidAI's LFM2.5-VL vision-language model. The changes include a new LFMVLPlugin for dynamic image token expansion, a corresponding chat template, model registration, and a transformers version check. The implementation is mostly correct, but I've found a critical issue in the LFMVLPlugin regarding image batching that will affect training with batch sizes greater than one. I've also suggested an improvement to the unit test to cover the new plugin's core logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

solved This problem has been already solved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants