Skip to content

Conversation

@Jinghao-Guo
Copy link
Collaborator

This pull request introduces support for the new LLaVAOneVision1_5 model variant in the codebase, including its configuration, registration, and integration with the Liger kernel for efficient training and inference. The main changes are grouped below.

New Model Integration

  • Added LLaVAOneVision1_5 model and configuration classes (Llavaonevision1_5Config, LLaVAOneVision1_5_ForConditionalGeneration, and related kernel patch) to the codebase, enabling use of this model for conditional generation tasks. [1] [2] [3] [4]
  • Registered the new model with the internal mapping function to ensure it is discoverable and usable through the standard model loading APIs.

API and Import Updates

  • Updated the __init__.py files to expose the new model, its configuration, and kernel patch functions at the package level, making them available for import and use. [1] [2] [3]

Liger Kernel Integration

  • Implemented a custom forward method for LLaVAOneVision1_5_ForConditionalGeneration using the Liger fused linear cross-entropy loss for efficient training, with support for sequence packing and inference modes.

Model Configuration

  • Defined detailed configuration classes for both the vision and text backbones of LLaVAOneVision1_5, supporting advanced features like rotary position embeddings, sliding window attention, and flexible backbone initialization.

These changes collectively enable the use of the new LLaVAOneVision1_5 model variant, ensure it is properly registered and exposed, and optimize its training and inference performance with Liger kernel support.

Motivation

Modifications

Commit Message Convention

Please follow our standardized commit message format:

  • [feat] - New features or functionality
  • [fix] - Bug fixes
  • [docs] - Documentation changes only
  • [style] - Code style changes (formatting, missing semicolons, etc.)
  • [refactor] - Code refactoring without changing functionality
  • [perf] - Performance improvements
  • [test] - Adding or updating tests
  • [chore] - Maintenance tasks, dependency updates, etc.
  • [ci] - CI/CD configuration changes

Examples:

  • [feat] add qwen omni iterable dataset support
  • [fix] resolve bagel model configuration error
  • [docs] update training guide with YAML examples

See CONTRIBUTING.md for more details.

CI/CD Checks

Your PR will automatically run the following checks:

  • Linting: Code formatting with black (line-length=120) and import sorting with isort
  • Run pre-commit run --all-files locally to verify before pushing

Checklist

  • Follow commit message convention (see above)
  • Run pre-commit run --all-files and ensure all checks pass
  • Format your code with black (line-length=120) and isort
  • Add unit tests for new functionality
  • Update documentation as needed, including docstrings or example tutorials
  • Ensure all CI/CD checks pass

- Introduced a new monkey patching module for LLaVAOneVision1_5 to apply Liger kernel optimizations.
- Implemented support for rotary positional embeddings, RMS normalization, cross-entropy loss, and SwiGLU activations.
- Added functionality to patch existing model instances with optimized components.
- Included version checks and warnings for transformers compatibility.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +86 to +90
original_inputs = inputs_embeds
inputs_embeds, indices, cu_seq_lens, max_seqlen_in_batch = _unpad_input(inputs_embeds, attention_mask)
if get_ulysses_sequence_parallel_world_size() > 1:
inputs_embeds = slice_input_tensor(inputs_embeds, dim=0, padding=True)
bs, seqlen = original_inputs.shape[:2]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge model_forward crashes when cu_seq_lens supplied

model_forward uses bs, seqlen = original_inputs.shape[:2] after only setting original_inputs inside the two cu_seq_lens is None branches. When callers provide an explicit cu_seq_lens/indices (e.g., reusing already unpadded inputs for sequence packing or Ulysses shards), neither branch runs and original_inputs is undefined, causing an UnboundLocalError before any computation. This makes the optimized path unusable for prepacked inputs until original_inputs is initialized or seqlen is derived from the provided lengths.

Useful? React with 👍 / 👎.

Comment on lines +1980 to +1984
image_token_id = self.config.image_token_id
video_token_id = self.config.video_token_id
vision_start_token_id = self.config.vision_start_token_id

vision_start_mask = input_ids == vision_start_token_id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Missing vision_start_token_id in config breaks generation utilities

Both get_rope_index and _get_image_nums_and_video_nums rely on self.config.vision_start_token_id, but Llavaonevision1_5Config.__init__ never defines that attribute (it sets only image_token_id, video_token_id, etc., at lines 249‑273 of the config). Creating a config without an explicit vision_start_token_id entry (the default class instantiation or any checkpoint omitting the key) will raise AttributeError as soon as these helpers run during position-id construction or beam expansion. A default or propagated vision_start_token_id needs to be added to the config to keep generation paths working.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces support for the LLaVAOneVision1_5 model variant, a multimodal vision-language model. The implementation includes model configuration, architecture definition, Liger kernel optimizations for efficient training, and sequence packing utilities for improved throughput.

Key changes:

  • New model architecture with Rice vision encoder and LLaVA text decoder supporting image/video inputs
  • Liger kernel integration for fused linear cross-entropy loss and optimized attention/normalization
  • Sequence packing support via custom operations for variable-length batch processing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 31 comments.

Show a summary per file
File Description
configuration_llavaonevision1_5.py Defines configuration classes for vision (RiceConfig), text (LLaVAOneVision1_5_TextConfig), and main model (Llavaonevision1_5Config) with support for rotary embeddings and sliding window attention
modeling_llavaonevision1_5.py Implements the complete model architecture including Rice vision transformer, text model, and conditional generation class with multimodal fusion
monkey_patch.py Provides Liger kernel patching functions for optimizing text model components (RoPE, RMS norm, SwiGLU MLP) and sequence packing integration
llava_onevision1_5_ops.py Implements sequence packing operations for the text model, decoder layers, and flash attention with variable-length support
llava_onevision1_5_liger.py Custom forward pass using fused linear cross-entropy loss for memory-efficient training with optional sequence packing
__init__.py files Register the model and expose relevant classes/functions at package level
Comments suppressed due to low confidence (3)

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_liger.py:15

  • Except block directly handles BaseException.
except:

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py:49

  • Except block directly handles BaseException.
except:

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py:15

  • Except block directly handles BaseException.
except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +146
rope=rope,
cross_entropy=cross_entropy,
fused_linear_cross_entropy=False, # Already handled at the top level
rms_norm=rms_norm,
swiglu=swiglu,
model=language_model,
use_rmpad=use_rmpad,
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation uses tabs instead of spaces. Python PEP 8 recommends using 4 spaces for indentation, not tabs. This is inconsistent with the rest of the codebase and can lead to mixed indentation issues.

Copilot uses AI. Check for mistakes.

try:
from flash_attn.layers.rotary import apply_rotary_emb_func
except:
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.

Suggested change
except:
except ImportError:

Copilot uses AI. Check for mistakes.
from liger_kernel.transformers.fused_linear_cross_entropy import (
LigerFusedLinearCrossEntropyLoss,
)
except:
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.

Suggested change
except:
except ImportError:

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,276 @@
"""LLaVALLaVAOneVision1_5 model configuration"""
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the docstring: "LLaVALLaVAOneVision1_5" should be "LLaVAOneVision1_5" (the "LLaVA" prefix is duplicated).

Suggested change
"""LLaVALLaVAOneVision1_5 model configuration"""
"""LLaVAOneVision1_5 model configuration"""

Copilot uses AI. Check for mistakes.
super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)


class Llavaonevision1_5Config(PretrainedConfig):
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: The class name Llavaonevision1_5Config uses lowercase "llava" while other classes in the same file use LLaVAOneVision1_5 with proper casing. This should be LLaVAOneVision1_5Config to maintain consistency with naming conventions throughout the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +51
try:
from flash_attn.layers.rotary import apply_rotary_emb_func
except:
apply_rotary_emb_func = None
logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version")
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'apply_rotary_emb_func' is not used.

Suggested change
try:
from flash_attn.layers.rotary import apply_rotary_emb_func
except:
apply_rotary_emb_func = None
logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version")

Copilot uses AI. Check for mistakes.
from loguru import logger
from transformers import PreTrainedModel

from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'apply_liger_kernel_to_aero' is not used.

Suggested change
from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero

Copilot uses AI. Check for mistakes.

from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero
from lmms_engine.models.monkey_patch import MONKEY_PATCHER
from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'apply_liger_kernel_to_qwen3' is not used.

Suggested change
from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +17
except:
print("Liger Kernel is not installed, pip install liger-kernel to use this patch")

Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statement may execute during import.

Suggested change
except:
print("Liger Kernel is not installed, pip install liger-kernel to use this patch")
liger_kernel_available = True
except:
liger_kernel_available = False

Copilot uses AI. Check for mistakes.
from liger_kernel.transformers.rope import liger_rotary_pos_emb
from liger_kernel.transformers.swiglu import LigerSwiGLUMLP
except:
print("liger kernel not installed, please install it with `pip install liger-kernel`")
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statement may execute during import.

Suggested change
print("liger kernel not installed, please install it with `pip install liger-kernel`")
raise ImportError("liger kernel not installed, please install it with `pip install liger-kernel`")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the parts LGTM to me. Better if can add examples and test case also.

I have a small question regarding the auto class for huggingface. I remember that since LLaVA OV 1.5 contains an auto mapping on the huggingface hub. The class type for AutoConfig would be different to the pretrained config you defined here and thus would not make the patching ops work?

@kcz358
Copy link
Collaborator

kcz358 commented Dec 10, 2025

Hi @Jinghao-Guo , can run a lint before the final merge. Thanks!

@kcz358 kcz358 merged commit 498e0dc into main Dec 10, 2025
2 checks passed
@kcz358 kcz358 deleted the dev/llava_onevision_1.5 branch December 10, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants