LLaVAOneVision1_5 Support #101

Jinghao-Guo · 2025-12-02T11:07:41Z

This pull request introduces support for the new LLaVAOneVision1_5 model variant in the codebase, including its configuration, registration, and integration with the Liger kernel for efficient training and inference. The main changes are grouped below.

New Model Integration

Added LLaVAOneVision1_5 model and configuration classes (Llavaonevision1_5Config, LLaVAOneVision1_5_ForConditionalGeneration, and related kernel patch) to the codebase, enabling use of this model for conditional generation tasks. [1] [2] [3] [4]
Registered the new model with the internal mapping function to ensure it is discoverable and usable through the standard model loading APIs.

API and Import Updates

Updated the __init__.py files to expose the new model, its configuration, and kernel patch functions at the package level, making them available for import and use. [1] [2] [3]

Liger Kernel Integration

Implemented a custom forward method for LLaVAOneVision1_5_ForConditionalGeneration using the Liger fused linear cross-entropy loss for efficient training, with support for sequence packing and inference modes.

Model Configuration

Defined detailed configuration classes for both the vision and text backbones of LLaVAOneVision1_5, supporting advanced features like rotary position embeddings, sliding window attention, and flexible backbone initialization.

These changes collectively enable the use of the new LLaVAOneVision1_5 model variant, ensure it is properly registered and exposed, and optimize its training and inference performance with Liger kernel support.

Motivation

Modifications

Commit Message Convention

Please follow our standardized commit message format:

[feat] - New features or functionality
[fix] - Bug fixes
[docs] - Documentation changes only
[style] - Code style changes (formatting, missing semicolons, etc.)
[refactor] - Code refactoring without changing functionality
[perf] - Performance improvements
[test] - Adding or updating tests
[chore] - Maintenance tasks, dependency updates, etc.
[ci] - CI/CD configuration changes

Examples:

[feat] add qwen omni iterable dataset support
[fix] resolve bagel model configuration error
[docs] update training guide with YAML examples

See CONTRIBUTING.md for more details.

CI/CD Checks

Your PR will automatically run the following checks:

Linting: Code formatting with black (line-length=120) and import sorting with isort
Run pre-commit run --all-files locally to verify before pushing

Checklist

Follow commit message convention (see above)
Run pre-commit run --all-files and ensure all checks pass
Format your code with black (line-length=120) and isort
Add unit tests for new functionality
Update documentation as needed, including docstrings or example tutorials
Ensure all CI/CD checks pass

- Introduced a new monkey patching module for LLaVAOneVision1_5 to apply Liger kernel optimizations. - Implemented support for rotary positional embeddings, RMS normalization, cross-entropy loss, and SwiGLU activations. - Added functionality to patch existing model instances with optimized components. - Included version checks and warnings for transformers compatibility.

…LLaVA OneVision 1.5

…atches

…function

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-02T11:12:23Z

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py

+        original_inputs = inputs_embeds
+        inputs_embeds, indices, cu_seq_lens, max_seqlen_in_batch = _unpad_input(inputs_embeds, attention_mask)
+        if get_ulysses_sequence_parallel_world_size() > 1:
+            inputs_embeds = slice_input_tensor(inputs_embeds, dim=0, padding=True)
+    bs, seqlen = original_inputs.shape[:2]


model_forward crashes when cu_seq_lens supplied

model_forward uses bs, seqlen = original_inputs.shape[:2] after only setting original_inputs inside the two cu_seq_lens is None branches. When callers provide an explicit cu_seq_lens/indices (e.g., reusing already unpadded inputs for sequence packing or Ulysses shards), neither branch runs and original_inputs is undefined, causing an UnboundLocalError before any computation. This makes the optimized path unusable for prepacked inputs until original_inputs is initialized or seqlen is derived from the provided lengths.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-02T11:12:24Z

src/lmms_engine/models/llava_onevision1_5/modeling_llavaonevision1_5.py

+        image_token_id = self.config.image_token_id
+        video_token_id = self.config.video_token_id
+        vision_start_token_id = self.config.vision_start_token_id
+
+        vision_start_mask = input_ids == vision_start_token_id


Missing vision_start_token_id in config breaks generation utilities

Both get_rope_index and _get_image_nums_and_video_nums rely on self.config.vision_start_token_id, but Llavaonevision1_5Config.__init__ never defines that attribute (it sets only image_token_id, video_token_id, etc., at lines 249‑273 of the config). Creating a config without an explicit vision_start_token_id entry (the default class instantiation or any checkpoint omitting the key) will raise AttributeError as soon as these helpers run during position-id construction or beam expansion. A default or propagated vision_start_token_id needs to be added to the config to keep generation paths working.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This pull request introduces support for the LLaVAOneVision1_5 model variant, a multimodal vision-language model. The implementation includes model configuration, architecture definition, Liger kernel optimizations for efficient training, and sequence packing utilities for improved throughput.

Key changes:

New model architecture with Rice vision encoder and LLaVA text decoder supporting image/video inputs
Liger kernel integration for fused linear cross-entropy loss and optimized attention/normalization
Sequence packing support via custom operations for variable-length batch processing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 31 comments.

Show a summary per file

File	Description
`configuration_llavaonevision1_5.py`	Defines configuration classes for vision (RiceConfig), text (LLaVAOneVision1_5_TextConfig), and main model (Llavaonevision1_5Config) with support for rotary embeddings and sliding window attention
`modeling_llavaonevision1_5.py`	Implements the complete model architecture including Rice vision transformer, text model, and conditional generation class with multimodal fusion
`monkey_patch.py`	Provides Liger kernel patching functions for optimizing text model components (RoPE, RMS norm, SwiGLU MLP) and sequence packing integration
`llava_onevision1_5_ops.py`	Implements sequence packing operations for the text model, decoder layers, and flash attention with variable-length support
`llava_onevision1_5_liger.py`	Custom forward pass using fused linear cross-entropy loss for memory-efficient training with optional sequence packing
`__init__.py` files	Register the model and expose relevant classes/functions at package level

Comments suppressed due to low confidence (3)

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_liger.py:15

Except block directly handles BaseException.

except:

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py:49

Except block directly handles BaseException.

except:

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py:15

Except block directly handles BaseException.

except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-02T11:15:44Z

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py

+        rope=rope,
+        cross_entropy=cross_entropy,
+        fused_linear_cross_entropy=False,  # Already handled at the top level
+        rms_norm=rms_norm,
+        swiglu=swiglu,
+        model=language_model,
+        use_rmpad=use_rmpad,


The indentation uses tabs instead of spaces. Python PEP 8 recommends using 4 spaces for indentation, not tabs. This is inconsistent with the rest of the codebase and can lead to mixed indentation issues.

Copilot · 2025-12-02T11:15:44Z

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py

+
+try:
+    from flash_attn.layers.rotary import apply_rotary_emb_func
+except:


Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.

Suggested change

except:

except ImportError:

Copilot · 2025-12-02T11:15:44Z

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_liger.py

+    from liger_kernel.transformers.fused_linear_cross_entropy import (
+        LigerFusedLinearCrossEntropyLoss,
+    )
+except:


Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.

Suggested change

except:

except ImportError:

Copilot · 2025-12-02T11:15:45Z

src/lmms_engine/models/llava_onevision1_5/configuration_llavaonevision1_5.py

@@ -0,0 +1,276 @@
+"""LLaVALLaVAOneVision1_5 model configuration"""


There's a typo in the docstring: "LLaVALLaVAOneVision1_5" should be "LLaVAOneVision1_5" (the "LLaVA" prefix is duplicated).

Suggested change

"""LLaVALLaVAOneVision1_5 model configuration"""

"""LLaVAOneVision1_5 model configuration"""

Copilot · 2025-12-02T11:15:45Z

src/lmms_engine/models/llava_onevision1_5/configuration_llavaonevision1_5.py

+        super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
+
+
+class Llavaonevision1_5Config(PretrainedConfig):


Inconsistent naming: The class name Llavaonevision1_5Config uses lowercase "llava" while other classes in the same file use LLaVAOneVision1_5 with proper casing. This should be LLaVAOneVision1_5Config to maintain consistency with naming conventions throughout the codebase.

Copilot · 2025-12-02T11:15:51Z

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py

+try:
+    from flash_attn.layers.rotary import apply_rotary_emb_func
+except:
+    apply_rotary_emb_func = None
+    logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version")


Import of 'apply_rotary_emb_func' is not used.

Suggested change

try:

from flash_attn.layers.rotary import apply_rotary_emb_func

except:

apply_rotary_emb_func = None

logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version")

Copilot · 2025-12-02T11:15:52Z

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py

+from loguru import logger
+from transformers import PreTrainedModel
+
+from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero


Import of 'apply_liger_kernel_to_aero' is not used.

Suggested change

from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero

Copilot · 2025-12-02T11:15:52Z

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py

+
+from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero
+from lmms_engine.models.monkey_patch import MONKEY_PATCHER
+from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3


Import of 'apply_liger_kernel_to_qwen3' is not used.

Suggested change

from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3

Copilot · 2025-12-02T11:15:52Z

src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_liger.py

+except:
+    print("Liger Kernel is not installed, pip install liger-kernel to use this patch")
+


Print statement may execute during import.

Suggested change

except:

print("Liger Kernel is not installed, pip install liger-kernel to use this patch")

liger_kernel_available = True

except:

liger_kernel_available = False

Copilot · 2025-12-02T11:15:53Z

src/lmms_engine/models/llava_onevision1_5/monkey_patch.py

+    from liger_kernel.transformers.rope import liger_rotary_pos_emb
+    from liger_kernel.transformers.swiglu import LigerSwiGLUMLP
+except:
+    print("liger kernel not installed, please install it with `pip install liger-kernel`")


Print statement may execute during import.

Suggested change

print("liger kernel not installed, please install it with `pip install liger-kernel`")

raise ImportError("liger kernel not installed, please install it with `pip install liger-kernel`")

kcz358

Most of the parts LGTM to me. Better if can add examples and test case also.

I have a small question regarding the auto class for huggingface. I remember that since LLaVA OV 1.5 contains an auto mapping on the huggingface hub. The class type for AutoConfig would be different to the pretrained config you defined here and thus would not make the patching ops work?

…model

kcz358 · 2025-12-10T08:12:38Z

Hi @Jinghao-Guo , can run a lint before the final merge. Thanks!

…1.5 (#114)

Jinghao-Guo added 3 commits December 1, 2025 23:11

feat: Enhance model_forward and attn_forward for sequence packing in …

3f12d70

…LLaVA OneVision 1.5

feat: Refactor LLaVA OneVision 1.5 model imports and enhance monkey p…

10e6ef0

…atches

Jinghao-Guo requested review from Copilot and kcz358 December 2, 2025 11:07

Copilot started reviewing on behalf of Jinghao-Guo December 2, 2025 11:08 View session

fix: Correct indentation in apply_liger_kernel_to_llava_onevision1_5 …

b6228d9

…function

Copilot finished reviewing on behalf of Jinghao-Guo December 2, 2025 11:10

chatgpt-codex-connector bot reviewed Dec 2, 2025

View reviewed changes

Copilot AI reviewed Dec 2, 2025

View reviewed changes

kcz358 approved these changes Dec 4, 2025

View reviewed changes

Jinghao-Guo added 3 commits December 9, 2025 16:29

feat: Add example configuration and training script

b42fd4b

feat: Add training scripts and documentation for LLaVA-OneVision 1.5 …

6eeda7d

…model

fix: Update configuration path in run.sh for LLaVA OneVision 1.5

4da476d

kcz358 approved these changes Dec 10, 2025

View reviewed changes

Jinghao-Guo added 2 commits December 10, 2025 21:52

feat: Update configuration and model integration for LLaVA OneVision …

4923c57

…1.5 (#114)

Update processor and load paths for llava_onevision1_5

e497d7c

kcz358 merged commit 498e0dc into main Dec 10, 2025
2 checks passed

kcz358 deleted the dev/llava_onevision_1.5 branch December 10, 2025 14:01

		@@ -0,0 +1,276 @@
		"""LLaVALLaVAOneVision1_5 model configuration"""

	"""LLaVALLaVAOneVision1_5 model configuration"""
	"""LLaVAOneVision1_5 model configuration"""

		super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)


		class Llavaonevision1_5Config(PretrainedConfig):

		except:
		print("Liger Kernel is not installed, pip install liger-kernel to use this patch")

-except:
-    print("Liger Kernel is not installed, pip install liger-kernel to use this patch")
+    liger_kernel_available = True
+except:
+    liger_kernel_available = False

	print("liger kernel not installed, please install it with `pip install liger-kernel`")
	raise ImportError("liger kernel not installed, please install it with `pip install liger-kernel`")

LLaVAOneVision1_5 Support #101

LLaVAOneVision1_5 Support #101

Uh oh!

Conversation

Jinghao-Guo commented Dec 2, 2025

New Model Integration

API and Import Updates

Liger Kernel Integration

Model Configuration

Motivation

Modifications

Commit Message Convention

CI/CD Checks

Checklist

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment

Choose a reason for hiding this comment

Uh oh!

kcz358 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants