-
Notifications
You must be signed in to change notification settings - Fork 26
LLaVAOneVision1_5 Support #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced a new monkey patching module for LLaVAOneVision1_5 to apply Liger kernel optimizations. - Implemented support for rotary positional embeddings, RMS normalization, cross-entropy loss, and SwiGLU activations. - Added functionality to patch existing model instances with optimized components. - Included version checks and warnings for transformers compatibility.
…LLaVA OneVision 1.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| original_inputs = inputs_embeds | ||
| inputs_embeds, indices, cu_seq_lens, max_seqlen_in_batch = _unpad_input(inputs_embeds, attention_mask) | ||
| if get_ulysses_sequence_parallel_world_size() > 1: | ||
| inputs_embeds = slice_input_tensor(inputs_embeds, dim=0, padding=True) | ||
| bs, seqlen = original_inputs.shape[:2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_forward crashes when cu_seq_lens supplied
model_forward uses bs, seqlen = original_inputs.shape[:2] after only setting original_inputs inside the two cu_seq_lens is None branches. When callers provide an explicit cu_seq_lens/indices (e.g., reusing already unpadded inputs for sequence packing or Ulysses shards), neither branch runs and original_inputs is undefined, causing an UnboundLocalError before any computation. This makes the optimized path unusable for prepacked inputs until original_inputs is initialized or seqlen is derived from the provided lengths.
Useful? React with 👍 / 👎.
| image_token_id = self.config.image_token_id | ||
| video_token_id = self.config.video_token_id | ||
| vision_start_token_id = self.config.vision_start_token_id | ||
|
|
||
| vision_start_mask = input_ids == vision_start_token_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing vision_start_token_id in config breaks generation utilities
Both get_rope_index and _get_image_nums_and_video_nums rely on self.config.vision_start_token_id, but Llavaonevision1_5Config.__init__ never defines that attribute (it sets only image_token_id, video_token_id, etc., at lines 249‑273 of the config). Creating a config without an explicit vision_start_token_id entry (the default class instantiation or any checkpoint omitting the key) will raise AttributeError as soon as these helpers run during position-id construction or beam expansion. A default or propagated vision_start_token_id needs to be added to the config to keep generation paths working.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces support for the LLaVAOneVision1_5 model variant, a multimodal vision-language model. The implementation includes model configuration, architecture definition, Liger kernel optimizations for efficient training, and sequence packing utilities for improved throughput.
Key changes:
- New model architecture with Rice vision encoder and LLaVA text decoder supporting image/video inputs
- Liger kernel integration for fused linear cross-entropy loss and optimized attention/normalization
- Sequence packing support via custom operations for variable-length batch processing
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 31 comments.
Show a summary per file
| File | Description |
|---|---|
configuration_llavaonevision1_5.py |
Defines configuration classes for vision (RiceConfig), text (LLaVAOneVision1_5_TextConfig), and main model (Llavaonevision1_5Config) with support for rotary embeddings and sliding window attention |
modeling_llavaonevision1_5.py |
Implements the complete model architecture including Rice vision transformer, text model, and conditional generation class with multimodal fusion |
monkey_patch.py |
Provides Liger kernel patching functions for optimizing text model components (RoPE, RMS norm, SwiGLU MLP) and sequence packing integration |
llava_onevision1_5_ops.py |
Implements sequence packing operations for the text model, decoder layers, and flash attention with variable-length support |
llava_onevision1_5_liger.py |
Custom forward pass using fused linear cross-entropy loss for memory-efficient training with optional sequence packing |
__init__.py files |
Register the model and expose relevant classes/functions at package level |
Comments suppressed due to low confidence (3)
src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_liger.py:15
- Except block directly handles BaseException.
except:
src/lmms_engine/models/llava_onevision1_5/llava_onevision1_5_ops.py:49
- Except block directly handles BaseException.
except:
src/lmms_engine/models/llava_onevision1_5/monkey_patch.py:15
- Except block directly handles BaseException.
except:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| rope=rope, | ||
| cross_entropy=cross_entropy, | ||
| fused_linear_cross_entropy=False, # Already handled at the top level | ||
| rms_norm=rms_norm, | ||
| swiglu=swiglu, | ||
| model=language_model, | ||
| use_rmpad=use_rmpad, |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation uses tabs instead of spaces. Python PEP 8 recommends using 4 spaces for indentation, not tabs. This is inconsistent with the rest of the codebase and can lead to mixed indentation issues.
|
|
||
| try: | ||
| from flash_attn.layers.rotary import apply_rotary_emb_func | ||
| except: |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.
| except: | |
| except ImportError: |
| from liger_kernel.transformers.fused_linear_cross_entropy import ( | ||
| LigerFusedLinearCrossEntropyLoss, | ||
| ) | ||
| except: |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause catches all exceptions including system exceptions like KeyboardInterrupt and SystemExit. This should be except ImportError: or at minimum except Exception: to avoid catching system-level exceptions that should propagate.
| except: | |
| except ImportError: |
| @@ -0,0 +1,276 @@ | |||
| """LLaVALLaVAOneVision1_5 model configuration""" | |||
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the docstring: "LLaVALLaVAOneVision1_5" should be "LLaVAOneVision1_5" (the "LLaVA" prefix is duplicated).
| """LLaVALLaVAOneVision1_5 model configuration""" | |
| """LLaVAOneVision1_5 model configuration""" |
| super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs) | ||
|
|
||
|
|
||
| class Llavaonevision1_5Config(PretrainedConfig): |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent naming: The class name Llavaonevision1_5Config uses lowercase "llava" while other classes in the same file use LLaVAOneVision1_5 with proper casing. This should be LLaVAOneVision1_5Config to maintain consistency with naming conventions throughout the codebase.
| try: | ||
| from flash_attn.layers.rotary import apply_rotary_emb_func | ||
| except: | ||
| apply_rotary_emb_func = None | ||
| logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version") |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'apply_rotary_emb_func' is not used.
| try: | |
| from flash_attn.layers.rotary import apply_rotary_emb_func | |
| except: | |
| apply_rotary_emb_func = None | |
| logger.warning_once("fail to load faster rotary ops, use PyTorch version by default. Please check image version") |
| from loguru import logger | ||
| from transformers import PreTrainedModel | ||
|
|
||
| from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'apply_liger_kernel_to_aero' is not used.
| from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero |
|
|
||
| from lmms_engine.models.aero.monkey_patch import apply_liger_kernel_to_aero | ||
| from lmms_engine.models.monkey_patch import MONKEY_PATCHER | ||
| from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3 |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'apply_liger_kernel_to_qwen3' is not used.
| from lmms_engine.models.qwen3.monkey_patch import apply_liger_kernel_to_qwen3 |
| except: | ||
| print("Liger Kernel is not installed, pip install liger-kernel to use this patch") | ||
|
|
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print statement may execute during import.
| except: | |
| print("Liger Kernel is not installed, pip install liger-kernel to use this patch") | |
| liger_kernel_available = True | |
| except: | |
| liger_kernel_available = False |
| from liger_kernel.transformers.rope import liger_rotary_pos_emb | ||
| from liger_kernel.transformers.swiglu import LigerSwiGLUMLP | ||
| except: | ||
| print("liger kernel not installed, please install it with `pip install liger-kernel`") |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print statement may execute during import.
| print("liger kernel not installed, please install it with `pip install liger-kernel`") | |
| raise ImportError("liger kernel not installed, please install it with `pip install liger-kernel`") |
kcz358
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the parts LGTM to me. Better if can add examples and test case also.
I have a small question regarding the auto class for huggingface. I remember that since LLaVA OV 1.5 contains an auto mapping on the huggingface hub. The class type for AutoConfig would be different to the pretrained config you defined here and thus would not make the patching ops work?
|
Hi @Jinghao-Guo , can run a lint before the final merge. Thanks! |
This pull request introduces support for the new
LLaVAOneVision1_5model variant in the codebase, including its configuration, registration, and integration with the Liger kernel for efficient training and inference. The main changes are grouped below.New Model Integration
LLaVAOneVision1_5model and configuration classes (Llavaonevision1_5Config,LLaVAOneVision1_5_ForConditionalGeneration, and related kernel patch) to the codebase, enabling use of this model for conditional generation tasks. [1] [2] [3] [4]API and Import Updates
__init__.pyfiles to expose the new model, its configuration, and kernel patch functions at the package level, making them available for import and use. [1] [2] [3]Liger Kernel Integration
LLaVAOneVision1_5_ForConditionalGenerationusing the Liger fused linear cross-entropy loss for efficient training, with support for sequence packing and inference modes.Model Configuration
LLaVAOneVision1_5, supporting advanced features like rotary position embeddings, sliding window attention, and flexible backbone initialization.These changes collectively enable the use of the new
LLaVAOneVision1_5model variant, ensure it is properly registered and exposed, and optimize its training and inference performance with Liger kernel support.Motivation
Modifications
Commit Message Convention
Please follow our standardized commit message format:
[feat]- New features or functionality[fix]- Bug fixes[docs]- Documentation changes only[style]- Code style changes (formatting, missing semicolons, etc.)[refactor]- Code refactoring without changing functionality[perf]- Performance improvements[test]- Adding or updating tests[chore]- Maintenance tasks, dependency updates, etc.[ci]- CI/CD configuration changesExamples:
[feat] add qwen omni iterable dataset support[fix] resolve bagel model configuration error[docs] update training guide with YAML examplesSee CONTRIBUTING.md for more details.
CI/CD Checks
Your PR will automatically run the following checks:
black(line-length=120) and import sorting withisortpre-commit run --all-fileslocally to verify before pushingChecklist
pre-commit run --all-filesand ensure all checks passblack(line-length=120) andisort