Skip to content

Conversation

pctablet505
Copy link
Collaborator

Fix for failing TAP presubmit tests:
Fix for failing tests on tensorflow backend:

This pull request addresses data type compatibility issues across different backends by standardizing the data type for position ID tensors and related operations to int32. This change ensures consistent behavior and prevents failures, particularly with the TensorFlow backend.

Backend compatibility improvements:

  • Updated the dtype argument for position_ids weights in both clip_layers.py and siglip_layers.py to use "int32" instead of int, resolving failures with TensorFlow and ensuring compatibility across all backends. [1] [2] [3]
  • Explicitly cast rel_pos to "int32" in the _make_log_bucket_position method of disentangled_self_attention.py to maintain consistent data types.

This pull request addresses integer dtype compatibility issues across different backends (TensorFlow, JAX, Torch) by explicitly setting the dtype to "int32" when creating position ID weights and casting relative positions. This ensures consistent behavior and prevents failures, particularly with TensorFlow.

Cross-backend dtype compatibility improvements:

  • Changed the dtype argument from int to "int32" in the add_weight calls for position_ids in both clip_layers.py and siglip_layers.py to ensure compatibility with all supported backends, especially TensorFlow. [1] [2] [3]
  • Added an explicit cast to "int32" for rel_pos in the _make_log_bucket_position method in disentangled_self_attention.py to standardize dtype handling.## Description of the change

pctablet505 and others added 28 commits April 17, 2025 10:26
Added checks for invalid inputs
Added tests to check invalid inputs
Fix for model not loading when using numpy behaviour with tensorflow
Updated the convert_tokenizer function to support both string and list formats for merges in the tokenizer config, improving compatibility with different tokenizer export formats.
Removed debug print statements and redundant merge format handling logic. Now only converts list-of-lists merge format to space-separated strings, streamlining the tokenizer conversion process.
Changed the dtype of position_ids from int to 'int64' in the CLIPVisionEmbedding layer to ensure correct device placement, particularly for TensorFlow backends.
Added explicit casting of rel_pos and abs_pos to int32 in the _make_log_bucket_position method to ensure correct data types for subsequent operations.
Corrects dtype casting in _make_log_bucket_position and _get_rel_pos methods to ensure consistency and prevent potential type errors. Uses the input dtype for bucket_pos and explicitly casts ids to int32.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @pctablet505, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical data type compatibility issues that were causing tests to fail, particularly with the TensorFlow backend. By standardizing integer data types to int32 for position ID tensors and related operations across various layers, the changes ensure consistent behavior and prevent future failures across different deep learning backends.

Highlights

  • Standardized Integer Dtypes: The data type for position_ids weights in clip_layers.py and siglip_layers.py has been explicitly set to "int32" to resolve TensorFlow backend failures and ensure cross-backend compatibility.
  • Consistent Relative Position Casting: The rel_pos tensor in the _make_log_bucket_position method within disentangled_self_attention.py is now explicitly cast to "int32" to maintain consistent data type handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the dtype compatibility issues that were causing test failures on the TensorFlow backend. By explicitly setting integer dtypes to "int32", you've ensured consistent behavior across TensorFlow, JAX, and PyTorch, which aligns with the project's goal of being backend-agnostic.

The changes in clip_layers.py, siglip_layers.py, and deberta_v3/disentangled_self_attention.py are all sound. I've left a few suggestions to improve the comments in clip_layers.py and siglip_layers.py for better long-term maintainability, as the current versions are either a bit informal or potentially confusing.

Overall, this is a good fix that improves the robustness of the models across different backends.

Comment on lines 197 to 200
# Let the backend determine the int32 dtype. For example, tf
# requires int64 for correct device placement, whereas jax and torch
# don't.
dtype=int,
dtype="int32",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the change above, this comment is confusing. It says "Let the backend determine the int32 dtype", but the code is now explicitly setting dtype="int32" to ensure consistency across backends. Updating the comment to reflect this would make the code easier to understand and align with the project's style of writing backend-agnostic code.1

Suggested change
# Let the backend determine the int32 dtype. For example, tf
# requires int64 for correct device placement, whereas jax and torch
# don't.
dtype=int,
dtype="int32",
# Explicitly use "int32" for backend compatibility. Using `int` can
# lead to dtype mismatches between backends (e.g., int64 on
# TensorFlow), while "int32" is consistent across all backends.
dtype="int32",

Style Guide References

Footnotes

  1. All code must be backend-agnostic, supporting TensorFlow, JAX, and PyTorch. This change enforces that by using a specific dtype instead of letting the backend decide. (link)

pctablet505 and others added 2 commits September 18, 2025 13:55
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Collaborator

@JyotinderSingh JyotinderSingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype issue is coming up due to a bug that was recently introduced in Keras. We should fix it at the core library level instead of modifying the keras-hub models.

I've reported the issue for tracking here: keras-team/keras#21677

sign = ops.sign(rel_pos)
mid = self.bucket_size // 2
mid = ops.cast(mid, dtype=dtype)
mid = ops.cast(self.bucket_size // 2, dtype=dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't seem to do anything, we can omit it.

Comment on lines +73 to +76
# Explicitly use "int32" for backend compatibility. Using `int` can
# lead to dtype mismatches between backends (e.g., int64 on
# TensorFlow), while "int32" is consistent across all backends.
dtype="int32",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't modify this to be int32 directly. The original, removed comment explains why.

In TensorFlow, "int32" tensors are typically placed on the CPU, which can cause issues with graph generation as parts of the model would reside on different devices. "int" usually maps to "int64" and stays on the GPU.

reference: https://stackoverflow.com/questions/37439299/no-gpu-kernel-for-an-int32-variable-op/37452938#37452938

Comment on lines +197 to +200
# Explicitly use "int32" for backend compatibility. Using `int` can
# lead to dtype mismatches between backends (e.g., int64 on
# TensorFlow), while "int32" is consistent across all backends.
dtype="int32",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other comment.

def _get_rel_pos(self, num_positions):
ids = ops.arange(num_positions)
ids = ops.cast(ids, dtype="int")
ids = ops.cast(ids, dtype="int32")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other comment.

@JyotinderSingh
Copy link
Collaborator

This issue was fixed in keras-team/keras#21679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants