-
Notifications
You must be signed in to change notification settings - Fork 296
Fix for failing tests due to dtype issue. #2407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added checks for invalid inputs
Added tests to check invalid inputs
Fix for model not loading when using numpy behaviour with tensorflow
This reverts commit 3fdc7fd.
Updated the convert_tokenizer function to support both string and list formats for merges in the tokenizer config, improving compatibility with different tokenizer export formats.
Removed debug print statements and redundant merge format handling logic. Now only converts list-of-lists merge format to space-separated strings, streamlining the tokenizer conversion process.
Changed the dtype of position_ids from int to 'int64' in the CLIPVisionEmbedding layer to ensure correct device placement, particularly for TensorFlow backends.
Added explicit casting of rel_pos and abs_pos to int32 in the _make_log_bucket_position method to ensure correct data types for subsequent operations.
Corrects dtype casting in _make_log_bucket_position and _get_rel_pos methods to ensure consistency and prevent potential type errors. Uses the input dtype for bucket_pos and explicitly casts ids to int32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @pctablet505, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses critical data type compatibility issues that were causing tests to fail, particularly with the TensorFlow backend. By standardizing integer data types to int32
for position ID tensors and related operations across various layers, the changes ensure consistent behavior and prevent future failures across different deep learning backends.
Highlights
- Standardized Integer Dtypes: The data type for
position_ids
weights inclip_layers.py
andsiglip_layers.py
has been explicitly set to"int32"
to resolve TensorFlow backend failures and ensure cross-backend compatibility. - Consistent Relative Position Casting: The
rel_pos
tensor in the_make_log_bucket_position
method withindisentangled_self_attention.py
is now explicitly cast to"int32"
to maintain consistent data type handling.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands on the current page.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses the dtype
compatibility issues that were causing test failures on the TensorFlow backend. By explicitly setting integer dtypes to "int32"
, you've ensured consistent behavior across TensorFlow, JAX, and PyTorch, which aligns with the project's goal of being backend-agnostic.
The changes in clip_layers.py
, siglip_layers.py
, and deberta_v3/disentangled_self_attention.py
are all sound. I've left a few suggestions to improve the comments in clip_layers.py
and siglip_layers.py
for better long-term maintainability, as the current versions are either a bit informal or potentially confusing.
Overall, this is a good fix that improves the robustness of the models across different backends.
# Let the backend determine the int32 dtype. For example, tf | ||
# requires int64 for correct device placement, whereas jax and torch | ||
# don't. | ||
dtype=int, | ||
dtype="int32", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the change above, this comment is confusing. It says "Let the backend determine the int32 dtype", but the code is now explicitly setting dtype="int32"
to ensure consistency across backends. Updating the comment to reflect this would make the code easier to understand and align with the project's style of writing backend-agnostic code.1
# Let the backend determine the int32 dtype. For example, tf | |
# requires int64 for correct device placement, whereas jax and torch | |
# don't. | |
dtype=int, | |
dtype="int32", | |
# Explicitly use "int32" for backend compatibility. Using `int` can | |
# lead to dtype mismatches between backends (e.g., int64 on | |
# TensorFlow), while "int32" is consistent across all backends. | |
dtype="int32", |
Style Guide References
Footnotes
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtype issue is coming up due to a bug that was recently introduced in Keras. We should fix it at the core library level instead of modifying the keras-hub models.
I've reported the issue for tracking here: keras-team/keras#21677
sign = ops.sign(rel_pos) | ||
mid = self.bucket_size // 2 | ||
mid = ops.cast(mid, dtype=dtype) | ||
mid = ops.cast(self.bucket_size // 2, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't seem to do anything, we can omit it.
# Explicitly use "int32" for backend compatibility. Using `int` can | ||
# lead to dtype mismatches between backends (e.g., int64 on | ||
# TensorFlow), while "int32" is consistent across all backends. | ||
dtype="int32", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't modify this to be int32
directly. The original, removed comment explains why.
In TensorFlow, "int32"
tensors are typically placed on the CPU, which can cause issues with graph generation as parts of the model would reside on different devices. "int"
usually maps to "int64"
and stays on the GPU.
# Explicitly use "int32" for backend compatibility. Using `int` can | ||
# lead to dtype mismatches between backends (e.g., int64 on | ||
# TensorFlow), while "int32" is consistent across all backends. | ||
dtype="int32", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the other comment.
def _get_rel_pos(self, num_positions): | ||
ids = ops.arange(num_positions) | ||
ids = ops.cast(ids, dtype="int") | ||
ids = ops.cast(ids, dtype="int32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the other comment.
This issue was fixed in keras-team/keras#21679. |
Fix for failing TAP presubmit tests:
Fix for failing tests on tensorflow backend:
This pull request addresses data type compatibility issues across different backends by standardizing the data type for position ID tensors and related operations to
int32
. This change ensures consistent behavior and prevents failures, particularly with the TensorFlow backend.Backend compatibility improvements:
dtype
argument forposition_ids
weights in bothclip_layers.py
andsiglip_layers.py
to use"int32"
instead ofint
, resolving failures with TensorFlow and ensuring compatibility across all backends. [1] [2] [3]rel_pos
to"int32"
in the_make_log_bucket_position
method ofdisentangled_self_attention.py
to maintain consistent data types.This pull request addresses integer dtype compatibility issues across different backends (TensorFlow, JAX, Torch) by explicitly setting the dtype to "int32" when creating position ID weights and casting relative positions. This ensures consistent behavior and prevents failures, particularly with TensorFlow.
Cross-backend dtype compatibility improvements:
dtype
argument fromint
to"int32"
in theadd_weight
calls forposition_ids
in bothclip_layers.py
andsiglip_layers.py
to ensure compatibility with all supported backends, especially TensorFlow. [1] [2] [3]"int32"
forrel_pos
in the_make_log_bucket_position
method indisentangled_self_attention.py
to standardize dtype handling.## Description of the change