Skip to content

Add deduplication pass for initializer tensors (#66) #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

AbhishekHerbertSamuel
Copy link

Summary

This PR adds a new graph transformation pass: DeduplicateInitializersPass.

It removes duplicate initializer tensors (typically model weights) based on a unique fingerprint derived from:

  • Tensor byte content (tobytes())
  • Data type (dtype)
  • Shape

All redundant initializers are removed, and nodes referencing them are updated to use the canonical (first-seen) tensor.


Implementation Details

  • Fingerprints are tracked using a dictionary: (tobytes, dtype, shape) → name
  • Redundant initializers are removed using graph.initializers.pop(...)
  • Node inputs are updated via node.replace_input_with(...) for correctness and safety

Benefits

  • Reduces memory and file size by eliminating duplicated weight tensors
  • Simplifies graph structure for downstream optimization and export

File Added

  • src/onnx_ir/passes/common/deduplicate_initializers.py

Closes

Closes #66

# Iterate over all initializers in the graph
for initializer in list(graph.initializers.values()):
key = (
initializer.const_value.tobytes(), # Content fingerprint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is memory consuming and thus highly inefficient. Consider comparing the dtype and shape first, and only compare values when you need to

from onnx_ir.passes.base import GraphTransformPass


class DeduplicateInitializersPass(GraphTransformPass):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this may be generated by some AIs. Please ensure the class names etc. are correct, and follow coding style from other files in this directory.

seen[key] = initializer.name

# Update node inputs to use the canonical initializer names
for node in graph:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to check nodes from the subgraphs too. You may use the ir.traversal recursive iterator for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @justinchuby,
I’ve addressed your feedback in the latest commit:

Optimized memory usage by grouping by (dtype, shape) before comparing tobytes()

Used iterate_graph(graph) to handle nodes in subgraphs as well

Let me know if any further changes are needed. Thanks again for the thoughtful review!

@AbhishekHerbertSamuel AbhishekHerbertSamuel force-pushed the add-deduplicate-initializers-pass branch from f99fa0c to ae8f078 Compare June 5, 2025 07:57
Copy link
Member

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s fine to use an AI for contribution. Please ensure however that the code actually works

@AbhishekHerbertSamuel
Copy link
Author

Thank you for the feedback, Justin. I'll check if it works and then only send it here.

…bgraph traversal

Address reviewer feedback:
- Optimized memory by grouping by dtype and shape before comparing values
- Used iterate_graph to handle subgraphs
- Validated on normal and subgraph models; deduplication works as expected

Signed-off-by: Abhishek Herbert Samuel <[email protected]>
@AbhishekHerbertSamuel
Copy link
Author

AbhishekHerbertSamuel commented Jun 6, 2025

Hi Justin,

Thanks again for your feedback! I've verified that the updated implementation works as intended. Here's the test setup and output: (I ran the test locally, didn't push it here)

Local file path for the test: /Users/abhishekherbertsamuel/ir-py/src/test_local_dedup.py

Test code:
import numpy as np
from onnx_ir._core import Graph, Node, Tensor, Value
from onnx_ir.passes.common.deduplicate_initializers import DeduplicateInitializersPass
def test_normal_and_subgraph_dedup():
print("\n=== TEST: Normal Graph and Subgraph Deduplication ===")

# Shared tensor content
arr = np.array([1, 2, 3])
t1 = Tensor(arr)
t2 = Tensor(arr.copy())  # clone with same content

# Main graph values
v1 = Value(name="w1", const_value=t1)
v2 = Value(name="w2", const_value=t2)

# Subgraph has its own separate Value object (same tensor, new graph-safe instance)
sub_tensor = Tensor(arr.copy())
sub_val = Value(name="w3", const_value=sub_tensor)

# Subgraph node and graph
sub_node = Node("", "Conv", inputs=[sub_val], outputs=[])
subgraph = Graph(
    inputs=[],
    outputs=[],
    nodes=[sub_node],
    initializers=[sub_val],
    name="subgraph"
)

# Main graph node
main_node = Node("", "Add", inputs=[v1, v2], outputs=[])

# Attach subgraph manually to the node (mimics nested block structure)
main_node.blocks = [subgraph]

# Construct main graph
parent_graph = Graph(
    inputs=[],
    outputs=[],
    nodes=[main_node],
    initializers=[v1, v2],
    name="main_graph"
)

print("Before Deduplication:")
print("Main Graph Initializers:", list(parent_graph.initializers.keys()))
print("Main Node inputs:", [v.name for v in main_node.inputs])
print("Subgraph Initializers:", list(subgraph.initializers.keys()))
print("Subgraph Node inputs:", [v.name for v in sub_node.inputs])

# Apply deduplication
DeduplicateInitializersPass().apply(parent_graph)

print("\nAfter Deduplication:")
print("Main Graph Initializers:", list(parent_graph.initializers.keys()))
print("Main Node inputs:", [v.name for v in main_node.inputs])
print("Subgraph Initializers:", list(subgraph.initializers.keys()))
print("Subgraph Node inputs:", [v.name for v in sub_node.inputs])

if name == "main":
test_normal_and_subgraph_dedup()

Test Screenshot: (Have uploaded it here)
Screenshot 2025-06-06 at 11 58 10 AM

If I have missed out on anything, please let me know.

With regards,
Abhishek Herbert Samuel

@AbhishekHerbertSamuel
Copy link
Author

Hi @justinchuby,

I've pushed the finalized implementation and test as separate, signed commits. The following have been addressed:

DeduplicateInitializersPass: Added under passes/common, follows repo conventions, uses (dtype, shape) → {tobytes: name} grouping for memory efficiency, and traverses all subgraphs via RecursiveGraphIterator.

Test coverage: A dedicated unittest verifies correct deduplication in the main graph and ensures subgraphs remain isolated.

Coding standards: Followed the structure and documentation style of other passes (e.g., topological_sort.py).

Commit signed: Used -s with a clean message summarizing the functionality.

I have also attached a screenshot of the unit test which passed successfully on my local copy of this repository.

Please let me know if any final changes are needed. Thanks again for your guidance and mentorship throughout this PR!

Best,
Abhishek Herbert Samuel
image

from onnx_ir.traversal import RecursiveGraphIterator


class DeduplicateInitializersPass:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class DeduplicateInitializersPass:
class DeduplicateInitializersPass(ir.passes.InPlacePass):

please subclass ir.passes.InPlacePass. You can use https://github.com/AbhishekHerbertSamuel/ir-py/blob/ef46092b5f10303bb9fe126eef0f5b44585e3b16/src/onnx_ir/passes/common/constant_manipulation.py#L23 as an example.

using RecursiveGraphIterator.
"""

def apply(self, graph: Graph) -> Graph:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement the call method. The first argument should be an ir.Model. You may use other passes in this directory as examples. Be sure to import modules only: https://google.github.io/styleguide/pyguide.html#224-decision

import unittest
import numpy as np

from onnx_ir._core import Tensor, Value, Node, Graph
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinchuby
Copy link
Member

Please feel free to ask questions when you are going through the code base or need help understanding parts of the code. It would be helpful to take a look at other existing passes and usages to ensure they are implemented in a similar style.

@justinchuby
Copy link
Member

My concern with this pass in particular is that we are using the full bytes in the look up table. This is memory intensive. I wonder if there is a good (efficient) hash method that can be apply to the bytes content, and use the hash value in the look up table. Only when the hash matches do we compare the actual bytes.

@AbhishekHerbertSamuel
Copy link
Author

Hi @justinchuby,
Thanks a lot for your detailed feedback :)

I’ll update the class to inherit from ir.passes.InPlacePass as suggested and move the main logic into the call method, following the repo’s conventions (like in constant_manipulation.py).
I’ll also change the test imports to follow the module-only import guideline — thanks for pointing me to the correct example!

Regarding the memory concern:
You're absolutely right — using tobytes() directly is memory-intensive. I’ll switch to using sha256 to hash the tensor bytes first, which helps group potential duplicates quickly. Then, to avoid any risk of false positives from rare hash collisions, I’ll still compare the full bytes only when the hashes match. This keeps things memory-efficient while still being safe and accurate. Thanks again for the suggestion!

Will push the changes shortly. Please let me know if I missed anything else. Appreciate your guidance!

Warm regards,
Abhishek Herbert Samuel

- Implemented DeduplicateInitializersPass to remove redundant initializers
  with identical shape, dtype, and values within individual graphs.
- Ensured deduplication is confined to the same graph scope (no cross-subgraph merging).
- Added unit tests covering:
  - Exact duplicates
  - Different shapes/dtypes
  - Scalars
  - Multiple duplicates
  - Non-deduplicable distinct values
- Removed subgraph-related tests due to ONNX serialization behavior omitting their initializers.

Signed-off-by: Abhishek Herbert Samuel <[email protected]>
@AbhishekHerbertSamuel
Copy link
Author

Hi @justinchuby,
I've pushed the finalized version of DeduplicateInitializersPass along with a focused set of unit tests. The current tests comprehensively validate deduplication behavior across various scenarios—shape, dtype, scalar, and value uniqueness.

Tests involving subgraph initializers were removed, as ONNX drops those during serialization, making them unreliable to assert against. Let me know if you'd like a different strategy for subgraph coverage.

Thanks again for your guidance throughout!

Warm regards,
Abhishek Herbert Samuel

Copy link

codecov bot commented Jun 9, 2025

Codecov Report

Attention: Patch coverage is 76.47059% with 12 lines in your changes missing coverage. Please review.

Project coverage is 73.71%. Comparing base (6656096) to head (8f6dbdc).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../onnx_ir/passes/common/deduplicate_initializers.py 76.47% 5 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #67      +/-   ##
==========================================
+ Coverage   73.57%   73.71%   +0.14%     
==========================================
  Files          37       38       +1     
  Lines        4492     4543      +51     
  Branches      902      915      +13     
==========================================
+ Hits         3305     3349      +44     
- Misses        858      861       +3     
- Partials      329      333       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby justinchuby self-assigned this Jun 9, 2025
@AbhishekHerbertSamuel AbhishekHerbertSamuel force-pushed the add-deduplicate-initializers-pass branch from a00be10 to 6b3e0b7 Compare June 11, 2025 08:40
@AbhishekHerbertSamuel
Copy link
Author

AbhishekHerbertSamuel commented Jun 11, 2025

Hi @justinchuby, tried to make the remaining changes based on the CI workflow's results. While my tests ran locally and are fine (6/6 correct), the codecov bot and workflows were not triggered here automatically as it was the previous time. Is that ok or a sign of an error?

With regards,
Abhishek Herbert Samuel

break # only break when deduplication is successful
else:
# no matching content found: append as a new entry
group[content_hash].append((initializer.name, content))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may store the values instead in the hash table, and use const_val.tobytes() for comparison. This way the bytes do not stay in the memory or take up space?

Suggested change
group[content_hash].append((initializer.name, content))
group[content_hash].append(const_val)

if initializer.name is not None:
graph.initializers.pop(initializer.name)
break # only break when deduplication is successful
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent?

@justinchuby
Copy link
Member

Thank you! could you also take a look at the lint errors?

@AbhishekHerbertSamuel
Copy link
Author

Hi @justinchuby, will work on it and send the corrected code here:)

@AbhishekHerbertSamuel
Copy link
Author

image
Hi @justinchuby, have committed it with the lint issues and byte optimization addressed. Please find attached the screenshot here, showing the absence of lint errors in my local system.

Do let me know if there are any other changes :)

@@ -27,16 +27,16 @@ class DeduplicateInitializersPass(onnx_ir.passes.InPlacePass):

def call(self, model: onnx_ir.Model) -> onnx_ir.passes.PassResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you import onnx_ir as ir and use it as such, to stay consistent with the rest of the code base?

@justinchuby
Copy link
Member

Thanks! I will do a more detailed review soon

@AbhishekHerbertSamuel
Copy link
Author

Sure @justinchuby, will fix it and maintain code consistency :)

continue # Skip if initializer has no constant value
dtype = const_val.dtype.name
shape = tuple(int(dim) if isinstance(dim, int) else -1 for dim in const_val.shape)
content = const_val.tobytes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very slow on big tensors. Is there a way to compare rawdata directly to save some time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion it seems a good idea to avoid comparing big tensors at all. @AbhishekHerbertSamuel Could you limit the size to 1024 values? You can find the element count of the tensor with tensor.size

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threshold 1024 is a very small number in production, there is case where model size is reduced from 150MB to 100MB by tying weights, maybe we can parameterize this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when comparing those constants, we can compare shape and dtype first, instead of comparing the raw data directly, this will save tremendous time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree that’s a good idea

@justinchuby justinchuby requested a review from Copilot June 13, 2025 03:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new graph transformation pass to remove duplicate initializer tensors by hashing their content and updating node inputs to the canonical tensor.

  • Introduces DeduplicateInitializersPass to group initializers by dtype, shape, and content hash, remove duplicates, and rewrite node inputs.
  • Implements content‐based deduplication with SHA-256 fingerprinting and exact byte comparison to avoid collisions.
  • Provides unit tests covering identical, shape/dtype differences, scalar, multiple duplicates, and unique-value scenarios.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/onnx_ir/passes/common/deduplicate_initializers.py Implements new pass to deduplicate initializer tensors
src/onnx_ir/passes/common/deduplicate_initializers_test.py Adds unit tests for various deduplication scenarios
Comments suppressed due to low confidence (3)

src/onnx_ir/passes/common/deduplicate_initializers.py:31

  • [nitpick] The name 'seen' is ambiguous; consider renaming it to something more descriptive like 'initializer_groups' or 'fingerprint_groups'.
seen: dict[tuple[str, tuple[int, ...]], dict[str, list[onnx_ir.Value]]] = {}

src/onnx_ir/passes/common/deduplicate_initializers.py:33

  • [nitpick] The variable 'name_map' could be renamed to 'duplicate_to_canonical' or similar to clarify its purpose.
name_map = {}

src/onnx_ir/passes/common/deduplicate_initializers.py:67

  • There are no tests covering deduplication behavior within nested subgraphs; consider adding a test case to verify that duplicate initializers in subgraphs are also handled.
for node in onnx_ir.traversal.RecursiveGraphIterator(graph):

@AbhishekHerbertSamuel
Copy link
Author

Thank you @xadupre @inisis @justinchuby for the feedback. Will make the requested changes and ensure that the PR is ready to be merged.

…nd size limit

- Avoids comparing large tensors >1024 elements to reduce performance overhead
- Compares shape and dtype before accessing tensor content
- Adds test coverage for subgraph deduplication (If node branches)
- Passes all linters: ruff, mypy, editorconfig

Signed-off-by: Abhishek Herbert Samuel <[email protected]>
@AbhishekHerbertSamuel
Copy link
Author

@xadupre @justinchuby @inisis I have made the requested changes. Please check and let me know if it's ready for merging or if other changes need to be made prior to that. Thank you once again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a tensor de-duplication pass
4 participants