Skip to content

Conversation

@samiuc
Copy link
Contributor

@samiuc samiuc commented Dec 18, 2025

During our testing, we found that the memory grows unbounded when processing large datasets, eventually causing OOM errors caused by the PIL Image objects opened during dataset iteration are never closed. Images opened via Image.open() in builders and decoded via Features_Image().decode_example() when loading from parquet accumulate in memory.

So the code changes in this PR add an auto-close PIL Images in as_record_dict() after serialization. Added _close_images()
method to DatasetRecord and DatasetRecordWithPrediction that releases image resources. Also added try/finally blocks in builders (pixparse, funsd, xfund, file_dataset) to close images after local processing.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

DCO Check Passed

Thanks @samiuc, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Dec 18, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@samiuc samiuc changed the title PIL Image Memory Leaks in Dataset Builders fix: PIL Image Memory Leaks in Dataset Builders Dec 18, 2025
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: a592458
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: ac622cc
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: 3648bd9

Signed-off-by: samiuc <[email protected]>
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: a592458
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: ac622cc
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: 3648bd9
I, samiuc <[email protected]>, hereby add my Signed-off-by to this commit: c05b85e

Signed-off-by: samiuc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants