-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH some steps to make cloudpickle dynamic function/classes more deterministic #524
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #524 +/- ##
==========================================
+ Coverage 96.08% 96.12% +0.04%
==========================================
Files 3 3
Lines 562 568 +6
Branches 116 123 +7
==========================================
+ Hits 540 546 +6
Misses 11 11
Partials 11 11 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a pass of feedback. I included some comments on my analysis of the xfailed test about string literals in class attributes. Interning is probably not a safe solution in this case...
Co-authored-by: Olivier Grisel <[email protected]>
I found a better solution to string interning: prevent memoization for problematic strings that might be interned or not. For the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More feedback on the new strategy.
# to ensure deterministic pickles, that depends wheter the name is interned | ||
# or not. This is not guaranteed for reconstructed dynamic function, so we make | ||
# sure it is never interned. | ||
"__name__": "".join(func.__name__), | ||
"__qualname__": func.__qualname__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why we don't need this treatment for the __qualname__
value nor for the keys of func.__dict__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, at least for the latter, there are still things to solver. I will push a slight change to the test to show what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with string interning is mostly due to collision between the string defined in two different places: typically method names that are interned as class attributes, vs the dynamic func name.
So I would say this won't happen for __qualname__
as this is only present here but maybe I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a changelog entry?
There seems to be a leftover call to sys.intern
(see below).
Also, I would rather avoid duplicating variations of the same inline comment all over the place and instead make them all point to the inline comment of _class_setstate
to limit redundancy and make it more maintainable.
Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the missing changelog entry.
Thanks @tomMoral! |
cloudpickle
stream, leads to the same pickle file:One case seems unsolvable: when updating the class state, all
attrname
for a class are interned. This is why we are also interning the dynamic func names, to be consistent between their name and the method names (all interned).But if some values in the pickle stream (anywhere) have the same value as one
attrname
, then these two strings which are the same in the original pickle file become different in the unpickled objects. Typically, pickling the following original dynamic class results in memorization ofjoin
between the method name and the argument value, while pickling the unpickled class leads to no memorization:arg_1 = "join"
value is not interned. In contrast, thejoin
method name is.Not sure if there is a solution for this one.
Note that this test can be run locally with
pytest -vv --runxfail -k determinist_subworker_str_interning
Related issues: #510, #453.