-
Notifications
You must be signed in to change notification settings - Fork 13
Implement readline(s) for all stream-seek classes
#194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement readline(s) for all stream-seek classes
#194
Conversation
|
Pinging @agoscinski @GeigerJ2 @khsrali for a review, thanks! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #194 +/- ##
==========================================
- Coverage 99.63% 97.91% -1.73%
==========================================
Files 8 8
Lines 1931 2060 +129
==========================================
+ Hits 1924 2017 +93
- Misses 7 43 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
mypy is complaining, but not sure if we actually need the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining mypy error: So the PackedObjectReader accepts fhandlers of type StreamSeekBytesType which is defined as
StreamSeekBytesType = Union[
BinaryIO,
'PackedObjectReader',
'CallbackStreamWrapper',
'ZlibLikeBaseStreamDecompresser',
]
Since PackedObjectReader readline method uses the fhandler's readline method, the type checker correctly gives you the error that you need to implement this function for these file handlers as well. If you consider not to implement the methods in the other classe, you at least need to do an isinstance check and error out in case it is a type that does not support readline.
| return b'' | ||
|
|
||
| readline_size = remaining if (size is None or size < 0) else min(size, remaining) | ||
| line = self._fhandle.readline(readline_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this reads only up until some terminator, at least for BinaryIO this is b'\n', even though it can be changed for some file handler, the readline_size is just a maximum value up until it reads. I am not sure if this is the intended behavior, since my understanding of the PackedObjectReader is that it contains already "one line". It seems like you just make reading slower since you randomly chunk your data whenever the you find a b'\n' which has no meaning for arbitrary packed objects (unlike your text example in the tests). What is more meaningful in terms of packed data is to have some maximum number to limit memory consumption (for example 65536 bytes for 64KB). You would loose however the behavior for texts where the terminator makes sense, which IMO is not important for this abstraction but maybe I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex!
I generally agree and was also thinking if b'\n' really makes sense, or something "fixed". In the end, I followed the "new-line" approach, based on what I saw for the readline method of the BytesIO / BufferedReader that would be used when the repository isn't packed https://github.com/python/cpython/blob/main/Lib/_pyio.py#L509-L518. Moreover, think that I was mostly biased by the pickle application of that. I just had a quick look there and it seems that pickle is also most relying on b'\n' in the readline method.
But again, I agree that this might not be the most reasonable approach for such a packed file.
Sure. My comment was just to say if it's necessary to implement those (from our practical perspective). In case we would agree to not do it for now, and leave this for another PR, I'd of course add such a check, totally agree on that. |
This commit adds a `readline` method to the `PackedObjectReader`, which was necessary to make it comatible with pickle. The current version causes problems in `aiida-workgraph` and `aiida-pythonjob`, see aiidateam#174.
for more information, see https://pre-commit.ci
8f66e0d to
caceab7
Compare
|
Notes from discussion with @giovannipizzi:
|
for more information, see https://pre-commit.ci
readline method to PackedObjectReaderreadline(s) to all stream classes
readline(s) to all stream classesreadline(s) for all stream-seek classes
for more information, see https://pre-commit.ci
Adding a
readlinemethod toPackedObjectReaderto address #174. This is important as it causes errors inaiida-workgraphandaiida-pythonjob.According to the test and some manual comparison, it seems that this is working fine. Would be great if we could quickly iterate and get this in, as @superstar54 was also keen in fixing this for the
WorkGraph/PythonJob