-
Notifications
You must be signed in to change notification settings - Fork 62
EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…rts in the upload buffer
we were discussing back then splitting implementations for streaming and "full knowledge" planner - then we would be able to plan better in such edge cases - but this is only for small parts - so we decided to keep single implementation and leave such edge cases suboptimal - in most applications parts will be larger than 5MB and there is no problem - if there is valid usecase then I think dedicated implementation of planner should be produced for "full knowledge" case - modifying streaming logic is nightmare :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read it in detail, because it doesn't have a single test yet. The emerger planner has very nice tests, if I remember correctly - could you please add one showing the case described in task description, make sure that it does what it should do?
copy_parts = self._get_copy_parts( | ||
current_intent, | ||
start_offset=upload_buffer.end_offset, | ||
end_offset=current_end, | ||
end_offset=current_end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use trailing commas on multi-line statements so that if someone wants to add an argument, the diff consists of just one "added" line and not 2 added 1 removed
upload_buffer = UploadBuffer(current_end) | ||
else: | ||
upload_buffer = UploadBuffer(current_end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload_buffer = UploadBuffer(current_end) | |
else: | |
upload_buffer = UploadBuffer(current_end) | |
upload_buffer = UploadBuffer(current_end) |
# we "borrow" a fragment of current intent to upload buffer | ||
# to fill it to minimum part size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# we "borrow" a fragment of current intent to upload buffer | |
# to fill it to minimum part size | |
# we "borrow" a fragment of current intent to upload buffer | |
# to fill it to minimum part size |
EmergePlanner optimization solution as a task for job recruiting process.
Task:
`implementation of
b2-sdk-python/b2sdk/transfer/emerge/planner/planner.py
Line 191 in 10a5e72
is a little bit naiive, because (with 5MB minimum part size) if we mark
"u" as a megabyte of data to upload
"c" as a megabyte of data to copy
"d" as a megabyte of data that will be downloaded and uploaded
then for "input", "master" pattern will be executed by the current version of the code while ideally in such case the "ideal" pattern would be used:
input: uuu cccccccccc uu ccccc ...
master: uuu ddcccccccc uu ddddd ...
ideal: uuu ddcccccddd uu ccccc ...
This is because Planner is only considering to bundle an upload with a part of a copy from the right side of the upload and not also from the left. Appropriately to the quality of the code in this project, tests should be provided and the code of the planner should remain readable. The behavior of the planner should strictly improve and the changes done to it should not have any negative impact on any input case (beyond marginally higher CPU consumption during planning).
`