Skip to content

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LewyMaster
Copy link

EmergePlanner optimization solution as a task for job recruiting process.

Task:
`implementation of

# completely flush the upload buffer

is a little bit naiive, because (with 5MB minimum part size) if we mark
"u" as a megabyte of data to upload
"c" as a megabyte of data to copy
"d" as a megabyte of data that will be downloaded and uploaded
then for "input", "master" pattern will be executed by the current version of the code while ideally in such case the "ideal" pattern would be used:

input: uuu cccccccccc uu ccccc ...

master: uuu ddcccccccc uu ddddd ...

ideal: uuu ddcccccddd uu ccccc ...

This is because Planner is only considering to bundle an upload with a part of a copy from the right side of the upload and not also from the left. Appropriately to the quality of the code in this project, tests should be provided and the code of the planner should remain readable. The behavior of the planner should strictly improve and the changes done to it should not have any negative impact on any input case (beyond marginally higher CPU consumption during planning).
`

@mzukowski-reef
Copy link
Contributor

mzukowski-reef commented Mar 26, 2025

we were discussing back then splitting implementations for streaming and "full knowledge" planner - then we would be able to plan better in such edge cases - but this is only for small parts - so we decided to keep single implementation and leave such edge cases suboptimal - in most applications parts will be larger than 5MB and there is no problem - if there is valid usecase then I think dedicated implementation of planner should be produced for "full knowledge" case - modifying streaming logic is nightmare :)

Copy link
Collaborator

@ppolewicz ppolewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read it in detail, because it doesn't have a single test yet. The emerger planner has very nice tests, if I remember correctly - could you please add one showing the case described in task description, make sure that it does what it should do?

copy_parts = self._get_copy_parts(
current_intent,
start_offset=upload_buffer.end_offset,
end_offset=current_end,
end_offset=current_end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use trailing commas on multi-line statements so that if someone wants to add an argument, the diff consists of just one "added" line and not 2 added 1 removed

Comment on lines +208 to +210
upload_buffer = UploadBuffer(current_end)
else:
upload_buffer = UploadBuffer(current_end)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
upload_buffer = UploadBuffer(current_end)
else:
upload_buffer = UploadBuffer(current_end)
upload_buffer = UploadBuffer(current_end)

Comment on lines +212 to +213
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants