EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

LewyMaster · 2025-03-26T05:18:07Z

EmergePlanner optimization solution as a task for job recruiting process.

Task:
`implementation of

b2-sdk-python/b2sdk/transfer/emerge/planner/planner.py

Line 191 in 10a5e72

# completely flush the upload buffer

is a little bit naiive, because (with 5MB minimum part size) if we mark
"u" as a megabyte of data to upload
"c" as a megabyte of data to copy
"d" as a megabyte of data that will be downloaded and uploaded
then for "input", "master" pattern will be executed by the current version of the code while ideally in such case the "ideal" pattern would be used:

input: uuu cccccccccc uu ccccc ...

master: uuu ddcccccccc uu ddddd ...

ideal: uuu ddcccccddd uu ccccc ...

This is because Planner is only considering to bundle an upload with a part of a copy from the right side of the upload and not also from the left. Appropriately to the quality of the code in this project, tests should be provided and the code of the planner should remain readable. The behavior of the planner should strictly improve and the changes done to it should not have any negative impact on any input case (beyond marginally higher CPU consumption during planning).
`

…rts in the upload buffer

mzukowski-reef · 2025-03-26T08:58:52Z

we were discussing back then splitting implementations for streaming and "full knowledge" planner - then we would be able to plan better in such edge cases - but this is only for small parts - so we decided to keep single implementation and leave such edge cases suboptimal - in most applications parts will be larger than 5MB and there is no problem - if there is valid usecase then I think dedicated implementation of planner should be produced for "full knowledge" case - modifying streaming logic is nightmare :)

ppolewicz

I didn't read it in detail, because it doesn't have a single test yet. The emerger planner has very nice tests, if I remember correctly - could you please add one showing the case described in task description, make sure that it does what it should do?

ppolewicz · 2025-03-27T15:06:29Z

b2sdk/transfer/emerge/planner/planner.py

                    copy_parts = self._get_copy_parts(
                        current_intent,
                        start_offset=upload_buffer.end_offset,
-                        end_offset=current_end,
+                        end_offset=current_end


Please use trailing commas on multi-line statements so that if someone wants to add an argument, the diff consists of just one "added" line and not 2 added 1 removed

ppolewicz · 2025-03-27T15:06:47Z

b2sdk/transfer/emerge/planner/planner.py

+                        upload_buffer = UploadBuffer(current_end)
+                    else:
+                        upload_buffer = UploadBuffer(current_end)


Suggested change

upload_buffer = UploadBuffer(current_end)

else:

upload_buffer = UploadBuffer(current_end)

upload_buffer = UploadBuffer(current_end)

ppolewicz · 2025-03-27T15:07:17Z

b2sdk/transfer/emerge/planner/planner.py

+                        # we "borrow" a fragment of current intent to upload buffer
+                        # to fill it to minimum part size


Suggested change

# we "borrow" a fragment of current intent to upload buffer

# to fill it to minimum part size

# we "borrow" a fragment of current intent to upload buffer

# to fill it to minimum part size

EmergePlanner: Optimize _get_emerge_parts() for better grouping of pa…

3a47e5d

…rts in the upload buffer

ppolewicz reviewed Mar 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

LewyMaster commented Mar 26, 2025

mzukowski-reef commented Mar 26, 2025 •

edited

Loading

ppolewicz left a comment

ppolewicz Mar 27, 2025

ppolewicz Mar 27, 2025

ppolewicz Mar 27, 2025

		# we "borrow" a fragment of current intent to upload buffer
		# to fill it to minimum part size

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

Are you sure you want to change the base?

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

Conversation

LewyMaster commented Mar 26, 2025

input: uuu cccccccccc uu ccccc ...

master: uuu ddcccccccc uu ddddd ...

ideal: uuu ddcccccddd uu ccccc ...

mzukowski-reef commented Mar 26, 2025 • edited Loading

ppolewicz left a comment

Choose a reason for hiding this comment

ppolewicz Mar 27, 2025

Choose a reason for hiding this comment

ppolewicz Mar 27, 2025

Choose a reason for hiding this comment

ppolewicz Mar 27, 2025

Choose a reason for hiding this comment

mzukowski-reef commented Mar 26, 2025 •

edited

Loading