-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: s3 transfer manager v2 #3079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: s3 transfer manager v2 #3079
Conversation
This is an initial phase for the s3 transfer manager v2, which includes: - Progress Tracker with a default Console Progres Bar. - Dedicated Multipart Download Listener for listen to events specificly to multipart download. - Generic Transfer Listener that will be used in either a multipart upload or a multipart download. The progress tracker is dependant on the Generic Transfer Listener, and when enabled it uses the same parameter to be provided as the progress tracker. This is important because if there is a need for listening to transfer specific events and also track the progress then, a custom implementation must be done that incorporate those two needs together, otherwise one of each other must be used. - Single Object Download - Multipart Objet Download This initial implementation misses the test cases.
- Refactor set a single argument, even when not exists, in the console progress bar. - Add a specific parameter for showing the progress rendering defaulted to STDOUT. - Add test cases for ConsoleProgressBar. - Add test cases for DefaultProgressTracker. - Add test cases for ObjectProgressTracker. - Add test cases for TransferListener.
- Add test cases for multipart download listener.
- Add a trait to the MultipartDownloader implementation to keep the main implementatio cleaner. - Add test cases for multipart downloader, in specific testing part and range get multipart downloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good on the first pass- there are some nits like function braces needing newlines, new files needing newlines, naming conventions, etc. also had some questions about design
Refactor: - Moves opening braces into a new line. - Make requestArgs an optional argument. - Remove unnecessary traits. - Use traditional declarations. Adds: - Download directory feature.
Refactor: - Add a message placeholder for progress status. For example in case of errors. Adds: - Upload feature, missing multipart functionality.
- Add upload directory feature
- Add a dedicated multipart upload implementation - Add transfer progress to multipart upload - Add upload directory with the required options. - Create specific response models for upload, and upload directory. - Add multipart upload test cases. - Fix transfer listener completation eval.
Short namespace from `Aws\S3\Features\S3Transfer` to `Aws\S3\S3Transfer`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments- I think a few from the last round were left addressed also. I'd do another check for function opening braces (needing to be moved to a new line) and new files that are missing a newline at the end. More test classes needed as well but I'm assuming those are on the way
- Implement progress tracker based on SEP spec. - Add a default progress bar implementation. - Add different progress tracker formats: -- Plain progress format: [|progress_bar|] |percent|% -- Transfer progress format: [|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit| -- Colored progress format: |object_name|:\n\033|color_code|[|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit| |message|\033[0m - Add a default single progress tracker implementation. - Add a default multi progress tracker implementation for tracking directory transfers. - Include tests unit just for console progress bar.
- Fixes current test cases for: - MultipartUploader - MultipartDownloader - ProgressTracker
- Remove progress bar color enum since the colors were moved into the specific format that requires them.
TransferListener must be tested from the implementations that extends and use this abstract class.
Add nullable type to listenerNotifier property in the MultipartUploader implementation.
- Tests for MultiProgressTracker - Tests for SingleProgressTracker - Tests for ProgressBarFormat - Tests for TransferProgressSnapshot - Tests for TransferListenerNotifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better- still needing some more unit tests, along with integ tests. Left some comments and nits on formatting. It seems each new file is missing a newline so I'd check those as well
- Refactor code to address some styling related feedback. - Add upload and uploadDirectory unit tests.
- Fix MultipartUpload tests by increasing the part size from 1024 to 10240000 so it gets between the allowed part size range 5MB-5GBs. - Rename tobe to to_be in the progress formatting.
- Add download tests - Add download directory tests - Minor naming refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just a few nits this time around. Still needing integ tests- will do another round of reviews once those are up.
src/S3/S3Transfer/Progress/ColoredTransferProgressBarFormat.php
Outdated
Show resolved
Hide resolved
src/S3/S3Transfer/Progress/ColoredTransferProgressBarFormat.php
Outdated
Show resolved
Hide resolved
- Add upload integ tests for: - Single uploads - Multipart uploads - Checksum in single uploads - Checksum in multipart uploads - Add download integ tests for: - Single downloads - Multipart downloads
- Add integ tests for directory uploads - Add integ tests for directory downloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, but the most important requests are: adding upload()
and download()
methods on the uploader and downloader classes that call promise()
(similar to the old implementation)and testing the resolvesOutsideTargetDirectory
logic.
I would do an audit of hard-coded values that can be moved to classes, line length (max 85 char) and all new files that do not end with a newline.
- Move some fixed values out of the methods into consts. - Address a line exceeded 80 chars. - Declare keys used across different implementations as consts.
- Fix keys declaration in TransferListener.php - Make use of DIRECTORY_SEPARATOR const instead of hardcoding `/`
- Some implementations using TransferListener were missing the import statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the prior comments I "un-resolved" may be related to the new diff view I've been using. let me know if you're seeing any discrepancies
* | ||
* @return PromiseInterface | ||
*/ | ||
public function download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the wrong entry point for the comment, but I saw we had recursive directory upload functionality. We should have that for downloads as well per the spec, but I don't think that's a hard requirement
- Make data model final - Refactor test cases to use correct parameters for S3TransferManager APIs
- Fix S3TransferManagetContext.php to use the correct parameter expected in the different APIs exposed by S3TranserManager - Refactor some formatting styling.
- Added empty line at the end of files. - Add a more descriptive documentation in $failsWhenDestinationExists. - Remove unnecessary break line.
- Remove spaces between union type definitions - Add documentation for config parameters in UploadDirectoryRequest. - Add missing new lines in a few places.
- Rename upload and download request args. - Refactor tests to clean up resources specifically in the finally block instead of using cleanUpFns. - Add Throwable and Result type. - Make parameter optional by adding a default value.
- Source should have been provided as null when the bucket and key are already provided as part of the download request args. - When checksum type will be resolved to FULL_OBJECT the only supported algorithms are CRC family.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! Another couple of comments
847cda0
to
e7842ed
Compare
- Fix checksum parameter in the request should have been ChecksumCRC32 and no ChecksumCrc32. - Validates next part fetched in a multipart download operation is sequentially correct. - Fix checksum calculation in multipart parts processing. It should have been base64 encoded and the hash_final should have been the binary. - Prevents parts uploading when there was a failure and its running with concurrency. For example, when running in concurrency parts may not be uploaded in order and a failure in one part was not preventing other parts from being uploaded. This fix prevents any part from being uploaded after a failure happened. - Forces multipart_download_type parameter to be lowercase, even when provided as upper case. - Add tests to validate input fields are copied in the different multipart upload operations. - Add tests to validate IfMatch is present in subsequent part or range get requests. - Add test runners for upload and download for modeled test cases.
- Expected part count validation was added and hence in the tests where not part counts were provided a failure was happening. - Replace rmdir by the clean up directory helper method from the TestUtility implementation.
- Rename Exceptions to Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good- a few more nits, but I don't see any reason not to approve once those are addressed
- Remove full object checksum calculate since it is not recommended. - Address some styling issues.
- Make some statements multilines. - Add comments describing functionality.
- Fixes exceptions must be returned and not thrown from upload and download directory. - Returns how many objects were transferred and failed from upload or download directory operations when there are failures. - Handles circular folder traversal when following symbolic links. - Add max_concurrency config for upload and download directory APIs. - Remove s3_delimiter config parameter from download directory API. - Add modeled test cases runner for upload and download directory operations.
abstract protected function getTotalSize(): int; | ||
|
||
/** | ||
* @param ResultInterface $result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docblock or return type might need an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the update about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be tests for multipart uploads with custom checksums/checksum algorithms testing different checksum types
|
||
if (isset($headers['ChecksumAlgorithm'])) { | ||
// Checksum injection when expected to succeed at checksum validation | ||
// This is needed because the checksum in the test is wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which checksums are wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checksum in the modeled test case.
Here is an example of an expected response defined here:
"response": {
"status": 200,
"headers": {
"Content-Length": "8388608",
"Content-Range": "bytes 0-8388607/8388608",
"PartsCount": 1,
"ETag": "\"a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6\"",
"ChecksumAlgorithm": "CRC32",
"ChecksumCRC32": "abcdef12"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about it is wrong?
* | ||
* @return S3Client | ||
*/ | ||
private function getS3ClientMock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this is hard to follow- Is there a reason UsesServiceTrait
can't be used for mocking service clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this way more easier for me due to the way I did the tests. getS3ClientMock will just return a mocked s3 client with either a default implementation for getCommand
and executeAsync
commands when they are not provided as parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding- why would we not want getCommand
and executeAsync
available on each mock client?
GetTestClient()
on UsesServiceTrait
implements this functionality as far as I can tell. And sequential results are added by using AddMockResults()
* | ||
* @return bool True if all expected words are found in order, false otherwise | ||
*/ | ||
private function assertEachWordMatchesTheErrorMessage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the error messages so different that assertEqualsIgnoringCase
and a few trims or some adjustments to our error messages won't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually is not for handling message that differs in the casing, instead it validates if an expected message is found in the exception message ignoring any intermediate words.
For example, the following scenario will resolve to true:
// If expected
$expected = "There was an issue downloading file";
// If message gotten
$gotten = "There was an issue downloading the file `/path/message/` in the operation";
// Then, it will resolve to true
assertEachWordMatchesTheErrorMessage($expected, $gotten)
Why it will resolve to true?
- Because the following worlds were found in order within the gotten message:
$expected = ["There", "was", "an", "issue", "downloading", "file"]
$gotten = ["There", "was", "an", "issue", "downloading", "the `ignored`", "file", ....]
Why this was needed?
- Because the expected messages defined in the modeled test cases do not match the messages we throw from those API operations, however, I found that some of them matches in that way.
However, if you have a better idea I am open to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see us use a combination of ExpectException()
and ExpectExceptionMessage()
(passes on a partial match) when possible, or if we're catching an exception, we can use AssertStringContainsIgnoringCase
, along with adjusting our exception messages to match the modeled error messages more closely
- Avoid allocating memory for each part to be uploaded, instead its read from the file by creating a new file handle that reads from that specific offset from the file. - Improves config validation in S3TransferManager - Renamed MultipartDownloader to AbstractMultipartDownloader
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.