[DO NOT MERGE] TransferManager: DirectoryUploader & DirectoryDownloader #3288

jterapin · 2025-08-29T19:58:26Z

Experimental prototype

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.
For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

github-actions · 2025-09-09T17:46:53Z

Detected 1 possible performance regressions:

aws-sdk-s3.gem_size_kb - z-score regression: 513.71 -> 517.5. Z-score: 15.04

mullermp

I think this is on the right path but we should break this down into chunks. I think you should first refactor file uploader/downloader to use the executor, and we should release that, then add directory upload/download after that.

mullermp · 2025-09-25T16:42:38Z

gems/aws-sdk-s3/lib/aws-sdk-s3/default_executor.rb

+        @queue = Queue.new
+        @max_threads = options[:max_threads] || 10
+        @pool = []
+        @running = true


Does running state default to true in the initializer for other implementations of this in ruby-concurrency?

Eventually yes - concurrent-ruby's implementation goes a bit deeper but I simplified our implementation. Links below (most common executors eventually inherits RubyExecutorService:

AbstractExecutorService: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/executor/abstract_executor_service.rb#L52-L54

RubyExecutorService: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/executor/ruby_executor_service.rb#L70-L72

SimpleExecutorService: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/executor/simple_executor_service.rb#L98

mullermp · 2025-09-25T16:43:59Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+module Aws
+  module S3
+    # Raised when DirectoryDownloader fails to download objects from S3 bucket
+    class DirectoryDownloadError < StandardError


By convention we were putting these in separate files right? If you want to promote the other two (multipart errors) to the files where they are used that's fine too, but let's stay consistent.

Yup, I'm planning to separate them out.

mullermp · 2025-09-25T16:45:52Z

gems/aws-sdk-s3/lib/aws-sdk-s3/multipart_file_uploader.rb

      # @option options [Integer] :thread_count (DEFAULT_THREAD_COUNT)
      def initialize(options = {})
        @client = options[:client] || Client.new
        @thread_count = options[:thread_count] || DEFAULT_THREAD_COUNT


The thread count wouldn't matter in this class anymore right?

mullermp · 2025-09-25T16:47:06Z

gems/aws-sdk-s3/lib/aws-sdk-s3/multipart_file_uploader.rb

+        parts = upload_parts(upload_id, source, file_size, options)
+        complete_upload(upload_id, parts, file_size, options)
+      ensure
+        @executor.shutdown if @executor.running? && @options[:executor].nil?


This seems odd. I would think the executor always exists because we provide a default - if nil we still provide a default. Also you would check if it were nil before you attempt to call a method on it?

Yeah, I was still trying to figure out executor but I have streamlined the executor flow a bit.

mullermp · 2025-09-25T16:49:36Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        end
+
+        download_opts = options.dup
+        @bucket = bucket


It's odd to set instance state like bucket in a method. Wouldn't you pass bucket, errors, configuration, etc, down to relevant methods and return back errors?

Yeah, I'd agree. I'm not against this being a "one-shot" class - IE, for each directory download, you must create this object again - in which case bucket/destination would be set on initialization instead.

Either way though - setting it as a member variable here is a little weird.

I streamlined a bit here so there's less of floating instance states

mullermp · 2025-09-25T16:50:05Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        downloads = process_download_queue(producer, downloader, download_opts)
+        build_result(downloads)
+      ensure
+        @executor.shutdown unless @options[:executor]


We should always assume an executor I think.

Since this is an internal/private api - I think I'd agree that its reasonable to always require an executor to be provided, and then we don't ever shut it down

gems/aws-sdk-s3/lib/aws-sdk-s3/default_executor.rb

alextwoods · 2025-09-25T17:45:38Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        downloads = process_download_queue(producer, downloader, download_opts)
+        build_result(downloads)
+      ensure
+        @executor.shutdown unless @options[:executor]


Since this is an internal/private api - I think I'd agree that its reasonable to always require an executor to be provided, and then we don't ever shut it down

alextwoods · 2025-09-25T17:47:38Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        end
+
+        download_opts = options.dup
+        @bucket = bucket


Yeah, I'd agree. I'm not against this being a "one-shot" class - IE, for each directory download, you must create this object again - in which case bucket/destination would be set on initialization instead.

Either way though - setting it as a member variable here is a little weird.

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

alextwoods · 2025-09-25T18:36:55Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        download_attempts = 0
+        completion_queue = Queue.new
+        queue_executor = DefaultExecutor.new
+        while (object = producer.object_queue.shift) != :done


Suggestion on the Producer interface - I think the object_queue should be an internal detail here. I'd lean towards having this implement enumerable, so here you would just do:

producer.each do |object| break if @abord_download # rest of code end

calling each on the producer would start it (so no need to call run) and would handle yielding objects from the internal object_queue.

Thanks for the feedback! I made some changes on the interface and I really like it! It flows pretty well :)

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

alextwoods · 2025-09-25T18:45:01Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

+
+        upload_opts = options.dup
+        @ignore_failure = upload_opts.delete(:ignore_failure) || false
+        @errors = []


Same comment as on the DirectoryDownloader - I think we need to decide if these classes are "one shot" or re-usable. If they are re-usable, then we cannot set state on the class.

Because the File uploader/downloader are NOT one shot, I guess I'd lean towards consistency here and having a DirectoryUploader that is re-useable and then I think the easiest way to manage this is to create a private class DirectoryUpload which is one-shot (ie, represents the state of a single directory upload.

As part of this - I think we would move management of the queue executor to the top level object- that is, the same queue executor would be shared across any concurrent DirectoryUploads (so that effectively a configured max_concurrent_file_uploads setting would apply across all concurrent uploads started from the same DirectoryUploader).

I made some heavy changes around this. I'm not sure if I want to abstract out one-shot DirectoryUpload yet. 🤔 open to any thoughts after you take a look at the revisions!

alextwoods · 2025-09-25T18:48:54Z

gems/aws-sdk-s3/lib/aws-sdk-s3/file_downloader.rb

+        File.rename(opts[:temp_path], destination) if opts[:temp_path]
      ensure
-        File.delete(@temp_path) if @temp_path && File.exist?(@temp_path)
+        @executor.shutdown if @executor.running? && @options[:executor].nil?


I think there is an issue with this - the FileDowloader can be used to download multiple files right (ie, without creating a new one)? so here, if we shutdown the executor on the first download call, we'll never be able to call download again.

I think we'd probably want to call shutdown only when the downloader is GC'ed (and maybe at that point, we would just call kill on it instead?)

After thinking a bit through - given that executor is always passed in at init - i think we can omit this/shutdown outside of the FileDownloader (at Object/TM level).

jterapin and others added 12 commits August 27, 2025 08:44

Initial setup

6cb37fe

Minor adjustments

d31ae4d

Directory downloader impl

74ed189

Directory uploader impl

4e8db17

Merge branch 'version-3' into tm-directory-features

098049d

Merge branch 'version-3' into tm-directory-features

8f387d2

Add default executor

7749ba5

Add running check to default executor

99f0de6

Refactor MultipartFileUploader with executor

441fa82

Fix typo in MultipartFileUploader

c792439

Update TM upload file with executor

adce496

Merge branch 'version-3' into tm-directory-features

012c2bc

jterapin added 17 commits September 10, 2025 12:50

Merge branch 'version-3' into tm-directory-features

ee9c9da

Merge from version-3

75df844

Merge branch 'version-3' into tm-directory-features

e5d3245

Merge branch 'version-3' into tm-directory-features

173f5e4

Merge branch 'version-3' into tm-directory-features

cf88ff2

Update to only spawn workers when needed

2758c4d

Update directory uploader

b92d3b3

Update directory uploader

6afb495

Update uploader

86b53e8

Merge branch 'version-3' into tm-directory-features

d587ae1

Add minor improvements to directory uploader

eae3814

Merge branch 'version-3' into tm-directory-features

14010ef

Fix specs

8ab4edc

Minor updates to multipart file uploader

face84d

Minor refactors

36a1e87

Fix options

7dd9f98

Refactor DirectoryUploader

77ab1ba

jterapin added 5 commits September 24, 2025 15:52

Merge version-3 into branch

e843137

Update multipartfileuploader

009127d

Refactor FileDownloader

39912fd

Implement Directory Downloader

f9fb117

Add TODO

d307555

mullermp reviewed Sep 25, 2025

View reviewed changes

alextwoods reviewed Sep 25, 2025

View reviewed changes

jterapin added 15 commits September 29, 2025 08:14

Merge version-3 into branch

a14649a

Feedback - update default executor

b9231e7

Refactor file downloader

d991128

Support FileDownloader changes

bc533a0

Extra updates to FileDownloader

9efc77f

Address feedback for FileUploader and MultipartFileUploader

1cc3fcf

Merge branch 'version-3' into tm-directory-features

7b6b220

Add improvements to directory uploader

45d2f5d

Update DirectoryDownloader based on feedbacks

64d481e

Minor feedback updates

2ab63fb

Merge branch 'version-3' into tm-directory-features

7af3e32

Update executor

747965f

Improve Directory Uploader

2230478

Handle failure cases correctly

cb145a0

Improve Executor

0cb35cd

[DO NOT MERGE] TransferManager: DirectoryUploader & DirectoryDownloader #3288

Are you sure you want to change the base?

[DO NOT MERGE] TransferManager: DirectoryUploader & DirectoryDownloader #3288

Uh oh!

Conversation

jterapin commented Aug 29, 2025

Uh oh!

github-actions bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mullermp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Sep 9, 2025 •

edited

Loading