You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nothing super specific here, but wanted to brain dump and get a broader discussion going.
As part of my CMIP work my recipes often download many files from sometimes slow servers. This seems to take very long and frequently scales up to many workers, which increases cost.
Looking at the Dataflow resource metrics
it seems like there is one worker spun up per file? There is a spike in CPU useage initially, but then the worker idles around mostly.
Can we maybe modify the level of concurrency here and have one worker download/cache multiple files via threads to improve performance and/or save costs?
Can we maybe modify the level of concurrency here and have one worker download/cache multiple files via threads to improve performance and/or save costs?
Nothing super specific here, but wanted to brain dump and get a broader discussion going.
As part of my CMIP work my recipes often download many files from sometimes slow servers. This seems to take very long and frequently scales up to many workers, which increases cost.
Looking at the Dataflow resource metrics
it seems like there is one worker spun up per file? There is a spike in CPU useage initially, but then the worker idles around mostly.
Can we maybe modify the level of concurrency here and have one worker download/cache multiple files via threads to improve performance and/or save costs?
Perhaps something to chat about on Thu @ranchodeluxe @moradology ?
The text was updated successfully, but these errors were encountered: