Adds FutureAdapter that delegates executions to a threadpool for parallelization #1264

skrawcz · 2025-01-01T06:18:45Z

Taking inspiration from #1263, I implemented
a similar adapter to how async works.

We get away with this because we don't encounter
SERDE boundaries.

If you run the example DAG you'll see that:

it is parallelized as it should be
you can use caching and the tracking adapter

Rough edges:

doesn't support DAGs with parallelizable / collect.
haven't tested it with other adapters

Changes

adds adapter that lazily enables parallelism via threadpool executor
docs
example

How I tested this

runs locally using the example

See this image from the tracker showing that things are running in parallel:

Notes - TODO:

unit tests
example against async & comparison.
~~- [ ] example uses materializers & caching to exercise things.~~
check error experience

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

Taking inspiration from #1263, I implemented a similar adapter to how async works. We get away with this because we don't encounter SERDE boundaries. If you run the example DAG you'll see that: 1. it is parallelized as it should be 2. you can use caching and the tracking adapter Rough edges: - haven't tested this extensively, but seems to just work. - need to add tests for it & docs, etc.

elijahbenizzy

Mark as experimental IMO

elijahbenizzy · 2025-01-01T17:58:49Z

hamilton/plugins/h_threadpool.py

+        :return: the result of the execution of the graph.
+        """
+        for k, v in outputs.items():
+            if isinstance(v, Future):


I think you want concurrent.futures.as_completed(...). Can't convince myselfe that we're not deadlocking in certain cases, but I think the fact that we're doing topological order should be good enough...

I don't think this matters here. Since this wont block anything executing in a thread...

It's IMO slightly cleaner, but yeah, it'll not return until the slowest does regardless. Might also not want to mutate the outputs dictionary (copying is cleaner). But yes, nits.

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 037c915 in 1 minute and 12 seconds

More details

Looked at 501 lines of code in 9 files
Skipped 2 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. tests/plugins/test_h_threadpool.py:5

Draft comment:
Consider adding tests for edge cases and error handling, such as:
Submitting a function that raises an exception.
Handling of non-Future objects in build_result.
Using FutureAdapter with no max_workers specified.
Reason this comment was not posted:
Confidence changes required: 50%
The test file is missing tests for edge cases and error handling.

2. examples/parallelism/lazy_threadpool_execution/run.py:11

Draft comment:
Typo in filename: "my_funtions.png" should be "my_functions.png".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
While this is a real typo, it's a very minor issue in an example file. The typo would be immediately obvious when the code runs since the generated file would have a misspelled name. It's not a logic error or code quality issue. According to the rules, we shouldn't make comments about obvious or unimportant issues.
The typo could potentially cause confusion for users following the example, and inconsistent filenames across examples might be worth fixing for documentation purposes.
While consistency is good, this is still a very minor cosmetic issue that would be immediately apparent when running the code, and fixing it isn't critical to the code's functionality.
The comment should be deleted as it points out an obvious and unimportant issue that doesn't affect code functionality or quality.

3. examples/parallelism/lazy_threadpool_execution/run.py:20

Draft comment:
Consider making project_id and username configurable or document that these need to be changed by the user.
Reason this comment was not posted:
Confidence changes required: 50%
The run.py script uses a hardcoded project ID and username, which might not be suitable for all users.

4. examples/parallelism/lazy_threadpool_execution/run.py:17

Draft comment:
Ensure that hamilton_sdk is a required dependency or provide instructions for its installation.
Reason this comment was not posted:
Confidence changes required: 50%
The run.py script imports adapters from hamilton_sdk, which might not be available in all environments.

5. hamilton/plugins/h_threadpool.py:71

Draft comment:
Consider adding exception handling for the function execution to prevent unhandled exceptions from crashing the thread pool.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
ThreadPoolExecutor automatically captures exceptions in Future objects, which can be retrieved via Future.exception(). The code already uses Future.result() which will raise any captured exceptions. Adding additional exception handling would be redundant since the executor already provides this functionality. The comment seems to misunderstand how ThreadPoolExecutor works.
Maybe there are specific types of exceptions that should be handled differently or logged in a particular way for this application.
While custom exception handling could be added, it should be driven by specific requirements rather than general concerns about unhandled exceptions, which are already managed by ThreadPoolExecutor.
The comment should be deleted as it suggests adding unnecessary exception handling that's already provided by ThreadPoolExecutor.

6. examples/parallelism/lazy_threadpool_execution/my_functions.py:4

Draft comment:
Function names like a, b, c, etc., are not descriptive. Consider using more descriptive names to improve code readability. This applies to all similar functions in this file and in my_functions_async.py.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
Since this is in an examples/parallelism directory, these functions appear to be deliberately simplified to demonstrate a concept about parallel execution. The single-letter names might actually be beneficial here to keep the example focused on the parallelism patterns rather than domain-specific logic. The file name and directory structure suggest this is educational/example code rather than production code.
The comment raises a valid point about code readability in general, but I might be overlooking the pedagogical value of keeping example code extremely simple.
In example code demonstrating technical concepts like parallelism, overly descriptive names could actually detract from the learning objective by adding unnecessary complexity.
Delete the comment. While descriptive names are generally good practice, in this specific context of example code demonstrating parallelism concepts, the simple single-letter names serve a valid pedagogical purpose.

7. examples/parallelism/lazy_threadpool_execution/run.py:20

Draft comment:
Avoid hardcoding sensitive information like project_id and username. Consider using environment variables or configuration files to manage such data securely. This applies to all similar instances in this file.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
While hardcoded credentials are generally a security concern, this is in an examples directory and appears to be demo code. The project_id is just "21" and has a comment "modify this as needed". The email address appears to be a real email but is likely just for demonstration. Example code often includes placeholder values to show the expected format.
The comment raises a valid security best practice. Hardcoded credentials could be copied by users into production code.
Since this is clearly example code and the values are meant to be replaced, adding configuration management would make the example more complex and harder to understand.
Delete the comment. While the security practice is valid, this is example code where simple placeholder values are more appropriate for demonstration purposes.

Workflow ID: wflow_jgnKlIGIxZMkXrlc

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on cecc050 in 24 seconds

More details

Looked at 30 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/parallelism/lazy_threadpool_execution/run.py:11

Draft comment:
There's a typo in the filename. It should be my_functions.png instead of my_funtions.png.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_7kOHRbqPlOa0bhhM

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 4ac52f1 in 4 seconds

More details

Looked at 9 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. examples/parallelism/lazy_threadpool_execution/requirements.txt:1

Draft comment:
It's recommended to specify exact versions for dependencies to ensure reproducibility.
Reason this comment was not posted:
Confidence changes required: 50%
The requirements file should specify exact versions to ensure reproducibility.

2. examples/parallelism/lazy_threadpool_execution/requirements.txt:3

Draft comment:
Remove the empty line at the end of the file for consistency.
Reason this comment was not posted:
Confidence changes required: 10%
The requirements file should not have an empty line at the end. This is a minor issue but should be corrected for consistency.

Workflow ID: wflow_TnxeLrcDmtuh7gwY

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

So that people can adjust the result accordingly.

ellipsis-dev

👍 Looks good to me! Incremental review on 4f136d4 in 45 seconds

More details

Looked at 133 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. tests/plugins/test_h_threadpool.py:4

Draft comment:
The import from hamilton import lifecycle is unused and can be removed to clean up the code.
Reason this comment was not posted:
Confidence changes required: 10%
The import statement for lifecycle is not used in the test file. Removing it will clean up the code.

2. hamilton/plugins/h_threadpool.py:72

Draft comment:
Returning [Any] from input_types is not very informative. Consider returning an empty list or a more specific type if applicable.
Reason this comment was not posted:
Confidence changes required: 50%
The input_types method in FutureAdapter returns a list with Any, which is not very informative. It might be better to return an empty list or a more specific type if possible.

3. hamilton/plugins/h_threadpool.py:80

Draft comment:
Returning Any from output_type is not very informative. Consider returning a more specific type if applicable.
Reason this comment was not posted:
Confidence changes required: 50%
The output_type method in FutureAdapter returns Any when result_builder is not provided. It might be better to return a more specific type if possible.

4. hamilton/plugins/h_threadpool.py:82

Draft comment:
The do_remote_execute method in FutureAdapter is not following the single responsibility principle as it both submits a function to the executor and wraps it with _new_fn. Consider splitting this into two methods for clarity.
Reason this comment was not posted:
Comment was on unchanged code.

5. hamilton/plugins/h_threadpool.py:48

Draft comment:
The FutureAdapter class is handling two responsibilities: managing thread execution and building results. Consider separating these concerns to adhere to the Single Responsibility Principle.
Reason this comment was not posted:
Comment was not on a valid diff hunk.

6. hamilton/plugins/h_threadpool.py:64

Draft comment:
The methods input_types and output_type in FutureAdapter could have more descriptive names to better convey their purpose.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:

These method names are part of an interface (ResultBuilder) implementation. 2. The names are clear and descriptive for their purpose. 3. Changing interface method names would break compatibility. 4. The methods have good docstrings explaining their purpose. 5. The names follow standard Python type annotation patterns.
The names could potentially be more specific about their role in the FutureAdapter context. However, they're interface methods.
Interface method names should be consistent across implementations, and these names are already clear and well-documented.
The comment should be deleted as it suggests renaming interface methods that are already well-named and documented.

Workflow ID: wflow_PfXblcHcDhyI1ynw

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

skrawcz mentioned this pull request Jan 1, 2025

Caching for async driver #1263

Open

skrawcz requested a review from elijahbenizzy January 1, 2025 06:26

elijahbenizzy approved these changes Jan 1, 2025

View reviewed changes

skrawcz added 2 commits January 1, 2025 21:22

Adds docs, tests, and example

f72bfb6

Fix file name

037c915

skrawcz marked this pull request as ready for review January 2, 2025 05:37

skrawcz changed the title ~~Adds lazy threadpool DAG parallelization~~ Adds FutureAdapter that delegates executions to a threadpool for parallelization Jan 2, 2025

ellipsis-dev bot reviewed Jan 2, 2025

View reviewed changes

Fixes example image

cecc050

ellipsis-dev bot reviewed Jan 2, 2025

View reviewed changes

adds requirements.txt

4ac52f1

ellipsis-dev bot reviewed Jan 2, 2025

View reviewed changes

Adds wrapping a result builder

4f136d4

So that people can adjust the result accordingly.

ellipsis-dev bot reviewed Jan 2, 2025

View reviewed changes

elijahbenizzy merged commit 161b2a6 into main Jan 3, 2025
24 checks passed

elijahbenizzy deleted the add_lazy_threadpool_execution branch January 3, 2025 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds FutureAdapter that delegates executions to a threadpool for parallelization #1264

Adds FutureAdapter that delegates executions to a threadpool for parallelization #1264

skrawcz commented Jan 1, 2025 •

edited

Loading

elijahbenizzy left a comment

elijahbenizzy Jan 1, 2025

skrawcz Jan 2, 2025 •

edited

Loading

elijahbenizzy Jan 2, 2025 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Adds FutureAdapter that delegates executions to a threadpool for parallelization #1264

Adds FutureAdapter that delegates executions to a threadpool for parallelization #1264

Conversation

skrawcz commented Jan 1, 2025 • edited Loading

Changes

How I tested this

Notes - TODO:

Checklist

elijahbenizzy left a comment

Choose a reason for hiding this comment

elijahbenizzy Jan 1, 2025

Choose a reason for hiding this comment

skrawcz Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

elijahbenizzy Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

skrawcz commented Jan 1, 2025 •

edited

Loading

skrawcz Jan 2, 2025 •

edited

Loading

elijahbenizzy Jan 2, 2025 •

edited

Loading