Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[1.9] AssetKey path as a tuple (dagster-io#25240)
Storing `path` as a tuple and avoiding custom `__eq__` and `__hash__` functions results in a substantial performance improvement for operations like building up a large global asset graph. ## How I Tested These Changes For this target large graph, function calls decreased by ~70% and execution time decreased by ~50% Before: ``` Profiling asset graph... 353461 function calls in 0.083 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 119166 0.021 0.000 0.027 0.000 asset_key.py:56(__hash__) 1 0.012 0.012 0.080 0.080 remote_asset_graph.py:303(_build) 1 0.008 0.008 0.018 0.018 remote_asset_graph.py:355(<dictcomp>) 119166 0.006 0.000 0.006 0.000 {built-in method builtins.hash} 1 0.005 0.005 0.012 0.012 remote_asset_graph.py:502(_build_execution_set_index) 7790 0.003 0.000 0.006 0.000 external_data.py:1313(key) 2 0.003 0.002 0.006 0.003 remote_asset_graph.py:481(_warn_on_duplicates_within_subset) 17250 0.003 0.000 0.007 0.000 {method 'add' of 'set' objects} 1 0.002 0.002 0.083 0.083 remote_asset_graph.py:276(from_workspace_snapshot) 5012 0.002 0.000 0.004 0.000 remote_asset_graph.py:51(__init__) 9598 0.002 0.000 0.002 0.000 asset_key.py:59(__eq__) 1 0.002 0.002 0.003 0.003 remote_asset_graph.py:330(<dictcomp>) 5012 0.002 0.000 0.002 0.000 remote_asset_graph.py:64(<listcomp>) 1 0.001 0.001 0.003 0.003 remote_asset_graph.py:329(<dictcomp>) 7790 0.001 0.000 0.002 0.000 <string>:1(<lambda>) 1 0.001 0.001 0.002 0.002 remote_asset_graph.py:350(<dictcomp>) 1 0.001 0.001 0.008 0.008 remote_asset_graph.py:455(_warn_on_duplicate_nodes) 7798 0.001 0.000 0.001 0.000 {built-in method __new__ of type object at 0x10320bcb0} 1 0.001 0.001 0.002 0.002 remote_asset_graph.py:328(<setcomp>) 21747 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects} 19842 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance} 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:256(<dictcomp>) 2 0.001 0.000 0.001 0.000 remote_asset_graph.py:489(<dictcomp>) ... ``` After: ``` Profiling asset graph... 105531 function calls in 0.043 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.009 0.009 0.040 0.040 remote_asset_graph.py:303(_build) 1 0.006 0.006 0.010 0.010 remote_asset_graph.py:355(<dictcomp>) 1 0.004 0.004 0.006 0.006 remote_asset_graph.py:502(_build_execution_set_index) 2 0.003 0.001 0.004 0.002 remote_asset_graph.py:481(_warn_on_duplicates_within_subset) 7790 0.003 0.000 0.004 0.000 external_data.py:1313(key) 5012 0.002 0.000 0.002 0.000 remote_asset_graph.py:64(<listcomp>) 1 0.002 0.002 0.043 0.043 remote_asset_graph.py:276(from_workspace_snapshot) 5012 0.002 0.000 0.004 0.000 remote_asset_graph.py:51(__init__) 17250 0.002 0.000 0.002 0.000 {method 'add' of 'set' objects} 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:329(<dictcomp>) 1 0.001 0.001 0.005 0.005 remote_asset_graph.py:455(_warn_on_duplicate_nodes) 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:330(<dictcomp>) 7790 0.001 0.000 0.002 0.000 <string>:1(<lambda>) 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:350(<dictcomp>) 7798 0.001 0.000 0.001 0.000 {built-in method __new__ of type object at 0x1050b3cb0} 21747 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects} 19842 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance} 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:256(<dictcomp>) 2 0.001 0.000 0.001 0.000 remote_asset_graph.py:489(<dictcomp>) 1 0.001 0.001 0.001 0.001 remote_asset_graph.py:328(<setcomp>) ... ``` ## Changelog [breaking] `AssetKey` can no longer be iterated over or indexed in to. This behavior was never an intended access pattern and in all observed cases was a mistake.
- Loading branch information