Skip to content

Commit 5de07ac

Browse files
authored
Add example for get_cache_data() (#625)
* Add example for get_cache_data() * fix minimal * fix list of tools * hid list * add list step by step * add to list * ignore * add strings * extend type hint * fix test to increase coverage
1 parent 55286eb commit 5de07ac

File tree

3 files changed

+147
-54
lines changed

3 files changed

+147
-54
lines changed

executorlib/__init__.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,19 @@
99
SlurmJobExecutor,
1010
)
1111

12-
__version__ = _get_versions()["version"]
13-
__all__: list = [
12+
__all__: list[str] = [
1413
"FluxJobExecutor",
1514
"FluxClusterExecutor",
1615
"SingleNodeExecutor",
1716
"SlurmJobExecutor",
1817
"SlurmClusterExecutor",
1918
]
19+
20+
try:
21+
from executorlib.standalone.hdf import get_cache_data
22+
except ImportError:
23+
pass
24+
else:
25+
__all__ += ["get_cache_data"]
26+
27+
__version__ = _get_versions()["version"]

notebooks/1-single-node.ipynb

Lines changed: 136 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,17 @@
2626
"id": "b1907f12-7378-423b-9b83-1b65fc0a20f5",
2727
"metadata": {},
2828
"outputs": [],
29-
"source": "from executorlib import SingleNodeExecutor"
29+
"source": [
30+
"from executorlib import SingleNodeExecutor"
31+
]
3032
},
3133
{
3234
"cell_type": "markdown",
3335
"id": "1654679f-38b3-4699-9bfe-b48cbde0b2db",
3436
"metadata": {},
35-
"source": "It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:"
37+
"source": [
38+
"It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:"
39+
]
3640
},
3741
{
3842
"cell_type": "code",
@@ -45,8 +49,8 @@
4549
"output_type": "stream",
4650
"text": [
4751
"2\n",
48-
"CPU times: user 100 ms, sys: 70.7 ms, total: 171 ms\n",
49-
"Wall time: 1.94 s\n"
52+
"CPU times: user 84.4 ms, sys: 59.3 ms, total: 144 ms\n",
53+
"Wall time: 482 ms\n"
5054
]
5155
}
5256
],
@@ -61,7 +65,9 @@
6165
"cell_type": "markdown",
6266
"id": "a1109584-9db2-4f9d-b3ed-494d96241396",
6367
"metadata": {},
64-
"source": "As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement."
68+
"source": [
69+
"As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement."
70+
]
6571
},
6672
{
6773
"cell_type": "code",
@@ -74,8 +80,8 @@
7480
"output_type": "stream",
7581
"text": [
7682
"[4, 6, 8]\n",
77-
"CPU times: user 49.4 ms, sys: 29.2 ms, total: 78.7 ms\n",
78-
"Wall time: 1.75 s\n"
83+
"CPU times: user 39.7 ms, sys: 26.8 ms, total: 66.5 ms\n",
84+
"Wall time: 524 ms\n"
7985
]
8086
}
8187
],
@@ -105,8 +111,8 @@
105111
"output_type": "stream",
106112
"text": [
107113
"[10, 12, 14]\n",
108-
"CPU times: user 40.5 ms, sys: 28.1 ms, total: 68.6 ms\n",
109-
"Wall time: 1.09 s\n"
114+
"CPU times: user 28 ms, sys: 23.1 ms, total: 51.1 ms\n",
115+
"Wall time: 517 ms\n"
110116
]
111117
}
112118
],
@@ -121,7 +127,9 @@
121127
"cell_type": "markdown",
122128
"id": "ac86bf47-4eb6-4d7c-acae-760b880803a8",
123129
"metadata": {},
124-
"source": "These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library."
130+
"source": [
131+
"These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library."
132+
]
125133
},
126134
{
127135
"cell_type": "markdown",
@@ -349,8 +357,8 @@
349357
"output_type": "stream",
350358
"text": [
351359
"2\n",
352-
"CPU times: user 37.1 ms, sys: 21.8 ms, total: 58.9 ms\n",
353-
"Wall time: 1.09 s\n"
360+
"CPU times: user 31.1 ms, sys: 19.1 ms, total: 50.1 ms\n",
361+
"Wall time: 394 ms\n"
354362
]
355363
}
356364
],
@@ -388,7 +396,9 @@
388396
"cell_type": "markdown",
389397
"id": "9e1212c4-e3fb-4e21-be43-0a4f0a08b856",
390398
"metadata": {},
391-
"source": "Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations."
399+
"source": [
400+
"Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations."
401+
]
392402
},
393403
{
394404
"cell_type": "code",
@@ -413,34 +423,7 @@
413423
"experience performance degradation.\n",
414424
"\n",
415425
" Local host: MacBook-Pro.local\n",
416-
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22031/1/vader_segment.MacBook-Pro.501.17620001.1\n",
417-
" Error: No such file or directory (errno 2)\n",
418-
"--------------------------------------------------------------------------\n",
419-
"--------------------------------------------------------------------------\n",
420-
"A system call failed during shared memory initialization that should\n",
421-
"not have. It is likely that your MPI job will now either abort or\n",
422-
"experience performance degradation.\n",
423-
"\n",
424-
" Local host: MacBook-Pro.local\n",
425-
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22028/1/vader_segment.MacBook-Pro.501.17610001.1\n",
426-
" Error: No such file or directory (errno 2)\n",
427-
"--------------------------------------------------------------------------\n",
428-
"--------------------------------------------------------------------------\n",
429-
"A system call failed during shared memory initialization that should\n",
430-
"not have. It is likely that your MPI job will now either abort or\n",
431-
"experience performance degradation.\n",
432-
"\n",
433-
" Local host: MacBook-Pro.local\n",
434-
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22030/1/vader_segment.MacBook-Pro.501.17630001.1\n",
435-
" Error: No such file or directory (errno 2)\n",
436-
"--------------------------------------------------------------------------\n",
437-
"--------------------------------------------------------------------------\n",
438-
"A system call failed during shared memory initialization that should\n",
439-
"not have. It is likely that your MPI job will now either abort or\n",
440-
"experience performance degradation.\n",
441-
"\n",
442-
" Local host: MacBook-Pro.local\n",
443-
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22029/1/vader_segment.MacBook-Pro.501.17600001.1\n",
426+
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.55070/1/vader_segment.MacBook-Pro.501.96730001.1\n",
444427
" Error: No such file or directory (errno 2)\n",
445428
"--------------------------------------------------------------------------\n"
446429
]
@@ -486,7 +469,9 @@
486469
"cell_type": "markdown",
487470
"id": "d07cf107-3627-4cb0-906c-647497d6e0d2",
488471
"metadata": {},
489-
"source": "The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored."
472+
"source": [
473+
"The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored."
474+
]
490475
},
491476
{
492477
"cell_type": "code",
@@ -538,8 +523,8 @@
538523
"output_type": "stream",
539524
"text": [
540525
"[2, 4, 6]\n",
541-
"CPU times: user 547 ms, sys: 161 ms, total: 708 ms\n",
542-
"Wall time: 1.33 s\n"
526+
"CPU times: user 512 ms, sys: 138 ms, total: 650 ms\n",
527+
"Wall time: 865 ms\n"
543528
]
544529
}
545530
],
@@ -571,8 +556,8 @@
571556
"output_type": "stream",
572557
"text": [
573558
"[2, 4, 6]\n",
574-
"CPU times: user 52.1 ms, sys: 41.1 ms, total: 93.2 ms\n",
575-
"Wall time: 1.13 s\n"
559+
"CPU times: user 56.7 ms, sys: 32.5 ms, total: 89.2 ms\n",
560+
"Wall time: 620 ms\n"
576561
]
577562
}
578563
],
@@ -583,6 +568,106 @@
583568
" print([f.result() for f in future_lst])"
584569
]
585570
},
571+
{
572+
"cell_type": "markdown",
573+
"id": "5144a035-633e-4e60-a362-f3b15b28848b",
574+
"metadata": {},
575+
"source": [
576+
"An additional advantage of the cache is the option to gather the results of previously submitted functions. Using the `get_cache_data()` function the results of each Python function is converted to a dictionary. This list of dictionaries can be converted to a `pandas.DataFrame` for further processing:"
577+
]
578+
},
579+
{
580+
"cell_type": "code",
581+
"execution_count": 19,
582+
"id": "f574b9e1-de55-4e38-aef7-a4bed540e040",
583+
"metadata": {},
584+
"outputs": [
585+
{
586+
"data": {
587+
"text/html": [
588+
"<div>\n",
589+
"<style scoped>\n",
590+
" .dataframe tbody tr th:only-of-type {\n",
591+
" vertical-align: middle;\n",
592+
" }\n",
593+
"\n",
594+
" .dataframe tbody tr th {\n",
595+
" vertical-align: top;\n",
596+
" }\n",
597+
"\n",
598+
" .dataframe thead th {\n",
599+
" text-align: right;\n",
600+
" }\n",
601+
"</style>\n",
602+
"<table border=\"1\" class=\"dataframe\">\n",
603+
" <thead>\n",
604+
" <tr style=\"text-align: right;\">\n",
605+
" <th></th>\n",
606+
" <th>function</th>\n",
607+
" <th>input_args</th>\n",
608+
" <th>input_kwargs</th>\n",
609+
" <th>output</th>\n",
610+
" <th>runtime</th>\n",
611+
" <th>filename</th>\n",
612+
" </tr>\n",
613+
" </thead>\n",
614+
" <tbody>\n",
615+
" <tr>\n",
616+
" <th>0</th>\n",
617+
" <td>&lt;built-in function sum&gt;</td>\n",
618+
" <td>([1, 1],)</td>\n",
619+
" <td>{}</td>\n",
620+
" <td>2</td>\n",
621+
" <td>0.001686</td>\n",
622+
" <td>sum0d968285d17368d1c34ea7392309bcc5.h5out</td>\n",
623+
" </tr>\n",
624+
" <tr>\n",
625+
" <th>1</th>\n",
626+
" <td>&lt;built-in function sum&gt;</td>\n",
627+
" <td>([3, 3],)</td>\n",
628+
" <td>{}</td>\n",
629+
" <td>6</td>\n",
630+
" <td>0.136151</td>\n",
631+
" <td>sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out</td>\n",
632+
" </tr>\n",
633+
" <tr>\n",
634+
" <th>2</th>\n",
635+
" <td>&lt;built-in function sum&gt;</td>\n",
636+
" <td>([2, 2],)</td>\n",
637+
" <td>{}</td>\n",
638+
" <td>4</td>\n",
639+
" <td>0.136006</td>\n",
640+
" <td>sum6270955d7c8022a0c1027aafaee64439.h5out</td>\n",
641+
" </tr>\n",
642+
" </tbody>\n",
643+
"</table>\n",
644+
"</div>"
645+
],
646+
"text/plain": [
647+
" function input_args input_kwargs output runtime \\\n",
648+
"0 <built-in function sum> ([1, 1],) {} 2 0.001686 \n",
649+
"1 <built-in function sum> ([3, 3],) {} 6 0.136151 \n",
650+
"2 <built-in function sum> ([2, 2],) {} 4 0.136006 \n",
651+
"\n",
652+
" filename \n",
653+
"0 sum0d968285d17368d1c34ea7392309bcc5.h5out \n",
654+
"1 sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out \n",
655+
"2 sum6270955d7c8022a0c1027aafaee64439.h5out "
656+
]
657+
},
658+
"execution_count": 19,
659+
"metadata": {},
660+
"output_type": "execute_result"
661+
}
662+
],
663+
"source": [
664+
"import pandas\n",
665+
"from executorlib import get_cache_data\n",
666+
"\n",
667+
"df = pandas.DataFrame(get_cache_data(cache_directory=\"./cache\"))\n",
668+
"df"
669+
]
670+
},
586671
{
587672
"cell_type": "markdown",
588673
"id": "68092479-e846-494a-9ac9-d9638b102bd8",
@@ -593,15 +678,15 @@
593678
},
594679
{
595680
"cell_type": "code",
596-
"execution_count": 19,
681+
"execution_count": 20,
597682
"id": "34a9316d-577f-4a63-af14-736fb4e6b219",
598683
"metadata": {},
599684
"outputs": [
600685
{
601686
"name": "stdout",
602687
"output_type": "stream",
603688
"text": [
604-
"['sumb6a5053f96b7031239c2e8d0e7563ce4.h5out', 'sum5171356dfe527405c606081cfbd2dffe.h5out', 'sumd1bf4ee658f1ac42924a2e4690e797f4.h5out']\n"
689+
"['sum0d968285d17368d1c34ea7392309bcc5.h5out', 'sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out', 'sum6270955d7c8022a0c1027aafaee64439.h5out']\n"
605690
]
606691
}
607692
],
@@ -637,7 +722,7 @@
637722
},
638723
{
639724
"cell_type": "code",
640-
"execution_count": 20,
725+
"execution_count": 21,
641726
"id": "d8b75a26-479d-405e-8895-a8d56b3f0f4b",
642727
"metadata": {},
643728
"outputs": [],
@@ -658,7 +743,7 @@
658743
},
659744
{
660745
"cell_type": "code",
661-
"execution_count": 21,
746+
"execution_count": 22,
662747
"id": "35fd5747-c57d-4926-8d83-d5c55a130ad6",
663748
"metadata": {},
664749
"outputs": [
@@ -692,7 +777,7 @@
692777
},
693778
{
694779
"cell_type": "code",
695-
"execution_count": 22,
780+
"execution_count": 23,
696781
"id": "f67470b5-af1d-4add-9de8-7f259ca67324",
697782
"metadata": {},
698783
"outputs": [

tests/test_singlenodeexecutor_cache.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from executorlib.standalone.serialize import cloudpickle_register
77

88
try:
9-
from executorlib.standalone.hdf import get_cache_data
9+
from executorlib import get_cache_data
1010

1111
skip_h5py_test = False
1212
except ImportError:

0 commit comments

Comments
 (0)