|
26 | 26 | "id": "b1907f12-7378-423b-9b83-1b65fc0a20f5", |
27 | 27 | "metadata": {}, |
28 | 28 | "outputs": [], |
29 | | - "source": "from executorlib import SingleNodeExecutor" |
| 29 | + "source": [ |
| 30 | + "from executorlib import SingleNodeExecutor" |
| 31 | + ] |
30 | 32 | }, |
31 | 33 | { |
32 | 34 | "cell_type": "markdown", |
33 | 35 | "id": "1654679f-38b3-4699-9bfe-b48cbde0b2db", |
34 | 36 | "metadata": {}, |
35 | | - "source": "It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:" |
| 37 | + "source": [ |
| 38 | + "It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:" |
| 39 | + ] |
36 | 40 | }, |
37 | 41 | { |
38 | 42 | "cell_type": "code", |
|
45 | 49 | "output_type": "stream", |
46 | 50 | "text": [ |
47 | 51 | "2\n", |
48 | | - "CPU times: user 100 ms, sys: 70.7 ms, total: 171 ms\n", |
49 | | - "Wall time: 1.94 s\n" |
| 52 | + "CPU times: user 84.4 ms, sys: 59.3 ms, total: 144 ms\n", |
| 53 | + "Wall time: 482 ms\n" |
50 | 54 | ] |
51 | 55 | } |
52 | 56 | ], |
|
61 | 65 | "cell_type": "markdown", |
62 | 66 | "id": "a1109584-9db2-4f9d-b3ed-494d96241396", |
63 | 67 | "metadata": {}, |
64 | | - "source": "As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement." |
| 68 | + "source": [ |
| 69 | + "As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement." |
| 70 | + ] |
65 | 71 | }, |
66 | 72 | { |
67 | 73 | "cell_type": "code", |
|
74 | 80 | "output_type": "stream", |
75 | 81 | "text": [ |
76 | 82 | "[4, 6, 8]\n", |
77 | | - "CPU times: user 49.4 ms, sys: 29.2 ms, total: 78.7 ms\n", |
78 | | - "Wall time: 1.75 s\n" |
| 83 | + "CPU times: user 39.7 ms, sys: 26.8 ms, total: 66.5 ms\n", |
| 84 | + "Wall time: 524 ms\n" |
79 | 85 | ] |
80 | 86 | } |
81 | 87 | ], |
|
105 | 111 | "output_type": "stream", |
106 | 112 | "text": [ |
107 | 113 | "[10, 12, 14]\n", |
108 | | - "CPU times: user 40.5 ms, sys: 28.1 ms, total: 68.6 ms\n", |
109 | | - "Wall time: 1.09 s\n" |
| 114 | + "CPU times: user 28 ms, sys: 23.1 ms, total: 51.1 ms\n", |
| 115 | + "Wall time: 517 ms\n" |
110 | 116 | ] |
111 | 117 | } |
112 | 118 | ], |
|
121 | 127 | "cell_type": "markdown", |
122 | 128 | "id": "ac86bf47-4eb6-4d7c-acae-760b880803a8", |
123 | 129 | "metadata": {}, |
124 | | - "source": "These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library." |
| 130 | + "source": [ |
| 131 | + "These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library." |
| 132 | + ] |
125 | 133 | }, |
126 | 134 | { |
127 | 135 | "cell_type": "markdown", |
|
349 | 357 | "output_type": "stream", |
350 | 358 | "text": [ |
351 | 359 | "2\n", |
352 | | - "CPU times: user 37.1 ms, sys: 21.8 ms, total: 58.9 ms\n", |
353 | | - "Wall time: 1.09 s\n" |
| 360 | + "CPU times: user 31.1 ms, sys: 19.1 ms, total: 50.1 ms\n", |
| 361 | + "Wall time: 394 ms\n" |
354 | 362 | ] |
355 | 363 | } |
356 | 364 | ], |
|
388 | 396 | "cell_type": "markdown", |
389 | 397 | "id": "9e1212c4-e3fb-4e21-be43-0a4f0a08b856", |
390 | 398 | "metadata": {}, |
391 | | - "source": "Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations." |
| 399 | + "source": [ |
| 400 | + "Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations." |
| 401 | + ] |
392 | 402 | }, |
393 | 403 | { |
394 | 404 | "cell_type": "code", |
|
413 | 423 | "experience performance degradation.\n", |
414 | 424 | "\n", |
415 | 425 | " Local host: MacBook-Pro.local\n", |
416 | | - " System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22031/1/vader_segment.MacBook-Pro.501.17620001.1\n", |
417 | | - " Error: No such file or directory (errno 2)\n", |
418 | | - "--------------------------------------------------------------------------\n", |
419 | | - "--------------------------------------------------------------------------\n", |
420 | | - "A system call failed during shared memory initialization that should\n", |
421 | | - "not have. It is likely that your MPI job will now either abort or\n", |
422 | | - "experience performance degradation.\n", |
423 | | - "\n", |
424 | | - " Local host: MacBook-Pro.local\n", |
425 | | - " System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22028/1/vader_segment.MacBook-Pro.501.17610001.1\n", |
426 | | - " Error: No such file or directory (errno 2)\n", |
427 | | - "--------------------------------------------------------------------------\n", |
428 | | - "--------------------------------------------------------------------------\n", |
429 | | - "A system call failed during shared memory initialization that should\n", |
430 | | - "not have. It is likely that your MPI job will now either abort or\n", |
431 | | - "experience performance degradation.\n", |
432 | | - "\n", |
433 | | - " Local host: MacBook-Pro.local\n", |
434 | | - " System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22030/1/vader_segment.MacBook-Pro.501.17630001.1\n", |
435 | | - " Error: No such file or directory (errno 2)\n", |
436 | | - "--------------------------------------------------------------------------\n", |
437 | | - "--------------------------------------------------------------------------\n", |
438 | | - "A system call failed during shared memory initialization that should\n", |
439 | | - "not have. It is likely that your MPI job will now either abort or\n", |
440 | | - "experience performance degradation.\n", |
441 | | - "\n", |
442 | | - " Local host: MacBook-Pro.local\n", |
443 | | - " System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22029/1/vader_segment.MacBook-Pro.501.17600001.1\n", |
| 426 | + " System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.55070/1/vader_segment.MacBook-Pro.501.96730001.1\n", |
444 | 427 | " Error: No such file or directory (errno 2)\n", |
445 | 428 | "--------------------------------------------------------------------------\n" |
446 | 429 | ] |
|
486 | 469 | "cell_type": "markdown", |
487 | 470 | "id": "d07cf107-3627-4cb0-906c-647497d6e0d2", |
488 | 471 | "metadata": {}, |
489 | | - "source": "The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored." |
| 472 | + "source": [ |
| 473 | + "The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored." |
| 474 | + ] |
490 | 475 | }, |
491 | 476 | { |
492 | 477 | "cell_type": "code", |
|
538 | 523 | "output_type": "stream", |
539 | 524 | "text": [ |
540 | 525 | "[2, 4, 6]\n", |
541 | | - "CPU times: user 547 ms, sys: 161 ms, total: 708 ms\n", |
542 | | - "Wall time: 1.33 s\n" |
| 526 | + "CPU times: user 512 ms, sys: 138 ms, total: 650 ms\n", |
| 527 | + "Wall time: 865 ms\n" |
543 | 528 | ] |
544 | 529 | } |
545 | 530 | ], |
|
571 | 556 | "output_type": "stream", |
572 | 557 | "text": [ |
573 | 558 | "[2, 4, 6]\n", |
574 | | - "CPU times: user 52.1 ms, sys: 41.1 ms, total: 93.2 ms\n", |
575 | | - "Wall time: 1.13 s\n" |
| 559 | + "CPU times: user 56.7 ms, sys: 32.5 ms, total: 89.2 ms\n", |
| 560 | + "Wall time: 620 ms\n" |
576 | 561 | ] |
577 | 562 | } |
578 | 563 | ], |
|
583 | 568 | " print([f.result() for f in future_lst])" |
584 | 569 | ] |
585 | 570 | }, |
| 571 | + { |
| 572 | + "cell_type": "markdown", |
| 573 | + "id": "5144a035-633e-4e60-a362-f3b15b28848b", |
| 574 | + "metadata": {}, |
| 575 | + "source": [ |
| 576 | + "An additional advantage of the cache is the option to gather the results of previously submitted functions. Using the `get_cache_data()` function the results of each Python function is converted to a dictionary. This list of dictionaries can be converted to a `pandas.DataFrame` for further processing:" |
| 577 | + ] |
| 578 | + }, |
| 579 | + { |
| 580 | + "cell_type": "code", |
| 581 | + "execution_count": 19, |
| 582 | + "id": "f574b9e1-de55-4e38-aef7-a4bed540e040", |
| 583 | + "metadata": {}, |
| 584 | + "outputs": [ |
| 585 | + { |
| 586 | + "data": { |
| 587 | + "text/html": [ |
| 588 | + "<div>\n", |
| 589 | + "<style scoped>\n", |
| 590 | + " .dataframe tbody tr th:only-of-type {\n", |
| 591 | + " vertical-align: middle;\n", |
| 592 | + " }\n", |
| 593 | + "\n", |
| 594 | + " .dataframe tbody tr th {\n", |
| 595 | + " vertical-align: top;\n", |
| 596 | + " }\n", |
| 597 | + "\n", |
| 598 | + " .dataframe thead th {\n", |
| 599 | + " text-align: right;\n", |
| 600 | + " }\n", |
| 601 | + "</style>\n", |
| 602 | + "<table border=\"1\" class=\"dataframe\">\n", |
| 603 | + " <thead>\n", |
| 604 | + " <tr style=\"text-align: right;\">\n", |
| 605 | + " <th></th>\n", |
| 606 | + " <th>function</th>\n", |
| 607 | + " <th>input_args</th>\n", |
| 608 | + " <th>input_kwargs</th>\n", |
| 609 | + " <th>output</th>\n", |
| 610 | + " <th>runtime</th>\n", |
| 611 | + " <th>filename</th>\n", |
| 612 | + " </tr>\n", |
| 613 | + " </thead>\n", |
| 614 | + " <tbody>\n", |
| 615 | + " <tr>\n", |
| 616 | + " <th>0</th>\n", |
| 617 | + " <td><built-in function sum></td>\n", |
| 618 | + " <td>([1, 1],)</td>\n", |
| 619 | + " <td>{}</td>\n", |
| 620 | + " <td>2</td>\n", |
| 621 | + " <td>0.001686</td>\n", |
| 622 | + " <td>sum0d968285d17368d1c34ea7392309bcc5.h5out</td>\n", |
| 623 | + " </tr>\n", |
| 624 | + " <tr>\n", |
| 625 | + " <th>1</th>\n", |
| 626 | + " <td><built-in function sum></td>\n", |
| 627 | + " <td>([3, 3],)</td>\n", |
| 628 | + " <td>{}</td>\n", |
| 629 | + " <td>6</td>\n", |
| 630 | + " <td>0.136151</td>\n", |
| 631 | + " <td>sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out</td>\n", |
| 632 | + " </tr>\n", |
| 633 | + " <tr>\n", |
| 634 | + " <th>2</th>\n", |
| 635 | + " <td><built-in function sum></td>\n", |
| 636 | + " <td>([2, 2],)</td>\n", |
| 637 | + " <td>{}</td>\n", |
| 638 | + " <td>4</td>\n", |
| 639 | + " <td>0.136006</td>\n", |
| 640 | + " <td>sum6270955d7c8022a0c1027aafaee64439.h5out</td>\n", |
| 641 | + " </tr>\n", |
| 642 | + " </tbody>\n", |
| 643 | + "</table>\n", |
| 644 | + "</div>" |
| 645 | + ], |
| 646 | + "text/plain": [ |
| 647 | + " function input_args input_kwargs output runtime \\\n", |
| 648 | + "0 <built-in function sum> ([1, 1],) {} 2 0.001686 \n", |
| 649 | + "1 <built-in function sum> ([3, 3],) {} 6 0.136151 \n", |
| 650 | + "2 <built-in function sum> ([2, 2],) {} 4 0.136006 \n", |
| 651 | + "\n", |
| 652 | + " filename \n", |
| 653 | + "0 sum0d968285d17368d1c34ea7392309bcc5.h5out \n", |
| 654 | + "1 sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out \n", |
| 655 | + "2 sum6270955d7c8022a0c1027aafaee64439.h5out " |
| 656 | + ] |
| 657 | + }, |
| 658 | + "execution_count": 19, |
| 659 | + "metadata": {}, |
| 660 | + "output_type": "execute_result" |
| 661 | + } |
| 662 | + ], |
| 663 | + "source": [ |
| 664 | + "import pandas\n", |
| 665 | + "from executorlib import get_cache_data\n", |
| 666 | + "\n", |
| 667 | + "df = pandas.DataFrame(get_cache_data(cache_directory=\"./cache\"))\n", |
| 668 | + "df" |
| 669 | + ] |
| 670 | + }, |
586 | 671 | { |
587 | 672 | "cell_type": "markdown", |
588 | 673 | "id": "68092479-e846-494a-9ac9-d9638b102bd8", |
|
593 | 678 | }, |
594 | 679 | { |
595 | 680 | "cell_type": "code", |
596 | | - "execution_count": 19, |
| 681 | + "execution_count": 20, |
597 | 682 | "id": "34a9316d-577f-4a63-af14-736fb4e6b219", |
598 | 683 | "metadata": {}, |
599 | 684 | "outputs": [ |
600 | 685 | { |
601 | 686 | "name": "stdout", |
602 | 687 | "output_type": "stream", |
603 | 688 | "text": [ |
604 | | - "['sumb6a5053f96b7031239c2e8d0e7563ce4.h5out', 'sum5171356dfe527405c606081cfbd2dffe.h5out', 'sumd1bf4ee658f1ac42924a2e4690e797f4.h5out']\n" |
| 689 | + "['sum0d968285d17368d1c34ea7392309bcc5.h5out', 'sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out', 'sum6270955d7c8022a0c1027aafaee64439.h5out']\n" |
605 | 690 | ] |
606 | 691 | } |
607 | 692 | ], |
|
637 | 722 | }, |
638 | 723 | { |
639 | 724 | "cell_type": "code", |
640 | | - "execution_count": 20, |
| 725 | + "execution_count": 21, |
641 | 726 | "id": "d8b75a26-479d-405e-8895-a8d56b3f0f4b", |
642 | 727 | "metadata": {}, |
643 | 728 | "outputs": [], |
|
658 | 743 | }, |
659 | 744 | { |
660 | 745 | "cell_type": "code", |
661 | | - "execution_count": 21, |
| 746 | + "execution_count": 22, |
662 | 747 | "id": "35fd5747-c57d-4926-8d83-d5c55a130ad6", |
663 | 748 | "metadata": {}, |
664 | 749 | "outputs": [ |
|
692 | 777 | }, |
693 | 778 | { |
694 | 779 | "cell_type": "code", |
695 | | - "execution_count": 22, |
| 780 | + "execution_count": 23, |
696 | 781 | "id": "f67470b5-af1d-4add-9de8-7f259ca67324", |
697 | 782 | "metadata": {}, |
698 | 783 | "outputs": [ |
|
0 commit comments